提交 9ccb8108 编写于 作者: J jiangjiajun

first commit

上级 01d41a4d
- repo: local
hooks:
- id: yapf
name: yapf
entry: yapf
language: system
args: [-i, --style .style.yapf]
files: \.py$
- repo: https://github.com/pre-commit/pre-commit-hooks
sha: a11d9314b22d8f8c7556443875b731ef05965464
hooks:
- id: check-merge-conflict
- id: check-symlinks
- id: end-of-file-fixer
- id: trailing-whitespace
- id: detect-private-key
- id: check-symlinks
- id: check-added-large-files
- repo: local
hooks:
- id: copyright_checker
name: copyright_checker
entry: python ./.copyright.hook
language: system
files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|proto|py)$
exclude: (?!.*third_party)^.*$
## 1. 准备「图像分类」任务数据
### 图像分类的数据结构
```
data/mydataset/
|-- class 1
|-- 0001.jpg
|-- 0002.jpg
|-- ...
|-- class 2
|-- 0001.jpg
|-- 0002.jpg
|-- ...
```
class 1 及 class 2 文件夹需要命名为需要分类的类名,输入限定为英文字符,不可包含空格、中文或特殊字符。
## 2. 使用LabelMe标注「目标检测」及「实例分割」任务数据
### 2.1 准备工作
* **2.1.1** 创建与图像文件夹相对应的文件夹,用于存储标注的json文件。
* **2.1.2**点击”Open Dir“按钮,选择需要标注的图像所在的文件夹打开,则”File List“对话框中会显示所有图像所对应的绝对路径。
### 2.2 目标检测标注
* **2.2.1** 打开矩形框标注工具,具体如下图所示
<div align=center><img width="800" height="450" src="./pics/detection1.png"/></div>
* **2.2.2** 使用拖拉的方式对目标物体进行标识,并在弹出的对话框中写明对应label(当label已存在时点击即可),具体如下图所示:
<div align=center><img width="800" height="450" src="./pics/detection3.png"/></div>
当框标注错误时,可点击右侧的“Edit Polygons”再点击标注框,通过拖拉进行修改,也可再点击“Delete Polygon”进行删除。
* **2.2.3** 点击右侧”Save“,将标注结果保存到***2.1.1***中创建的文件夹中
【注意】当所使用的模型是类似Mask R-CNN这类模型时,虽是目标检测模型,但却需要实例分割信息,具体参见***2.3***
### 2.3 实例分割标注
* **2.3.1** 点击右侧的“Create Polygons”以打点的方式圈出目标的轮廓,并在弹出的对话框中写明对应label(当label已存在时点击即可),具体如下提所示:
<div align=center><img width="800" height="450" src="./pics/detection2.png"/></div>
当框标注错误时,可点击右侧的“Edit Polygons”再点击标注框,通过拖拉进行修改,也可再点击“Delete Polygon”进行删除。
* **2.3.2** 点击右侧”Save“,将标注结果保存到***2.1.1***中创建的文件夹中。
【注意】***2.2.2***和***2.3.1***中在在定义label名字时,都用英文命名,同时在名字后加上“_0”或“_1”分别代表目标是一个对象(iscrowd=0)还是一组对象(iscrowd=1)。一个对象表示能用一个矩形框(或大于等于一个多边形框)就能将一个独立对象表示出来,当需要使用多个多边形来表示一个对象时,在“_0”后加上同一个数字代表同一个对象;一组对象表示同一类型对象联系太紧密只能用一个矩形框(或一个多边形框)将一组对象圈出来。例如下图在进行目标检测标注时,水果单堆分割成一个个水果较不容易,所以将其定义为一组水果对象,label定为“fruit_1” ;在进行实例分割标注时,装饰品无法用一个多边形框表示出来,所以使用3个label为“decoration_00”的多边形表示。
<div align=center><img width="800" height="450" src="./pics/detection4.png"/></div>
## 2.4 目标检测任务数据目录结构
```
data/mydataset/
|-- JPEGImages
|-- 1.jpg
|-- 2.jpg
|-- Annotations
|-- 1.xml
|-- 2.xml
```
其中,Annotations文件夹中存放标注文件,JPEGImages文件夹中存放图像文件。
## 2.5 实例分割任务数据目录结构
```
data/mydataset/
|-- JPEGImages
|-- 1.jpg
|-- 2.jpg
|-- annotations.json
```
其中,`annotations.json`为标注文件,JPEGImages文件夹中存放图像文件。
## 3 使用LabelMe标注「语义分割」任务数据
语义分割中常用的数据集是CityScape和COCO,此小节主要讲述CityScape数据集在LabelMe上标注的使用,有关COCO部分请参考 2.3 小节中有关Mask RCNN部分。
### 3.1 准备工作
* **3.1.1** 创建与图像文件夹相对应的文件夹,用于存储标注的json文件
* **3.1.2** 点击”Open Dir“按钮,选择需要标注的图像所在的文件夹打开,则”File List“对话框中会显示所有图像所对应的绝对路径
### 3.2 标注
* **3.2.1** 点击右侧的“Create Polygons”以打点的方式圈出目标的轮廓,并在弹出的对话框中写明对应label(当label已存在时点击即可),具体如下提所示:
<div align=center><img width="800" height="450" src="./pics/detection5.png"/></div>
当框标注错误时,可点击右侧的“Edit Polygons”再点击标注框,通过拖拉进行修改,也可再点击“Delete Polygon”进行删除。
* **3.2.2** 点击右侧”Save“,将标注结果保存到***3.1.1***中创建的文件夹中
## 3.3 语义分割任务数据目录结构:
```
data/mydataset/
|-- JPEGImages
|-- 1.jpg
|-- 2.jpg
|-- 3.jpg
|-- Annotations
|-- 1.png
|-- 2.png
|-- 3.png
(可选)|-- label.txt
```
其中JPEGImages为图片文件夹,Annotations为标签文件夹。您可以提供一份命名为“label.txt”的包含所有标注名的清单,用于直接呈现类别名称,将标注序号“1”、“2”、“3” 等显示为对应的“空调”、“桌子”、“花瓶”。
\ No newline at end of file
### LabelMe
LabelMe是目前广泛使用的数据标注工具,您也可以在保证标注文件格式与PaddleX所支持格式进行匹配的基础,上选用其他标注工具。
LabelMe GitHub地址:https://github.com/wkentaro/labelme
#### LabelMe的安装
***注:为了保证环境的统一,本文介绍了在Anaconda环境下安装及使用LabelMe的方法,您也可以根据您的实际情况及需求,采用其他方案***
Windows: 参考文档[[标注工具安装和使用/1_Windows/1_3_LabelMe安装.md]](../DataAnnotation/标注工具安装和使用/1_Windows/1_3_LabelMe安装.md)
Ubuntu: 参考文档[[标注工具安装和使用/2_Ubuntu/2_3_LabelMe安装.md]](../DataAnnotation/标注工具安装和使用/2_Ubuntu/2_3_LabelMe安装.md)
MacOS: 参考文档[[标注工具安装和使用/3_MacOS/3_3_LabelMe安装.md]](../DataAnnotation/标注工具安装和使用/3_MacOS/3_3_LabelMe安装.md)
#### 使用LabelMe标注你的数据集
参考文档[[AnnotationNote]](../DataAnnotation/AnnotationNote)
## 2.1.1.1 下载Anaconda
在Anaconda官网[(https://www.anaconda.com/distribution/)](https://www.anaconda.com/distribution/)选择“Windows”,并选择与所需python相对应的Anaconda版本进行下载(PaddlePaddle要求安装的Anaconda版本为64-bit)
## 2.1.1.2 安装Anaconda
打开下载的安装包(以.exe为后缀),根据引导完成安装,在安装过程中可以修改安装路径,具体如下图所示:
<div align=center><img width="580" height="400" src="./pics/anaconda1.png"/></div>
【注意】默认安装在Windows当前用户主目录下
## 2.1.1.3 使用Anaconda
- 点击Windows系统左下角的Windows图标,打开:所有程序->Anaconda3/2(64-bit)->Anaconda Prompt
- 在命令行中执行下述命令
```cmd
# 创建一个名为mypaddle的环境,指定python版本是3.5
conda create -n mypaddle python=3.5
# 创建好后,使用activate进入环境
conda activate mypaddle
python --version
# 若上述命令行出现Anaconda字样,则表示安装成功
# 退出环境
conda deactivate
```
## 1. 安装PaddlePaddle
PaddlePaddle可以在64-bit的Windows7、Windows8、Windows10企业版、Windows10专业版上运行,同时支持python2(>=2.7.15)和python3(>= 3.5.1),但pip版本必须高于9.0.1。Windows版本同时支持CPU版和GPU版的PaddlePaddle,若使用GPU版,对于CUDA和CUDNN的安装,可参考NVIDIA官方文档[(https://docs.nvidia.com/cuda/cuda-installation-guide-linux/)](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/)[(https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/beginners_guide/install/Tables.html/#ciwhls-release)](https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/beginners_guide/install/Tables.html/#ciwhls-release)了解。目前,Windows环境暂不支持NCCL,分布式等相关功能。
- 在命令行中执行下述命令
```cmd
# 进入创建好的Anaconda环境
conda activate mypaddle
# (选择1)安装CPU版本PaddlePaddle
pip install -U paddlepaddle
# (选择2)安装GPU版本PaddlePaddle
pip install -U paddlepaddle-gpu
```
【注意】默认提供的安装包需要计算机支持AVX指令集和MKL,若环境不支持,可以在PaddlePaddle官网[(https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/beginners_guide/install/Tables.html/#ciwhls-release)](https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/beginners_guide/install/Tables.html/#ciwhls-release)下载openblas版本的安装包
- 安装成功后,打开python命令行,使用以下代码进行测试:
```python
import paddle.fluid as fluid
fluid.install_check.run_check()
# 若出现Your Paddle Fluid is installed successfully!字样则表示安装成功
```
## 2.1.3.1 安装LabelMe
在命令行中执行下述命令
```cmd
# 进入创建好的Anaconda环境
conda activate mypaddle
# (选择一)python版本为2.x
conda install pyqt
# (选择二)python版本为3.x
pip install pyqt5
# 安装LabelMe
pip install labelme
```
## 2.1.3.2 使用LabelMe
在命令行中执行下述命令,则会出现标注工具
```cmd
# 进入创建好的Anaconda环境
source activate mypaddle
# 开启LabelMe
```
LabelMe标注工具界面主要如下图所示:
<div align=center><img width="800" height="450" src="./pics/labelme1.png"/></div>
LabelMe默认标注多边形,可在图像中右键选择标注其他类型的框,如下图所示:
<div align=center><img width="800" height="450" src="./pics/labelme2.png"/></div>
## 2.2.1.1 下载Anaconda
Ubuntu图形界面下:在Anaconda官网[(https://www.anaconda.com/distribution/)](https://www.anaconda.com/distribution/)选择“Linux”,并选择与所需python相对应的Anaconda版本进行下载
Ubuntu命令行界面下:使用”wget“进行下载
```cmd
# Anaconda2
wget https://repo.anaconda.com/archive/Anaconda2-2019.07-Linux-x86_64.sh --no-check-certificate
# Anaconda3
wget https://repo.anaconda.com/archive/Anaconda3-2019.07-Linux-x86_64.sh --no-check-certificate
```
## 2.2.1.2 安装Anaconda
***步骤一:安装***
在Anaconda安装包所在路径执行下述命令行
```cmd
# 运行所下载的Anaconda,例如:
bash ./Anaconda3-2019.07-Linux-x86_64.sh
```
【注意】安装过程中一直回车即可,直至出现设置路径时可对安装路径进行修改,否则默认安装在Ubuntu当前用户主目录下
***步骤二:设置环境变量***
在命令行中执行下述命令
```cmd
# 将anaconda的bin目录加入PATH
# 根据安装路径的不同,修改”~/anaconda3/bin“
echo 'export PATH="~/anaconda3/bin:$PATH"' >> ~/.bashrc
# 更新bashrc以立即生效
source ~/.bashrc
```
## 2.2.1.3 使用Anaconda
在命令行中执行下述命令
```cmd
# 创建一个名为mypaddle的环境,指定python版本是3.5
conda create -n mypaddle python=3.5
# 创建好后,使用activate进入环境
source activate mypaddle
python --version
# 若上述命令行出现Anaconda字样,则表示安装成功
# 退出环境
source deactivate
```
## 2.2.2.1 安装PaddlePaddle
PaddlePaddle可以在64-bit的Ubuntu14.04(支持CUDA8、CUDA10)、Ubuntu16.04(支持CUDA8、CUDA9、CUDA10)、Ubuntu18.04(支持CUDA10)上运行,同时支持python2(>=2.7.15)和python3(>= 3.5.1),但pip版本必须高于9.0.1。Windows版本同时支持CPU版和GPU版的PaddlePaddle,若使用GPU版,对于CUDA和CUDNN的安装,可参考NVIDIA官方文档[(https://docs.nvidia.com/cuda/cuda-installation-guide-linux/)](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/)[(https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/beginners_guide/install/Tables.html/#ciwhls-release)](https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/beginners_guide/install/Tables.html/#ciwhls-release)了解。
- 在命令行中执行下述命令
```cmd
# 进入创建好的Anaconda环境
source activate mypaddle
# (选择1)安装CPU版本PaddlePaddle
pip install -U paddlepaddle
# (选择2)安装GPU版本PaddlePaddle
pip install -U paddlepaddle-gpu
# (选择3)安装指定版本PaddlePaddle
pip install -U paddlepaddle-gpu==[版本号]
pip install -U paddlepaddle==[版本号]
```
【注意】版本号可参考PyPi官网[(https://pypi.org/project/paddlepaddle-gpu/#history)](https://pypi.org/project/paddlepaddle-gpu/#history)
- 安装成功后,打开python命令行,使用以下代码进行测试:
```python
import paddle.fluid as fluid
fluid.install_check.run_check()
# 若出现Your Paddle Fluid is installed successfully!字样则表示安装成功
```
## 2.2.3.1 安装LabelMe
在命令行中执行下述命令
```cmd
# 进入创建好的Anaconda环境
source activate mypaddle
# (选择一)python版本为2.x
conda install pyqt
# (选择二)python版本为3.x
pip install pyqt5
# 安装LabelMe
pip install labelme
```
## 2.2.3.2 使用LabelMe
在命令行中执行下述命令,则会出现标注工具
```cmd
# 进入创建好的Anaconda环境
source activate mypaddle
# 开启LabelMe
labelme
```
LabelMe标注工具界面主要如下图所示:
<div align=center><img width="800" height="450" src="./pics/labelme1.png"/></div>
LabelMe默认标注多边形,可在图像中右键选择标注其他类型的框,如下图所示:
<div align=center><img width="800" height="450" src="./pics/labelme2.png"/></div>
## 2.3.1.1 下载Anaconda
在Anaconda官网[(https://www.anaconda.com/distribution/)](https://www.anaconda.com/distribution/)选择“MacOS”,并选择与所需python相对应的Anaconda版本进行下载
## 2.3.1.2 安装Anaconda
***步骤一:安装***
打开下载的安装包(以.pkl为后缀),根据引导完成安装,在安装过程中可以修改安装路径,具体如下图所示:
<div align=center><img width="600" height="400" src="./pics/anaconda1.png"/></div>
【注意】默认安装在MacOS当前用户主目录下
***步骤二:设置环境变量***
在命令行中执行下述命令
```cmd
# 将anaconda的bin目录加入PATH
# 根据安装路径的不同,修改”/Users/anaconda3/bin“
echo 'export PATH="/Users/anaconda3/bin:$PATH"' >> ~/.bash_profile
# 更新bash_profile以立即生效
source ~/.bash_profile
```
## 2.3.1.3 使用Anaconda
在命令行中执行下述命令
```cmd
# 创建一个名为mypaddle的环境,指定python版本是3.5
conda create -n mypaddle python=3.5
# 创建好后,使用activate进入环境
source activate mypaddle
python --version
# 若上述命令行出现Anaconda字样,则表示安装成功
# 退出环境
source deactivate
```
## 2.3.2.1 安装PaddlePaddle
PaddlePaddle可以在64-bit的MacOS10.11、MacOS10.12、MacOS10.13、MacOS10.14上运行,同时支持python2(>=2.7.15)和python3(>= 3.5.1),但pip版本必须高于9.0.1。目前,MacOS环境仅支持CPU版PaddlePaddle。
- 在命令行中执行下述命令
```cmd
# 进入创建好的Anaconda环境
source activate mypaddle
# (选择1)安装CPU版本PaddlePaddle
pip install -U paddlepaddle
# (选择2)安装指定版本PaddlePaddle
pip install -U paddlepaddle==[版本号]
```
- 安装成功后,打开python命令行,使用以下代码进行测试:
```python
import paddle.fluid as fluid
fluid.install_check.run_check()
# 若出现Your Paddle Fluid is installed successfully!字样则表示安装成功
```
## 2.3.3.1 安装LabelMe
在命令行中执行下述命令
```cmd
# 进入创建好的Anaconda环境
source activate mypaddle
# (选择一)python版本为2.x
conda install pyqt
# (选择二)python版本为3.x
pip install pyqt5
# 安装LabelMe
pip install labelme
```
## 2.3.3.2 使用LabelMe
在命令行中执行下述命令,则会出现标注工具
```cmd
# 进入创建好的Anaconda环境
source activate mypaddle
# 开启LabelMe
labelme
```
LabelMe标注工具界面主要如下图所示:
<div align=center><img width="800" height="450" src="./pics/labelme1.png"/></div>
LabelMe默认标注多边形,可在图像中右键选择标注其他类型的框,如下图所示:
<div align=center><img width="800" height="450" src="./pics/labelme2.png"/></div>
<img src="./paddlex.png" width = "200" height = "31" alt="PaddleX" align=center />
PaddleX是基于飞桨技术生态的全流程深度学习模型开发工具。具备易集成,易使用,全流程等特点。PaddleX作为深度学习开发工具,不仅提供了开源的内核代码,可供用户灵活使用或集成,同时也提供了配套的前端可视化客户端套件,让用户以可视化地方式进行模型开发,访问[PaddleX官网](https://www.paddlepaddle.org.cn/paddlex/download)获取更多相关细节。
## 安装
查看[PaddleX安装文档](docs/install.md)
<div align=center> ## 文档
<br/><img src="./images/paddlexlogo.png" width = "450" height = "69" alt="PaddleX" align=center /> 推荐访问PaddleX在线使用文档,快速查阅读使用教程和API文档说明。
</br> 常用文档
- [10分钟快速上手PaddleX模型训练](docs/quick_start.md)
- [PaddleX使用教程](docs/tutorials)
- [PaddleX模型库](docs/model_zoo.md)
</div>
## 反馈
- 项目官网: https://www.paddlepaddle.org.cn/paddlex
飞桨全流程开发工具,集飞桨核心框架、模型库、工具及组件等深度学习开发所需全部能力于一身,打通深度学习开发全流程,并提供简明易懂的Python API,方便用户根据实际生产需求进行直接调用或二次开发,为开发者提供飞桨全流程开发的最佳实践。 - PaddleX用户QQ群: 1045148026 (手机QQ扫描如下二维码快速加入)
<img src="./QQGroup.jpeg" width="195" height="300" alt="QQGroup" align=center />
为了帮助开发者更好的了解飞桨的开发步骤以及所涉及的模块组件,进一步提升项目开发效率,我们还为开发者提供了基于PaddleX实现的图形化开发界即可面示例,用户可以基于该界面示例进行改造,开发符合自己习惯的操作界面。开发者可以根据实际业务需求,直接调用、改造PaddleX后端技术内核来开发项目,也可使用图形化开发界面快速体验飞桨模型开发全流程。
我们诚挚地邀请您前往 [官网](https://www.paddlepaddle.org.cn/paddle/paddlex)下载试用PaddleX可视化前端,并提出您的宝贵意见或贡献项目。PaddleX代码将在5月初随正式版本发布时,全部开源。
## 目录
* <a href="#1">**产品特性**</a>
* <a href="#2">**PaddleX 可视化前端**</a>
1. <a href="#a">下载可视化前端</a>
2. <a href="#b">准备数据</a>
3. <a href="#c">导入我的数据集</a>
4. <a href="#d">创建项目</a>
5. <a href="#e">项目开发</a>
* <a href="#3">**PaddleX 后端技术内核**</a>
* <a href="#4">**FAQ**</a>
## <a name="1">产品特性</a>
1. **全流程打通**
将深度学习开发从数据接入、模型训练、参数调优、模型评估、预测部署全流程打通,并提供可视化的使用界面,省去了对各环节间串连的代码开发与脚本调用,极大地提升了开发效率。
2. **开源技术内核**
集成PaddleCV领先的视觉算法和面向任务的开发套件、预训练模型应用工具PaddleHub、训练可视化工具VisualDL、模型压缩工具库PaddleSlim等技术能力于一身,并提供简明易懂的Python API,完全开源开放,易于集成和二次开发,为您的业务实践全程助力。
3. **本地一键安装**
高度兼容Windows、Mac、Linux系统,同时支持NVIDIA GPU加速深度学习训练。本地开发、保证数据安全,高度符合产业应用的实际需求。
4. **教程与服务**
从数据集准备到上线部署,为您提供业务开发全流程的文档说明及技术服务。开发者可以通过QQ群、微信群、GitHub社区等多种形式与飞桨团队及同业合作伙伴交流沟通。
## <a name="2">PaddleX 可视化前端</a>
**<a name="a">第一步:下载可视化前端</a>**
您需要前往 [官网](https://www.paddlepaddle.org.cn/paddle/paddlex)填写基本信息后下载试用PaddleX可视化前端
**<a name="b">第二步:准备数据**</a>
在开始模型训练前,您需要根据不同的任务类型,将数据标注为相应的格式。目前PaddleX支持【图像分类】、【目标检测】、【语义分割】、【实例分割】四种任务类型。不同类型任务的数据处理方式可查看[数据标注方式](https://github.com/PaddlePaddle/PaddleX/tree/master/DataAnnotation/AnnotationNote)
**<a name="c">第三步:导入我的数据集</a>**
①数据标注完成后,您需要根据不同的任务,将数据和标注文件,按照客户端提示更名并保存到正确的文件中。
②在客户端新建数据集,选择与数据集匹配的任务类型,并选择数据集对应的路径,将数据集导入。
<img src="./images/00%E6%95%B0%E6%8D%AE%E9%9B%86%E5%AF%BC%E5%85%A5%E8%AF%B4%E6%98%8E.png" width = "500" height = "350" alt="00数据集导入说明" align=center />
③选定导入数据集后,客户端会自动校验数据及标注文件是否合规,校验成功后,您可根据实际需求,将数据集按比例划分为训练集、验证集、测试集。
④您可在「数据分析」模块按规则预览您标注的数据集,双击单张图片可放大查看。
<img src="./images/01%E6%95%B0%E6%8D%AE%E5%88%87%E5%88%86%E5%8F%8A%E9%A2%84%E8%A7%88.png" width = "500" height = "300" alt="01数据切分及预览" align=center />
**<a name="d">第四步:创建项目</a>**
① 在完成数据导入后,您可以点击「新建项目」创建一个项目。
② 您可根据实际任务需求选择项目的任务类型,需要注意项目所采用的数据集也带有任务类型属性,两者需要进行匹配。
<img src="./images/02%E5%88%9B%E5%BB%BA%E9%A1%B9%E7%9B%AE.png" width = "500" height = "300" alt="02创建项目" align=center />
<a name="e">**第五步:项目开发**</a>
**数据选择**:项目创建完成后,您需要选择已载入客户端并校验后的数据集,并点击下一步,进入参数配置页面。
<img src="./images/03%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E9%9B%86.png" width = "400" height = "200" alt="03选择数据集" align=center />
**参数配置**:主要分为**模型参数****训练参数****优化策略**三部分。您可根据实际需求选择模型结构及对应的训练参数、优化策略,使得任务效果最佳。
<img src="./images/04参数配置-2.png" width = "500" height = "500" alt="04参数配置-2" align=center />
参数配置完成后,点击启动训练,模型开始训练并进行效果评估。
**训练可视化**
在训练过程中,您可通过VisualDL查看模型训练过程时的参数变化、日志详情,及当前最优的训练集和验证集训练指标。模型在训练过程中通过点击"终止训练"随时终止训练过程。
<img src="./images/05%E8%AE%AD%E7%BB%83%E5%8F%AF%E8%A7%86%E5%8C%96.png" width = "500" height = "350" alt="05训练可视化" align=center />
<img src="./images/06VisualDL.png" width = "500" height = "300" alt="06VisualDL" align=center />
模型训练结束后,点击”下一步“,进入模型评估页面。
**模型评估**
在模型评估页面,您可将训练后的模型应用在切分时留出的「验证数据集」以测试模型在验证集上的效果。评估方法包括混淆矩阵、精度、召回率等。在这个页面,您也可以直接查看模型在测试数据集上的预测效果。
根据评估结果,您可决定进入模型发布页面,或返回先前步骤调整参数配置重新进行训练。
<img src="./images/07%E6%A8%A1%E5%9E%8B%E8%AF%84%E4%BC%B0.jpg" width = "500" height = "550" alt="07模型评估" align=center />
**模型发布**
当模型效果满意后,您可根据实际的生产环境需求,选择将模型发布为需要的版本。
<img src="./images/08%E6%A8%A1%E5%9E%8B%E5%8F%91%E5%B8%83.png" width = "450" height = "350" alt="08模型发布" align=center />
## <a name="3">PaddleX 技术内核</a>
将于2020年5月下旬全面开源,敬请期待
## <a name="4">FAQ</a>
1. **为什么我的数据集没办法切分?**
如果您的数据集已经被一个或多个项目引用,数据集将无法切分,您可以额外新建一个数据集,并引用同一批数据,再选择不同的切分比例。
2. **任务和项目的区别是什么?**
一个项目可以包含多条任务,一个项目拥有唯一的数据集,但采用不同的参数配置启动训练会创建多条任务,方便您对比采用不同参数配置的训练效果,并管理多个任务。
3. **为什么训练速度这么慢?**
PaddleX完全采用您本地的硬件进行计算,深度学习任务确实对算力的要求比较高,为了使您能快速体验应用PaddleX进行开发,我们适配了CPU硬件,但强烈建议您使用GPU以提升训练速度和开发体验。
4. **我可以在服务器或云平台上部署PaddleX么?**
当前PaddleX 可视化前端是一个适配本地单机安装的Client,无法在服务器上直接进行部署,您可以直接使用PaddleX Core后端技术内核,或采用飞桨核心框架进行服务器上的部署。如果您希望使用公有算力,强烈建议您尝试飞桨产品系列中的 [EasyDL](https://ai.baidu.com/easydl/)[AI Studio](https://aistudio.baidu.com/aistudio/index)进行开发。
5. **为什么我的安装总是报错?**
PaddleX的安装包中打包了PaddlePaddle全流程开发所需的所有依赖,理论上不需要您额外安装CUDA等ToolKit (如使用NVIDIA GPU), 但对操作系统版本、处理器架构、驱动版本等有一定要求,如安装发生错误,建议您先检查开发环境是否与PaddleX推荐环境匹配。
**这里我们推荐您在以下环境中安装使用PaddleX:**
* **操作系统:**
* Windows 7/8/10(推荐Windows 10);
* Mac OS 10.13+ ;
* Ubuntu 18.04+;
***注:处理器需为x86_64架构,支持 MKL***
* **训练硬件:**
* **GPU**(Windows及Linux系统):
推荐使用支持CUDA的NVIDIA显卡,例如:GTX 1070+以上性能的显卡;
Windows系统X86_64驱动版本>=411.31;
Linux系统X86_64驱动版本>=410.48;
显存8G以上;
* **CPU**:
PaddleX 当前支持您用本地CPU进行训练,但推荐使用GPU以获得更好的开发体验。
* **内存** :建议8G以上
* **硬盘空间** :建议SSD剩余空间1T以上(非必须)
***注:PaddleX 在 Windows及Mac系统只支持单卡模式。Windows暂时不支持NCCL。***
**如果您有更多问题或建议,欢迎以issue的形式,或加入PaddleX官方QQ群(1045148026)直接反馈您的意见及建议**
<div align=center>
<img src="./images/09qq%E7%BE%A4%E4%BA%8C%E7%BB%B4%E7%A0%81.png" alt="09qq群二维码" align=center />
</div>
## 飞桨技术生态
- [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)
- [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg)
- [PaddleClas](https://github.com/PaddlePaddle/PaddleClas)
- [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
- [PaddleHub](https://github.com/PaddlePaddle/PaddleHub)
- [PaddleLite](https://github.com/PaddlePaddle/Paddle-Lite)
- [VisualDL](https://github.com/PaddlePaddle/VisualDL)
path=$(cd `dirname $0`; pwd)
cd $path
pip install pre-commit
pip install yapf
pre-commit install
# 常见问题
## 1. 训练过程因显存不够出错
> 通过使用在终端`nvidia-smi`命令,查看GPU是否被其它任务占用,尝试清除其它任务;
> 调低训练时的`batch_size`参数,从而降低显存的要求,注意需等比例调低`learning_rate`等参数;
> 选用更小的模型或backbone。
## 2. 是否有更小的模型,适用于更低配置的设备上运行
> 可以使用模型裁剪,参考文档[模型裁剪使用教程](slim/prune.md),通过调整裁剪参数,可以控制模型裁剪后的大小,在实际实验中,如VOC检测数据,使用yolov3-mobilenet,原模型大小为XXM,裁剪后为XX M,精度基本保持不变
## 3. 如何配置训练时GPU的卡数
> 通过在终端export环境变量,或在Python代码中设置,可参考文档[CPU/多卡GPU训练](gpu_configure.md)
## 4. 想将之前训练的模型参数上继续训练
> 在训练调用`train`接口时,将`pretrain_weights`设为之前的模型保存路径即可
## 5. PaddleX保存的模型分为正常训练过程中产生、裁剪训练产生、导出为部署模型和量化保存这么多种,有什么差别,怎么区分
**不同模型的功能差异**
>1.正常模型训练保存
>
>>模型在正常训练过程,每间隔n个epoch保存的模型目录,模型可作为预训练模型参数,可使用PaddleX加载预测、或导出部署模型
>2.裁剪训练保存
>
>>模型在裁剪训练过程,每间隔n个epoch保存的模型目录,模型不可作为预训练模型参数,可使用PaddleX加载预测、或导出部署模型
>3.导出部署模型
>
>>为了模型在服务端部署,导出的模型目录,不可作为预训练模型参数,可使用PaddleX加载预测
>4.量化保存模型
>
>>为了提升模型预测速度,将模型参数进行量化保存的模型目录,模型不可作为预训练模型参数,可使用PaddleX加载预测
**区分方法**
>> 通过模型目录下model.yml文件中`status`字段来区别不同的模型类型, 'Normal'、'Prune'、'Infer'、'Quant'分别表示正常模型训练保存、裁剪训练保存、导出的部署模型、量化保存模型
## 6. 模型训练需要太久时间,或者训练速度太慢,怎么提速
> 1.模型训练速度与用户选定的模型大小,和设定的`batch_size`相关,模型大小可直接参考[模型库](model_zoo.md)中的指标,一般而言,模型越大,训练速度就越慢;
> 2.在模型速度之外,模型训练完成所需的时间又与用户设定的`num_epochs`迭代轮数相关,用户可以通过观察模型在验证集上的指标来决定是否提示结束掉训练进程(训练时设定`save_interval_epochs`参数,训练过程会每间隔`save_interval_epochs`轮数在验证集上计算指标,并保存模型);
## 7. 如何设定迭代的轮数
> 1. 用户自行训练时,如不确定迭代的轮数,可以将轮数设高一些,同时注意设置`save_interval_epochs`,这样模型迭代每间隔相应轮数就会在验证集上进行评估和保存,可以根据不同轮数模型在验证集上的评估指标,判断模型是否已经收敛,若模型已收敛,可以自行结束训练进程
>
## 8. 只有CPU,没有GPU,如何提升训练速度
> 当没有GPU时,可以根据自己的CPU配置,选择是否使用多CPU进行训练,具体配置方式可以参考文档[多卡CPU/GPU训练](gpu_configure.md)
>
## 9. 电脑不能联网,训练时因为下载预训练模型失败,如何解决
> 可以预先通过其它方式准备好预训练模型,然后训练时自定义`pretrain_weights`即可,可参考文档[无联网模型训练](how_to_offline_run.md)
## 10. 每次训练新的模型,都需要重新下载预训练模型,怎样可以下载一次就搞定
> 1.可以按照9的方式来解决这个问题
> 2.每次训练前都设定`paddlex.pretrain_dir`路径,如设定`paddlex.pretrain_dir='/usrname/paddlex`,如此下载完的预训练模型会存放至`/usrname/paddlex`目录下,而已经下载在该目录的模型也不会再次重复下载
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
# PaddleX文档
PaddleX的使用文档均在本目录结构下。文档采用Read the Docs方式组织,您可以直接访问[在线文档](https://www.baidu.com)进行查阅。
## 编译文档
在本目录下按如下步骤进行文档编译
- 安装依赖: `pip install -r requirements.txt`
- 编译: `make html`
编译完成后,编译文件在`_build`目录下,直接打开`_build/html/index.html`即可查阅。
# 数据集-datasets
## ImageNet类
```
paddlex.datasets.ImageNet(data_dir, file_list, label_list, transforms=None, num_workers=‘auto’, buffer_size=100, parallel_method='thread', shuffle=False)
```
读取ImageNet格式的分类数据集,并对样本进行相应的处理。ImageNet数据集格式的介绍可查看文档:[数据集格式说明](../datasets.md)
示例:[代码文件](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/classification/mobilenetv2.py#L25)
### 参数
> * **data_dir** (str): 数据集所在的目录路径。
> * **file_list** (str): 描述数据集图片文件和类别id的文件路径(文本内每行路径为相对`data_dir`的相对路径)。
> * **label_list** (str): 描述数据集包含的类别信息文件路径。
> * **transforms** (paddlex.cls.transforms): 数据集中每个样本的预处理/增强算子,详见[paddlex.cls.transforms](./transforms/cls_transforms.md)。
> * **num_workers** (int|str):数据集中样本在预处理过程中的线程或进程数。默认为'auto'。当设为'auto'时,根据系统的实际CPU核数设置`num_workers`: 如果CPU核数的一半大于8,则`num_workers`为8,否则为CPU核数的一半。
> * **buffer_size** (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。
> * **parallel_method** (str): 数据集中样本在预处理过程中并行处理的方式,支持'thread'线程和'process'进程两种方式。默认为'thread'(Windows和Mac下会强制使用thread,该参数无效)。
> * **shuffle** (bool): 是否需要对数据集中样本打乱顺序。默认为False。
## VOCDetection类
```
paddlex.datasets.VOCDetection(data_dir, file_list, label_list, transforms=None, num_workers=‘auto’, buffer_size=100, parallel_method='thread', shuffle=False)
```
读取PascalVOC格式的检测数据集,并对样本进行相应的处理。PascalVOC数据集格式的介绍可查看文档:[数据集格式说明](../datasets.md)
示例:[代码文件](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/detection/yolov3_mobilenetv1.py#L29)
### 参数
> * **data_dir** (str): 数据集所在的目录路径。
> * **file_list** (str): 描述数据集图片文件和对应标注文件的文件路径(文本内每行路径为相对`data_dir`的相对路径)。
> * **label_list** (str): 描述数据集包含的类别信息文件路径。
> * **transforms** (paddlex.det.transforms): 数据集中每个样本的预处理/增强算子,详见[paddlex.det.transforms](./transforms/det_transforms.md)。
> * **num_workers** (int|str):数据集中样本在预处理过程中的线程或进程数。默认为'auto'。当设为'auto'时,根据系统的实际CPU核数设置`num_workers`: 如果CPU核数的一半大于8,则`num_workers`为8,否则为CPU核数的一半。
> * **buffer_size** (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。
> * **parallel_method** (str): 数据集中样本在预处理过程中并行处理的方式,支持'thread'线程和'process'进程两种方式。默认为'thread'(Windows和Mac下会强制使用thread,该参数无效)。
> * **shuffle** (bool): 是否需要对数据集中样本打乱顺序。默认为False。
## COCODetection类
```
paddlex.datasets.COCODetection(data_dir, ann_file, transforms=None, num_workers='auto', buffer_size=100, parallel_method='thread', shuffle=False)
```
读取MSCOCO格式的检测数据集,并对样本进行相应的处理,该格式的数据集同样可以应用到实例分割模型的训练中。MSCOCO数据集格式的介绍可查看文档:[数据集格式说明](../datasets.md)
示例:[代码文件](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/detection/mask_rcnn_r50_fpn.py#L27)
### 参数
> * **data_dir** (str): 数据集所在的目录路径。
> * **ann_file** (str): 数据集的标注文件,为一个独立的json格式文件。
> * **transforms** (paddlex.det.transforms): 数据集中每个样本的预处理/增强算子,详见[paddlex.det.transforms](./transforms/det_transforms.md)。
> * **num_workers** (int|str):数据集中样本在预处理过程中的线程或进程数。默认为'auto'。当设为'auto'时,根据系统的实际CPU核数设置`num_workers`: 如果CPU核数的一半大于8,则`num_workers`为8,否则为CPU核数的一半。
> * **buffer_size** (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。
> * **parallel_method** (str): 数据集中样本在预处理过程中并行处理的方式,支持'thread'线程和'process'进程两种方式。默认为'thread'(Windows和Mac下会强制使用thread,该参数无效)。
> * **shuffle** (bool): 是否需要对数据集中样本打乱顺序。默认为False。
## SegDataset类
```
paddlex.datasets.SegDataset(data_dir, file_list, label_list, transforms=None, num_workers='auto', buffer_size=100, parallel_method='thread', shuffle=False)
```
读取语分分割任务数据集,并对样本进行相应的处理。语义分割任务数据集格式的介绍可查看文档:[数据集格式说明](../datasets.md)
示例:[代码文件](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/segmentation/unet.py#L27)
### 参数
> * **data_dir** (str): 数据集所在的目录路径。
> * **file_list** (str): 描述数据集图片文件和对应标注文件的文件路径(文本内每行路径为相对`data_dir`的相对路径)。
> * **label_list** (str): 描述数据集包含的类别信息文件路径。
> * **transforms** (paddlex.seg.transforms): 数据集中每个样本的预处理/增强算子,详见[paddlex.seg.transforms](./transforms/seg_transforms.md)。
> * **num_workers** (int|str):数据集中样本在预处理过程中的线程或进程数。默认为'auto'。当设为'auto'时,根据系统的实际CPU核数设置`num_workers`: 如果CPU核数的一半大于8,则`num_workers`为8,否则为CPU核数的一半。
> * **buffer_size** (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。
> * **parallel_method** (str): 数据集中样本在预处理过程中并行处理的方式,支持'thread'线程和'process'进程两种方式。默认为'thread'(Windows和Mac下会强制使用thread,该参数无效)。
> * **shuffle** (bool): 是否需要对数据集中样本打乱顺序。默认为False。
# paddlex.deploy
使用AnalysisPredictor进行预测部署。
## create_predictor
```
paddlex.deploy.create_predictor(model_dir, use_gpu=False)
```
#### Args
* **model_dir**: 训练过程中保存的模型路径
* **use_gpu**: 是否使用GPU进行预测
#### Returns
* **Predictor**: paddlex.deploy.predictor.Predictor
### 示例
```
import paddlex
# 下
Predictor = paddlex.deploy.create_predictor(model_dir, use_gpu=True)
```
## ClassifyPredictor
继承至paddlex.deploy.predictor.Predictor,当model_dir/model.yml里面
```
paddlex.deploy.create_predictor(model_dir, use_gpu=False)
```
#### Args
* **model_dir**: 训练过程中保存的模型路径
* **use_gpu**: 是否使用GPU进行预测
#### Returns
* **Predictor**: paddlex.deploy.predictor.Predictor
### 示例
```
import paddlex
# 下
Predictor = paddlex.deploy.create_predictor(model_dir, use_gpu=True)
```
接口说明
============================
.. toctree::
:maxdepth: 2
transforms/index.rst
datasets.md
models.md
slim.md
load_model.md
visualize.md
# 模型加载-load_model
PaddleX提供了统一的模型加载接口,支持加载PaddleX保存的模型,并在验证集上进行评估或对测试图片进行预测
## 函数接口
```
paddlex.load_model(model_dir)
```
### 参数
* **model_dir**: 训练过程中保存的模型路径
### 返回值
* **paddlex.cv.models**, 模型类。
### 示例
> 1. [点击下载](https://bj.bcebos.com/paddlex/models/garbage_epoch_12.tar.gz)PaddleX在垃圾分拣数据上训练的MaskRCNN模型
> 2. [点击下载](https://bj.bcebos.com/paddlex/datasets/garbage_ins_det.tar.gz)垃圾分拣数据集
```
import paddlex as pdx
model_dir = './garbage_epoch_12'
data_dir = './garbage_ins_det/JPEGImages'
ann_file = './garbage_ins_det/val.json'
# 加载垃圾分拣模型
model = pdx.load_model(model_dir)
# 预测
pred_result = model.predict('./garbage_ins_det/JPEGImages/000114.bmp')
# 在验证集上进行评估
eval_reader = pdx.cv.datasets.CocoDetection(data_dir=data_dir,
ann_file=ann_file
transforms=model.eval_transforms)
eval_result = model.evaluate(eval_reader, batch_size=1)
```
# 模型-models
## 分类模型
### ResNet50类
```python
paddlex.cls.ResNet50(num_classes=1000)
```
构建ResNet50分类器,并实现其训练、评估和预测。
#### **参数:**
> - **num_classes** (int): 类别数。默认为1000。
#### 分类器训练函数接口
> ```python
> train(self, num_epochs, train_dataset, train_batch_size=64, eval_dataset=None, save_interval_epochs=1, log_interval_steps=2, save_dir='output', pretrain_weights='IMAGENET', optimizer=None, learning_rate=0.025, lr_decay_epochs=[30, 60, 90], lr_decay_gamma=0.1, use_vdl=False, sensitivities_file=None, eval_metric_loss=0.05)
> ```
>
> **参数:**
>
> > - **num_epochs** (int): 训练迭代轮数。
> > - **train_dataset** (paddlex.datasets): 训练数据读取器。
> > - **train_batch_size** (int): 训练数据batch大小。同时作为验证数据batch大小。默认值为64。
> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
> > - **save_interval_epochs** (int): 模型保存间隔(单位:迭代轮数)。默认为1。
> > - **log_interval_steps** (int): 训练日志输出间隔(单位:迭代步数)。默认为2。
> > - **save_dir** (str): 模型保存路径。
> > - **pretrain_weights** (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认为'IMAGENET'。
> > - **optimizer** (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
> > - **learning_rate** (float): 默认优化器的初始学习率。默认为0.025。
> > - **lr_decay_epochs** (list): 默认优化器的学习率衰减轮数。默认为[30, 60, 90]。
> > - **lr_decay_gamma** (float): 默认优化器的学习率衰减率。默认为0.1。
> > - **use_vdl** (bool): 是否使用VisualDL进行可视化。默认值为False。
> > - **sensitivities_file** (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',则自动下载在ImageNet图片数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
> > - **eval_metric_loss** (float): 可容忍的精度损失。默认为0.05。
#### 分类器评估函数接口
> ```python
> evaluate(self, eval_dataset, batch_size=1, epoch_id=None, return_details=False)
> ```
>
> **参数:**
>
> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
> > - **batch_size** (int): 验证数据批大小。默认为1。
> > - **epoch_id** (int): 当前评估模型所在的训练轮数。
> > - **return_details** (bool): 是否返回详细信息,默认False。
>
> **返回值:**
>
> > - **dict**: 当return_details为False时,返回dict, 包含关键字:'acc1'、'acc5',分别表示最大值的accuracy、前5个最大值的accuracy。
> > - **tuple** (metrics, eval_details): 当`return_details`为True时,增加返回dict,包含关键字:'true_labels'、'pred_scores',分别代表真实类别id、每个类别的预测得分。
#### 分类器预测函数接口
> ```python
> predict(self, img_file, transforms=None, topk=5)
> ```
>
> **参数:**
>
> > - **img_file** (str): 预测图像路径。
> > - **transforms** (paddlex.cls.transforms): 数据预处理操作。
> > - **topk** (int): 预测时前k个最大值。
> **返回值:**
>
> > - **list**: 其中元素均为字典。字典的关键字为'category_id'、'category'、'score',
> > 分别对应预测类别id、预测类别标签、预测得分。
### 其它分类器类
`ResNet50`外,`paddlex.cls`下还提供了`ResNet18``ResNet34``ResNet101``ResNet50_vd``ResNet101_vd``DarkNet53``MobileNetV1``MobileNetV2``MobileNetV3_small``MobileNetV3_large``Xception41``Xception65``Xception71``ShuffleNetV2`, 使用方式(包括函数接口和参数)均与`ResNet50`一致,各模型效果可参考[模型库](../model_zoo.md)中列表。
## 检测模型
### YOLOv3类
```python
paddlex.det.YOLOv3(num_classes=80, backbone='MobileNetV1', anchors=None, anchor_masks=None, ignore_threshold=0.7, nms_score_threshold=0.01, nms_topk=1000, nms_keep_topk=100, nms_iou_threshold=0.45, label_smooth=False, train_random_shapes=[320, 352, 384, 416, 448, 480, 512, 544, 576, 608])
```
构建YOLOv3检测器,并实现其训练、评估和预测。
**参数:**
> - **num_classes** (int): 类别数。默认为80。
> - **backbone** (str): YOLOv3的backbone网络,取值范围为['DarkNet53', 'ResNet34', 'MobileNetV1', 'MobileNetV3_large']。默认为'MobileNetV1'。
> - **anchors** (list|tuple): anchor框的宽度和高度,为None时表示使用默认值
> [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
> [59, 119], [116, 90], [156, 198], [373, 326]]。
> - **anchor_masks** (list|tuple): 在计算YOLOv3损失时,使用anchor的mask索引,为None时表示使用默认值
> [[6, 7, 8], [3, 4, 5], [0, 1, 2]]。
> - **ignore_threshold** (float): 在计算YOLOv3损失时,IoU大于`ignore_threshold`的预测框的置信度被忽略。默认为0.7。
> - **nms_score_threshold** (float): 检测框的置信度得分阈值,置信度得分低于阈值的框应该被忽略。默认为0.01。
> - **nms_topk** (int): 进行NMS时,根据置信度保留的最大检测框数。默认为1000。
> - **nms_keep_topk** (int): 进行NMS后,每个图像要保留的总检测框数。默认为100。
> - **nms_iou_threshold** (float): 进行NMS时,用于剔除检测框IOU的阈值。默认为0.45。
> - **label_smooth** (bool): 是否使用label smooth。默认值为False。
> - **train_random_shapes** (list|tuple): 训练时从列表中随机选择图像大小。默认值为[320, 352, 384, 416, 448, 480, 512, 544, 576, 608]。
#### YOLOv3训练函数接口
> ```python
> train(self, num_epochs, train_dataset, train_batch_size=8, eval_dataset=None, save_interval_epochs=20, log_interval_steps=2, save_dir='output', pretrain_weights='IMAGENET', optimizer=None, learning_rate=1.0/8000, warmup_steps=1000, warmup_start_lr=0.0, lr_decay_epochs=[213, 240], lr_decay_gamma=0.1, metric=None, use_vdl=False, sensitivities_file=None, eval_metric_loss=0.05)
> ```
>
> **参数:**
>
> > - **num_epochs** (int): 训练迭代轮数。
> > - **train_dataset** (paddlex.datasets): 训练数据读取器。
> > - **train_batch_size** (int): 训练数据batch大小。目前检测仅支持单卡评估,训练数据batch大小与显卡数量之商为验证数据batch大小。默认值为8。
> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
> > - **save_interval_epochs** (int): 模型保存间隔(单位:迭代轮数)。默认为20。
> > - **log_interval_steps** (int): 训练日志输出间隔(单位:迭代次数)。默认为2。
> > - **save_dir** (str): 模型保存路径。默认值为'output'。
> > - **pretrain_weights** (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认为None。
> > - **optimizer** (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
> > - **learning_rate** (float): 默认优化器的学习率。默认为1.0/8000。
> > - **warmup_steps** (int): 默认优化器进行warmup过程的步数。默认为1000。
> > - **warmup_start_lr** (int): 默认优化器warmup的起始学习率。默认为0.0。
> > - **lr_decay_epochs** (list): 默认优化器的学习率衰减轮数。默认为[213, 240]。
> > - **lr_decay_gamma** (float): 默认优化器的学习率衰减率。默认为0.1。
> > - **metric** (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认值为None。
> > - **use_vdl** (bool): 是否使用VisualDL进行可视化。默认值为False。
> > - **sensitivities_file** (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',则自动下载在PascalVOC数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
> > - **eval_metric_loss** (float): 可容忍的精度损失。默认为0.05。
#### YOLOv3评估函数接口
> ```python
> evaluate(self, eval_dataset, batch_size=1, epoch_id=None, metric=None, return_details=False)
> ```
>
> **参数:**
>
> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
> > - **batch_size** (int): 验证数据批大小。默认为1。
> > - **epoch_id** (int): 当前评估模型所在的训练轮数。
> > - **metric** (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认为None,根据用户传入的Dataset自动选择,如为VOCDetection,则`metric`为'VOC';如为COCODetection,则`metric`为'COCO'默认为None。
> > - **return_details** (bool): 是否返回详细信息。默认值为False。
> >
> **返回值:**
>
> > - **tuple** (metrics, eval_details) | **dict** (metrics): 当`return_details`为True时,返回(metrics, eval_details),当`return_details`为False时,返回metrics。metrics为dict,包含关键字:'bbox_mmap'或者’bbox_map‘,分别表示平均准确率平均值在各个阈值下的结果取平均值的结果(mmAP)、平均准确率平均值(mAP)。eval_details为dict,包含关键字:'bbox',对应元素预测结果列表,每个预测结果由图像id、预测框类别id、预测框坐标、预测框得分;’gt‘:真实标注框相关信息。
#### YOLOv3预测函数接口
> ```python
> predict(self, img_file, transforms=None)
> ```
>
> **参数:**
>
> > - **img_file** (str): 预测图像路径。
> > - **transforms** (paddlex.det.transforms): 数据预处理操作。
>
> **返回值:**
>
> > - **list**: 预测结果列表,列表中每个元素均为一个dict,key'bbox', 'category', 'category_id', 'score',分别表示每个预测目标的框坐标信息、类别、类别id、置信度,其中框坐标信息为[xmin, ymin, w, h],即左上角x, y坐标和框的宽和高。
### FasterRCNN类
```python
paddlex.det.FasterRCNN(num_classes=81, backbone='ResNet50', with_fpn=True, aspect_ratios=[0.5, 1.0, 2.0], anchor_sizes=[32, 64, 128, 256, 512])
```
构建FasterRCNN检测器,并实现其训练、评估和预测。
**参数:**
> - **num_classes** (int): 包含了背景类的类别数。默认为81。
> - **backbone** (str): FasterRCNN的backbone网络,取值范围为['ResNet18', 'ResNet50', 'ResNet50vd', 'ResNet101', 'ResNet101vd']。默认为'ResNet50'。
> - **with_fpn** (bool): 是否使用FPN结构。默认为True。
> - **aspect_ratios** (list): 生成anchor高宽比的可选值。默认为[0.5, 1.0, 2.0]。
> - **anchor_sizes** (list): 生成anchor大小的可选值。默认为[32, 64, 128, 256, 512]。
#### FasterRCNN训练函数接口
> ```python
> train(self, num_epochs, train_dataset, train_batch_size=2, eval_dataset=None, save_interval_epochs=1, log_interval_steps=2,save_dir='output', pretrain_weights='IMAGENET', optimizer=None, learning_rate=0.0025, warmup_steps=500, warmup_start_lr=1.0/1200, lr_decay_epochs=[8, 11], lr_decay_gamma=0.1, metric=None, use_vdl=False)
>
> ```
>
> **参数:**
>
> > - **num_epochs** (int): 训练迭代轮数。
> > - **train_dataset** (paddlex.datasets): 训练数据读取器。
> > - **train_batch_size** (int): 训练数据batch大小。目前检测仅支持单卡评估,训练数据batch大小与显卡数量之商为验证数据batch大小。默认为2。
> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
> > - **save_interval_epochs** (int): 模型保存间隔(单位:迭代轮数)。默认为1。
> > - **log_interval_steps** (int): 训练日志输出间隔(单位:迭代次数)。默认为2。
> > - **save_dir** (str): 模型保存路径。默认值为'output'。
> > - **pretrain_weights** (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认为None。
> > - **optimizer** (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
> > - **learning_rate** (float): 默认优化器的初始学习率。默认为0.0025。
> > - **warmup_steps** (int): 默认优化器进行warmup过程的步数。默认为500。
> > - **warmup_start_lr** (int): 默认优化器warmup的起始学习率。默认为1.0/1200。
> > - **lr_decay_epochs** (list): 默认优化器的学习率衰减轮数。默认为[8, 11]。
> > - **lr_decay_gamma** (float): 默认优化器的学习率衰减率。默认为0.1。
> > - **metric** (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认值为None。
> > - **use_vdl** (bool): 是否使用VisualDL进行可视化。默认值为False。
#### FasterRCNN评估函数接口
> ```python
> evaluate(self, eval_dataset, batch_size=1, epoch_id=None, metric=None, return_details=False)
>
> ```
>
> **参数:**
>
> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
> > - **batch_size** (int): 验证数据批大小。默认为1。
> > - **epoch_id** (int): 当前评估模型所在的训练轮数。
> > - **metric** (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认为None,根据用户传入的Dataset自动选择,如为VOCDetection,则`metric`为'VOC'; 如为COCODetection,则`metric`为'COCO'。
> > - **return_details** (bool): 是否返回详细信息。默认值为False。
> >
> **返回值:**
>
> > - **tuple** (metrics, eval_details) | **dict** (metrics): 当`return_details`为True时,返回(metrics, eval_details),当`return_details`为False时,返回metrics。metrics为dict,包含关键字:'bbox_mmap'或者’bbox_map‘,分别表示平均准确率平均值在各个IoU阈值下的结果取平均值的结果(mmAP)、平均准确率平均值(mAP)。eval_details为dict,包含关键字:'bbox',对应元素预测结果列表,每个预测结果由图像id、预测框类别id、预测框坐标、预测框得分;’gt‘:真实标注框相关信息。
#### FasterRCNN预测函数接口
> ```python
> predict(self, img_file, transforms=None)
>
> ```
>
> **参数:**
>
> > - **img_file** (str): 预测图像路径。
> > - **transforms** (paddlex.det.transforms): 数据预处理操作。
>
> **返回值:**
>
> > - **list**: 预测结果列表,列表中每个元素均为一个dict,key'bbox', 'category', 'category_id', 'score',分别表示每个预测目标的框坐标信息、类别、类别id、置信度,其中框坐标信息为[xmin, ymin, w, h],即左上角x, y坐标和框的宽和高。
### MaskRCNN类
```python
paddlex.det.MaskRCNN(num_classes=81, backbone='ResNet50', with_fpn=True, aspect_ratios=[0.5, 1.0, 2.0], anchor_sizes=[32, 64, 128, 256, 512])
```
构建MaskRCNN检测器,并实现其训练、评估和预测。
**参数:**
> - **num_classes** (int): 包含了背景类的类别数。默认为81。
> - **backbone** (str): MaskRCNN的backbone网络,取值范围为['ResNet18', 'ResNet50', 'ResNet50vd', 'ResNet101', 'ResNet101vd']。默认为'ResNet50'。
> - **with_fpn** (bool): 是否使用FPN结构。默认为True。
> - **aspect_ratios** (list): 生成anchor高宽比的可选值。默认为[0.5, 1.0, 2.0]。
> - **anchor_sizes** (list): 生成anchor大小的可选值。默认为[32, 64, 128, 256, 512]。
#### MaskRCNN训练函数接口
> ```python
> train(self, num_epochs, train_dataset, train_batch_size=1, eval_dataset=None, save_interval_epochs=1, log_interval_steps=20, save_dir='output', pretrain_weights='IMAGENET', optimizer=None, learning_rate=1.0/800, warmup_steps=500, warmup_start_lr=1.0 / 2400, lr_decay_epochs=[8, 11], lr_decay_gamma=0.1, metric=None, use_vdl=False)
>
> ```
>
> **参数:**
>
> > - **num_epochs** (int): 训练迭代轮数。
> > - **train_dataset** (paddlex.datasets): 训练数据读取器。
> > - **train_batch_size** (int): 训练数据batch大小。目前检测仅支持单卡评估,训练数据batch大小与显卡数量之商为验证数据batch大小。默认为1。
> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
> > - **save_interval_epochs** (int): 模型保存间隔(单位:迭代轮数)。默认为1。
> > - **log_interval_steps** (int): 训练日志输出间隔(单位:迭代次数)。默认为2。
> > - **save_dir** (str): 模型保存路径。默认值为'output'。
> > - **pretrain_weights** (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认为None。
> > - **optimizer** (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
> > - **learning_rate** (float): 默认优化器的初始学习率。默认为0.00125。
> > - **warmup_steps** (int): 默认优化器进行warmup过程的步数。默认为500。
> > - **warmup_start_lr** (int): 默认优化器warmup的起始学习率。默认为1.0/2400。
> > - **lr_decay_epochs** (list): 默认优化器的学习率衰减轮数。默认为[8, 11]。
> > - **lr_decay_gamma** (float): 默认优化器的学习率衰减率。默认为0.1。
> > - **metric** (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认值为None。
> > - **use_vdl** (bool): 是否使用VisualDL进行可视化。默认值为False。
#### MaskRCNN评估函数接口
> ```python
> evaluate(self, eval_dataset, batch_size=1, epoch_id=None, metric=None, return_details=False)
>
> ```
>
> **参数:**
>
> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
> > - **batch_size** (int): 验证数据批大小。默认为1。
> > - **epoch_id** (int): 当前评估模型所在的训练轮数。
> > - **metric** (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认为None,根据用户传入的Dataset自动选择,如为VOCDetection,则`metric`为'VOC'; 如为COCODetection,则`metric`为'COCO'。
> > - **return_details** (bool): 是否返回详细信息。默认值为False。
> >
> **返回值:**
>
> > - **tuple** (metrics, eval_details) | **dict** (metrics): 当`return_details`为True时,返回(metrics, eval_details),当return_details为False时,返回metrics。metrics为dict,包含关键字:'bbox_mmap'和'segm_mmap'或者’bbox_map‘和'segm_map',分别表示预测框和分割区域平均准确率平均值在各个IoU阈值下的结果取平均值的结果(mmAP)、平均准确率平均值(mAP)。eval_details为dict,包含关键字:'bbox',对应元素预测框结果列表,每个预测结果由图像id、预测框类别id、预测框坐标、预测框得分;'mask',对应元素预测区域结果列表,每个预测结果由图像id、预测区域类别id、预测区域坐标、预测区域得分;’gt‘:真实标注框和标注区域相关信息。
#### MaskRCNN预测函数接口
> ```python
> predict(self, img_file, transforms=None)
>
> ```
>
> **参数:**
>
> > - **img_file** (str): 预测图像路径。
> > - **transforms** (paddlex.det.transforms): 数据预处理操作。
>
> **返回值:**
>
> > - **list**: 预测结果列表,列表中每个元素均为一个dict,key'bbox', 'mask', 'category', 'category_id', 'score',分别表示每个预测目标的框坐标信息、Mask信息,类别、类别id、置信度,其中框坐标信息为[xmin, ymin, w, h],即左上角x, y坐标和框的宽和高。
## 分割模型
### DeepLabv3p类
```python
paddlex.seg.DeepLabv3p(num_classes=2, backbone='MobileNetV2_x1.0', output_stride=16, aspp_with_sep_conv=True, decoder_use_sep_conv=True, encoder_with_aspp=True, enable_decoder=True, use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255)
```
构建DeepLabv3p分割器,并实现其训练、评估和预测。
**参数:**
> - **num_classes** (int): 类别数。
> - **backbone** (str): DeepLabv3+的backbone网络,实现特征图的计算,取值范围为['Xception65', 'Xception41', 'MobileNetV2_x0.25', 'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x2.0'],'MobileNetV2_x1.0'。
> - **output_stride** (int): backbone 输出特征图相对于输入的下采样倍数,一般取值为8或16。默认16。
> - **aspp_with_sep_conv** (bool): decoder模块是否采用separable convolutions。默认True。
> - **decoder_use_sep_conv** (bool): decoder模块是否采用separable convolutions。默认True。
> - **encoder_with_aspp** (bool): 是否在encoder阶段采用aspp模块。默认True。
> - **enable_decoder** (bool): 是否使用decoder模块。默认True。
> - **use_bce_loss** (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。默认False。
> - **use_dice_loss** (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用,当`use_bce_loss`和`use_dice_loss`都为False时,使用交叉熵损失函数。默认False。
> - **class_weight** (list/str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。
> - **ignore_index** (int): label上忽略的值,label为`ignore_index`的像素不参与损失函数的计算。默认255。
#### DeepLabv3训练函数接口
> ```python
> train(self, num_epochs, train_dataset, train_batch_size=2, eval_dataset=None, eval_batch_size=1, save_interval_epochs=1, log_interval_steps=2, save_dir='output', pretrain_weights='IMAGENET', optimizer=None, learning_rate=0.01, lr_decay_power=0.9, use_vdl=False, sensitivities_file=None, eval_metric_loss=0.05):
>
> ```
>
> **参数:**
> >
> > - **num_epochs** (int): 训练迭代轮数。
> > - **train_dataset** (paddlex.datasets): 训练数据读取器。
> > - **train_batch_size** (int): 训练数据batch大小。同时作为验证数据batch大小。默认2。
> > - **eval_dataset** (paddlex.datasets): 评估数据读取器。
> > - **save_interval_epochs** (int): 模型保存间隔(单位:迭代轮数)。默认为1。
> > - **log_interval_steps** (int): 训练日志输出间隔(单位:迭代次数)。默认为2。
> > - **save_dir** (str): 模型保存路径。默认'output'
> > - **pretrain_weights** (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认'IMAGENET'。
> > - **optimizer** (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认的优化器:使用fluid.optimizer.Momentum优化方法,polynomial的学习率衰减策略。
> > - **learning_rate** (float): 默认优化器的初始学习率。默认0.01。
> > - **lr_decay_power** (float): 默认优化器学习率衰减指数。默认0.9。
> > - **use_vdl** (bool): 是否使用VisualDL进行可视化。默认False。
> > - **sensitivities_file** (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',则自动下载在ImageNet图片数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
> > - **eval_metric_loss** (float): 可容忍的精度损失。默认为0.05。
#### DeepLabv3评估函数接口
> ```python
> evaluate(self, eval_dataset, batch_size=1, epoch_id=None, return_details=False):
> ```
> **参数:**
> >
> > - **eval_dataset** (paddlex.datasets): 评估数据读取器。
> > - **batch_size** (int): 评估时的batch大小。默认1。
> > - **epoch_id** (int): 当前评估模型所在的训练轮数。
> > - **return_details** (bool): 是否返回详细信息。默认False。
> **返回值:**
> >
> > - **dict**: 当`return_details`为False时,返回dict。包含关键字:'miou'、'category_iou'、'macc'、
> > 'category_acc'和'kappa',分别表示平均iou、各类别iou、平均准确率、各类别准确率和kappa系数。
> > - **tuple** (metrics, eval_details):当`return_details`为True时,增加返回dict (eval_details),
> > 包含关键字:'confusion_matrix',表示评估的混淆矩阵。
#### DeepLabv3预测函数接口
> ```
> predict(self, im_file, transforms=None):
> ```
> **参数:**
> >
> > - **img_file** (str): 预测图像路径。
> > - **transforms** (paddlex.seg.transforms): 数据预处理操作。
> **返回值:**
> >
> > - **dict**: 包含关键字'label_map'和'score_map', 'label_map'存储预测结果灰度图,像素值表示对应的类别,'score_map'存储各类别的概率,shape=(h, w, num_classes)。
### UNet类
```python
paddlex.seg.UNet(num_classes=2, upsample_mode='bilinear', use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255)
```
构建UNet分割器,并实现其训练、评估和预测。
**参数:**
> - **num_classes** (int): 类别数。
> - **upsample_mode** (str): UNet decode时采用的上采样方式,取值为'bilinear'时利用双线行差值进行上菜样,当输入其他选项时则利用反卷积进行上菜样,默认为'bilinear'。
> - **use_bce_loss** (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。默认False。
> - **use_dice_loss** (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。默认False。
> - **class_weight** (list/str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。
> - **ignore_index** (int): label上忽略的值,label为`ignore_index`的像素不参与损失函数的计算。默认255。
#### Unet训练函数接口
> ```python
> train(self, num_epochs, train_dataset, train_batch_size=2, eval_dataset=None, eval_batch_size=1, save_interval_epochs=1, log_interval_steps=2, save_dir='output', pretrain_weights='COCO', optimizer=None, learning_rate=0.01, lr_decay_power=0.9, use_vdl=False, sensitivities_file=None, eval_metric_loss=0.05):
> ```
>
> **参数:**
> >
> > - **num_epochs** (int): 训练迭代轮数。
> > - **train_dataset** (paddlex.datasets): 训练数据读取器。
> > - **train_batch_size** (int): 训练数据batch大小。同时作为验证数据batch大小。默认2。
> > - **eval_dataset** (paddlex.datasets): 评估数据读取器。
> > - **save_interval_epochs** (int): 模型保存间隔(单位:迭代轮数)。默认为1。
> > - **log_interval_steps** (int): 训练日志输出间隔(单位:迭代次数)。默认为2。
> > - **save_dir** (str): 模型保存路径。默认'output'
> > - **pretrain_weights** (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',则自动下载在COCO图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认'COCO'。
> > - **optimizer** (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认的优化器:使用fluid.optimizer.Momentum优化方法,polynomial的学习率衰减策略。
> > - **learning_rate** (float): 默认优化器的初始学习率。默认0.01。
> > - **lr_decay_power** (float): 默认优化器学习率衰减指数。默认0.9。
> > - **use_vdl** (bool): 是否使用VisualDL进行可视化。默认False。
> > - **sensitivities_file** (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',则自动下载在ImageNet图片数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
> > - **eval_metric_loss** (float): 可容忍的精度损失。默认为0.05。
#### Unet评估函数接口
> ```
> evaluate(self, eval_dataset, batch_size=1, epoch_id=None, return_details=False):
> ```
> **参数:**
> >
> > - **eval_dataset** (paddlex.datasets): 评估数据读取器。
> > - **batch_size** (int): 评估时的batch大小。默认1。
> > - **epoch_id** (int): 当前评估模型所在的训练轮数。
> > - **return_details** (bool): 是否返回详细信息。默认False。
> **返回值:**
> >
> > - **dict**: 当return_details为False时,返回dict。包含关键字:'miou'、'category_iou'、'macc'、
> > 'category_acc'和'kappa',分别表示平均iou、各类别iou、平均准确率、各类别准确率和kappa系数。
> > - **tuple** (metrics, eval_details):当return_details为True时,增加返回dict (eval_details),
> > 包含关键字:'confusion_matrix',表示评估的混淆矩阵。
#### Unet预测函数接口
> ```
> predict(self, im_file, transforms=None):
> ```
> **参数:**
> >
> > - **img_file** (str): 预测图像路径。
> > - **transforms** (paddlex.seg.transforms): 数据预处理操作。
> **返回值:**
> >
> > - **dict**: 包含关键字'label_map'和'score_map', 'label_map'存储预测结果灰度图,像素值表示对应的类别,'score_map'存储各类别的概率,shape=(h, w, num_classes)。
# 模型压缩-slim
## 计算参数敏感度
```
paddlex.slim.cal_params_sensetives(model, save_file, eval_dataset, batch_size=8)
```
计算模型中可裁剪参数在验证集上的敏感度,并将敏感度信息保存至文件`save_file`
1. 获取模型中可裁剪卷积Kernel的名称。
2. 计算每个可裁剪卷积Kernel不同裁剪率下的敏感度。
【注意】卷积的敏感度是指在不同裁剪率下评估数据集预测精度的损失,通过得到的敏感度,可以决定最终模型需要裁剪的参数列表和各裁剪参数对应的裁剪率。
[查看使用示例](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/compress/classification/cal_sensitivities_file.py#L33)
### 参数
* **model** (paddlex.cls.models/paddlex.det.models/paddlex.seg.models): paddlex加载的模型。
* **save_file** (str): 计算的得到的sensetives文件存储路径。
* **eval_dataset** (paddlex.datasets): 评估数据集的读取器。
* **batch_size** (int): 评估时的batch_size大小。
## 导出量化模型
```
paddlex.slim.export_quant_model(model, test_dataset, batch_size=2, batch_num=10, save_dir='./quant_model', cache_dir='./temp')
```
导出量化模型,该接口实现了Post Quantization量化方式,需要传入测试数据集,并设定`batch_size``batch_num`,模型会以`batch_size`的大小计算`batch_num`批样本数据,并以这些样本数据的计算结果为统计信息进行模型量化。
### 参数
```
* **model**(paddlex.cls.models/paddlex.det.models/paddlex.seg.models): paddlex加载的模型。
* **test_dataset**(paddlex.dataset): 测试数据集
* **batch_size**(int): 进行前向计算时的批数据大小
* **batch_num**(int): 进行向前计算时批数据数量
* **save_dir**(str): 量化后模型的保存目录
* **cache_dir**(str): 量化过程中的统计数据临时存储目录
```
### 使用示例
点击下载如下示例中的[模型](https://bj.bcebos.com/paddlex/models/vegetables_mobilenet.tar.gz)[数据集](https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz)
```
import paddlex as pdx
model = pdx.load_model('vegetables_mobilenet')
test_dataset = pdx.datasets.ImageNet(
data_dir='vegetables_cls',
file_list='vegetables_cls/train_list.txt',
label_list='vegetables_cls/labels.txt',
transforms=model.eval_transforms)
pdx.slim.export_quant_model(model, test_dataset, save_dir='./quant_mobilenet')
```
# 分类-paddlex.cls.transforms
对图像分类任务的数据进行操作。可以利用[Compose](#compose)类将图像预处理/增强操作进行组合。
## Compose类
```python
paddlex.cls.transforms.Compose(transforms)
```
根据数据预处理/增强算子对输入数据进行操作。 [使用示例](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/classification/mobilenetv2.py#L13)
### 参数
* **transforms** (list): 数据预处理/数据增强列表。
## RandomCrop类
```python
paddlex.cls.transforms.RandomCrop(crop_size=224, lower_scale=0.88, lower_ratio=3. / 4, upper_ratio=4. / 3)
```
对图像进行随机剪裁,模型训练时的数据增强操作。
1. 根据lower_scale、lower_ratio、upper_ratio计算随机剪裁的高、宽。
2. 根据随机剪裁的高、宽随机选取剪裁的起始点。
3. 剪裁图像。
4. 调整剪裁后的图像的大小到crop_size*crop_size。
### 参数
* **crop_size** (int): 随机裁剪后重新调整的目标边长。默认为224。
* **lower_scale** (float): 裁剪面积相对原面积比例的最小限制。默认为0.88。
* **lower_ratio** (float): 宽变换比例的最小限制。默认为3. / 4。
* **upper_ratio** (float): 宽变换比例的最小限制。默认为4. / 3。
## RandomHorizontalFlip类
```python
paddlex.cls.transforms.RandomHorizontalFlip(prob=0.5)
```
以一定的概率对图像进行随机水平翻转,模型训练时的数据增强操作。
### 参数
* **prob** (float): 随机水平翻转的概率。默认为0.5。
## RandomVerticalFlip类
```python
paddlex.cls.transforms.RandomVerticalFlip(prob=0.5)
```
以一定的概率对图像进行随机垂直翻转,模型训练时的数据增强操作。
### 参数
* **prob** (float): 随机垂直翻转的概率。默认为0.5。
## Normalize类
```python
paddlex.cls.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
```
对图像进行标准化。
1. 对图像进行归一化到区间[0.0, 1.0]。
2. 对图像进行减均值除以标准差操作。
### 参数
* **mean** (list): 图像数据集的均值。默认为[0.485, 0.456, 0.406]。
* **std** (list): 图像数据集的标准差。默认为[0.229, 0.224, 0.225]。
## ResizeByShort类
```python
paddlex.cls.transforms.ResizeByShort(short_size=256, max_size=-1)
```
根据图像的短边调整图像大小(resize)。
1. 获取图像的长边和短边长度。
2. 根据短边与short_size的比例,计算长边的目标长度,此时高、宽的resize比例为short_size/原图短边长度。
3. 如果max_size>0,调整resize比例:
如果长边的目标长度>max_size,则高、宽的resize比例为max_size/原图长边长度。
4. 根据调整大小的比例对图像进行resize。
### 参数
* **short_size** (int): 调整大小后的图像目标短边长度。默认为256。
* **max_size** (int): 长边目标长度的最大限制。默认为-1。
## CenterCrop类
```python
paddlex.cls.transforms.CenterCrop(crop_size=224)
```
以图像中心点扩散裁剪长宽为`crop_size`的正方形
1. 计算剪裁的起始点。
2. 剪裁图像。
### 参数
* **crop_size** (int): 裁剪的目标边长。默认为224。
## RandomRotate类
```python
paddlex.cls.transforms.RandomRotate(rotate_range=30, prob=0.5)
```
以一定的概率对图像在[-rotate_range, rotaterange]角度范围内进行旋转,模型训练时的数据增强操作。
### 参数
* **rotate_range** (int): 旋转度数的范围。默认为30。
* **prob** (float): 随机旋转的概率。默认为0.5。
## RandomDistort类
```python
paddlex.cls.transforms.RandomDistort(brightness_range=0.9, brightness_prob=0.5, contrast_range=0.9, contrast_prob=0.5, saturation_range=0.9, saturation_prob=0.5, hue_range=18, hue_prob=0.5)
```
以一定的概率对图像进行随机像素内容变换,模型训练时的数据增强操作。
1. 对变换的操作顺序进行随机化操作。
2. 按照1中的顺序以一定的概率对图像在范围[-range, range]内进行随机像素内容变换。
### 参数
* **brightness_range** (float): 明亮度因子的范围。默认为0.9。
* **brightness_prob** (float): 随机调整明亮度的概率。默认为0.5。
* **contrast_range** (float): 对比度因子的范围。默认为0.9。
* **contrast_prob** (float): 随机调整对比度的概率。默认为0.5。
* **saturation_range** (float): 饱和度因子的范围。默认为0.9。
* **saturation_prob** (float): 随机调整饱和度的概率。默认为0.5。
* **hue_range** (int): 色调因子的范围。默认为18。
* **hue_prob** (float): 随机调整色调的概率。默认为0.5。
# 检测-paddlex.det.transforms
对目标检测任务的数据进行操作。可以利用[Compose](#compose)类将图像预处理/增强操作进行组合。
## Compose类
```python
paddlex.det.transforms.Compose(transforms)
```
根据数据预处理/增强算子对输入数据进行操作。[使用示例](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/detection/yolov3_mobilenetv1.py#L13)
### 参数
* **transforms** (list): 数据预处理/数据增强列表。
## ResizeByShort类
```python
paddlex.det.transforms.ResizeByShort(short_size=800, max_size=1333)
```
根据图像的短边调整图像大小(resize)。
1. 获取图像的长边和短边长度。
2. 根据短边与short_size的比例,计算长边的目标长度,此时高、宽的resize比例为short_size/原图短边长度。
3. 如果max_size>0,调整resize比例:
如果长边的目标长度>max_size,则高、宽的resize比例为max_size/原图长边长度。
4. 根据调整大小的比例对图像进行resize。
### 参数
* **short_size** (int): 短边目标长度。默认为800。
* **max_size** (int): 长边目标长度的最大限制。默认为1333。
## Padding类
```python
paddlex.det.transforms.Padding(coarsest_stride=1)
```
将图像的长和宽padding至coarsest_stride的倍数。如输入图像为[300, 640], `coarest_stride`为32,则由于300不为32的倍数,因此在图像最右和最下使用0值进行padding,最终输出图像为[320, 640]
1. 如果coarsest_stride为1则直接返回。
2. 计算宽和高与最邻近的coarest_stride倍数差值
3. 根据计算得到的差值,在图像最右和最下进行padding
### 参数
* **coarsest_stride** (int): 填充后的图像长、宽为该参数的倍数,默认为1。
## Resize类
```python
paddlex.det.transforms.Resize(target_size=608, interp='LINEAR')
```
调整图像大小(resize)。
* 当目标大小(target_size)类型为int时,根据插值方式,将图像resize为[target_size, target_size]。
* 当目标大小(target_size)类型为list或tuple时,根据插值方式,将图像resize为target_size。
【注意】当插值方式为“RANDOM”时,则随机选取一种插值方式进行resize,作为模型训练时的数据增强操作。
### 参数
* **target_size** (int/list/tuple): 短边目标长度。默认为608。
* **interp** (str): resize的插值方式,与opencv的插值方式对应,取值范围为['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4', 'RANDOM']。默认为"LINEAR"。
## RandomHorizontalFlip类
```python
paddlex.det.transforms.RandomHorizontalFlip(prob=0.5)
```
以一定的概率对图像进行随机水平翻转,模型训练时的数据增强操作。
### 参数
* **prob** (float): 随机水平翻转的概率。默认为0.5。
## Normalize类
```python
paddlex.det.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
```
对图像进行标准化。
1. 归一化图像到到区间[0.0, 1.0]。
2. 对图像进行减均值除以标准差操作。
### 参数
* **mean** (list): 图像数据集的均值。默认为[0.485, 0.456, 0.406]。
* **std** (list): 图像数据集的标准差。默认为[0.229, 0.224, 0.225]。
## RandomDistort类
```python
paddlex.det.transforms.RandomDistort(brightness_range=0.5, brightness_prob=0.5, contrast_range=0.5, contrast_prob=0.5, saturation_range=0.5, saturation_prob=0.5, hue_range=18, hue_prob=0.5)
```
以一定的概率对图像进行随机像素内容变换,模型训练时的数据增强操作。
1. 对变换的操作顺序进行随机化操作。
2. 按照1中的顺序以一定的概率对图像在范围[-range, range]内进行随机像素内容变换。
### 参数
* **brightness_range** (float): 明亮度因子的范围。默认为0.5。
* **brightness_prob** (float): 随机调整明亮度的概率。默认为0.5。
* **contrast_range** (float): 对比度因子的范围。默认为0.5。
* **contrast_prob** (float): 随机调整对比度的概率。默认为0.5。
* **saturation_range** (float): 饱和度因子的范围。默认为0.5。
* **saturation_prob** (float): 随机调整饱和度的概率。默认为0.5。
* **hue_range** (int): 色调因子的范围。默认为18。
* **hue_prob** (float): 随机调整色调的概率。默认为0.5。
## MixupImage类
```python
paddlex.det.transforms.MixupImage(alpha=1.5, beta=1.5, mixup_epoch=-1)
```
对图像进行mixup操作,模型训练时的数据增强操作,目前仅YOLOv3模型支持该transform。
当label_info中不存在mixup字段时,直接返回,否则进行下述操作:
1. 从随机beta分布中抽取出随机因子factor。
2. 根据不同情况进行处理:
* 当factor>=1.0时,去除label_info中的mixup字段,直接返回。
* 当factor<=0.0时,直接返回label_info中的mixup字段,并在label_info中去除该字段。
* 其余情况,执行下述操作:
(1)原图像乘以factor,mixup图像乘以(1-factor),叠加2个结果。
(2)拼接原图像标注框和mixup图像标注框。
(3)拼接原图像标注框类别和mixup图像标注框类别。
(4)原图像标注框混合得分乘以factor,mixup图像标注框混合得分乘以(1-factor),叠加2个结果。
3. 更新im_info中的augment_shape信息。
### 参数
* **alpha** (float): 随机beta分布的下限。默认为1.5。
* **beta** (float): 随机beta分布的上限。默认为1.5。
* **mixup_epoch** (int): 在前mixup_epoch轮使用mixup增强操作;当该参数为-1时,该策略不会生效。默认为-1。
## RandomExpand类
```python
paddlex.det.transforms.RandomExpand(max_ratio=4., prob=0.5, mean=[127.5, 127.5, 127.5])
```
随机扩张图像,模型训练时的数据增强操作,模型训练时的数据增强操作。
1. 随机选取扩张比例(扩张比例大于1时才进行扩张)。
2. 计算扩张后图像大小。
3. 初始化像素值为数据集均值的图像,并将原图像随机粘贴于该图像上。
4. 根据原图像粘贴位置换算出扩张后真实标注框的位置坐标。
### 参数
* **max_ratio** (float): 图像扩张的最大比例。默认为4.0。
* **prob** (float): 随机扩张的概率。默认为0.5。
* **mean** (list): 图像数据集的均值(0-255)。默认为[127.5, 127.5, 127.5]。
## RandomCrop类
```python
paddlex.det.transforms.RandomCrop(batch_sampler=None, satisfy_all=False, avoid_no_bbox=True)
```
随机裁剪图像,模型训练时的数据增强操作。
1. 根据batch_sampler计算获取裁剪候选区域的位置。
(1) 根据min scale、max scale、min aspect ratio、max aspect ratio计算随机剪裁的高、宽。
(2) 根据随机剪裁的高、宽随机选取剪裁的起始点。
(3) 筛选出裁剪候选区域:
* 当satisfy_all为True时,需所有真实标注框与裁剪候选区域的重叠度满足需求时,该裁剪候选区域才可保留。
* 当satisfy_all为False时,当有一个真实标注框与裁剪候选区域的重叠度满足需求时,该裁剪候选区域就可保留。
2. 遍历所有裁剪候选区域:
(1) 若真实标注框与候选裁剪区域不重叠,或其中心点不在候选裁剪区域,则将该真实标注框去除。
(2) 计算相对于该候选裁剪区域,真实标注框的位置,并筛选出对应的类别、混合得分。
(3) 若avoid_no_bbox为False,返回当前裁剪后的信息即可;反之,要找到一个裁剪区域中真实标注框个数不为0的区域,才返回裁剪后的信息。
### 参数
* **batch_sampler** (list): 随机裁剪参数的多种组合,每种组合包含8个值,如下:
- max sample (int):满足当前组合的裁剪区域的个数上限。
- max trial (int): 查找满足当前组合的次数。
- min scale (float): 裁剪面积相对原面积,每条边缩短比例的最小限制。
- max scale (float): 裁剪面积相对原面积,每条边缩短比例的最大限制。
- min aspect ratio (float): 裁剪后短边缩放比例的最小限制。
- max aspect ratio (float): 裁剪后短边缩放比例的最大限制。
- min overlap (float): 真实标注框与裁剪图像重叠面积的最小限制。
- max overlap (float): 真实标注框与裁剪图像重叠面积的最大限制。
默认值为None,当为None时采用如下设置:
[[1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]]
* **satisfy_all** (bool): 是否需要所有标注框满足条件,裁剪候选区域才保留。默认为False。
* **avoid_no_bbox** (bool): 是否对裁剪图像不存在标注框的图像进行保留。默认为True。
数据处理-transforms
============================
transforms为PaddleX的模型训练提供了数据的预处理和数据增强接口。
.. toctree::
:maxdepth: 1
cls_transforms.md
det_transforms.md
seg_transforms.md
# 分割-paddlex.seg.transforms
对用于分割任务的数据进行操作。可以利用[Compose](#compose)类将图像预处理/增强操作进行组合。
## Compose类
```python
paddlex.seg.transforms.Compose(transforms)
```
根据数据预处理/数据增强列表对输入数据进行操作。[使用示例](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/segmentation/unet.py#L13)
### 参数
* **transforms** (list): 数据预处理/数据增强列表。
## RandomHorizontalFlip类
```python
paddlex.seg.transforms.RandomHorizontalFlip(prob=0.5)
```
以一定的概率对图像进行水平翻转,模型训练时的数据增强操作。
### 参数
* **prob** (float): 随机水平翻转的概率。默认值为0.5。
## RandomVerticalFlip类
```python
paddlex.seg.transforms.RandomVerticalFlip(prob=0.1)
```
以一定的概率对图像进行垂直翻转,模型训练时的数据增强操作。
### 参数
* **prob** (float): 随机垂直翻转的概率。默认值为0.1。
## Resize类
```python
paddlex.seg.transforms.Resize(target_size, interp='LINEAR')
```
调整图像大小(resize)。
- 当目标大小(target_size)类型为int时,根据插值方式,
将图像resize为[target_size, target_size]。
- 当目标大小(target_size)类型为list或tuple时,根据插值方式,
将图像resize为target_size, target_size的输入应为[w, h]或(w, h)。
### 参数
* **target_size** (int|list|tuple): 目标大小
* **interp** (str): resize的插值方式,与opencv的插值方式对应,
可选的值为['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4'],默认为"LINEAR"。
## ResizeByLong类
```python
paddlex.seg.transforms.ResizeByLong(long_size)
```
对图像长边resize到固定值,短边按比例进行缩放。
### 参数
* **long_size** (int): resize后图像的长边大小。
## ResizeRangeScaling类
```python
paddlex.seg.transforms.ResizeRangeScaling(min_value=400, max_value=600)
```
对图像长边随机resize到指定范围内,短边按比例进行缩放,模型训练时的数据增强操作。
### 参数
* **min_value** (int): 图像长边resize后的最小值。默认值400。
* **max_value** (int): 图像长边resize后的最大值。默认值600。
## ResizeStepScaling类
```python
paddlex.seg.transforms.ResizeStepScaling(min_scale_factor=0.75, max_scale_factor=1.25, scale_step_size=0.25)
```
对图像按照某一个比例resize,这个比例以scale_step_size为步长,在[min_scale_factor, max_scale_factor]随机变动,模型训练时的数据增强操作。
### 参数
* **min_scale_factor**(float), resize最小尺度。默认值0.75。
* **max_scale_factor** (float), resize最大尺度。默认值1.25。
* **scale_step_size** (float), resize尺度范围间隔。默认值0.25。
## Normalize类
```python
paddlex.seg.transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
```
对图像进行标准化。
1.图像像素归一化到区间 [0.0, 1.0]。
2.对图像进行减均值除以标准差操作。
### 参数
* **mean** (list): 图像数据集的均值。默认值[0.5, 0.5, 0.5]。
* **std** (list): 图像数据集的标准差。默认值[0.5, 0.5, 0.5]。
## Padding类
```python
paddlex.seg.transforms.Padding(target_size, im_padding_value=[127.5, 127.5, 127.5], label_padding_value=255)
```
对图像或标注图像进行padding,padding方向为右和下。根据提供的值对图像或标注图像进行padding操作。
### 参数
* **target_size** (int|list|tuple): padding后图像的大小。
* **im_padding_value** (list): 图像padding的值。默认为[127.5, 127.5, 127.5]。
* **label_padding_value** (int): 标注图像padding的值。默认值为255(仅在训练时需要设定该参数)。
## RandomPaddingCrop类
```python
paddlex.seg.transforms.RandomPaddingCrop(crop_size=512, im_padding_value=[127.5, 127.5, 127.5], label_padding_value=255)
```
对图像和标注图进行随机裁剪,当所需要的裁剪尺寸大于原图时,则进行padding操作,模型训练时的数据增强操作。
### 参数
* **crop_size**(int|list|tuple): 裁剪图像大小。默认为512。
* **im_padding_value** (list): 图像padding的值。默认为[127.5, 127.5, 127.5]。
* **label_padding_value** (int): 标注图像padding的值。默认值为255。
## RandomBlur类
```python
paddlex.seg.transforms.RandomBlur(prob=0.1)
```
以一定的概率对图像进行高斯模糊,模型训练时的数据增强操作。
### 参数
* **prob** (float): 图像模糊概率。默认为0.1。
## RandomRotation类
```python
paddlex.seg.transforms.RandomRotate(rotate_range=15, im_padding_value=[127.5, 127.5, 127.5], label_padding_value=255)
```
对图像进行随机旋转, 模型训练时的数据增强操作。
在旋转区间[-rotate_range, rotate_range]内,对图像进行随机旋转,当存在标注图像时,同步进行,
并对旋转后的图像和标注图像进行相应的padding。
### 参数
* **rotate_range** (float): 最大旋转角度。默认为15度。
* **im_padding_value** (list): 图像padding的值。默认为[127.5, 127.5, 127.5]。
* **label_padding_value** (int): 标注图像padding的值。默认为255。
## RandomScaleAspect类
```python
paddlex.seg.transforms.RandomScaleAspect(min_scale=0.5, aspect_ratio=0.33)
```
裁剪并resize回原始尺寸的图像和标注图像,模型训练时的数据增强操作。
按照一定的面积比和宽高比对图像进行裁剪,并reszie回原始图像的图像,当存在标注图时,同步进行。
### 参数
* **min_scale** (float):裁取图像占原始图像的面积比,取值[0,1],为0时则返回原图。默认为0.5。
* **aspect_ratio** (float): 裁取图像的宽高比范围,非负值,为0时返回原图。默认为0.33。
## RandomDistort类
```python
paddlex.seg.transforms.RandomDistort(brightness_range=0.5, brightness_prob=0.5, contrast_range=0.5, contrast_prob=0.5, saturation_range=0.5, saturation_prob=0.5, hue_range=18, hue_prob=0.5)
```
以一定的概率对图像进行随机像素内容变换,模型训练时的数据增强操作。
1.对变换的操作顺序进行随机化操作。
2.按照1中的顺序以一定的概率对图像在范围[-range, range]内进行随机像素内容变换。
### 参数
* **brightness_range** (float): 明亮度因子的范围。默认为0.5。
* **brightness_prob** (float): 随机调整明亮度的概率。默认为0.5。
* **contrast_range** (float): 对比度因子的范围。默认为0.5。
* **contrast_prob** (float): 随机调整对比度的概率。默认为0.5。
* **saturation_range** (float): 饱和度因子的范围。默认为0.5。
* **saturation_prob** (float): 随机调整饱和度的概率。默认为0.5。
* **hue_range** (int): 色调因子的范围。默认为18。
* **hue_prob** (float): 随机调整色调的概率。默认为0.5。
# 可视化-visualize
PaddleX提供了一系列模型预测和结果分析的可视化函数。
## 目标检测/实例分割预测结果可视化
```
paddlex.det.visualize(image, result, threshold=0.5, save_dir=None)
```
将目标检测/实例分割模型预测得到的Box框和Mask在原图上进行可视化
### 参数
> * **image** (str): 原图文件路径。
> * **result** (str): 模型预测结果。
> * **threshold**(float): score阈值,将Box置信度低于该阈值的框过滤不进行可视化。默认0.5
> * **save_dir**(str): 可视化结果保存路径。若为None,则表示不保存,该函数将可视化的结果以np.ndarray的形式返回;若设为目录路径,则将可视化结果保存至该目录下
### 使用示例
> 点击下载如下示例中的[模型](https://bj.bcebos.com/paddlex/models/garbage_epoch_12.tar.gz)和[测试图片](https://bj.bcebos.com/paddlex/datasets/garbage.bmp)
```
import paddlex as pdx
model = pdx.load_model('garbage_epoch_12')
result = model.predict('garbage.bmp')
pdx.det.visualize('garbage.bmp', result, save_dir='./')
# 预测结果保存在./visualize_garbage.bmp
```
## 语义分割预测结果可视化
```
paddlex.seg.visualize(image, result, weight=0.6, save_dir=None)
```
将语义分割模型预测得到的Mask在原图上进行可视化
### 参数
> * **image** (str): 原图文件路径。
> * **result** (str): 模型预测结果。
> * **weight**(float): mask可视化结果与原图权重因子,weight表示原图的权重。默认0.6
> * **save_dir**(str): 可视化结果保存路径。若为None,则表示不保存,该函数将可视化的结果以np.ndarray的形式返回;若设为目录路径,则将可视化结果保存至该目录下
### 使用示例
> 点击下载如下示例中的[模型](https://bj.bcebos.com/paddlex/models/cityscape_deeplab.tar.gz)和[测试图片](https://bj.bcebos.com/paddlex/datasets/city.png)
```
import paddlex as pdx
model = pdx.load_model('cityscape_deeplab')
result = model.predict('city.png')
pdx.det.visualize('city.png', result, save_dir='./')
# 预测结果保存在./visualize_city.png
```
## 模型裁剪比例可视化分析
```
paddlex.slim.visualize(model, sensitivities_file)
```
利用此接口,可以分析在不同的`eval_metric_loss`参数下,模型被裁剪的比例情况。可视化结果纵轴为eval_metric_loss参数值,横轴为对应的模型被裁剪的比例
### 参数
>* **model**: 使用PaddleX加载的模型
>* **sensitivities_file**: 模型各参数在验证集上计算得到的参数敏感度信息文件
### 使用示例
> 点击下载示例中的[模型](https://bj.bcebos.com/paddlex/models/vegetables_mobilenet.tar.gz)和[sensitivities_file](https://bj.bcebos.com/paddlex/slim_prune/mobilenetv2.sensitivities)
```
import paddlex as pdx
model = pdx.load_model('vegetables_mobilenet')
pdx.slim.visualize(model, 'mobilenetv2.sensitivities', save_dir='./')
# 可视化结果保存在./sensitivities.png
```
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
import sphinx_rtd_theme
html_theme = "sphinx_rtd_theme"
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
# -- Project information -----------------------------------------------------
project = 'PaddleX'
copyright = '2020, paddlex@baidu.com'
author = 'paddlex@baidu.com'
# The full version, including alpha/beta/rc tags
release = '0.1.0'
from recommonmark.parser import CommonMarkParser
source_parsers = {
'.md': CommonMarkParser,
}
source_suffix = ['.rst', '.md']
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['sphinx_markdown_tables']
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = 'zh_CN'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
#html_theme = 'alabaster'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
html_logo = 'images/paddlex.png'
# 模型转换
## 转ONNX模型
PaddleX基于[Paddle2ONNX工具](https://github.com/PaddlePaddle/paddle2onnx),提供了便捷的API,支持用户将PaddleX训练保存的模型导出为ONNX模型。
通过如下示例代码,用户即可将PaddleX训练好的MobileNetV2模型导出
```
import paddlex as pdx
pdx.convertor.to_onnx(model_dir='paddle_mobilenet', save_dir='onnx_mobilenet')
```
## 转PaddleLite模型
PaddleX可支持导出为[PaddleLite](https://github.com/PaddlePaddle/Paddle-Lite)支持的模型格式,用于支持用户将模型部署更多硬件设备。
通过如下示例代码,用户即可将PaddleX训练好的MobileNetV2模型导出
```
import paddlex as pdx
pdx.convertor.to_lite(model_dir='paddle_mobilenet', save_dir='lite_mobilnet', terminal='arm')
```
# 数据集格式说明
---
## 图像分类ImageNet
图像分类ImageNet数据集包含对应多个标签的图像文件夹、标签文件及图像列表文件。
参考数据文件结构如下:
```
./dataset/ # 数据集根目录
|--labelA # 标签为labelA的图像目录
| |--a1.jpg
| |--...
| └--...
|
|--...
|
|--labelZ # 标签为labelZ的图像目录
| |--z1.jpg
| |--...
| └--...
|
|--train_list.txt # 训练文件列表文件
|
|--val_list.txt # 验证文件列表文件
|
└--labels.txt # 标签列表文件
```
其中,相应的文件名可根据需要自行定义。
`train_list.txt``val_list.txt`文本以空格为分割符分为两列,第一列为图像文件相对于dataset的相对路径,第二列为图像文件对应的标签id(从0开始)。如下所示:
```
labelA/a1.jpg 0
labelZ/z1.jpg 25
...
```
`labels.txt`: 每一行为一个单独的类别,相应的行号即为类别对应的id(行号从0开始),如下所示:
```
labelA
labelB
...
```
[点击这里](https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz),下载蔬菜分类分类数据集
在PaddleX中,使用`paddlex.cv.datasets.ImageNet`([API说明](./apis/datasets.html#imagenet))加载分类数据集
## 目标检测VOC
目标检测VOC数据集包含图像文件夹、标注信息文件夹、标签文件及图像列表文件。
参考数据文件结构如下:
```
./dataset/ # 数据集根目录
|--JPEGImages # 图像目录
| |--xxx1.jpg
| |--...
| └--...
|
|--Annotations # 标注信息目录
| |--xxx1.xml
| |--...
| └--...
|
|--train_list.txt # 训练文件列表文件
|
|--val_list.txt # 验证文件列表文件
|
└--labels.txt # 标签列表文件
```
其中,相应的文件名可根据需要自行定义。
`train_list.txt``val_list.txt`文本以空格为分割符分为两列,第一列为图像文件相对于dataset的相对路径,第二列为标注文件相对于dataset的相对路径。如下所示:
```
JPEGImages/xxx1.jpg Annotations/xxx1.xml
JPEGImages/xxx2.jpg Annotations/xxx2.xml
...
```
`labels.txt`: 每一行为一个单独的类别,相应的行号即为类别对应的id(行号从0开始),如下所示:
```
labelA
labelB
...
```
[点击这里](https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz),下载昆虫检测数据集
在PaddleX中,使用`paddlex.cv.datasets.VOCDetection`([API说明](./apis/datasets.html#vocdetection))加载目标检测VOC数据集
## 目标检测和实例分割COCO
目标检测和实例分割COCO数据集包含图像文件夹及图像标注信息文件。
参考数据文件结构如下:
```
./dataset/ # 数据集根目录
|--JPEGImages # 图像目录
| |--xxx1.jpg
| |--...
| └--...
|
|--train.json # 训练相关信息文件
|
└--val.json # 验证相关信息文件
```
其中,相应的文件名可根据需要自行定义。
`train.json``val.json`存储与标注信息、图像文件相关的信息。如下所示:
```
{
"annotations": [
{
"iscrowd": 0,
"category_id": 1,
"id": 1,
"area": 33672.0,
"image_id": 1,
"bbox": [232, 32, 138, 244],
"segmentation": [[32, 168, 365, 117, ...]]
},
...
],
"images": [
{
"file_name": "xxx1.jpg",
"height": 512,
"id": 267,
"width": 612
},
...
]
"categories": [
{
"name": "labelA",
"id": 1,
"supercategory": "component"
}
]
}
```
每个字段的含义如下所示:
| 域名 | 字段名 | 含义 | 数据类型 | 备注 |
|:-----|:--------|:------------|------|:-----|
| annotations | id | 标注信息id | int | 从1开始 |
| annotations | iscrowd | 标注框是否为一组对象 | int | 只有0、1两种取值 |
| annotations | category_id | 标注框类别id | int | |
| annotations | area | 标注框的面积 | float | |
| annotations | image_id | 当前标注信息所在图像的id | int | |
| annotations | bbox | 标注框坐标 | list | 长度为4,分别代表x,y,w,h |
| annotations | segmentation | 标注区域坐标 | list | list中有至少1个list,每个list由每个小区域坐标点的横纵坐标(x,y)组成 |
| images | id | 图像id | int | 从1开始 |
| images | file_name | 图像文件名 | str | |
| images | height | 图像高度 | int | |
| images | width | 图像宽度 | int | |
| categories | id | 类别id | int | 从1开始 |
| categories | name | 类别标签名 | str | |
| categories | supercategory | 类别父类的标签名 | str | |
[点击这里](https://bj.bcebos.com/paddlex/datasets/garbage_ins_det.tar.gz),下载垃圾实例分割数据集
在PaddleX中,使用`paddlex.cv.datasets.COCODetection`([API说明](./apis/datasets.html#cocodetection))加载COCO格式数据集
## 语义分割数据
语义分割数据集包含原图、标注图及相应的文件列表文件。
参考数据文件结构如下:
```
./dataset/ # 数据集根目录
|--images # 原图目录
| |--xxx1.png
| |--...
| └--...
|
|--annotations # 标注图目录
| |--xxx1.png
| |--...
| └--...
|
|--train_list.txt # 训练文件列表文件
|
|--val_list.txt # 验证文件列表文件
|
└--labels.txt # 标签列表
```
其中,相应的文件名可根据需要自行定义。
`train_list.txt``val_list.txt`文本以空格为分割符分为两列,第一列为图像文件相对于dataset的相对路径,第二列为标注图像文件相对于dataset的相对路径。如下所示:
```
images/xxx1.png annotations/xxx1.png
images/xxx2.png annotations/xxx2.png
...
```
`labels.txt`: 每一行为一个单独的类别,相应的行号即为类别对应的id(行号从0开始),如下所示:
```
labelA
labelB
...
```
标注图像为单通道图像,像素值即为对应的类别,像素标注类别需要从0开始递增,
例如0,1,2,3表示有4种类别,标注类别最多为256类。其中可以指定特定的像素值用于表示该值的像素不参与训练和评估(默认为255)。
[点击这里](https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz),下载视盘语义分割数据集
在PaddleX中,使用`paddlex.cv.datasets.SegReader`([API说明](./apis/datasets.html#segreader))加载语义分割数据集
# 模型预测部署
本文档指引用户如何采用更高性能地方式来部署使用PaddleX训练的模型。使用本文档模型部署方式,会在模型运算过程中,对模型计算图进行优化,同时减少内存操作,相对比普通的paddlepaddle模型加载和预测方式,预测速度平均可提升1倍,具体各模型性能对比见[预测性能对比](#预测性能对比)
## 服务端部署
### 导出inference模型
在服务端部署的模型需要首先将模型导出为inference格式模型,导出的模型将包括`__model__``__params__``model.yml`三个文名,分别为模型的网络结构,模型权重和模型的配置文件(包括数据预处理参数等等)。在安装完PaddleX后,在命令行终端使用如下命令导出模型到当前目录`inferece_model`下。
> 可直接下载垃圾检测模型测试本文档的流程[garbage_epoch_12.tar.gz](https://bj.bcebos.com/paddlex/models/garbage_epoch_12.tar.gz)
```
paddlex --export_inference --model_dir=./garbage_epoch_12 --save_dir=./inference_model
```
### Python部署
PaddleX已经集成了基于Python的高性能预测接口,在安装PaddleX后,可参照如下代码示例,进行预测。相关的接口文档可参考[paddlex.deploy](apis/deploy.md)
> 点击下载测试图片 [garbage.bmp](https://bj.bcebos.com/paddlex/datasets/garbage.bmp)
```
import paddlex as pdx
predictorpdx.deploy.create_predictor('./inference_model')
result = predictor.predict(image='garbage.bmp')
```
### C++部署
> C++部署方案正在整理中,即将开源...
### 预测性能对比
#### 测试环境
- CUDA 9.0
- CUDNN 7.5
- PaddlePaddle 1.71
- GPU: Tesla P40
- AnalysisPredictor 指采用Python的高性能预测方式
- Executor 指采用paddlepaddle普通的python预测方式
- Batch Size均为1,耗时单位为ms/image,只计算模型运行时间,不包括数据的预处理和后处理
| 模型 | AnalysisPredictor耗时 | Executor耗时 | 输入图像大小 |
| :---- | :--------------------- | :------------ | :------------ |
| resnet50 | 4.84 | 7.57 | 224*224 |
| mobilenet_v2 | 3.27 | 5.76 | 224*224 |
| unet | 22.51 | 34.60 |513*513 |
| deeplab_mobile | 63.44 | 358.31 |1025*2049 |
| yolo_mobilenetv2 | 15.20 | 19.54 | 608*608 |
| faster_rcnn_r50_fpn_1x | 50.05 | 69.58 |800*1088 |
| faster_rcnn_r50_1x | 326.11 | 347.22 | 800*1067 |
| mask_rcnn_r50_fpn_1x | 67.49 | 91.02 | 800*1088 |
| mask_rcnn_r50_1x | 326.11 | 350.94 | 800*1067 |
## 移动端部署
> Lite模型导出正在集成中,即将开源...
# 多卡GPU/CPU训练
## GPU卡数配置
PaddleX在训练过程中会优先选择**当前所有可用的GPU卡进行训练**,在评估时**分类和分割任务仍使用多张卡****检测任务只使用1张卡**进行计算,在预测时各任务**则只会使用1张卡进行计算**
用户如想配置PaddleX在运行时使用的卡的数量,可在命令行终端(Shell)或Python代码中按如下方式配置:
命令行终端:
```
# 使用1号GPU卡
export CUDA_VISIBLE_DEVICES='1'
# 使用0, 1, 3号GPU卡
export CUDA_VISIBLE_DEVICES='0,1,3'
# 不使用GPU,仅使用CPU
export CUDA_VISIBLE_DEVICES=''
```
python代码:
```
# 注意:须要在第一次import paddlex或paddle前执行如下语句
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1,3'
import paddlex as pdx
```
## 使用多个GPU卡训练
目前PaddlePaddle支持在Linux下使用多卡训练,Windows只支持单卡,在命令行终端输入`nvidia-smi`可以查看自己机器的GPU卡信息,如若提示命令未找到,则用户需要自行安装CUDA驱动。
PaddleX在多卡GPU下训练时,无需额外的配置,用户按照上文的方式,通过`CUDA_VISIBLE_DEVICES`环境变量配置所需要使用的卡的数量即可。
需要注意的是,在训练代码中,可根据卡的数量,调高`batch_size``learning_rate`等参数,GPU卡数量越多,则可以支持更高的`batch_size`(注意batch_size需能被卡的数量整除), 同时更高的`batch_size`也意味着学习率`learning_rate`也要对应上调。同理,在训练过程中,如若因显存或内存不够导致训练失败,用户也需自行调低`batch_size`,并且按比例调低学习率。
## CPU配置
PaddleX在训练过程中可以选择使用CPU进行训练、评估和预测。通过以下方式进行配置:
命令行终端:
```
export CUDA_VISIBLE_DEVICES=""
```
python代码:
```
# 注意:须要在第一次import paddlex或paddle前执行如下语句
import os
os.environ['CUDA_VISIBLE_DEVICES'] = ''
import paddlex as pdx
```
此时使用的CPU个数为1。
## 使用多个CPU训练
通过设置环境变量`CPU_NUM`可以改变CPU个数,如果未设置,则CPU数目默认设为1,即`CPU_NUM`=1。 在物理核心数量范围内,该参数的配置可以加速模型。
PaddleX在训练过程中会选择`CPU_NUM`个CPU进行训练,在评估时分类和分割任务仍使用`CPU_NUM`个CPU,而检测任务只使用1个CPU进行计算,在预测时各任务则只会使用1个CPU进行计算。
通过以下方式可设置CPU的个数:
命令行终端:
```
export CUDA_VISIBLE_DEVICES=""
export CPU_NUM=2
```
python代码:
```
# 注意:须要在第一次import paddlex或paddle前执行如下语句
import os
os.environ['CUDA_VISIBLE_DEVICES'] = ''
os.environ['CPU_NUM'] = '2'
import paddlex as pdx
```
欢迎使用PaddleX!
=======================================
PaddleX是基于飞桨技术生态的深度学习全流程开发工具。具备易集成,易使用,全流程等特点。PaddleX作为深度学习开发工具,不仅提供了开源的内核代码,可供用户灵活使用或集成,同时也提供了配套的前端可视化客户端套件,让用户以可视化地方式进行模型开发,相关细节可查阅PaddleX官网。
本文档为PaddleX内核代码使用手册
.. toctree::
:maxdepth: 1
:caption: 目录:
quick_start.md
install.md
model_zoo.md
slim/index
apis/index
datasets.md
gpu_configure.md
tutorials/index.rst
metrics.md
FAQ.md
* PaddleX版本: v0.1.0
* 项目官网: http://www.paddlepaddle.org.cn/paddlex
* 项目GitHub: https://github.com/PaddlePaddle/PaddleX/tree/develop
* 官方QQ用户群: 1045148026
* GitHub Issue反馈: http://www.github.com/PaddlePaddle/PaddleX/issues
# 安装
> 以下安装过程默认用户已安装好Anaconda和CUDA 10.1(有GPU卡的情况下), Anaconda的安装可参考其官网https://www.anaconda.com/
## Linux/Mac安装
```
# 使用conda创建虚拟环境
conda create -n paddlex python=3.7
conda activate paddlex
# 安装paddlepaddle
# cpu版: pip install paddlepaddle
pip install paddlepaddle-gpu
# 安装PaddleX
pip install paddlex
```
## Windows安装
```
# 使用conda创建虚拟环境
conda create -n paddlex python=3.7
conda activate paddlex
# 安装paddlepaddle
# cpu版: pip install paddlepaddle
pip install paddlepaddle-gpu
# 安装pycocotools
pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI
# 安装PaddleX
pip install paddlex
```
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=.
set BUILDDIR=_build
if "%1" == "" goto help
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
:end
popd
# 指标及日志含义
PaddleX在模型训练、评估过程中,都会有相应的日志和指标反馈,本文档用于说明这些日志和指标的含义。
## 训练通用统计信息
PaddleX所有模型在训练过程中,输出的日志信息都包含了6个通用的统计信息,用于辅助用户进行模型训练,例如**分割模型**的训练日志,如下图所示。
![](./images/seg_train.png)
各字段含义如下:
| 字段 | 字段值示例 | 含义 |
| -------------- | -------------------- | ------------------------------------------------------------ |
| Epoch | Epoch=4/20 | [迭代轮数]所有训练数据会被训练20轮,当前处于第4轮 |
| Step | Step=62/66 | [迭代步数]所有训练数据被训练一轮所需要的迭代步数为66,当前处于第62步 |
| loss | loss=0.007226 | [损失函数值]参与当前迭代步数的训练样本的平均损失函数值loss,loss值越低,表明模型在训练集上拟合的效果越好(如上日志中第1行表示第4个epoch的第62个Batch的loss值为0.007226) |
| lr | lr=0.008215 | [学习率]当前模型迭代过程中的学习率 |
| time_each_step | time_each_step=0.41s | [每步迭代时间]训练过程计算得到的每步迭代平均用时 |
| eta | eta=0:9:44 | [剩余时间]模型训练完成所需剩余时间预估为0小时9分钟44秒 |
| | | |
不同模型的日志中除了上述通用字段外,还有其它字段,这些字段含义可见文档后面对各任务模型的描述。
## 评估通用统计信息
PaddleX所有模型在训练过程中会根据用户设定的`save_interval_epochs`参数,每间隔一定轮数进行评估和保存。例如**分类模型**的评估日志,如下图所示。
![](images/cls_eval.png)
上图中第1行表明验证数据集中样本数为240,需要迭代8步才能评估完所有验证数据;第5行用于表明第2轮的模型已经完成保存操作;第6行则表明当前保存的模型中,第2轮的模型在验证集上指标最优(分类任务看`acc1`,此时`acc1`值为0.258333),最优模型会保存在`best_model`目录中。
## 分类特有统计信息
### 训练日志字段
分类任务的训练日志除了通用统计信息外,还包括`acc1``acc5`两个特有字段。
> 注: acck准确率是针对一张图片进行计算的:把模型在各个类别上的预测得分按从高往低进行排序,取出前k个预测类别,若这k个预测类别包含了真值类,则认为该图片分类正确。
![](images/cls_train.png)
上图中第1行中的`acc1`表示参与当前迭代步数的训练样本的平均top1准确率,值越高代表模型越优;`acc5`表示参与当前迭代步数的训练样本的平均top5(若类别数n少于5,则为topn)准确率,值越高代表模型越优。第4行中的`loss`表示整个训练集的平均损失函数值,`acc1`表示整个训练集的平均top1准确率,`acc5`表示整个训练集的平均top5准确率。
### 评估日志字段
![](images/cls_eval.png)
上图中第3行中的`acc1`表示整个验证集的平均top1准确率,`acc5`表示整个验证集的平均top5准确率。
## 检测特有统计信息
### 训练日志字段
#### YOLOv3
YOLOv3的训练日志只包括训练通用统计信息(见[训练通用统计信息](http://yq01-qianmo-com-255-134-17.yq01.baidu.com:8199/metrics.html#id2))。
![](images/yolo_train.png)
上图中第5行`loss`表示整个训练集的平均损失函数loss值。
#### FasterRCNN
FasterRCNN的训练日志除了通用统计信息外,还包括`loss_cls``loss_bbox``loss_rpn_cls``loss_rpn_bbox`,这些字段的含义如下:
| 字段 | 含义 |
| -------------- | --------------------------------------------- |
| loss_cls | RCNN子网络中分类损失函数值 |
| loss_bbox | RCNN子网络中检测框回归损失函数值 |
| loss_rpn_cls | RPN子网络中分类损失函数值 |
| loss_rpn_bbox | RPN子网络中检测框回归损失函数值 |
| loss | 所有子网络损失函数值之和 |
![](images/faster_train.png)
上图中第1行`loss`, `loss_cls``loss_bbox``loss_rpn_clss``loss_rpn_bbox`都是参与当前迭代步数的训练样本的损失值,而第7行是针整个训练集的损失函数值。
#### MaskRCNN
MaskRCNN的训练日志除了通用统计信息外,还包括`loss_cls``loss_bbox``loss_mask``loss_rpn_cls``loss_rpn_bbox`,这些字段的含义如下:
| 字段 | 含义 |
| -------------- | --------------------------------------------- |
| loss_cls | RCNN子网络中分类损失函数值 |
| loss_bbox | RCNN子网络中检测框回归损失函数值 |
| loss_mask | RCNN子网络中Mask回归损失函数值 |
| loss_rpn_cls | RPN子网络中分类损失函数值 |
| loss_rpn_bbox | RPN子网络中检测框回归损失函数值 |
| loss | 所有子网络损失函数值之和 |
![](images/mask_train.png)
上图中第1行`loss`, `loss_cls``loss_bbox``loss_mask``loss_rpn_clss``loss_rpn_bbox`都是参与当前迭代步数的训练样本的损失值,而第7行是针整个训练集的损失函数值。
### 评估日志字段
检测可以使用两种评估标准:VOC评估标准和COCO评估标准。
#### VOC评估标准
![](images/voc_eval.png)
> 注:`map`为平均准确率的平均值,即IoU(Intersection Over Union)取0.5时各个类别的准确率-召回率曲线下面积的平均值。
上图中第3行`bbox_map`表示检测任务中整个验证集的平均准确率平均值。
#### COCO评估标准
> 注:COCO评估指标可参见[COCO官网解释](http://cocodataset.org/#detection-eval)。PaddleX主要反馈`mmAP`,即AP at IoU=.50:.05:.95这项指标,为在各个IoU阈值下平均准确率平均值(mAP)的平均值。
COCO格式的数据集不仅可以用于训练目标检测模型,也可以用于训练实例分割模型。在目标检测中,PaddleX主要反馈针对检测框的`bbox_mmAP`指标;在实例分割中,还包括针对Mask的`seg_mmAP`指标。如下所示,第一张日志截图为目标检测的评估结果,第二张日志截图为实例分割的评估结果。
![](images/faster_eval.png)
上图中红框标注的`bbox_mmap`表示整个验证集的检测框平均准确率平均值。
![](images/mask_eval.png)
上图中红框标注的`bbox_mmap``seg_mmap`分别表示整个验证集的检测框平均准确率平均值、Mask平均准确率平均值。
## 分割特有统计信息
### 训练日志字段
语义分割的训练日志只包括训练通用统计信息(见[训练通用统计信息](http://yq01-qianmo-com-255-134-17.yq01.baidu.com:8199/metrics.html#id2))。
![](images/seg_train.png)
### 评估日志字段
语义分割的评估日志包括了`miou``category_iou``macc``category_acc``kappa`,这些字段的含义如下:
| 字段 | 含义 |
| -------------- | --------------------------------------------- |
| miou | 各类IoU(Intersection Over Union)的平均值 |
| category_iou | 各类别的IoU |
| macc | 平均准确率,即预测正确的像素数/总像素数 |
| category_acc | 各类别的准确率,即各类别预测正确的像素数/预测为该类别的总像素数 |
| kappa | kappa系数,用于一致性检验 |
![](images/seg_eval.png)
# 模型库
本文档梳理了PaddleX v0.1.0支持的模型,同时也提供了在各个数据集上的预训练模型和对应验证集上的指标。用户也可自行下载对应的代码,在安装PaddleX后,即可使用相应代码训练模型。
表中相关模型也可下载好作为相应模型的预训练模型,通过`pretrain_weights`指定目录加载使用。
## 图像分类模型
> 表中模型相关指标均为在ImageNet数据集上使用PaddlePaddle Python预测接口测试得到(测试GPU型号为Nvidia Tesla P4),预测速度为每张图片预测用时(不包括预处理和后处理),表中符号`-`表示相关指标暂未测试。
| 模型 | 模型大小 | 预测速度(毫秒) | Top1准确率 | Top5准确率 |
| :----| :------- | :----------- | :--------- | :--------- |
| ResNet18| 46.9MB | 3.456 | 70.98% | 89.92% |
| ResNet34| 87.5MB | 5.668 | 74.57% | 92.14% |
| ResNet50| 102.7MB | 8.787 | 76.50% | 93.00% |
| ResNet101 |179.1MB | 15.447 | 77.56% | 93.64% |
| ResNet50_vd |102.8MB | 9.058 | 79.12% | 94.44% |
| ResNet101_vd| 179.2MB | 15.685 | 80.17% | 94.97% |
| DarkNet53|166.9MB | 11.969 | 78.04% | 94.05% |
| MobileNetV1 | 16.4MB | 2.609 | 70.99% | 89.68% |
| MobileNetV2 | 14.4MB | 4.546 | 72.15% | 90.65% |
| MobileNetV3_large| 22.8MB | - | 75.3% | 75.3% |
| MobileNetV3_small | 12.5MB | 6.809 | 67.46% | 87.12% |
| Xception41 |92.4MB | 13.757 | 79.30% | 94.53% |
| Xception65 | 144.6MB | 19.216 | 81.00% | 95.49% |
| Xception71| 151.9MB | 23.291 | 81.11% | 95.45% |
| DenseNet121 | 32.8MB | 12.437 | 75.66% | 92.58% |
| DenseNet161|116.3MB | 27.717 | 78.57% | 94.14% |
| DenseNet201| 84.6MB | 26.583 | 77.63% | 93.66% |
| ShuffleNetV2 | 10.2MB | 6.101 | 68.8% | 88.5% |
## 目标检测模型
> 表中模型相关指标均为在MSCOCO数据集上使用PaddlePaddle Python预测接口测试得到(测试GPU型号为Nvidia Tesla V100测试得到,表中符号`-`表示相关指标暂未测试。
| 模型 | 模型大小 | 预测时间(毫秒) | BoxAP |
|:-------|:-----------|:-------------|:----------|
|FasterRCNN-ResNet50|135.6MB| 78.450 | 35.2 |
|FasterRCNN-ResNet50_vd| 135.7MB | 79.523 | 36.4 |
|FasterRCNN-ResNet101| 211.7MB | 107.342 | 38.3 |
|FasterRCNN-ResNet50-FPN| 167.2MB | 44.897 | 37.2 |
|FasterRCNN-ResNet50_vd-FPN|168.7MB | 45.773 | 38.9 |
|FasterRCNN-ResNet101-FPN| 251.7MB | 55.782 | 38.7 |
|FasterRCNN-ResNet101_vd-FPN |252MB | 58.785 | 40.5 |
|YOLOv3-DarkNet53|252.4MB | 21.944 | 38.9 |
|YOLOv3-MobileNetv1 |101.2MB | 12.771 | 29.3 |
|YOLOv3-MobileNetv3|94.6MB | - | 31.6 |
| YOLOv3-ResNet34|169.7MB | 15.784 | 36.2 |
## 实例分割模型
> 表中模型相关指标均为在MSCOCO数据集上测试得到。
| 模型 |模型大小 | 预测时间(毫秒) | BoxAP | SegAP |
|:---------|:---------|:----------|:---------|:--------|
|MaskRCNN-ResNet50|51.2MB| 86.096 | 36.5 |32.2|
|MaskRCNN-ResNet50-FPN|184.6MB | 65.859 | 37.9 |34.2|
|MaskRCNN-ResNet50_vd-FPN |185.5MB | 63.191 | 39.8 |35.4|
|MaskRCNN-ResNet101-FPN|268.6MB | 77.024 | 39.5 |35.2|
|MaskRCNN-ResNet101vd-FPN |268.6MB | 76.307 | 41.4 |36.8|
## 语义分割模型
> 表中符号`-`表示相关指标暂未测试。
| 模型|数据集 | 模型大小 | 预测速度 | mIOU |
|:--------|:----------|:----------|:----------|:----------|
| UNet| | COCO | 53.7M | - |
| DeepLabv3+/Xception65| Cityscapes | 165.1M | | 0.7930 |
| DeepLabv3+/MobileNetV2 | Cityscapes | 7.4M | | 0.6981 |
# 10分钟快速上手使用
本文档在一个小数据集上展示了如何通过PaddleX进行训练,您可以阅读文档[使用教程-模型训练](/tutorials/train)来了解更多模型任务的训练使用方式。
## 1. 准备蔬菜分类数据集
```
wget https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz
tar xzvf vegetables_cls.tar.gz
```
## 2. 训练代码开发
通过如下`train.py`代码进行训练
> 设置使用0号GPU卡
```
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx
```
> 定义训练和验证时的数据处理流程, 在`train_transforms`中加入了`RandomCrop`和`RandomHorizontalFlip`两种数据增强方式
```
from paddlex.cls import transforms
train_transforms = transforms.Compose([
transforms.RandomCrop(crop_size=224),
transforms.RandomHorizontalFlip(),
transforms.Normalize()
])
eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=256),
transforms.CenterCrop(crop_size=224),
transforms.Normalize()
])
```
> 定义数据集,`pdx.datasets.ImageNet`表示读取ImageNet格式的分类数据集
```
train_dataset = pdx.datasets.ImageNet(
data_dir='vegetables_cls',
file_list='vegetables_cls/train_list.txt',
label_list='vegetables_cls/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.ImageNet(
data_dir='vegetables_cls',
file_list='vegetables_cls/train_list.txt',
label_list='vegetables_cls/labels.txt',
transforms=train_transforms)
```
> 模型训练
```
num_classes = len(train_dataset.labels)
model = pdx.cls.MobileNetV2(num_classes=num_classes)
model.train(num_epochs=10,
train_dataset=train_dataset,
train_batch_size=32,
eval_dataset=eval_dataset,
lr_decay_epochs=[4, 6, 8],
learning_rate=0.025,
save_dir='output/mobilenetv2',
use_vdl=True)
```
## 3. 模型训练
> `train.py`与解压后的数据集目录`vegetables_cls`放在同一目录下,在此目录下运行`train.py`即可开始训练。如果您的电脑上有GPU,这将会在10分钟内训练完成,如果为CPU也大概会在30分钟内训练完毕。
```
python train.py
```
## 4. 训练过程中查看训练指标
> 模型在训练过程中,所有的迭代信息将以标注输出流的形式,输出到命令执行的终端上,用户也可通过visualdl以可视化的方式查看训练指标的变化,通过如下方式启动visualdl后,在浏览器打开https://0.0.0.0:8001即可。
```
visualdl --logdir output/mobilenetv2/vdl_log --port 8000
```
![](./images/vdl1.jpg)
## 5. 训练完成使用模型进行测试
> 如使用训练过程中第8轮保存的模型进行测试
```
import paddlex as pdx
model = pdx.load_model('output/mobilenetv2/epoch_8')
result = model.predict('vegetables_cls/bocai/100.jpg', topk=3)
print("Predict Result:", result)
```
> 预测结果输出如下,预测按score进行排序,得到前三分类结果
```
Predict Result: Predict Result: [{'score': 0.9999393, 'category': 'bocai', 'category_id': 0}, {'score': 6.010089e-05, 'category': 'hongxiancai', 'category_id': 2}, {'score': 5.593914e-07, 'category': 'xilanhua', 'category_id': 5}]
```
## 其它推荐
- 1.[目标检测模型训练](tutorials/train/detection.md)
- 2.[语义分割模型训练](tutorials/train/segmentation.md)
- 3.[模型太大,想要更小的模型,试试模型裁剪吧!](tutorials/compress/classification.md)
sphinx
recommonmark
sphinx_markdown_tables
sphinx_rtd_theme
模型压缩
============================
.. toctree::
:maxdepth: 2
prune.md
quant.md
# 模型裁剪
## 原理介绍
模型裁剪用于减小模型的计算量和体积,可以加快模型部署后的预测速度,是一种减小模型大小和降低模型计算复杂度的常用方式,通过裁剪卷积层中Kernel输出通道的大小及其关联层参数大小来实现,其关联裁剪的原理可参见[PaddleSlim相关文档](https://paddlepaddle.github.io/PaddleSlim/algo/algo.html#id16)**一般而言,在同等模型精度前提下,数据复杂度越低,模型可以被裁剪的比例就越高**
## 裁剪方法
PaddleX提供了两种方式:
**1.用户自行计算裁剪配置(推荐),整体流程包含三个步骤,**
> **第一步**: 使用数据集训练原始模型
> **第二步**:利用第一步训练好的模型,在验证数据集上计算模型中各个参数的敏感度,并将敏感度信息存储至本地文件
> **第三步**:使用数据集训练裁剪模型(与第一步差异在于需要在`train`接口中,将第二步计算得到的敏感信息文件传给接口的`sensitivities_file`参数)
> 在如上三个步骤中,**相当于模型共需要训练两遍**,分别对应第一步和第三步,但其中第三步训练的是裁剪后的模型,因此训练速度较第一步会更快。
> 第二步会遍历模型中的部分裁剪参数,分别计算各个参数裁剪后对于模型在验证集上效果的影响,**因此会反复在验证集上评估多次**。
**2.使用PaddleX内置的裁剪方案**
> PaddleX内置的模型裁剪方案是**基于标准数据集**上计算得到的参数敏感度信息,由于不同数据集特征分布会有较大差异,所以该方案相较于第1种方案训练得到的模型**精度一般而言会更低**(**且用户自定义数据集与标准数据集特征分布差异越大,导致训练的模型精度会越低**),仅在用户想节省时间的前提下可以参考使用,使用方式只需一步,
> **一步**: 使用数据集训练裁剪模型,在训练调用`train`接口时,将接口中的`sensitivities_file`参数设置为'DEFAULT'字符串
> 注:各模型内置的裁剪方案分别依据的数据集为: 图像分类——ImageNet数据集、目标检测——PascalVOC数据集、语义分割——CityScape数据集
## 裁剪实验
基于上述两种方案,我们在PaddleX上使用样例数据进行了实验,在Tesla P40上实验指标如下所示,
### 图像分类
实验背景:使用MobileNetV2模型,数据集为蔬菜分类示例数据,见[使用教程-模型压缩-图像分类](../tutorials/compress/classification.md)
| 模型 | 裁剪情况 | 模型大小 | Top1准确率(%) |GPU预测速度 | CPU预测速度 |
| :-----| :--------| :-------- | :---------- |:---------- |:----------|
|MobileNetV2 | 无裁剪(原模型)| 13.0M | 97.50|6.47ms |47.44ms |
|MobileNetV2 | 方案一(eval_metric_loss=0.10) | 2.1M | 99.58 |5.03ms |20.22ms |
|MobileNetV2 | 方案二(eval_metric_loss=0.10) | 6.0M | 99.58 |5.42ms |29.06ms |
### 目标检测
实验背景:使用YOLOv3-MobileNetV1模型,数据集为昆虫检测示例数据,见[使用教程-模型压缩-目标检测](../tutorials/compress/detection.md)
| 模型 | 裁剪情况 | 模型大小 | MAP(%) |GPU预测速度 | CPU预测速度 |
| :-----| :--------| :-------- | :---------- |:---------- | :---------|
|YOLOv3-MobileNetV1 | 无裁剪(原模型)| 139M | 67.57| 14.88ms |976.42ms |
|YOLOv3-MobileNetV1 | 方案一(eval_metric_loss=0.10) | 34M | 75.49 |10.60ms |558.49ms |
|YOLOv3-MobileNetV1 | 方案二(eval_metric_loss=0.05) | 29M | 50.27| 9.43ms |360.46ms |
### 语义分割
实验背景:使用UNet模型,数据集为视盘分割示例数据, 见[使用教程-模型压缩-语义分割](../tutorials/compress/segmentation.md)
| 模型 | 裁剪情况 | 模型大小 | mIOU(%) |GPU预测速度 | CPU预测速度 |
| :-----| :--------| :-------- | :---------- |:---------- | :---------|
|UNet | 无裁剪(原模型)| 77M | 91.22 |33.28ms |9523.55ms |
|UNet | 方案一(eval_metric_loss=0.10) |26M | 90.37 |21.04ms |3936.20ms |
|UNet | 方案二(eval_metric_loss=0.10) |23M | 91.21 |18.61ms |3447.75ms |
# 模型量化
## 原理介绍
为了满足低内存带宽、低功耗、低计算资源占用以及低模型存储等需求,定点量化被提出。为此我们提供了训练后量化,该量化使用KL散度确定量化比例因子,将FP32模型转成INT8模型,且不需要重新训练,可以快速得到量化模型。
## 使用PaddleX量化模型
PaddleX提供了`export_quant_model`接口,让用户以接口的形式完成模型以post_quantization方式量化并导出。点击查看[量化接口使用文档](../apis/slim.md)
## 量化性能对比
模型量化后的性能对比指标请查阅[PaddleSlim模型库](https://paddlepaddle.github.io/PaddleSlim/model_zoo.html)
# 分类模型裁剪
---
本文档训练代码可直接在PaddleX的Repo中下载,[代码tutorials/compress/classification](http://gitlab.baidu.com/Paddle/PaddleX/tree/develop/tutorials/compress/classification)
本文档按如下方式对模型进行了裁剪
> 第一步:在训练数据集上训练MobileNetV2
> 第二步:在验证数据集上计算模型中各个参数的敏感度信息
> 第三步:根据第二步计算的敏感度,设定`eval_metric_loss`,对模型裁剪后重新在训练数据集上训练
## 步骤一 训练MobileNetV2
> 模型训练使用文档可以直接参考[分类模型训练](../train/classification.md),本文档在该代码基础上添加了部分参数选项,用户可直接下载模型训练代码[tutorials/compress/classification/mobilenet.py](http://gitlab.baidu.com/Paddle/PaddleX/tree/develop/tutorials/compress/classification/mobilenet.py)
> 使用如下命令开始模型训练
```
python mobilenet.py
```
## 步骤二 计算参数敏感度
> 参数敏感度的计算可以直接使用PaddleX提供的API`paddlex.slim.cal_params_sensitivities`,使用代码如下, 敏感度信息文件会保存至`save_file`
```
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx
model_dir = './output/mobilenet/best_model'
model = pdx.load_model(model_dir)
# 定义验证所用的数据集
eval_dataset = pdx.datasets.ImageNet(
data_dir=dataset,
file_list=os.path.join(dataset, 'val_list.txt'),
label_list=os.path.join(dataset, 'labels.txt'),
transforms=model.eval_transforms)
pdx.slim.cal_params_sensitivities(model,
save_file,
eval_dataset,
batch_size=8)
```
> 本步骤代码已整理至[tutorials/compress/classification/cal_sensitivities_file.py](http://gitlab.baidu.com/Paddle/PaddleX/tree/develop/tutorials/compress/classification/cal_sensitivities_file.py),用户可直接下载使用
> 使用如下命令开始计算敏感度
```
python cal_sensitivities_file.py --model_dir output/mobilenet/best_model --dataset vegetables_cls --save_file sensitivities.data
```
## 步骤三 开始裁剪训练
> 本步骤代码与步骤一使用同一份代码文件,使用如下命令开始裁剪训练
```
python mobilenet.py --model_dir output/mobilenet/best_model --sensitivities_file sensitivities.data --eval_metric_loss 0.10
```
## 实验效果
本教程的实验效果可以查阅[模型压缩文档](../../slim/prune.md)
# 检测模型裁剪
---
本文档训练代码可直接在PaddleX的Repo中下载,[代码tutorials/compress/detection](http://gitlab.baidu.com/Paddle/PaddleX/tree/develop/tutorials/compress/detection)
本文档按如下方式对模型进行了裁剪
> 第一步:在训练数据集上训练YOLOv3
> 第二步:在验证数据集上计算模型中各个参数的敏感度信息
> 第三步:根据第二步计算的敏感度,设定`eval_metric_loss`,对模型裁剪后重新在训练数据集上训练
## 步骤一 训练YOLOv3
> 模型训练使用文档可以直接参考[检测模型训练](../train/detection.md),本文档在该代码基础上添加了部分参数选项,用户可直接下载模型训练代码[tutorials/compress/detection/yolov3_mobilnet.py](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop_details/tutorials/compress/detection/yolov3_mobilenet.py)
> 使用如下命令开始模型训练
```
python yolov3_mobilenet.py
```
## 步骤二 计算参数敏感度
> 参数敏感度的计算可以直接使用PaddleX提供的API`paddlex.slim.cal_params_sensitivities`,使用代码如下, 敏感度信息文件会保存至`save_file`
```
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx
model = pdx.load_model(model_dir)
# 定义验证所用的数据集
eval_dataset = pdx.datasets.ImageNet(
data_dir=dataset,
file_list=os.path.join(dataset, 'val_list.txt'),
label_list=os.path.join(dataset, 'labels.txt'),
transforms=model.eval_transforms)
pdx.slim.cal_params_sensitivities(model,
save_file,
eval_dataset,
batch_size=8)
```
> 本步骤代码已整理至[tutorials/compress/detection/cal_sensitivities_file.py](http://gitlab.baidu.com/Paddle/PaddleX/tree/develop/tutorials/compress/detection/cal_sensitivities_file.py),用户可直接下载使用
> 使用如下命令开始计算敏感度
```
python cal_sensitivities_file.py --model_dir output/yolov3_mobile/best_model --dataset insect_det --save_file sensitivities.data
```
## 步骤三 开始裁剪训练
> 本步骤代码与步骤一使用同一份代码文件,使用如下命令开始裁剪训练
```
python yolov3_mobilenet.py --model_dir output/yolov3_mobile/best_model --sensitivities_file sensitivities.data --eval_metric_loss 0.10
```
## 实验效果
本教程的实验效果可以查阅[模型压缩文档](../../slim/prune.md)
模型压缩
=========================
.. toctree::
:maxdepth: 1
classification.md
detection.md
segmentation.md
# 分割模型裁剪
---
本文档训练代码可直接在PaddleX的Repo中下载,[代码tutorials/compress/segmentation](http://gitlab.baidu.com/Paddle/PaddleX/tree/develop/tutorials/compress/segmentation)
本文档按如下方式对模型进行了裁剪
> 第一步:在训练数据集上训练UNet
> 第二步:在验证数据集上计算模型中各个参数的敏感度信息
> 第三步:根据第二步计算的敏感度,设定`eval_metric_loss`,对模型裁剪后重新在训练数据集上训练
## 步骤一 训练UNet
> 模型训练使用文档可以直接参考[检测模型训练](../train/segmentation.md),本文档在该代码基础上添加了部分参数选项,用户可直接下载模型训练代码[tutorials/compress/segmentation/unet.py](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop_details/tutorials/compress/segmentation/unet.py)
> 使用如下命令开始模型训练
```
python unet.py
```
## 步骤二 计算参数敏感度
> 参数敏感度的计算可以直接使用PaddleX提供的API`paddlex.slim.cal_params_sensitivities`,使用代码如下, 敏感度信息文件会保存至`save_file`
```
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx
model = pdx.load_model(model_dir)
# 定义验证所用的数据集
eval_dataset = pdx.datasets.ImageNet(
data_dir=dataset,
file_list=os.path.join(dataset, 'val_list.txt'),
label_list=os.path.join(dataset, 'labels.txt'),
transforms=model.eval_transforms)
pdx.slim.cal_params_sensitivities(model,
save_file,
eval_dataset,
batch_size=8)
```
> 本步骤代码已整理至[tutorials/compress/detection/cal_sensitivities_file.py](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop_details/tutorials/compress/segmentation/cal_sensitivities_file.py),用户可直接下载使用
> 使用如下命令开始计算敏感度
```
python cal_sensitivities_file.py --model_dir output/unet/best_model --dataset optic_disc_seg --save_file sensitivities.data
```
## 步骤三 开始裁剪训练
> 本步骤代码与步骤一使用同一份代码文件,使用如下命令开始裁剪训练
```
python unet.py --model_dir output/unet/best_model --sensitivities_file sensitivities.data --eval_metric_loss 0.10
```
## 实验效果
本教程的实验效果可以查阅[模型压缩文档](../../slim/prune.md)
模型部署
=========================
.. toctree::
:maxdepth: 1
server.md
terminal.md
使用教程
=========================
.. toctree::
:maxdepth: 1
train/index.rst
compress/index.rst
# 训练图像分类模型
---
本文档训练代码可参考PaddleX的[代码tutorial/train/classification/mobilenetv2.py](http://gitlab.baidu.com/Paddle/PaddleX/tree/develop/tutorials/train/classification/mobilenetv2.py)
**1.下载并解压训练所需的数据集**
> 使用1张显卡训练并指定使用0号卡。
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx
```
> 这里使用蔬菜数据集,训练集、验证集和测试集共包含6189个样本,18个类别。
```python
veg_dataset = 'https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz'
pdx.utils.download_and_decompress(veg_dataset, path='./')
```
**2.定义训练和验证过程中的数据处理和增强操作**
> transforms用于指定训练和验证过程中的数据处理和增强操作流程,如下代码在训练过程中使用了`RandomCrop`和`RandomHorizontalFlip`进行数据增强,transforms的使用见[paddlex.cls.transforms](../../apis/transforms/cls_transforms.html#paddlex-cls-transforms)
```python
from paddlex.cls import transforms
train_transforms = transforms.Compose([
transforms.RandomCrop(crop_size=224),
transforms.RandomHorizontalFlip(),
transforms.Normalize()
])
eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=256),
transforms.CenterCrop(crop_size=224),
transforms.Normalize()
])
```
**3.创建数据集读取器,并绑定相应的数据预处理流程**
> 通过不同的数据集读取器可以加载不同格式的数据集,数据集API的介绍见文档[paddlex.datasets](../../apis/datasets.md)
```python
train_dataset = pdx.datasets.ImageNet(
data_dir='vegetables_cls',
file_list='vegetables_cls/train_list.txt',
label_list='vegetables_cls/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.ImageNet(
data_dir='vegetables_cls',
file_list='vegetables_cls/val_list.txt',
label_list='vegetables_cls/labels.txt',
transforms=eval_transforms)
```
**4.创建模型进行训练**
> 模型训练会默认自动下载和使用imagenet图像数据集上的预训练模型,用户也可自行指定`pretrain_weights`参数来设置预训练权重。模型训练过程每间隔`save_interval_epochs`轮会保存一次模型在`save_dir`目录下,同时在保存的过程中也会在验证数据集上计算相关指标。
> 分类模型的接口可见文档[paddlex.cls.models](../../apis/models.md)
```python
model = pdx.cls.MobileNetV2(num_classes=len(train_dataset.labels))
model.train(
num_epochs=10,
train_dataset=train_dataset,
train_batch_size=32,
eval_dataset=eval_dataset,
lr_decay_epochs=[4, 6, 8],
learning_rate=0.025,
save_dir='output/mobilenetv2',
use_vdl=True)
```
> 将`use_vdl`设置为`True`时可使用VisualDL查看训练指标。按以下方式启动VisualDL后,浏览器打开 https://0.0.0.0:8001即可。其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP。
```shell
visualdl --logdir output/mobilenetv2/vdl_log --port 8001
```
**5.验证或测试**
> 利用训练完的模型可继续在验证集上进行验证。
```python
eval_metrics = model.evaluate(eval_dataset, batch_size=8)
print("eval_metrics:", eval_metrics)
```
> 结果输出:
```
eval_metrics: OrderedDict([('acc1', 0.9895916733386709), ('acc5', 0.9983987189751802)])
```
> 训练完用模型对图片进行测试。
```python
predict_result = model.predict('./vegetables_cls/bocai/IMG_00000839.jpg', topk=5)
print("predict_result:", predict_result)
```
> 结果输出:
```
predict_result: [{'category_id': 13, 'category': 'bocai', 'score': 0.8607276},
{'category_id': 11, 'category': 'kongxincai', 'score': 0.06386806},
{'category_id': 2, 'category': 'suanmiao', 'score': 0.03736042},
{'category_id': 12, 'category': 'heiqiezi', 'score': 0.007879922},
{'category_id': 17, 'category': 'huluobo', 'score': 0.006327283}]
```
# 训练目标检测模型
------
更多检测模型在VOC数据集或COCO数据集上的训练代码可参考[代码tutorials/train/detection/faster_rcnn_r50_fpn.py](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/detection/faster_rcnn_r50_fpn.py)[代码tutorials/train/detection/yolov3_mobilenetv1.py](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/detection/yolov3_mobilenetv1.py)
**1.下载并解压训练所需的数据集**
> 使用1张显卡训练并指定使用0号卡。
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx
```
> 这里使用昆虫数据集,训练集、验证集和测试集共包含1938个样本,6个类别。
```python
insect_dataset = 'https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz'
pdx.utils.download_and_decompress(insect_dataset, path='./')
```
**2.定义训练和验证过程中的数据处理和增强操作**
> 在训练过程中使用`RandomHorizontalFlip`进行数据增强,由于接下来选择的模型是带FPN结构的Faster RCNN,所以使用`Padding`将输入图像的尺寸补齐到32的倍数,以保证FPN中两个需做相加操作的特征层的尺寸完全相同。transforms的使用见[paddlex.det.transforms](../../apis/transforms/det_transforms.md)
```python
from paddlex.det import transforms
train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.Normalize(),
transforms.ResizeByShort(short_size=800, max_size=1333),
transforms.Padding(coarsest_stride=32)
])
eval_transforms = transforms.Compose([
transforms.Normalize(),
transforms.ResizeByShort(short_size=800, max_size=1333),
transforms.Padding(coarsest_stride=32),
])
```
**3.创建数据集读取器,并绑定相应的数据预处理流程**
> 数据集读取器的介绍见文档[paddlex.datasets](../../apis/datasets.md)
```python
train_dataset = pdx.datasets.VOCDetection(
data_dir='insect_det',
file_list='insect_det/train_list.txt',
label_list='insect_det/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.VOCDetection(
data_dir='insect_det',
file_list='insect_det/val_list.txt',
label_list='insect_det/labels.txt',
transforms=eval_transforms)
```
**4.创建Faster RCNN模型,并进行训练**
> 创建带FPN结构的Faster RCNN模型,`num_classes` 需要设置为包含背景类的类别数,即: 目标类别数量(6) + 1
```python
num_classes = len(train_dataset.labels) + 1
model = pdx.det.FasterRCNN(num_classes=num_classes)
```
> 模型训练默认下载并使用在ImageNet数据集上训练得到的Backone,用户也可自行指定`pretrain_weights`参数来设置预训练权重。训练过程每间隔`save_interval_epochs`会在`save_dir`保存一次模型,与此同时也会在验证数据集上计算指标。检测模型的接口可见文档[paddlex.cv.models](../../apis/models.md#fasterrcnn)
```python
model.train(
num_epochs=12,
train_dataset=train_dataset,
train_batch_size=2,
eval_dataset=eval_dataset,
learning_rate=0.0025,
lr_decay_epochs=[8, 11],
save_dir='output/faster_rcnn_r50_fpn',
use_vdl=True)
```
> 将`use_vdl`设置为`True`时可使用VisualDL查看训练指标。按以下方式启动VisualDL后,浏览器打开 https://0.0.0.0:8001即可。其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP。
```shell
visualdl --logdir output/faster_rcnn_r50_fpn/vdl_log --port 8001
```
**5.验证或测试**
> 训练完利用模型可继续在验证集上进行验证。
```python
eval_metrics = model.evaluate(eval_dataset, batch_size=2)
print("eval_metrics:", eval_metrics)
```
> 结果输出:
```python
eval_metrics: {'bbox_map': 76.085371}
```
> 训练完用模型对图片进行测试。
```python
predict_result = model.predict('./insect_det/JPEGImages/1968.jpg')
```
> 可视化测试结果:
```python
pdx.det.visualize('./insect_det/JPEGImages/1968.jpg', predict_result, threshold=0.5, save_dir='./output/faster_rcnn_r50_fpn')
```
![](../images/visualized_fasterrcnn.jpg)
模型训练
=========================
.. toctree::
:maxdepth: 1
classification.md
detection.md
instance_segmentation.md
segmentation.md
visualdl.md
# 训练实例分割模型
------
本文档训练代码可直接下载[代码tutorials/train/detection/mask_rcnn_r50_fpn.py](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/detection/mask_rcnn_r50_fpn.py)
**1.下载并解压训练所需的数据集**
> 使用1张显卡训练并指定使用0号卡。
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx
```
> 这里使用垃圾分拣数据集,训练集、验证集和测试共包含283个样本,6个类别。
```python
garbage_dataset = 'https://bj.bcebos.com/paddlex/datasets/garbage_ins_det.tar.gz'
pdx.utils.download_and_decompress(garbage_dataset, path='./')
```
**2.定义训练和验证过程中的数据处理和增强操作**
> 在训练过程中使用`RandomHorizontalFlip`进行数据增强,由于接下来选择的模型是带FPN结构的Mask RCNN,所以使用`PaddingImage`将输入图像的尺寸补齐到32的倍数,以保证FPN中两个需做相加操作的特征层的尺寸完全相同。transforms的使用见[paddlex.cv.transforms](../../apis/transforms/det_transforms.md)
```python
from paddlex.det import transforms
train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.Normalize(),
transforms.ResizeByShort(short_size=800, max_size=1333),
transforms.Padding(coarsest_stride=32)
])
eval_transforms = transforms.Compose([
transforms.Normalize(),
transforms.ResizeByShort(short_size=800, max_size=1333),
transforms.Padding(coarsest_stride=32)
])
```
**3.创建数据集读取器,并绑定相应的数据预处理流程**
> 数据集读取器的介绍见文档[paddlex.datasets](../../apis/datasets.md)
```python
train_dataset = pdx.datasets.CocoDetection(
data_dir='garbage_ins_det/JPEGImages',
ann_file='garbage_ins_det/train.json',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.CocoDetection(
data_dir='garbage_ins_det/JPEGImages',
ann_file='garbage_ins_det/val.json',
transforms=eval_transforms)
```
**4.创建Mask RCNN模型,并进行训练**
> 创建带FPN结构的Mask RCNN模型,`num_classes` 需要设置为包含背景类的类别数,即: 目标类别数量(6) + 1。
```python
num_classes = len(train_dataset.labels)
model = pdx.det.MaskRCNN(num_classes=num_classes
```
> 模型训练默认下载并使用在ImageNet数据集上训练得到的Backone,用户也可自行指定`pretrain_weights`参数来设置预训练权重。训练过程每间隔`save_interval_epochs`会在`save_dir`保存一次模型,与此同时也会在验证数据集上计算指标。检测模型的接口可见文档[paddlex.det.models](../../apis/models.md)。
```python
model.train(
num_epochs=12,
train_dataset=train_dataset,
train_batch_size=1,
eval_dataset=eval_dataset,
learning_rate=0.00125,
lr_decay_epochs=[8, 11],
save_dir='output/mask_rcnn_r50_fpn',
use_vdl=True)
```
> 将`use_vdl`设置为`True`时可使用VisualDL查看训练指标。按以下方式启动VisualDL后,浏览器打开 https://0.0.0.0:8001即可。其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP。
```shell
visualdl --logdir output/faster_rcnn_r50_fpn/vdl_log --port 8001
```
**5.验证或测试**
> 训练完利用模型可继续在验证集上进行验证。
```python
eval_metrics = model.evaluate(eval_dataset, batch_size=1)
print("eval_metrics:", eval_metrics)
```
> 结果输出:
```python
eval_metrics: {'bbox_mmap': 0.858306, 'segm_mmap': 0.864278}
```
> 训练完用模型对图片进行测试。
```python
predict_result = model.predict('./garbage_ins_det/JPEGImages/000114.bmp')
```
> 可视化测试结果:
```python
pdx.det.visualize('./garbage_ins_det/JPEGImages/000114.bmp', predict_result, threshold=0.7, save_dir='./output/mask_rcnn_r50_fpn')
```
![](../images/visualized_maskrcnn.bmp)
# 训练语义分割模型
---
更多语义分割模型在Cityscapes数据集上的训练代码可参考[代码tutorials/train/segmentation/unet.py](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/segmentation/unet.py)[代码tutorials/train/segmentation/deeplabv3p.py](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/segmentation/deeplabv3p.py)
**1.下载并解压训练所需的数据集**
> 使用1张显卡训练并指定使用0号卡。
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx
```
> 这里使用视盘分割数据集,训练集、验证集和测试集共包含343个样本,2个类别。
```python
optic_dataset = 'https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz'
pdx.utils.download_and_decompress(optic_dataset, path='./')
```
**2.定义训练和验证过程中的数据处理和增强操作**
> 在训练过程中使用`RandomHorizontalFlip`和`RandomPaddingCrop`进行数据增强,transforms的使用见[paddlex.seg.transforms](../../apis/transforms/seg_transforms.md)
```python
train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.Resize(target_size=512),
transforms.RandomPaddingCrop(crop_size=500),
transforms.Normalize()
])
eval_transforms = transforms.Compose([
transforms.Resize(512),
transforms.Normalize()
])
```
**3.创建数据集读取器,并绑定相应的数据预处理流程**
> 数据集读取器的介绍见文档[paddlex.cv.datasets](../../apis/datasets.md)
```python
train_dataset = pdx.datasets.SegDataset(
data_dir='optic_disc_seg',
file_list='optic_disc_seg/train_list.txt',
label_list='optic_disc_seg/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.SegDataset(
data_dir='optic_disc_seg',
file_list='optic_disc_seg/val_list.txt',
label_list='optic_disc_seg/labels.txt',
transforms=eval_transforms)
```
**4.创建DeepLabv3+模型,并进行训练**
> 创建DeepLabv3+模型,`num_classes` 需要设置为不包含背景类的类别数,即: 目标类别数量(1),详细代码可参见[demo](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/segmentation/deeplabv3p.py#L44)。
```python
num_classes = num_classes
model = pdx.seg.DeepLabv3p(num_classes=num_classes)
```
> 模型训练默认下载并使用在ImageNet数据集上训练得到的Backone,用户也可自行指定`pretrain_weights`参数来设置预训练权重。
训练过程每间隔`save_interval_epochs`会在`save_dir`保存一次模型,与此同时也会在验证数据集上计算指标。
检测模型的接口可见文档[paddlex.seg.models](../../apis/models.md)
```python
model.train(
num_epochs=40,
train_dataset=train_dataset,
train_batch_size=4,
eval_dataset=eval_dataset,
learning_rate=0.01,
save_dir='output/deeplab',
use_vdl=True)
```
> 将`use_vdl`设置为`True`时可使用VisualDL查看训练指标。按以下方式启动VisualDL后,浏览器打开 https://0.0.0.0:8001即可。其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP。
```shell
visualdl --logdir output/deeplab/vdl_log --port 8001
```
**5.验证或测试**
> 训练完利用模型可继续在验证集上进行验证。
```python
eval_metrics = model.evaluate(eval_dataset, batch_size=2)
print("eval_metrics:", eval_metrics)
```
> 结果输出:
```python
eval_metrics: {'miou': 0.8915175875548873, 'category_iou': [0.9956445981924432, 0.7873905769173314], 'macc': 0.9957137358816046, 'category_acc': [0.9975360650317765, 0.8948120441157331], 'kappa': 0.8788684558629085}
```
> 训练完用模型对图片进行测试。
```python
image_name = 'optic_disc_seg/JPEGImages/H0005.jpg'
predict_result = model.predict(image_name)
```
> 可视化测试结果:
```python
import paddlex as pdx
pdx.seg.visualize(image_name, predict_result, weight=0.4)
```
![](../images/visualized_deeplab.jpg)
# VisualDL可视化训练指标
在使用PaddleX训练模型过程中,各个训练指标和评估指标会直接输出到标准输出流,同时也可通过VisualDL对训练过程中的指标进行可视化,只需在调用`train`函数时,将`use_vdl`参数设为`True`即可,如下代码所示,
```
model = paddlex.cls.ResNet50(num_classes=1000)
model.train(num_epochs=120, train_dataset=train_dataset,
train_batch_size=32, eval_dataset=eval_dataset,
log_interval_steps=10, save_interval_epochs=10,
save_dir='./output', use_vdl=True)
```
模型在训练过程中,会在`save_dir`下生成`vdl_log`目录,通过在命令行终端执行以下命令,启动VisualDL。
```
visualdl --logdir=output/vdl_log --port=8008
```
在浏览器打开`http://0.0.0.0:8008`便可直接查看随训练迭代动态变化的各个指标(0.0.0.0表示启动VisualDL所在服务器的IP,本机使用0.0.0.0即可)。
在训练分类模型过程中,使用VisualDL进行可视化的示例图如下所示。
> 训练过程中每个Step的`Loss`和相应`Top1准确率`变化趋势:
![](../../images/vdl1.jpg)
> 训练过程中每个Step的`学习率lr`和相应`Top5准确率`变化趋势:
![](../../images/vdl2.jpg)
> 训练过程中,每次保存模型时,模型在验证数据集上的`Top1准确率`和`Top5准确率`:
![](../../images/vdl3.jpg)
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from .utils.utils import get_environ_info
from . import cv
from . import det
from . import seg
from . import cls
from . import slim
env_info = get_environ_info()
load_model = cv.models.load_model
datasets = cv.datasets
log_level = 2
__version__ = '0.1.0.github'
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from . import cv
ResNet18 = cv.models.ResNet18
ResNet34 = cv.models.ResNet34
ResNet50 = cv.models.ResNet50
ResNet101 = cv.models.ResNet101
ResNet50_vd = cv.models.ResNet50_vd
ResNet101_vd = cv.models.ResNet101_vd
DarkNet53 = cv.models.DarkNet53
MobileNetV1 = cv.models.MobileNetV1
MobileNetV2 = cv.models.MobileNetV2
MobileNetV3_small = cv.models.MobileNetV3_small
MobileNetV3_large = cv.models.MobileNetV3_large
Xception41 = cv.models.Xception41
Xception65 = cv.models.Xception65
DenseNet121 = cv.models.DenseNet121
DenseNet161 = cv.models.DenseNet161
DenseNet201 = cv.models.DenseNet201
ShuffleNetV2 = cv.models.ShuffleNetV2
transforms = cv.transforms.cls_transforms
from six import text_type as _text_type
import argparse
import sys
def arg_parser():
parser = argparse.ArgumentParser()
parser.add_argument(
"--model_dir",
"-m",
type=_text_type,
default=None,
help="define model directory path")
parser.add_argument(
"--save_dir",
"-s",
type=_text_type,
default=None,
help="path to save inference model")
parser.add_argument(
"--version",
"-v",
action="store_true",
default=False,
help="get version of PaddleX")
parser.add_argument(
"--export_inference",
"-e",
action="store_true",
default=False,
help="export inference model for C++/Python deployment")
return parser
def main():
import os
os.environ['CUDA_VISIBLE_DEVICES'] = ""
import paddlex as pdx
if len(sys.argv) < 2:
print("Use command 'paddlex -h` to print the help information\n")
return
parser = arg_parser()
args = parser.parse_args()
if args.version:
print("PaddleX-{}".format(pdx.__version__))
print("Repo: https://github.com/PaddlePaddle/PaddleX.git")
print("Email: paddlex@baidu.com")
return
if args.export_inference:
assert args.model_dir is not None, "--model_dir should be defined while exporting inference model"
assert args.save_dir is not None, "--save_dir should be defined to save inference model"
model = pdx.load_model(args.model_dir)
model.export_inference_model(args.save_dir)
if __name__ == "__main__":
main()
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import models
from . import nets
from . import transforms
from . import datasets
cls_transforms = transforms.cls_transforms
det_transforms = transforms.det_transforms
seg_transforms = transforms.seg_transforms
# classification
ResNet50 = models.ResNet50
DarkNet53 = models.DarkNet53
# detection
YOLOv3 = models.YOLOv3
#EAST = models.EAST
FasterRCNN = models.FasterRCNN
MaskRCNN = models.MaskRCNN
UNet = models.UNet
DeepLabv3p = models.DeepLabv3p
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .imagenet import ImageNet
from .voc import VOCDetection
from .coco import CocoDetection
from .seg_dataset import SegDataset
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
import copy
import os.path as osp
import random
import numpy as np
import paddlex.utils.logging as logging
import paddlex as pst
from pycocotools.coco import COCO
from .voc import VOCDetection
from .dataset import is_pic
class CocoDetection(VOCDetection):
"""读取MSCOCO格式的检测数据集,并对样本进行相应的处理,该格式的数据集同样可以应用到实例分割模型的训练中。
Args:
data_dir (str): 数据集所在的目录路径。
ann_file (str): 数据集的标注文件,为一个独立的json格式文件。
transforms (paddlex.det.transforms): 数据集中每个样本的预处理/增强算子。
num_workers (int|str): 数据集中样本在预处理过程中的线程或进程数。默认为'auto'。当设为'auto'时,根据
系统的实际CPU核数设置`num_workers`: 如果CPU核数的一半大于8,则`num_workers`为8,否则为CPU核数的一半。
buffer_size (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。
parallel_method (str): 数据集中样本在预处理过程中并行处理的方式,支持'thread'
线程和'process'进程两种方式。默认为'thread'(Windows和Mac下会强制使用thread,该参数无效)。
shuffle (bool): 是否需要对数据集中样本打乱顺序。默认为False。
"""
def __init__(self,
data_dir,
ann_file,
transforms=None,
num_workers='auto',
buffer_size=100,
parallel_method='process',
shuffle=False):
super(VOCDetection, self).__init__(
transforms=transforms,
num_workers=num_workers,
buffer_size=buffer_size,
parallel_method=parallel_method,
shuffle=shuffle)
self.file_list = list()
self.labels = list()
self._epoch = 0
coco = COCO(ann_file)
self.coco_gt = coco
img_ids = coco.getImgIds()
cat_ids = coco.getCatIds()
catid2clsid = dict({catid: i + 1 for i, catid in enumerate(cat_ids)})
cname2cid = dict({
coco.loadCats(catid)[0]['name']: clsid
for catid, clsid in catid2clsid.items()
})
for label, cid in sorted(cname2cid.items(), key=lambda d: d[1]):
self.labels.append(label)
logging.info("Starting to read file list from dataset...")
for img_id in img_ids:
img_anno = coco.loadImgs(img_id)[0]
im_fname = osp.join(data_dir, img_anno['file_name'])
if not is_pic(im_fname):
continue
im_w = float(img_anno['width'])
im_h = float(img_anno['height'])
ins_anno_ids = coco.getAnnIds(imgIds=img_id, iscrowd=False)
instances = coco.loadAnns(ins_anno_ids)
bboxes = []
for inst in instances:
x, y, box_w, box_h = inst['bbox']
x1 = max(0, x)
y1 = max(0, y)
x2 = min(im_w - 1, x1 + max(0, box_w - 1))
y2 = min(im_h - 1, y1 + max(0, box_h - 1))
if inst['area'] > 0 and x2 >= x1 and y2 >= y1:
inst['clean_bbox'] = [x1, y1, x2, y2]
bboxes.append(inst)
else:
logging.warning(
"Found an invalid bbox in annotations: im_id: {}, area: {} x1: {}, y1: {}, x2: {}, y2: {}."
.format(img_id, float(inst['area']), x1, y1, x2, y2))
num_bbox = len(bboxes)
gt_bbox = np.zeros((num_bbox, 4), dtype=np.float32)
gt_class = np.zeros((num_bbox, 1), dtype=np.int32)
gt_score = np.ones((num_bbox, 1), dtype=np.float32)
is_crowd = np.zeros((num_bbox, 1), dtype=np.int32)
difficult = np.zeros((num_bbox, 1), dtype=np.int32)
gt_poly = [None] * num_bbox
for i, box in enumerate(bboxes):
catid = box['category_id']
gt_class[i][0] = catid2clsid[catid]
gt_bbox[i, :] = box['clean_bbox']
is_crowd[i][0] = box['iscrowd']
if 'segmentation' in box:
gt_poly[i] = box['segmentation']
im_info = {
'im_id': np.array([img_id]).astype('int32'),
'origin_shape': np.array([im_h, im_w]).astype('int32'),
}
label_info = {
'is_crowd': is_crowd,
'gt_class': gt_class,
'gt_bbox': gt_bbox,
'gt_score': gt_score,
'gt_poly': gt_poly,
'difficult': difficult
}
coco_rec = (im_info, label_info)
self.file_list.append([im_fname, coco_rec])
if not len(self.file_list) > 0:
raise Exception('not found any coco record in %s' % (ann_file))
logging.info("{} samples in file {}".format(
len(self.file_list), ann_file))
self.num_samples = len(self.file_list)
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from threading import Thread
import multiprocessing
import collections
import numpy as np
import six
import sys
import copy
import random
import platform
import chardet
import paddlex.utils.logging as logging
class EndSignal():
pass
def is_pic(img_name):
valid_suffix = ['JPEG', 'jpeg', 'JPG', 'jpg', 'BMP', 'bmp', 'PNG', 'png']
suffix = img_name.split('.')[-1]
if suffix not in valid_suffix:
return False
return True
def is_valid(sample):
if sample is None:
return False
if isinstance(sample, tuple):
for s in sample:
if s is None:
return False
elif isinstance(s, np.ndarray) and s.size == 0:
return False
elif isinstance(s, collections.Sequence) and len(s) == 0:
return False
return True
def get_encoding(path):
f = open(path, 'rb')
data = f.read()
file_encoding = chardet.detect(data).get('encoding')
return file_encoding
def multithread_reader(mapper,
reader,
num_workers=4,
buffer_size=1024,
batch_size=8,
drop_last=True):
from queue import Queue
end = EndSignal()
# define a worker to read samples from reader to in_queue
def read_worker(reader, in_queue):
for i in reader():
in_queue.put(i)
in_queue.put(end)
# define a worker to handle samples from in_queue by mapper
# and put mapped samples into out_queue
def handle_worker(in_queue, out_queue, mapper):
sample = in_queue.get()
while not isinstance(sample, EndSignal):
if len(sample) == 2:
r = mapper(sample[0], sample[1])
elif len(sample) == 3:
r = mapper(sample[0], sample[1], sample[2])
else:
raise Exception('The sample\'s length must be 2 or 3.')
if is_valid(r):
out_queue.put(r)
sample = in_queue.get()
in_queue.put(end)
out_queue.put(end)
def xreader():
in_queue = Queue(buffer_size)
out_queue = Queue(buffer_size)
# start a read worker in a thread
target = read_worker
t = Thread(target=target, args=(reader, in_queue))
t.daemon = True
t.start()
# start several handle_workers
target = handle_worker
args = (in_queue, out_queue, mapper)
workers = []
for i in range(num_workers):
worker = Thread(target=target, args=args)
worker.daemon = True
workers.append(worker)
for w in workers:
w.start()
batch_data = []
sample = out_queue.get()
while not isinstance(sample, EndSignal):
batch_data.append(sample)
if len(batch_data) == batch_size:
batch_data = GenerateMiniBatch(batch_data)
yield batch_data
batch_data = []
sample = out_queue.get()
finish = 1
while finish < num_workers:
sample = out_queue.get()
if isinstance(sample, EndSignal):
finish += 1
else:
batch_data.append(sample)
if len(batch_data) == batch_size:
batch_data = GenerateMiniBatch(batch_data)
yield batch_data
batch_data = []
if not drop_last and len(batch_data) != 0:
batch_data = GenerateMiniBatch(batch_data)
yield batch_data
batch_data = []
return xreader
def multiprocess_reader(mapper,
reader,
num_workers=4,
buffer_size=1024,
batch_size=8,
drop_last=True):
from .shared_queue import SharedQueue as Queue
def _read_into_queue(samples, mapper, queue):
end = EndSignal()
try:
for sample in samples:
if sample is None:
raise ValueError("sample has None")
if len(sample) == 2:
result = mapper(sample[0], sample[1])
elif len(sample) == 3:
result = mapper(sample[0], sample[1], sample[2])
else:
raise Exception('The sample\'s length must be 2 or 3.')
if is_valid(result):
queue.put(result)
queue.put(end)
except:
queue.put("")
six.reraise(*sys.exc_info())
def queue_reader():
queue = Queue(buffer_size, memsize=3 * 1024**3)
total_samples = [[] for i in range(num_workers)]
for i, sample in enumerate(reader()):
index = i % num_workers
total_samples[index].append(sample)
for i in range(num_workers):
p = multiprocessing.Process(
target=_read_into_queue,
args=(total_samples[i], mapper, queue))
p.start()
finish_num = 0
batch_data = list()
while finish_num < num_workers:
sample = queue.get()
if isinstance(sample, EndSignal):
finish_num += 1
elif sample == "":
raise ValueError("multiprocess reader raises an exception")
else:
batch_data.append(sample)
if len(batch_data) == batch_size:
batch_data = GenerateMiniBatch(batch_data)
yield batch_data
batch_data = []
if len(batch_data) != 0 and not drop_last:
batch_data = GenerateMiniBatch(batch_data)
yield batch_data
batch_data = []
return queue_reader
def GenerateMiniBatch(batch_data):
if len(batch_data) == 1:
return batch_data
width = [data[0].shape[2] for data in batch_data]
height = [data[0].shape[1] for data in batch_data]
if len(set(width)) == 1 and len(set(height)) == 1:
return batch_data
max_shape = np.array([data[0].shape for data in batch_data]).max(axis=0)
padding_batch = []
for data in batch_data:
im_c, im_h, im_w = data[0].shape[:]
padding_im = np.zeros((im_c, max_shape[1], max_shape[2]),
dtype=np.float32)
padding_im[:, :im_h, :im_w] = data[0]
padding_batch.append((padding_im, ) + data[1:])
return padding_batch
class Dataset:
def __init__(self,
transforms=None,
num_workers='auto',
buffer_size=100,
parallel_method='process',
shuffle=False):
if num_workers == 'auto':
import multiprocessing as mp
num_workers = mp.cpu_count() // 2 if mp.cpu_count() // 2 < 8 else 8
if platform.platform().startswith(
"Darwin") or platform.platform().startswith("Windows"):
parallel_method = 'thread'
if transforms is None:
raise Exception("transform should be defined.")
self.transforms = transforms
self.num_workers = num_workers
self.buffer_size = buffer_size
self.parallel_method = parallel_method
self.shuffle = shuffle
def generator(self, batch_size=1, drop_last=True):
self.batch_size = batch_size
parallel_reader = multithread_reader
if self.parallel_method == "process":
if platform.platform().startswith("Windows"):
logging.debug(
"multiprocess_reader is not supported in Windows platform, force to use multithread_reader."
)
else:
parallel_reader = multiprocess_reader
return parallel_reader(
self.transforms,
self.iterator,
num_workers=self.num_workers,
buffer_size=self.buffer_size,
batch_size=batch_size,
drop_last=drop_last)
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
import os.path as osp
import random
import copy
import paddlex.utils.logging as logging
from .dataset import Dataset
from .dataset import is_pic
from .dataset import get_encoding
class ImageNet(Dataset):
"""读取ImageNet格式的分类数据集,并对样本进行相应的处理。
Args:
data_dir (str): 数据集所在的目录路径。
file_list (str): 描述数据集图片文件和类别id的文件路径(文本内每行路径为相对data_dir的相对路)。
label_list (str): 描述数据集包含的类别信息文件路径。
transforms (paddlex.cls.transforms): 数据集中每个样本的预处理/增强算子。
num_workers (int|str): 数据集中样本在预处理过程中的线程或进程数。默认为'auto'。当设为'auto'时,根据
系统的实际CPU核数设置`num_workers`: 如果CPU核数的一半大于8,则`num_workers`为8,否则为CPU核
数的一半。
buffer_size (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。
parallel_method (str): 数据集中样本在预处理过程中并行处理的方式,支持'thread'
线程和'process'进程两种方式。默认为'thread'(Windows和Mac下会强制使用thread,该参数无效)。
shuffle (bool): 是否需要对数据集中样本打乱顺序。默认为False。
"""
def __init__(self,
data_dir,
file_list,
label_list,
transforms=None,
num_workers='auto',
buffer_size=100,
parallel_method='process',
shuffle=False):
super(ImageNet, self).__init__(
transforms=transforms,
num_workers=num_workers,
buffer_size=buffer_size,
parallel_method=parallel_method,
shuffle=shuffle)
self.file_list = list()
self.labels = list()
self._epoch = 0
with open(label_list, encoding=get_encoding(label_list)) as f:
for line in f:
item = line.strip()
self.labels.append(item)
logging.info("Starting to read file list from dataset...")
with open(file_list, encoding=get_encoding(file_list)) as f:
for line in f:
items = line.strip().split()
if not is_pic(items[0]):
continue
full_path = osp.join(data_dir, items[0])
if not osp.exists(full_path):
raise IOError(
'The image file {} is not exist!'.format(full_path))
self.file_list.append([full_path, int(items[1])])
self.num_samples = len(self.file_list)
logging.info("{} samples in file {}".format(
len(self.file_list), file_list))
def iterator(self):
self._epoch += 1
self._pos = 0
files = copy.deepcopy(self.file_list)
if self.shuffle:
random.shuffle(files)
files = files[:self.num_samples]
self.num_samples = len(files)
for f in files:
records = f[1]
sample = [f[0], records]
yield sample
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
import os.path as osp
import random
import copy
import paddlex.utils.logging as logging
from .dataset import Dataset
from .dataset import get_encoding
from .dataset import is_pic
class SegDataset(Dataset):
"""读取语义分割任务数据集,并对样本进行相应的处理。
Args:
data_dir (str): 数据集所在的目录路径。
file_list (str): 描述数据集图片文件和对应标注文件的文件路径(文本内每行路径为相对data_dir的相对路)。
label_list (str): 描述数据集包含的类别信息文件路径。
transforms (list): 数据集中每个样本的预处理/增强算子。
num_workers (int): 数据集中样本在预处理过程中的线程或进程数。默认为4。
buffer_size (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。
parallel_method (str): 数据集中样本在预处理过程中并行处理的方式,支持'thread'
线程和'process'进程两种方式。默认为'thread'(Windows和Mac下会强制使用thread,该参数无效)。
shuffle (bool): 是否需要对数据集中样本打乱顺序。默认为False。
"""
def __init__(self,
data_dir,
file_list,
label_list,
transforms=None,
num_workers='auto',
buffer_size=100,
parallel_method='process',
shuffle=False):
super(SegDataset, self).__init__(
transforms=transforms,
num_workers=num_workers,
buffer_size=buffer_size,
parallel_method=parallel_method,
shuffle=shuffle)
self.file_list = list()
self.labels = list()
self._epoch = 0
with open(label_list, encoding=get_encoding(label_list)) as f:
for line in f:
item = line.strip()
self.labels.append(item)
with open(file_list, encoding=get_encoding(file_list)) as f:
for line in f:
items = line.strip().split()
if not is_pic(items[0]):
continue
full_path_im = osp.join(data_dir, items[0])
full_path_label = osp.join(data_dir, items[1])
if not osp.exists(full_path_im):
raise IOError(
'The image file {} is not exist!'.format(full_path_im))
if not osp.exists(full_path_label):
raise IOError('The image file {} is not exist!'.format(
full_path_label))
self.file_list.append([full_path_im, full_path_label])
self.num_samples = len(self.file_list)
logging.info("{} samples in file {}".format(
len(self.file_list), file_list))
def iterator(self):
self._epoch += 1
self._pos = 0
files = copy.deepcopy(self.file_list)
if self.shuffle:
random.shuffle(files)
files = files[:self.num_samples]
self.num_samples = len(files)
for f in files:
label_path = f[1]
sample = [f[0], None, label_path]
yield sample
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
__all__ = ['SharedBuffer', 'SharedMemoryMgr', 'SharedQueue']
from .sharedmemory import SharedBuffer
from .sharedmemory import SharedMemoryMgr
from .sharedmemory import SharedMemoryError
from .queue import SharedQueue
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import sys
import six
if six.PY3:
import pickle
from io import BytesIO as StringIO
else:
import cPickle as pickle
from cStringIO import StringIO
import logging
import traceback
import multiprocessing as mp
from multiprocessing.queues import Queue
from .sharedmemory import SharedMemoryMgr
logger = logging.getLogger(__name__)
class SharedQueueError(ValueError):
""" SharedQueueError
"""
pass
class SharedQueue(Queue):
""" a Queue based on shared memory to communicate data between Process,
and it's interface is compatible with 'multiprocessing.queues.Queue'
"""
def __init__(self, maxsize=0, mem_mgr=None, memsize=None, pagesize=None):
""" init
"""
if six.PY3:
super(SharedQueue, self).__init__(maxsize, ctx=mp.get_context())
else:
super(SharedQueue, self).__init__(maxsize)
if mem_mgr is not None:
self._shared_mem = mem_mgr
else:
self._shared_mem = SharedMemoryMgr(
capacity=memsize, pagesize=pagesize)
def put(self, obj, **kwargs):
""" put an object to this queue
"""
obj = pickle.dumps(obj, -1)
buff = None
try:
buff = self._shared_mem.malloc(len(obj))
buff.put(obj)
super(SharedQueue, self).put(buff, **kwargs)
except Exception as e:
stack_info = traceback.format_exc()
err_msg = 'failed to put a element to SharedQueue '\
'with stack info[%s]' % (stack_info)
logger.warn(err_msg)
if buff is not None:
buff.free()
raise e
def get(self, **kwargs):
""" get an object from this queue
"""
buff = None
try:
buff = super(SharedQueue, self).get(**kwargs)
data = buff.get()
return pickle.load(StringIO(data))
except Exception as e:
stack_info = traceback.format_exc()
err_msg = 'failed to get element from SharedQueue '\
'with stack info[%s]' % (stack_info)
logger.warn(err_msg)
raise e
finally:
if buff is not None:
buff.free()
def release(self):
self._shared_mem.release()
self._shared_mem = None
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# utils for memory management which is allocated on sharedmemory,
# note that these structures may not be thread-safe
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os
import time
import math
import struct
import sys
import six
if six.PY3:
import pickle
else:
import cPickle as pickle
import json
import uuid
import random
import numpy as np
import weakref
import logging
from multiprocessing import Lock
from multiprocessing import RawArray
logger = logging.getLogger(__name__)
class SharedMemoryError(ValueError):
""" SharedMemoryError
"""
pass
class SharedBufferError(SharedMemoryError):
""" SharedBufferError
"""
pass
class MemoryFullError(SharedMemoryError):
""" MemoryFullError
"""
def __init__(self, errmsg=''):
super(MemoryFullError, self).__init__()
self.errmsg = errmsg
def memcopy(dst, src, offset=0, length=None):
""" copy data from 'src' to 'dst' in bytes
"""
length = length if length is not None else len(src)
assert type(dst) == np.ndarray, 'invalid type for "dst" in memcopy'
if type(src) is not np.ndarray:
if type(src) is str and six.PY3:
src = src.encode()
src = np.frombuffer(src, dtype='uint8', count=len(src))
dst[:] = src[offset:offset + length]
class SharedBuffer(object):
""" Buffer allocated from SharedMemoryMgr, and it stores data on shared memory
note that:
every instance of this should be freed explicitely by calling 'self.free'
"""
def __init__(self, owner, capacity, pos, size=0, alloc_status=''):
""" Init
Args:
owner (str): manager to own this buffer
capacity (int): capacity in bytes for this buffer
pos (int): page position in shared memory
size (int): bytes already used
alloc_status (str): debug info about allocator when allocate this
"""
self._owner = owner
self._cap = capacity
self._pos = pos
self._size = size
self._alloc_status = alloc_status
assert self._pos >= 0 and self._cap > 0, \
"invalid params[%d:%d] to construct SharedBuffer" \
% (self._pos, self._cap)
def owner(self):
""" get owner
"""
return SharedMemoryMgr.get_mgr(self._owner)
def put(self, data, override=False):
""" put data to this buffer
Args:
data (str): data to be stored in this buffer
Returns:
None
Raises:
SharedMemoryError when not enough space in this buffer
"""
assert type(data) in [str, bytes], \
'invalid type[%s] for SharedBuffer::put' % (str(type(data)))
if self._size > 0 and not override:
raise SharedBufferError('already has already been setted before')
if self.capacity() < len(data):
raise SharedBufferError('data[%d] is larger than size of buffer[%s]'\
% (len(data), str(self)))
self.owner().put_data(self, data)
self._size = len(data)
def get(self, offset=0, size=None, no_copy=True):
""" get the data stored this buffer
Args:
offset (int): position for the start point to 'get'
size (int): size to get
Returns:
data (np.ndarray('uint8')): user's data in numpy
which is passed in by 'put'
None: if no data stored in
"""
offset = offset if offset >= 0 else self._size + offset
if self._size <= 0:
return None
size = self._size if size is None else size
assert offset + size <= self._cap, 'invalid offset[%d] '\
'or size[%d] for capacity[%d]' % (offset, size, self._cap)
return self.owner().get_data(self, offset, size, no_copy=no_copy)
def size(self):
""" bytes of used memory
"""
return self._size
def resize(self, size):
""" resize the used memory to 'size', should not be greater than capacity
"""
assert size >= 0 and size <= self._cap, \
"invalid size[%d] for resize" % (size)
self._size = size
def capacity(self):
""" size of allocated memory
"""
return self._cap
def __str__(self):
""" human readable format
"""
return "SharedBuffer(owner:%s, pos:%d, size:%d, "\
"capacity:%d, alloc_status:[%s], pid:%d)" \
% (str(self._owner), self._pos, self._size, \
self._cap, self._alloc_status, os.getpid())
def free(self):
""" free this buffer to it's owner
"""
if self._owner is not None:
self.owner().free(self)
self._owner = None
self._cap = 0
self._pos = -1
self._size = 0
return True
else:
return False
class PageAllocator(object):
""" allocator used to malloc and free shared memory which
is split into pages
"""
s_allocator_header = 12
def __init__(self, base, total_pages, page_size):
""" init
"""
self._magic_num = 1234321000 + random.randint(100, 999)
self._base = base
self._total_pages = total_pages
self._page_size = page_size
header_pages = int(
math.ceil((total_pages + self.s_allocator_header) / page_size))
self._header_pages = header_pages
self._free_pages = total_pages - header_pages
self._header_size = self._header_pages * page_size
self._reset()
def _dump_alloc_info(self, fname):
hpages, tpages, pos, used = self.header()
start = self.s_allocator_header
end = start + self._page_size * hpages
alloc_flags = self._base[start:end].tostring()
info = {
'magic_num': self._magic_num,
'header_pages': hpages,
'total_pages': tpages,
'pos': pos,
'used': used
}
info['alloc_flags'] = alloc_flags
fname = fname + '.' + str(uuid.uuid4())[:6]
with open(fname, 'wb') as f:
f.write(pickle.dumps(info, -1))
logger.warn('dump alloc info to file[%s]' % (fname))
def _reset(self):
alloc_page_pos = self._header_pages
used_pages = self._header_pages
header_info = struct.pack(
str('III'), self._magic_num, alloc_page_pos, used_pages)
assert len(header_info) == self.s_allocator_header, \
'invalid size of header_info'
memcopy(self._base[0:self.s_allocator_header], header_info)
self.set_page_status(0, self._header_pages, '1')
self.set_page_status(self._header_pages, self._free_pages, '0')
def header(self):
""" get header info of this allocator
"""
header_str = self._base[0:self.s_allocator_header].tostring()
magic, pos, used = struct.unpack(str('III'), header_str)
assert magic == self._magic_num, \
'invalid header magic[%d] in shared memory' % (magic)
return self._header_pages, self._total_pages, pos, used
def empty(self):
""" are all allocatable pages available
"""
header_pages, pages, pos, used = self.header()
return header_pages == used
def full(self):
""" are all allocatable pages used
"""
header_pages, pages, pos, used = self.header()
return header_pages + used == pages
def __str__(self):
header_pages, pages, pos, used = self.header()
desc = '{page_info[magic:%d,total:%d,used:%d,header:%d,alloc_pos:%d,pagesize:%d]}' \
% (self._magic_num, pages, used, header_pages, pos, self._page_size)
return 'PageAllocator:%s' % (desc)
def set_alloc_info(self, alloc_pos, used_pages):
""" set allocating position to new value
"""
memcopy(self._base[4:12], struct.pack(
str('II'), alloc_pos, used_pages))
def set_page_status(self, start, page_num, status):
""" set pages from 'start' to 'end' with new same status 'status'
"""
assert status in ['0', '1'], 'invalid status[%s] for page status '\
'in allocator[%s]' % (status, str(self))
start += self.s_allocator_header
end = start + page_num
assert start >= 0 and end <= self._header_size, 'invalid end[%d] of pages '\
'in allocator[%s]' % (end, str(self))
memcopy(self._base[start:end], str(status * page_num))
def get_page_status(self, start, page_num, ret_flag=False):
start += self.s_allocator_header
end = start + page_num
assert start >= 0 and end <= self._header_size, 'invalid end[%d] of pages '\
'in allocator[%s]' % (end, str(self))
status = self._base[start:end].tostring().decode()
if ret_flag:
return status
zero_num = status.count('0')
if zero_num == 0:
return (page_num, 1)
else:
return (zero_num, 0)
def malloc_page(self, page_num):
header_pages, pages, pos, used = self.header()
end = pos + page_num
if end > pages:
pos = self._header_pages
end = pos + page_num
start_pos = pos
flags = ''
while True:
# maybe flags already has some '0' pages,
# so just check 'page_num - len(flags)' pages
flags = self.get_page_status(pos, page_num, ret_flag=True)
if flags.count('0') == page_num:
break
# not found enough pages, so shift to next few pages
free_pos = flags.rfind('1') + 1
pos += free_pos
end = pos + page_num
if end > pages:
pos = self._header_pages
end = pos + page_num
flags = ''
# not found available pages after scan all pages
if pos <= start_pos and end >= start_pos:
logger.debug('not found available pages after scan all pages')
break
page_status = (flags.count('0'), 0)
if page_status != (page_num, 0):
free_pages = self._total_pages - used
if free_pages == 0:
err_msg = 'all pages have been used:%s' % (str(self))
else:
err_msg = 'not found available pages with page_status[%s] '\
'and %d free pages' % (str(page_status), free_pages)
err_msg = 'failed to malloc %d pages at pos[%d] for reason[%s] and allocator status[%s]' \
% (page_num, pos, err_msg, str(self))
raise MemoryFullError(err_msg)
self.set_page_status(pos, page_num, '1')
used += page_num
self.set_alloc_info(end, used)
return pos
def free_page(self, start, page_num):
""" free 'page_num' pages start from 'start'
"""
page_status = self.get_page_status(start, page_num)
assert page_status == (page_num, 1), \
'invalid status[%s] when free [%d, %d]' \
% (str(page_status), start, page_num)
self.set_page_status(start, page_num, '0')
_, _, pos, used = self.header()
used -= page_num
self.set_alloc_info(pos, used)
DEFAULT_SHARED_MEMORY_SIZE = 1024 * 1024 * 1024
class SharedMemoryMgr(object):
""" manage a continouse block of memory, provide
'malloc' to allocate new buffer, and 'free' to free buffer
"""
s_memory_mgrs = weakref.WeakValueDictionary()
s_mgr_num = 0
s_log_statis = False
@classmethod
def get_mgr(cls, id):
""" get a SharedMemoryMgr with size of 'capacity'
"""
assert id in cls.s_memory_mgrs, 'invalid id[%s] for memory managers' % (
id)
return cls.s_memory_mgrs[id]
def __init__(self, capacity=None, pagesize=None):
""" init
"""
logger.debug('create SharedMemoryMgr')
pagesize = 64 * 1024 if pagesize is None else pagesize
assert type(pagesize) is int, "invalid type of pagesize[%s]" \
% (str(pagesize))
capacity = DEFAULT_SHARED_MEMORY_SIZE if capacity is None else capacity
assert type(capacity) is int, "invalid type of capacity[%s]" \
% (str(capacity))
assert capacity > 0, '"size of shared memory should be greater than 0'
self._released = False
self._cap = capacity
self._page_size = pagesize
assert self._cap % self._page_size == 0, \
"capacity[%d] and pagesize[%d] are not consistent" \
% (self._cap, self._page_size)
self._total_pages = self._cap // self._page_size
self._pid = os.getpid()
SharedMemoryMgr.s_mgr_num += 1
self._id = self._pid * 100 + SharedMemoryMgr.s_mgr_num
SharedMemoryMgr.s_memory_mgrs[self._id] = self
self._locker = Lock()
self._setup()
def _setup(self):
self._shared_mem = RawArray('c', self._cap)
self._base = np.frombuffer(
self._shared_mem, dtype='uint8', count=self._cap)
self._locker.acquire()
try:
self._allocator = PageAllocator(self._base, self._total_pages,
self._page_size)
finally:
self._locker.release()
def malloc(self, size, wait=True):
""" malloc a new SharedBuffer
Args:
size (int): buffer size to be malloc
wait (bool): whether to wait when no enough memory
Returns:
SharedBuffer
Raises:
SharedMemoryError when not found available memory
"""
page_num = int(math.ceil(size / self._page_size))
size = page_num * self._page_size
start = None
ct = 0
errmsg = ''
while True:
self._locker.acquire()
try:
start = self._allocator.malloc_page(page_num)
alloc_status = str(self._allocator)
except MemoryFullError as e:
start = None
errmsg = e.errmsg
if not wait:
raise e
finally:
self._locker.release()
if start is None:
time.sleep(0.1)
if ct % 100 == 0:
logger.warn('not enough space for reason[%s]' % (errmsg))
ct += 1
else:
break
return SharedBuffer(self._id, size, start, alloc_status=alloc_status)
def free(self, shared_buf):
""" free a SharedBuffer
Args:
shared_buf (SharedBuffer): buffer to be freed
Returns:
None
Raises:
SharedMemoryError when failed to release this buffer
"""
assert shared_buf._owner == self._id, "invalid shared_buf[%s] "\
"for it's not allocated from me[%s]" % (str(shared_buf), str(self))
cap = shared_buf.capacity()
start_page = shared_buf._pos
page_num = cap // self._page_size
#maybe we don't need this lock here
self._locker.acquire()
try:
self._allocator.free_page(start_page, page_num)
finally:
self._locker.release()
def put_data(self, shared_buf, data):
""" fill 'data' into 'shared_buf'
"""
assert len(data) <= shared_buf.capacity(), 'too large data[%d] '\
'for this buffer[%s]' % (len(data), str(shared_buf))
start = shared_buf._pos * self._page_size
end = start + len(data)
assert start >= 0 and end <= self._cap, "invalid start "\
"position[%d] when put data to buff:%s" % (start, str(shared_buf))
self._base[start:end] = np.frombuffer(data, 'uint8', len(data))
def get_data(self, shared_buf, offset, size, no_copy=True):
""" extract 'data' from 'shared_buf' in range [offset, offset + size)
"""
start = shared_buf._pos * self._page_size
start += offset
if no_copy:
return self._base[start:start + size]
else:
return self._base[start:start + size].tostring()
def __str__(self):
return 'SharedMemoryMgr:{id:%d, %s}' % (self._id, str(self._allocator))
def __del__(self):
if SharedMemoryMgr.s_log_statis:
logger.info('destroy [%s]' % (self))
if not self._released and not self._allocator.empty():
logger.debug(
'not empty when delete this SharedMemoryMgr[%s]' % (self))
else:
self._released = True
if self._id in SharedMemoryMgr.s_memory_mgrs:
del SharedMemoryMgr.s_memory_mgrs[self._id]
SharedMemoryMgr.s_mgr_num -= 1
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
import copy
import os.path as osp
import random
import numpy as np
import xml.etree.ElementTree as ET
from pycocotools.coco import COCO
import paddlex.utils.logging as logging
from .dataset import Dataset
from .dataset import is_pic
from .dataset import get_encoding
class VOCDetection(Dataset):
"""读取PascalVOC格式的检测数据集,并对样本进行相应的处理。
Args:
data_dir (str): 数据集所在的目录路径。
file_list (str): 描述数据集图片文件和对应标注文件的文件路径(文本内每行路径为相对data_dir的相对路)。
label_list (str): 描述数据集包含的类别信息文件路径。
transforms (paddlex.det.transforms): 数据集中每个样本的预处理/增强算子。
num_workers (int|str): 数据集中样本在预处理过程中的线程或进程数。默认为'auto'。当设为'auto'时,根据
系统的实际CPU核数设置`num_workers`: 如果CPU核数的一半大于8,则`num_workers`为8,否则为CPU核数的
一半。
buffer_size (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。
parallel_method (str): 数据集中样本在预处理过程中并行处理的方式,支持'thread'
线程和'process'进程两种方式。默认为'thread'(Windows和Mac下会强制使用thread,该参数无效)。
shuffle (bool): 是否需要对数据集中样本打乱顺序。默认为False。
"""
def __init__(self,
data_dir,
file_list,
label_list,
transforms=None,
num_workers='auto',
buffer_size=100,
parallel_method='process',
shuffle=False):
super(VOCDetection, self).__init__(
transforms=transforms,
num_workers=num_workers,
buffer_size=buffer_size,
parallel_method=parallel_method,
shuffle=shuffle)
self.file_list = list()
self.labels = list()
self._epoch = 0
annotations = {}
annotations['images'] = []
annotations['categories'] = []
annotations['annotations'] = []
cname2cid = {}
label_id = 1
with open(label_list, 'r', encoding=get_encoding(label_list)) as fr:
for line in fr.readlines():
cname2cid[line.strip()] = label_id
label_id += 1
self.labels.append(line.strip())
logging.info("Starting to read file list from dataset...")
for k, v in cname2cid.items():
annotations['categories'].append({
'supercategory': 'component',
'id': v,
'name': k
})
ct = 0
ann_ct = 0
with open(file_list, 'r', encoding=get_encoding(file_list)) as fr:
while True:
line = fr.readline()
if not line:
break
img_file, xml_file = [osp.join(data_dir, x) \
for x in line.strip().split()[:2]]
if not is_pic(img_file):
continue
if not osp.isfile(xml_file):
continue
if not osp.exists(img_file):
raise IOError(
'The image file {} is not exist!'.format(img_file))
tree = ET.parse(xml_file)
if tree.find('id') is None:
im_id = np.array([ct])
else:
ct = int(tree.find('id').text)
im_id = np.array([int(tree.find('id').text)])
objs = tree.findall('object')
im_w = float(tree.find('size').find('width').text)
im_h = float(tree.find('size').find('height').text)
gt_bbox = np.zeros((len(objs), 4), dtype=np.float32)
gt_class = np.zeros((len(objs), 1), dtype=np.int32)
gt_score = np.ones((len(objs), 1), dtype=np.float32)
is_crowd = np.zeros((len(objs), 1), dtype=np.int32)
difficult = np.zeros((len(objs), 1), dtype=np.int32)
for i, obj in enumerate(objs):
cname = obj.find('name').text
gt_class[i][0] = cname2cid[cname]
_difficult = int(obj.find('difficult').text)
x1 = float(obj.find('bndbox').find('xmin').text)
y1 = float(obj.find('bndbox').find('ymin').text)
x2 = float(obj.find('bndbox').find('xmax').text)
y2 = float(obj.find('bndbox').find('ymax').text)
x1 = max(0, x1)
y1 = max(0, y1)
x2 = min(im_w - 1, x2)
y2 = min(im_h - 1, y2)
gt_bbox[i] = [x1, y1, x2, y2]
is_crowd[i][0] = 0
difficult[i][0] = _difficult
annotations['annotations'].append({
'iscrowd':
0,
'image_id':
int(im_id[0]),
'bbox': [x1, y1, x2 - x1 + 1, y2 - y1 + 1],
'area':
float((x2 - x1 + 1) * (y2 - y1 + 1)),
'category_id':
cname2cid[cname],
'id':
ann_ct,
'difficult':
_difficult
})
ann_ct += 1
im_info = {
'im_id': im_id,
'origin_shape': np.array([im_h, im_w]).astype('int32'),
}
label_info = {
'is_crowd': is_crowd,
'gt_class': gt_class,
'gt_bbox': gt_bbox,
'gt_score': gt_score,
'gt_poly': [],
'difficult': difficult
}
voc_rec = (im_info, label_info)
if len(objs) != 0:
self.file_list.append([img_file, voc_rec])
ct += 1
annotations['images'].append({
'height':
im_h,
'width':
im_w,
'id':
int(im_id[0]),
'file_name':
osp.split(img_file)[1]
})
if not len(self.file_list) > 0:
raise Exception('not found any voc record in %s' % (file_list))
logging.info("{} samples in file {}".format(
len(self.file_list), file_list))
self.num_samples = len(self.file_list)
self.coco_gt = COCO()
self.coco_gt.dataset = annotations
self.coco_gt.createIndex()
def iterator(self):
self._epoch += 1
self._pos = 0
files = copy.deepcopy(self.file_list)
if self.shuffle:
random.shuffle(files)
files = files[:self.num_samples]
self.num_samples = len(files)
for f in files:
records = f[1]
im_info = copy.deepcopy(records[0])
label_info = copy.deepcopy(records[1])
im_info['epoch'] = self._epoch
if self.num_samples > 1:
mix_idx = random.randint(1, self.num_samples - 1)
mix_pos = (mix_idx + self._pos) % self.num_samples
else:
mix_pos = 0
im_info['mixup'] = [
files[mix_pos][0],
copy.deepcopy(files[mix_pos][1][0]),
copy.deepcopy(files[mix_pos][1][1])
]
self._pos += 1
sample = [f[0], im_info, label_info]
yield sample
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .classifier import BaseClassifier
from .classifier import ResNet18
from .classifier import ResNet34
from .classifier import ResNet50
from .classifier import ResNet101
from .classifier import ResNet50_vd
from .classifier import ResNet101_vd
from .classifier import DarkNet53
from .classifier import MobileNetV1
from .classifier import MobileNetV2
from .classifier import MobileNetV3_small
from .classifier import MobileNetV3_large
from .classifier import Xception41
from .classifier import Xception65
from .classifier import DenseNet121
from .classifier import DenseNet161
from .classifier import DenseNet201
from .classifier import ShuffleNetV2
from .base import BaseAPI
from .yolo_v3 import YOLOv3
from .faster_rcnn import FasterRCNN
from .mask_rcnn import MaskRCNN
from .unet import UNet
from .deeplabv3p import DeepLabv3p
from .load_model import load_model
from .slim import prune
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
import paddle.fluid as fluid
import os
import numpy as np
import time
import math
import yaml
import copy
import json
import functools
import paddlex.utils.logging as logging
from paddlex.utils import seconds_to_hms
import paddlex
from collections import OrderedDict
from os import path as osp
from paddle.fluid.framework import Program
from .utils.pretrain_weights import get_pretrain_weights
def dict2str(dict_input):
out = ''
for k, v in dict_input.items():
try:
v = round(float(v), 6)
except:
pass
out = out + '{}={}, '.format(k, v)
return out.strip(', ')
class BaseAPI:
def __init__(self, model_type):
self.model_type = model_type
# 现有的CV模型都有这个属性,而这个属且也需要在eval时用到
self.num_classes = None
self.labels = None
self.version = paddlex.__version__
if paddlex.env_info['place'] == 'cpu':
self.places = fluid.cpu_places()
else:
self.places = fluid.cuda_places()
self.exe = fluid.Executor(self.places[0])
self.train_prog = None
self.test_prog = None
self.parallel_train_prog = None
self.train_inputs = None
self.test_inputs = None
self.train_outputs = None
self.test_outputs = None
self.train_data_loader = None
self.eval_metrics = None
# 若模型是从inference model加载进来的,无法调用训练接口进行训练
self.trainable = True
# 是否使用多卡间同步BatchNorm均值和方差
self.sync_bn = False
# 当前模型状态
self.status = 'Normal'
def _get_single_card_bs(self, batch_size):
if batch_size % len(self.places) == 0:
return int(batch_size // len(self.places))
else:
raise Exception("Please support correct batch_size, \
which can be divided by available cards({}) in {}".
format(paddlex.env_info['num'],
paddlex.env_info['place']))
def build_program(self):
# 构建训练网络
self.train_inputs, self.train_outputs = self.build_net(mode='train')
self.train_prog = fluid.default_main_program()
startup_prog = fluid.default_startup_program()
# 构建预测网络
self.test_prog = fluid.Program()
with fluid.program_guard(self.test_prog, startup_prog):
with fluid.unique_name.guard():
self.test_inputs, self.test_outputs = self.build_net(
mode='test')
self.test_prog = self.test_prog.clone(for_test=True)
def arrange_transforms(self, transforms, mode='train'):
# 给transforms添加arrange操作
if self.model_type == 'classifier':
arrange_transform = paddlex.cls.transforms.ArrangeClassifier
elif self.model_type == 'segmenter':
arrange_transform = paddlex.seg.transforms.ArrangeSegmenter
elif self.model_type == 'detector':
arrange_name = 'Arrange{}'.format(self.__class__.__name__)
arrange_transform = getattr(paddlex.det.transforms, arrange_name)
else:
raise Exception("Unrecognized model type: {}".format(
self.model_type))
if type(transforms.transforms[-1]).__name__.startswith('Arrange'):
transforms.transforms[-1] = arrange_transform(mode=mode)
else:
transforms.transforms.append(arrange_transform(mode=mode))
def build_train_data_loader(self, dataset, batch_size):
# 初始化data_loader
if self.train_data_loader is None:
self.train_data_loader = fluid.io.DataLoader.from_generator(
feed_list=list(self.train_inputs.values()),
capacity=64,
use_double_buffer=True,
iterable=True)
batch_size_each_gpu = self._get_single_card_bs(batch_size)
generator = dataset.generator(
batch_size=batch_size_each_gpu, drop_last=True)
self.train_data_loader.set_sample_list_generator(
dataset.generator(batch_size=batch_size_each_gpu),
places=self.places)
def export_quant_model(self,
dataset,
save_dir,
batch_size=1,
batch_num=10,
cache_dir="./temp"):
self.arrange_transforms(transforms=dataset.transforms, mode='quant')
dataset.num_samples = batch_size * batch_num
try:
from .slim.post_quantization import PaddleXPostTrainingQuantization
except:
raise Exception(
"Model Quantization is not available, try to upgrade your paddlepaddle>=1.7.0"
)
is_use_cache_file = True
if cache_dir is None:
is_use_cache_file = False
post_training_quantization = PaddleXPostTrainingQuantization(
executor=self.exe,
dataset=dataset,
program=self.test_prog,
inputs=self.test_inputs,
outputs=self.test_outputs,
batch_size=batch_size,
batch_nums=batch_num,
scope=None,
algo='KL',
quantizable_op_type=["conv2d", "depthwise_conv2d", "mul"],
is_full_quantize=False,
is_use_cache_file=is_use_cache_file,
cache_dir=cache_dir)
post_training_quantization.quantize()
post_training_quantization.save_quantized_model(save_dir)
model_info = self.get_model_info()
model_info['status'] = 'Quant'
# 保存模型输出的变量描述
model_info['_ModelInputsOutputs'] = dict()
model_info['_ModelInputsOutputs']['test_inputs'] = [
[k, v.name] for k, v in self.test_inputs.items()
]
model_info['_ModelInputsOutputs']['test_outputs'] = [
[k, v.name] for k, v in self.test_outputs.items()
]
with open(
osp.join(save_dir, 'model.yml'), encoding='utf-8',
mode='w') as f:
yaml.dump(model_info, f)
def net_initialize(self,
startup_prog=None,
pretrain_weights=None,
fuse_bn=False,
save_dir='.',
sensitivities_file=None,
eval_metric_loss=0.05):
pretrain_dir = osp.join(save_dir, 'pretrain')
if not os.path.isdir(pretrain_dir):
if os.path.exists(pretrain_dir):
os.remove(pretrain_dir)
os.makedirs(pretrain_dir)
if hasattr(self, 'backbone'):
backbone = self.backbone
else:
backbone = self.__class__.__name__
pretrain_weights = get_pretrain_weights(
pretrain_weights, self.model_type, backbone, pretrain_dir)
if startup_prog is None:
startup_prog = fluid.default_startup_program()
self.exe.run(startup_prog)
if pretrain_weights is not None:
logging.info(
"Load pretrain weights from {}.".format(pretrain_weights))
paddlex.utils.utils.load_pretrain_weights(
self.exe, self.train_prog, pretrain_weights, fuse_bn)
# 进行裁剪
if sensitivities_file is not None:
from .slim.prune_config import get_sensitivities
sensitivities_file = get_sensitivities(sensitivities_file, self,
save_dir)
from .slim.prune import get_params_ratios, prune_program
prune_params_ratios = get_params_ratios(
sensitivities_file, eval_metric_loss=eval_metric_loss)
prune_program(self, prune_params_ratios)
self.status = 'Prune'
def get_model_info(self):
info = dict()
info['version'] = paddlex.__version__
info['Model'] = self.__class__.__name__
info['_Attributes'] = {'model_type': self.model_type}
if 'self' in self.init_params:
del self.init_params['self']
if '__class__' in self.init_params:
del self.init_params['__class__']
info['_init_params'] = self.init_params
info['_Attributes']['num_classes'] = self.num_classes
info['_Attributes']['labels'] = self.labels
try:
primary_metric_key = list(self.eval_metrics.keys())[0]
primary_metric_value = float(self.eval_metrics[primary_metric_key])
info['_Attributes']['eval_metrics'] = {
primary_metric_key: primary_metric_value
}
except:
pass
if hasattr(self.test_transforms, 'to_rgb'):
if self.test_transforms.to_rgb:
info['TransformsMode'] = 'RGB'
else:
info['TransformsMode'] = 'BGR'
if hasattr(self, 'test_transforms'):
if self.test_transforms is not None:
info['Transforms'] = list()
for op in self.test_transforms.transforms:
name = op.__class__.__name__
attr = op.__dict__
info['Transforms'].append({name: attr})
return info
def save_model(self, save_dir):
if not osp.isdir(save_dir):
if osp.exists(save_dir):
os.remove(save_dir)
os.makedirs(save_dir)
fluid.save(self.train_prog, osp.join(save_dir, 'model'))
model_info = self.get_model_info()
model_info['status'] = self.status
with open(
osp.join(save_dir, 'model.yml'), encoding='utf-8',
mode='w') as f:
yaml.dump(model_info, f)
# 评估结果保存
if hasattr(self, 'eval_details'):
with open(osp.join(save_dir, 'eval_details.json'), 'w') as f:
json.dump(self.eval_details, f)
if self.status == 'Prune':
# 保存裁剪的shape
shapes = {}
for block in self.train_prog.blocks:
for param in block.all_parameters():
pd_var = fluid.global_scope().find_var(param.name)
pd_param = pd_var.get_tensor()
shapes[param.name] = np.array(pd_param).shape
with open(
osp.join(save_dir, 'prune.yml'), encoding='utf-8',
mode='w') as f:
yaml.dump(shapes, f)
# 模型保存成功的标志
open(osp.join(save_dir, '.success'), 'w').close()
logging.info("Model saved in {}.".format(save_dir))
def export_inference_model(self, save_dir):
test_input_names = [
var.name for var in list(self.test_inputs.values())
]
test_outputs = list(self.test_outputs.values())
if self.__class__.__name__ == 'MaskRCNN':
from paddlex.utils.save import save_mask_inference_model
save_mask_inference_model(
dirname=save_dir,
executor=self.exe,
params_filename='__params__',
feeded_var_names=test_input_names,
target_vars=test_outputs,
main_program=self.test_prog)
else:
fluid.io.save_inference_model(
dirname=save_dir,
executor=self.exe,
params_filename='__params__',
feeded_var_names=test_input_names,
target_vars=test_outputs,
main_program=self.test_prog)
model_info = self.get_model_info()
model_info['status'] = 'Infer'
# 保存模型输出的变量描述
model_info['_ModelInputsOutputs'] = dict()
model_info['_ModelInputsOutputs']['test_inputs'] = [
[k, v.name] for k, v in self.test_inputs.items()
]
model_info['_ModelInputsOutputs']['test_outputs'] = [
[k, v.name] for k, v in self.test_outputs.items()
]
with open(
osp.join(save_dir, 'model.yml'), encoding='utf-8',
mode='w') as f:
yaml.dump(model_info, f)
# 模型保存成功的标志
open(osp.join(save_dir, '.success'), 'w').close()
logging.info(
"Model for inference deploy saved in {}.".format(save_dir))
def train_loop(self,
num_epochs,
train_dataset,
train_batch_size,
eval_dataset=None,
save_interval_epochs=1,
log_interval_steps=10,
save_dir='output',
use_vdl=False):
if not osp.isdir(save_dir):
if osp.exists(save_dir):
os.remove(save_dir)
os.makedirs(save_dir)
if use_vdl:
from visualdl import LogWriter
vdl_logdir = osp.join(save_dir, 'vdl_log')
# 给transform添加arrange操作
self.arrange_transforms(
transforms=train_dataset.transforms, mode='train')
# 构建train_data_loader
self.build_train_data_loader(
dataset=train_dataset, batch_size=train_batch_size)
if eval_dataset is not None:
self.eval_transforms = eval_dataset.transforms
self.test_transforms = copy.deepcopy(eval_dataset.transforms)
# 获取实时变化的learning rate
lr = self.optimizer._learning_rate
if isinstance(lr, fluid.framework.Variable):
self.train_outputs['lr'] = lr
# 在多卡上跑训练
if self.parallel_train_prog is None:
build_strategy = fluid.compiler.BuildStrategy()
build_strategy.fuse_all_optimizer_ops = False
if paddlex.env_info['place'] != 'cpu' and len(self.places) > 1:
build_strategy.sync_batch_norm = self.sync_bn
exec_strategy = fluid.ExecutionStrategy()
exec_strategy.num_iteration_per_drop_scope = 1
self.parallel_train_prog = fluid.CompiledProgram(
self.train_prog).with_data_parallel(
loss_name=self.train_outputs['loss'].name,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
total_num_steps = math.floor(
train_dataset.num_samples / train_batch_size)
num_steps = 0
time_stat = list()
time_train_one_epoch = None
time_eval_one_epoch = None
total_num_steps_eval = 0
# 模型总共的评估次数
total_eval_times = math.ceil(num_epochs / save_interval_epochs)
# 检测目前仅支持单卡评估,训练数据batch大小与显卡数量之商为验证数据batch大小。
eval_batch_size = train_batch_size
if self.model_type == 'detector':
eval_batch_size = self._get_single_card_bs(train_batch_size)
if eval_dataset is not None:
total_num_steps_eval = math.ceil(
eval_dataset.num_samples / eval_batch_size)
if use_vdl:
# VisualDL component
log_writer = LogWriter(vdl_logdir, sync_cycle=20)
train_step_component = OrderedDict()
eval_component = OrderedDict()
best_accuracy_key = ""
best_accuracy = -1.0
best_model_epoch = 1
for i in range(num_epochs):
records = list()
step_start_time = time.time()
epoch_start_time = time.time()
for step, data in enumerate(self.train_data_loader()):
outputs = self.exe.run(
self.parallel_train_prog,
feed=data,
fetch_list=list(self.train_outputs.values()))
outputs_avg = np.mean(np.array(outputs), axis=1)
records.append(outputs_avg)
# 训练完成剩余时间预估
current_time = time.time()
step_cost_time = current_time - step_start_time
step_start_time = current_time
if len(time_stat) < 20:
time_stat.append(step_cost_time)
else:
time_stat[num_steps % 20] = step_cost_time
# 每间隔log_interval_steps,输出loss信息
num_steps += 1
if num_steps % log_interval_steps == 0:
step_metrics = OrderedDict(
zip(list(self.train_outputs.keys()), outputs_avg))
if use_vdl:
for k, v in step_metrics.items():
if k not in train_step_component.keys():
with log_writer.mode('Each_Step_while_Training'
) as step_logger:
train_step_component[
k] = step_logger.scalar(
'Training: {}'.format(k))
train_step_component[k].add_record(num_steps, v)
# 估算剩余时间
avg_step_time = np.mean(time_stat)
if time_train_one_epoch is not None:
eta = (num_epochs - i - 1) * time_train_one_epoch + (
total_num_steps - step - 1) * avg_step_time
else:
eta = ((num_epochs - i) * total_num_steps - step -
1) * avg_step_time
if time_eval_one_epoch is not None:
eval_eta = (total_eval_times - i //
save_interval_epochs) * time_eval_one_epoch
else:
eval_eta = (
total_eval_times - i // save_interval_epochs
) * total_num_steps_eval * avg_step_time
eta_str = seconds_to_hms(eta + eval_eta)
logging.info(
"[TRAIN] Epoch={}/{}, Step={}/{}, {}, time_each_step={}s, eta={}"
.format(i + 1, num_epochs, step + 1, total_num_steps,
dict2str(step_metrics), round(
avg_step_time, 2), eta_str))
train_metrics = OrderedDict(
zip(list(self.train_outputs.keys()), np.mean(records, axis=0)))
logging.info('[TRAIN] Epoch {} finished, {} .'.format(
i + 1, dict2str(train_metrics)))
time_train_one_epoch = time.time() - epoch_start_time
epoch_start_time = time.time()
# 每间隔save_interval_epochs, 在验证集上评估和对模型进行保存
eval_epoch_start_time = time.time()
if (i + 1) % save_interval_epochs == 0 or i == num_epochs - 1:
current_save_dir = osp.join(save_dir, "epoch_{}".format(i + 1))
if not osp.isdir(current_save_dir):
os.makedirs(current_save_dir)
if eval_dataset is not None:
self.eval_metrics, self.eval_details = self.evaluate(
eval_dataset=eval_dataset,
batch_size=eval_batch_size,
epoch_id=i + 1,
return_details=True)
logging.info('[EVAL] Finished, Epoch={}, {} .'.format(
i + 1, dict2str(self.eval_metrics)))
# 保存最优模型
best_accuracy_key = list(self.eval_metrics.keys())[0]
current_accuracy = self.eval_metrics[best_accuracy_key]
if current_accuracy > best_accuracy:
best_accuracy = current_accuracy
best_model_epoch = i + 1
best_model_dir = osp.join(save_dir, "best_model")
self.save_model(save_dir=best_model_dir)
if use_vdl:
for k, v in self.eval_metrics.items():
if isinstance(v, list):
continue
if isinstance(v, np.ndarray):
if v.size > 1:
continue
if k not in eval_component:
with log_writer.mode('Each_Epoch_on_Eval_Data'
) as eval_logger:
eval_component[k] = eval_logger.scalar(
'Evaluation: {}'.format(k))
eval_component[k].add_record(i + 1, v)
self.save_model(save_dir=current_save_dir)
time_eval_one_epoch = time.time() - eval_epoch_start_time
eval_epoch_start_time = time.time()
logging.info(
'Current evaluated best model in eval_dataset is epoch_{}, {}={}'
.format(best_model_epoch, best_accuracy_key,
best_accuracy))
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
import numpy as np
import time
import math
import tqdm
import paddle.fluid as fluid
import paddlex.utils.logging as logging
from paddlex.utils import seconds_to_hms
import paddlex
from collections import OrderedDict
from .base import BaseAPI
class BaseClassifier(BaseAPI):
"""构建分类器,并实现其训练、评估、预测和模型导出。
Args:
model_name (str): 分类器的模型名字,取值范围为['ResNet18',
'ResNet34', 'ResNet50', 'ResNet101',
'ResNet50_vd', 'ResNet101_vd', 'DarkNet53',
'MobileNetV1', 'MobileNetV2', 'Xception41',
'Xception65', 'Xception71']。默认为'ResNet50'。
num_classes (int): 类别数。默认为1000。
"""
def __init__(self, model_name='ResNet50', num_classes=1000):
self.init_params = locals()
super(BaseClassifier, self).__init__('classifier')
if not hasattr(paddlex.cv.nets, str.lower(model_name)):
raise Exception(
"ERROR: There's no model named {}.".format(model_name))
self.model_name = model_name
self.labels = None
self.num_classes = num_classes
def build_net(self, mode='train'):
image = fluid.data(
dtype='float32', shape=[None, 3, None, None], name='image')
if mode != 'test':
label = fluid.data(dtype='int64', shape=[None, 1], name='label')
model = getattr(paddlex.cv.nets, str.lower(self.model_name))
net_out = model(image, num_classes=self.num_classes)
softmax_out = fluid.layers.softmax(net_out, use_cudnn=False)
inputs = OrderedDict([('image', image)])
outputs = OrderedDict([('predict', softmax_out)])
if mode != 'test':
cost = fluid.layers.cross_entropy(input=softmax_out, label=label)
avg_cost = fluid.layers.mean(cost)
acc1 = fluid.layers.accuracy(input=softmax_out, label=label, k=1)
k = min(5, self.num_classes)
acck = fluid.layers.accuracy(input=softmax_out, label=label, k=k)
if mode == 'train':
self.optimizer.minimize(avg_cost)
inputs = OrderedDict([('image', image), ('label', label)])
outputs = OrderedDict([('loss', avg_cost), ('acc1', acc1),
('acc{}'.format(k), acck)])
if mode == 'eval':
del outputs['loss']
return inputs, outputs
def default_optimizer(self, learning_rate, lr_decay_epochs, lr_decay_gamma,
num_steps_each_epoch):
boundaries = [b * num_steps_each_epoch for b in lr_decay_epochs]
values = [
learning_rate * (lr_decay_gamma**i)
for i in range(len(lr_decay_epochs) + 1)
]
lr_decay = fluid.layers.piecewise_decay(
boundaries=boundaries, values=values)
optimizer = fluid.optimizer.Momentum(
lr_decay,
momentum=0.9,
regularization=fluid.regularizer.L2Decay(1e-04))
return optimizer
def train(self,
num_epochs,
train_dataset,
train_batch_size=64,
eval_dataset=None,
save_interval_epochs=1,
log_interval_steps=2,
save_dir='output',
pretrain_weights='IMAGENET',
optimizer=None,
learning_rate=0.025,
lr_decay_epochs=[30, 60, 90],
lr_decay_gamma=0.1,
use_vdl=False,
sensitivities_file=None,
eval_metric_loss=0.05):
"""训练。
Args:
num_epochs (int): 训练迭代轮数。
train_dataset (paddlex.datasets): 训练数据读取器。
train_batch_size (int): 训练数据batch大小。同时作为验证数据batch大小。默认值为64。
eval_dataset (paddlex.datasets: 验证数据读取器。
save_interval_epochs (int): 模型保存间隔(单位:迭代轮数)。默认为1。
log_interval_steps (int): 训练日志输出间隔(单位:迭代步数)。默认为2。
save_dir (str): 模型保存路径。
pretrain_weights (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',
则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认为'IMAGENET'。
optimizer (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:
fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
learning_rate (float): 默认优化器的初始学习率。默认为0.025。
lr_decay_epochs (list): 默认优化器的学习率衰减轮数。默认为[30, 60, 90]。
lr_decay_gamma (float): 默认优化器的学习率衰减率。默认为0.1。
use_vdl (bool): 是否使用VisualDL进行可视化。默认值为False。
sensitivities_file (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',
则自动下载在ImageNet图片数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
eval_metric_loss (float): 可容忍的精度损失。默认为0.05。
Raises:
ValueError: 模型从inference model进行加载。
"""
if not self.trainable:
raise ValueError(
"Model is not trainable since it was loaded from a inference model."
)
self.labels = train_dataset.labels
if optimizer is None:
num_steps_each_epoch = train_dataset.num_samples // train_batch_size
optimizer = self.default_optimizer(
learning_rate=learning_rate,
lr_decay_epochs=lr_decay_epochs,
lr_decay_gamma=lr_decay_gamma,
num_steps_each_epoch=num_steps_each_epoch)
self.optimizer = optimizer
# 构建训练、验证、预测网络
self.build_program()
# 初始化网络权重
self.net_initialize(
startup_prog=fluid.default_startup_program(),
pretrain_weights=pretrain_weights,
save_dir=save_dir,
sensitivities_file=sensitivities_file,
eval_metric_loss=eval_metric_loss)
# 训练
self.train_loop(
num_epochs=num_epochs,
train_dataset=train_dataset,
train_batch_size=train_batch_size,
eval_dataset=eval_dataset,
save_interval_epochs=save_interval_epochs,
log_interval_steps=log_interval_steps,
save_dir=save_dir,
use_vdl=use_vdl)
def evaluate(self,
eval_dataset,
batch_size=1,
epoch_id=None,
return_details=False):
"""评估。
Args:
eval_dataset (paddlex.datasets): 验证数据读取器。
batch_size (int): 验证数据批大小。默认为1。
epoch_id (int): 当前评估模型所在的训练轮数。
return_details (bool): 是否返回详细信息。
Returns:
dict: 当return_details为False时,返回dict, 包含关键字:'acc1'、'acc5',
分别表示最大值的accuracy、前5个最大值的accuracy。
tuple (metrics, eval_details): 当return_details为True时,增加返回dict,
包含关键字:'true_labels'、'pred_scores',分别代表真实类别id、每个类别的预测得分。
"""
self.arrange_transforms(
transforms=eval_dataset.transforms, mode='eval')
data_generator = eval_dataset.generator(
batch_size=batch_size, drop_last=False)
k = min(5, self.num_classes)
total_steps = math.ceil(eval_dataset.num_samples * 1.0 / batch_size)
true_labels = list()
pred_scores = list()
if not hasattr(self, 'parallel_test_prog'):
self.parallel_test_prog = fluid.CompiledProgram(
self.test_prog).with_data_parallel(
share_vars_from=self.parallel_train_prog)
batch_size_each_gpu = self._get_single_card_bs(batch_size)
logging.info(
"Start to evaluating(total_samples={}, total_steps={})...".format(
eval_dataset.num_samples, total_steps))
for step, data in tqdm.tqdm(
enumerate(data_generator()), total=total_steps):
images = np.array([d[0] for d in data]).astype('float32')
labels = [d[1] for d in data]
num_samples = images.shape[0]
if num_samples < batch_size:
num_pad_samples = batch_size - num_samples
pad_images = np.tile(images[0:1], (num_pad_samples, 1, 1, 1))
images = np.concatenate([images, pad_images])
outputs = self.exe.run(
self.parallel_test_prog,
feed={'image': images},
fetch_list=list(self.test_outputs.values()))
outputs = [outputs[0][:num_samples]]
true_labels.extend(labels)
pred_scores.extend(outputs[0].tolist())
logging.debug("[EVAL] Epoch={}, Step={}/{}".format(
epoch_id, step + 1, total_steps))
pred_top1_label = np.argsort(pred_scores)[:, -1]
pred_topk_label = np.argsort(pred_scores)[:, -k:]
acc1 = sum(pred_top1_label == true_labels) / len(true_labels)
acck = sum(
[np.isin(x, y)
for x, y in zip(true_labels, pred_topk_label)]) / len(true_labels)
metrics = OrderedDict([('acc1', acc1), ('acc{}'.format(k), acck)])
if return_details:
eval_details = {
'true_labels': true_labels,
'pred_scores': pred_scores
}
return metrics, eval_details
return metrics
def predict(self, img_file, transforms=None, topk=1):
"""预测。
Args:
img_file (str): 预测图像路径。
transforms (paddlex.cls.transforms): 数据预处理操作。
topk (int): 预测时前k个最大值。
Returns:
list: 其中元素均为字典。字典的关键字为'category_id'、'category'、'score',
分别对应预测类别id、预测类别标签、预测得分。
"""
if transforms is None and not hasattr(self, 'test_transforms'):
raise Exception("transforms need to be defined, now is None.")
true_topk = min(self.num_classes, topk)
if transforms is not None:
self.arrange_transforms(transforms=transforms, mode='test')
im = transforms(img_file)
else:
self.arrange_transforms(
transforms=self.test_transforms, mode='test')
im = self.test_transforms(img_file)
result = self.exe.run(
self.test_prog,
feed={'image': im},
fetch_list=list(self.test_outputs.values()))
pred_label = np.argsort(result[0][0])[::-1][:true_topk]
res = [{
'category_id': l,
'category': self.labels[l],
'score': result[0][0][l]
} for l in pred_label]
return res
class ResNet18(BaseClassifier):
def __init__(self, num_classes=1000):
super(ResNet18, self).__init__(
model_name='ResNet18', num_classes=num_classes)
class ResNet34(BaseClassifier):
def __init__(self, num_classes=1000):
super(ResNet34, self).__init__(
model_name='ResNet34', num_classes=num_classes)
class ResNet50(BaseClassifier):
def __init__(self, num_classes=1000):
super(ResNet50, self).__init__(
model_name='ResNet50', num_classes=num_classes)
class ResNet101(BaseClassifier):
def __init__(self, num_classes=1000):
super(ResNet101, self).__init__(
model_name='ResNet101', num_classes=num_classes)
class ResNet50_vd(BaseClassifier):
def __init__(self, num_classes=1000):
super(ResNet50_vd, self).__init__(
model_name='ResNet50_vd', num_classes=num_classes)
class ResNet101_vd(BaseClassifier):
def __init__(self, num_classes=1000):
super(ResNet101_vd, self).__init__(
model_name='ResNet101_vd', num_classes=num_classes)
class DarkNet53(BaseClassifier):
def __init__(self, num_classes=1000):
super(DarkNet53, self).__init__(
model_name='DarkNet53', num_classes=num_classes)
class MobileNetV1(BaseClassifier):
def __init__(self, num_classes=1000):
super(MobileNetV1, self).__init__(
model_name='MobileNetV1', num_classes=num_classes)
class MobileNetV2(BaseClassifier):
def __init__(self, num_classes=1000):
super(MobileNetV2, self).__init__(
model_name='MobileNetV2', num_classes=num_classes)
class MobileNetV3_small(BaseClassifier):
def __init__(self, num_classes=1000):
super(MobileNetV3_small, self).__init__(
model_name='MobileNetV3_small', num_classes=num_classes)
class MobileNetV3_large(BaseClassifier):
def __init__(self, num_classes=1000):
super(MobileNetV3_large, self).__init__(
model_name='MobileNetV3_large', num_classes=num_classes)
class Xception65(BaseClassifier):
def __init__(self, num_classes=1000):
super(Xception65, self).__init__(
model_name='Xception65', num_classes=num_classes)
class Xception41(BaseClassifier):
def __init__(self, num_classes=1000):
super(Xception41, self).__init__(
model_name='Xception41', num_classes=num_classes)
class DenseNet121(BaseClassifier):
def __init__(self, num_classes=1000):
super(DenseNet121, self).__init__(
model_name='DenseNet121', num_classes=num_classes)
class DenseNet161(BaseClassifier):
def __init__(self, num_classes=1000):
super(DenseNet161, self).__init__(
model_name='DenseNet161', num_classes=num_classes)
class DenseNet201(BaseClassifier):
def __init__(self, num_classes=1000):
super(DenseNet201, self).__init__(
model_name='DenseNet201', num_classes=num_classes)
class ShuffleNetV2(BaseClassifier):
def __init__(self, num_classes=1000):
super(ShuffleNetV2, self).__init__(
model_name='ShuffleNetV2', num_classes=num_classes)
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
import os.path as osp
import numpy as np
import tqdm
import math
import cv2
import paddle.fluid as fluid
import paddlex.utils.logging as logging
import paddlex
from collections import OrderedDict
from .base import BaseAPI
from .utils.seg_eval import ConfusionMatrix
from .utils.visualize import visualize_segmentation
class DeepLabv3p(BaseAPI):
"""实现DeepLabv3+网络的构建并进行训练、评估、预测和模型导出。
Args:
num_classes (int): 类别数。
backbone (str): DeepLabv3+的backbone网络,实现特征图的计算,取值范围为['Xception65', 'Xception41',
'MobileNetV2_x0.25', 'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5',
'MobileNetV2_x2.0']。默认'MobileNetV2_x1.0'。
output_stride (int): backbone 输出特征图相对于输入的下采样倍数,一般取值为8或16。默认16。
aspp_with_sep_conv (bool): 在asspp模块是否采用separable convolutions。默认True。
decoder_use_sep_conv (bool): decoder模块是否采用separable convolutions。默认True。
encoder_with_aspp (bool): 是否在encoder阶段采用aspp模块。默认True。
enable_decoder (bool): 是否使用decoder模块。默认True。
use_bce_loss (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。默认False。
use_dice_loss (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用,
当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。默认False。
class_weight (list/str): 交叉熵损失函数各类损失的权重。当class_weight为list的时候,长度应为
num_classes。当class_weight为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重
自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None时,各类的权重1,
即平时使用的交叉熵损失函数。
ignore_index (int): label上忽略的值,label为ignore_index的像素不参与损失函数的计算。默认255。
Raises:
ValueError: use_bce_loss或use_dice_loss为真且num_calsses > 2。
ValueError: backbone取值不在['Xception65', 'Xception41', 'MobileNetV2_x0.25',
'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x2.0']之内。
ValueError: class_weight为list, 但长度不等于num_class。
class_weight为str, 但class_weight.low()不等于dynamic。
TypeError: class_weight不为None时,其类型不是list或str。
"""
def __init__(self,
num_classes=2,
backbone='MobileNetV2_x1.0',
output_stride=16,
aspp_with_sep_conv=True,
decoder_use_sep_conv=True,
encoder_with_aspp=True,
enable_decoder=True,
use_bce_loss=False,
use_dice_loss=False,
class_weight=None,
ignore_index=255):
self.init_params = locals()
super(DeepLabv3p, self).__init__('segmenter')
# dice_loss或bce_loss只适用两类分割中
if num_classes > 2 and (use_bce_loss or use_dice_loss):
raise ValueError(
"dice loss and bce loss is only applicable to binary classfication"
)
self.output_stride = output_stride
if backbone not in [
'Xception65', 'Xception41', 'MobileNetV2_x0.25',
'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5',
'MobileNetV2_x2.0'
]:
raise ValueError(
"backbone: {} is set wrong. it should be one of "
"('Xception65', 'Xception41', 'MobileNetV2_x0.25', 'MobileNetV2_x0.5',"
" 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x2.0')".
format(backbone))
if class_weight is not None:
if isinstance(class_weight, list):
if len(class_weight) != num_classes:
raise ValueError(
"Length of class_weight should be equal to number of classes"
)
elif isinstance(class_weight, str):
if class_weight.lower() != 'dynamic':
raise ValueError(
"if class_weight is string, must be dynamic!")
else:
raise TypeError(
'Expect class_weight is a list or string but receive {}'.
format(type(class_weight)))
self.backbone = backbone
self.num_classes = num_classes
self.use_bce_loss = use_bce_loss
self.use_dice_loss = use_dice_loss
self.class_weight = class_weight
self.ignore_index = ignore_index
self.aspp_with_sep_conv = aspp_with_sep_conv
self.decoder_use_sep_conv = decoder_use_sep_conv
self.encoder_with_aspp = encoder_with_aspp
self.enable_decoder = enable_decoder
self.labels = None
self.sync_bn = True
def _get_backbone(self, backbone):
def mobilenetv2(backbone):
# backbone: xception结构配置
# output_stride:下采样倍数
# end_points: mobilenetv2的block数
# decode_point: 从mobilenetv2中引出分支所在block数, 作为decoder输入
if '0.25' in backbone:
scale = 0.25
elif '0.5' in backbone:
scale = 0.5
elif '1.0' in backbone:
scale = 1.0
elif '1.5' in backbone:
scale = 1.5
elif '2.0' in backbone:
scale = 2.0
end_points = 18
decode_points = 4
return paddlex.cv.nets.MobileNetV2(
scale=scale,
output_stride=self.output_stride,
end_points=end_points,
decode_points=decode_points)
def xception(backbone):
# decode_point: 从Xception中引出分支所在block数,作为decoder输入
# end_point:Xception的block数
if '65' in backbone:
decode_points = 2
end_points = 21
layers = 65
if '41' in backbone:
decode_points = 2
end_points = 13
layers = 41
if '71' in backbone:
decode_points = 3
end_points = 23
layers = 71
return paddlex.cv.nets.Xception(
layers=layers,
output_stride=self.output_stride,
end_points=end_points,
decode_points=decode_points)
if 'Xception' in backbone:
return xception(backbone)
elif 'MobileNetV2' in backbone:
return mobilenetv2(backbone)
def build_net(self, mode='train'):
model = paddlex.cv.nets.segmentation.DeepLabv3p(
self.num_classes,
mode=mode,
backbone=self._get_backbone(self.backbone),
output_stride=self.output_stride,
aspp_with_sep_conv=self.aspp_with_sep_conv,
decoder_use_sep_conv=self.decoder_use_sep_conv,
encoder_with_aspp=self.encoder_with_aspp,
enable_decoder=self.enable_decoder,
use_bce_loss=self.use_bce_loss,
use_dice_loss=self.use_dice_loss,
class_weight=self.class_weight,
ignore_index=self.ignore_index)
inputs = model.generate_inputs()
model_out = model.build_net(inputs)
outputs = OrderedDict()
if mode == 'train':
self.optimizer.minimize(model_out)
outputs['loss'] = model_out
elif mode == 'eval':
outputs['loss'] = model_out[0]
outputs['pred'] = model_out[1]
outputs['label'] = model_out[2]
outputs['mask'] = model_out[3]
else:
outputs['pred'] = model_out[0]
outputs['logit'] = model_out[1]
return inputs, outputs
def default_optimizer(self,
learning_rate,
num_epochs,
num_steps_each_epoch,
lr_decay_power=0.9):
decay_step = num_epochs * num_steps_each_epoch
lr_decay = fluid.layers.polynomial_decay(
learning_rate,
decay_step,
end_learning_rate=0,
power=lr_decay_power)
optimizer = fluid.optimizer.Momentum(
lr_decay,
momentum=0.9,
regularization=fluid.regularizer.L2Decay(
regularization_coeff=4e-05))
return optimizer
def train(self,
num_epochs,
train_dataset,
train_batch_size=2,
eval_dataset=None,
save_interval_epochs=1,
log_interval_steps=2,
save_dir='output',
pretrain_weights='IMAGENET',
optimizer=None,
learning_rate=0.01,
lr_decay_power=0.9,
use_vdl=False,
sensitivities_file=None,
eval_metric_loss=0.05):
"""训练。
Args:
num_epochs (int): 训练迭代轮数。
train_dataset (paddlex.datasets): 训练数据读取器。
train_batch_size (int): 训练数据batch大小。同时作为验证数据batch大小。默认为2。
eval_dataset (paddlex.datasets): 评估数据读取器。
save_interval_epochs (int): 模型保存间隔(单位:迭代轮数)。默认为1。
log_interval_steps (int): 训练日志输出间隔(单位:迭代次数)。默认为2。
save_dir (str): 模型保存路径。默认'output'。
pretrain_weights (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',
则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认'IMAGENET。
optimizer (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认的优化器:使用
fluid.optimizer.Momentum优化方法,polynomial的学习率衰减策略。
learning_rate (float): 默认优化器的初始学习率。默认0.01。
lr_decay_power (float): 默认优化器学习率衰减指数。默认0.9。
use_vdl (bool): 是否使用VisualDL进行可视化。默认False。
sensitivities_file (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',
则自动下载在ImageNet图片数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
eval_metric_loss (float): 可容忍的精度损失。默认为0.05。
Raises:
ValueError: 模型从inference model进行加载。
"""
if not self.trainable:
raise ValueError(
"Model is not trainable since it was loaded from a inference model."
)
self.labels = train_dataset.labels
if optimizer is None:
num_steps_each_epoch = train_dataset.num_samples // train_batch_size
optimizer = self.default_optimizer(
learning_rate=learning_rate,
num_epochs=num_epochs,
num_steps_each_epoch=num_steps_each_epoch,
lr_decay_power=lr_decay_power)
self.optimizer = optimizer
# 构建训练、验证、预测网络
self.build_program()
# 初始化网络权重
self.net_initialize(
startup_prog=fluid.default_startup_program(),
pretrain_weights=pretrain_weights,
save_dir=save_dir,
sensitivities_file=sensitivities_file,
eval_metric_loss=eval_metric_loss)
# 训练
self.train_loop(
num_epochs=num_epochs,
train_dataset=train_dataset,
train_batch_size=train_batch_size,
eval_dataset=eval_dataset,
save_interval_epochs=save_interval_epochs,
log_interval_steps=log_interval_steps,
save_dir=save_dir,
use_vdl=use_vdl)
def evaluate(self,
eval_dataset,
batch_size=1,
epoch_id=None,
return_details=False):
"""评估。
Args:
eval_dataset (paddlex.datasets): 评估数据读取器。
batch_size (int): 评估时的batch大小。默认1。
epoch_id (int): 当前评估模型所在的训练轮数。
return_details (bool): 是否返回详细信息。默认False。
Returns:
dict: 当return_details为False时,返回dict。包含关键字:'miou'、'category_iou'、'macc'、
'category_acc'和'kappa',分别表示平均iou、各类别iou、平均准确率、各类别准确率和kappa系数。
tuple (metrics, eval_details):当return_details为True时,增加返回dict (eval_details),
包含关键字:'confusion_matrix',表示评估的混淆矩阵。
"""
self.arrange_transforms(
transforms=eval_dataset.transforms, mode='eval')
total_steps = math.ceil(eval_dataset.num_samples * 1.0 / batch_size)
conf_mat = ConfusionMatrix(self.num_classes, streaming=True)
data_generator = eval_dataset.generator(
batch_size=batch_size, drop_last=False)
if not hasattr(self, 'parallel_test_prog'):
self.parallel_test_prog = fluid.CompiledProgram(
self.test_prog).with_data_parallel(
share_vars_from=self.parallel_train_prog)
logging.info(
"Start to evaluating(total_samples={}, total_steps={})...".format(
eval_dataset.num_samples, total_steps))
for step, data in tqdm.tqdm(
enumerate(data_generator()), total=total_steps):
images = np.array([d[0] for d in data])
labels = np.array([d[1] for d in data])
num_samples = images.shape[0]
if num_samples < batch_size:
num_pad_samples = batch_size - num_samples
pad_images = np.tile(images[0:1], (num_pad_samples, 1, 1, 1))
images = np.concatenate([images, pad_images])
feed_data = {'image': images}
outputs = self.exe.run(
self.parallel_test_prog,
feed=feed_data,
fetch_list=list(self.test_outputs.values()),
return_numpy=True)
pred = outputs[0]
if num_samples < batch_size:
pred = pred[0:num_samples]
mask = labels != self.ignore_index
conf_mat.calculate(pred=pred, label=labels, ignore=mask)
_, iou = conf_mat.mean_iou()
logging.debug("[EVAL] Epoch={}, Step={}/{}, iou={}".format(
epoch_id, step + 1, total_steps, iou))
category_iou, miou = conf_mat.mean_iou()
category_acc, macc = conf_mat.accuracy()
metrics = OrderedDict(
zip(['miou', 'category_iou', 'macc', 'category_acc', 'kappa'],
[miou, category_iou, macc, category_acc,
conf_mat.kappa()]))
if return_details:
eval_details = {
'confusion_matrix': conf_mat.confusion_matrix.tolist()
}
return metrics, eval_details
return metrics
def predict(self, im_file, transforms=None):
"""预测。
Args:
img_file(str): 预测图像路径。
transforms(paddlex.cv.transforms): 数据预处理操作。
Returns:
dict: 包含关键字'label_map'和'score_map', 'label_map'存储预测结果灰度图,
像素值表示对应的类别,'score_map'存储各类别的概率,shape=(h, w, num_classes)
"""
if transforms is None and not hasattr(self, 'test_transforms'):
raise Exception("transforms need to be defined, now is None.")
if transforms is not None:
self.arrange_transforms(transforms=transforms, mode='test')
im, im_info = transforms(im_file)
else:
self.arrange_transforms(
transforms=self.test_transforms, mode='test')
im, im_info = self.test_transforms(im_file)
im = np.expand_dims(im, axis=0)
result = self.exe.run(
self.test_prog,
feed={'image': im},
fetch_list=list(self.test_outputs.values()))
pred = result[0]
pred = np.squeeze(pred).astype('uint8')
keys = list(im_info.keys())
for k in keys[::-1]:
if k == 'shape_before_resize':
h, w = im_info[k][0], im_info[k][1]
pred = cv2.resize(pred, (w, h), cv2.INTER_NEAREST)
elif k == 'shape_before_padding':
h, w = im_info[k][0], im_info[k][1]
pred = pred[0:h, 0:w]
return {'label_map': pred, 'score_map': result[1]}
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
import math
import tqdm
import numpy as np
import paddle.fluid as fluid
import paddlex.utils.logging as logging
import paddlex
import os.path as osp
import copy
from .base import BaseAPI
from collections import OrderedDict
from .utils.detection_eval import eval_results, bbox2out
class FasterRCNN(BaseAPI):
"""构建FasterRCNN,并实现其训练、评估、预测和模型导出。
Args:
num_classes (int): 包含了背景类的类别数。默认为81。
backbone (str): FasterRCNN的backbone网络,取值范围为['ResNet18', 'ResNet50',
'ResNet50vd', 'ResNet101', 'ResNet101vd']。默认为'ResNet50'。
with_fpn (bool): 是否使用FPN结构。默认为True。
aspect_ratios (list): 生成anchor高宽比的可选值。默认为[0.5, 1.0, 2.0]。
anchor_sizes (list): 生成anchor大小的可选值。默认为[32, 64, 128, 256, 512]。
"""
def __init__(self,
num_classes=81,
backbone='ResNet50',
with_fpn=True,
aspect_ratios=[0.5, 1.0, 2.0],
anchor_sizes=[32, 64, 128, 256, 512]):
self.init_params = locals()
super(FasterRCNN, self).__init__('detector')
backbones = [
'ResNet18', 'ResNet50', 'ResNet50vd', 'ResNet101', 'ResNet101vd'
]
assert backbone in backbones, "backbone should be one of {}".format(
backbones)
self.backbone = backbone
self.num_classes = num_classes
self.with_fpn = with_fpn
self.aspect_ratios = aspect_ratios
self.anchor_sizes = anchor_sizes
self.labels = None
def _get_backbone(self, backbone_name):
norm_type = None
if backbone_name == 'ResNet18':
layers = 18
variant = 'b'
elif backbone_name == 'ResNet50':
layers = 50
variant = 'b'
elif backbone_name == 'ResNet50vd':
layers = 50
variant = 'd'
norm_type = 'affine_channel'
elif backbone_name == 'ResNet101':
layers = 101
variant = 'b'
norm_type = 'affine_channel'
elif backbone_name == 'ResNet101vd':
layers = 101
variant = 'd'
norm_type = 'affine_channel'
if self.with_fpn:
backbone = paddlex.cv.nets.resnet.ResNet(
norm_type='bn' if norm_type is None else norm_type,
layers=layers,
variant=variant,
freeze_norm=True,
norm_decay=0.,
feature_maps=[2, 3, 4, 5],
freeze_at=2)
else:
backbone = paddlex.cv.nets.resnet.ResNet(
norm_type='affine_channel' if norm_type is None else norm_type,
layers=layers,
variant=variant,
freeze_norm=True,
norm_decay=0.,
feature_maps=4,
freeze_at=2)
return backbone
def build_net(self, mode='train'):
train_pre_nms_top_n = 2000 if self.with_fpn else 12000
test_pre_nms_top_n = 1000 if self.with_fpn else 6000
model = paddlex.cv.nets.detection.FasterRCNN(
backbone=self._get_backbone(self.backbone),
mode=mode,
num_classes=self.num_classes,
with_fpn=self.with_fpn,
aspect_ratios=self.aspect_ratios,
anchor_sizes=self.anchor_sizes,
train_pre_nms_top_n=train_pre_nms_top_n,
test_pre_nms_top_n=test_pre_nms_top_n)
inputs = model.generate_inputs()
if mode == 'train':
model_out = model.build_net(inputs)
loss = model_out['loss']
self.optimizer.minimize(loss)
outputs = OrderedDict([('loss', model_out['loss']),
('loss_cls', model_out['loss_cls']),
('loss_bbox', model_out['loss_bbox']),
('loss_rpn_cls', model_out['loss_rpn_cls']),
('loss_rpn_bbox',
model_out['loss_rpn_bbox'])])
else:
outputs = model.build_net(inputs)
return inputs, outputs
def default_optimizer(self, learning_rate, warmup_steps, warmup_start_lr,
lr_decay_epochs, lr_decay_gamma,
num_steps_each_epoch):
if warmup_steps > lr_decay_epochs[0] * num_steps_each_epoch:
raise Exception("warmup_steps should less than {}".format(
lr_decay_epochs[0] * num_steps_each_epoch))
boundaries = [b * num_steps_each_epoch for b in lr_decay_epochs]
values = [(lr_decay_gamma**i) * learning_rate
for i in range(len(lr_decay_epochs) + 1)]
lr_decay = fluid.layers.piecewise_decay(
boundaries=boundaries, values=values)
lr_warmup = fluid.layers.linear_lr_warmup(
learning_rate=lr_decay,
warmup_steps=warmup_steps,
start_lr=warmup_start_lr,
end_lr=learning_rate)
optimizer = fluid.optimizer.Momentum(
learning_rate=lr_warmup,
momentum=0.9,
regularization=fluid.regularizer.L2Decay(1e-04))
return optimizer
def train(self,
num_epochs,
train_dataset,
train_batch_size=2,
eval_dataset=None,
save_interval_epochs=1,
log_interval_steps=2,
save_dir='output',
pretrain_weights='IMAGENET',
optimizer=None,
learning_rate=0.0025,
warmup_steps=500,
warmup_start_lr=1.0 / 1200,
lr_decay_epochs=[8, 11],
lr_decay_gamma=0.1,
metric=None,
use_vdl=False):
"""训练。
Args:
num_epochs (int): 训练迭代轮数。
train_dataset (paddlex.datasets): 训练数据读取器。
train_batch_size (int): 训练数据batch大小。目前检测仅支持单卡评估,训练数据batch大小与
显卡数量之商为验证数据batch大小。默认为2。
eval_dataset (paddlex.datasets): 验证数据读取器。
save_interval_epochs (int): 模型保存间隔(单位:迭代轮数)。默认为1。
log_interval_steps (int): 训练日志输出间隔(单位:迭代次数)。默认为20。
save_dir (str): 模型保存路径。默认值为'output'。
pretrain_weights (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',
则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认为None。
optimizer (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:
fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
learning_rate (float): 默认优化器的初始学习率。默认为0.0025。
warmup_steps (int): 默认优化器进行warmup过程的步数。默认为500。
warmup_start_lr (int): 默认优化器warmup的起始学习率。默认为1.0/1200。
lr_decay_epochs (list): 默认优化器的学习率衰减轮数。默认为[8, 11]。
lr_decay_gamma (float): 默认优化器的学习率衰减率。默认为0.1。
metric (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认值为None。
use_vdl (bool): 是否使用VisualDL进行可视化。默认值为False。
Raises:
ValueError: 评估类型不在指定列表中。
ValueError: 模型从inference model进行加载。
"""
if metric is None:
if isinstance(train_dataset, paddlex.datasets.CocoDetection):
metric = 'COCO'
elif isinstance(train_dataset, paddlex.datasets.VOCDetection):
metric = 'VOC'
else:
raise ValueError(
"train_dataset should be datasets.VOCDetection or datasets.COCODetection."
)
assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
self.metric = metric
if not self.trainable:
raise ValueError(
"Model is not trainable since it was loaded from a inference model."
)
self.labels = copy.deepcopy(train_dataset.labels)
self.labels.insert(0, 'background')
# 构建训练网络
if optimizer is None:
# 构建默认的优化策略
num_steps_each_epoch = train_dataset.num_samples // train_batch_size
optimizer = self.default_optimizer(
learning_rate, warmup_steps, warmup_start_lr, lr_decay_epochs,
lr_decay_gamma, num_steps_each_epoch)
self.optimizer = optimizer
# 构建训练、验证、测试网络
self.build_program()
fuse_bn = True
if self.with_fpn and self.backbone in ['ResNet18', 'ResNet50']:
fuse_bn = False
self.net_initialize(
startup_prog=fluid.default_startup_program(),
pretrain_weights=pretrain_weights,
fuse_bn=fuse_bn,
save_dir=save_dir)
# 训练
self.train_loop(
num_epochs=num_epochs,
train_dataset=train_dataset,
train_batch_size=train_batch_size,
eval_dataset=eval_dataset,
save_interval_epochs=save_interval_epochs,
log_interval_steps=log_interval_steps,
save_dir=save_dir,
use_vdl=use_vdl)
def evaluate(self,
eval_dataset,
batch_size=1,
epoch_id=None,
metric=None,
return_details=False):
"""评估。
Args:
eval_dataset (paddlex.datasets): 验证数据读取器。
batch_size (int): 验证数据批大小。默认为1。
epoch_id (int): 当前评估模型所在的训练轮数。
metric (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认为None,
根据用户传入的Dataset自动选择,如为VOCDetection,则metric为'VOC';
如为COCODetection,则metric为'COCO'。
return_details (bool): 是否返回详细信息。默认值为False。
Returns:
tuple (metrics, eval_details) /dict (metrics): 当return_details为True时,返回(metrics, eval_details),
当return_details为False时,返回metrics。metrics为dict,包含关键字:'bbox_mmap'或者’bbox_map‘,
分别表示平均准确率平均值在各个阈值下的结果取平均值的结果(mmAP)、平均准确率平均值(mAP)。
eval_details为dict,包含关键字:'bbox',对应元素预测结果列表,每个预测结果由图像id、
预测框类别id、预测框坐标、预测框得分;’gt‘:真实标注框相关信息。
"""
self.arrange_transforms(
transforms=eval_dataset.transforms, mode='eval')
if metric is None:
if hasattr(self, 'metric') and self.metric is not None:
metric = self.metric
else:
if isinstance(eval_dataset, paddlex.datasets.CocoDetection):
metric = 'COCO'
elif isinstance(eval_dataset, paddlex.datasets.VOCDetection):
metric = 'VOC'
else:
raise Exception(
"eval_dataset should be datasets.VOCDetection or datasets.COCODetection."
)
assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
dataset = eval_dataset.generator(
batch_size=batch_size, drop_last=False)
total_steps = math.ceil(eval_dataset.num_samples * 1.0 / batch_size)
results = list()
logging.info(
"Start to evaluating(total_samples={}, total_steps={})...".format(
eval_dataset.num_samples, total_steps))
for step, data in tqdm.tqdm(enumerate(dataset()), total=total_steps):
images = np.array([d[0] for d in data]).astype('float32')
im_infos = np.array([d[1] for d in data]).astype('float32')
im_shapes = np.array([d[3] for d in data]).astype('float32')
feed_data = {
'image': images,
'im_info': im_infos,
'im_shape': im_shapes,
}
outputs = self.exe.run(
self.test_prog,
feed=[feed_data],
fetch_list=list(self.test_outputs.values()),
return_numpy=False)
res = {
'bbox': (np.array(outputs[0]),
outputs[0].recursive_sequence_lengths())
}
res_im_id = [d[2] for d in data]
res['im_info'] = (im_infos, [])
res['im_shape'] = (im_shapes, [])
res['im_id'] = (np.array(res_im_id), [])
if metric == 'VOC':
res_gt_box = []
res_gt_label = []
res_is_difficult = []
for d in data:
res_gt_box.extend(d[4])
res_gt_label.extend(d[5])
res_is_difficult.extend(d[6])
res_gt_box_lod = [d[4].shape[0] for d in data]
res_gt_label_lod = [d[5].shape[0] for d in data]
res_is_difficult_lod = [d[6].shape[0] for d in data]
res['gt_box'] = (np.array(res_gt_box), [res_gt_box_lod])
res['gt_label'] = (np.array(res_gt_label), [res_gt_label_lod])
res['is_difficult'] = (np.array(res_is_difficult),
[res_is_difficult_lod])
results.append(res)
logging.debug("[EVAL] Epoch={}, Step={}/{}".format(
epoch_id, step + 1, total_steps))
box_ap_stats, eval_details = eval_results(
results, metric, eval_dataset.coco_gt, with_background=True)
metrics = OrderedDict(
zip(['bbox_mmap' if metric == 'COCO' else 'bbox_map'],
box_ap_stats))
if return_details:
return metrics, eval_details
return metrics
def predict(self, img_file, transforms=None):
"""预测。
Args:
img_file (str): 预测图像路径。
transforms (paddlex.det.transforms): 数据预处理操作。
Returns:
list: 预测结果列表,每个预测结果由预测框类别标签、
预测框类别名称、预测框坐标、预测框得分组成。
"""
if transforms is None and not hasattr(self, 'test_transforms'):
raise Exception("transforms need to be defined, now is None.")
if transforms is not None:
self.arrange_transforms(transforms=transforms, mode='test')
im, im_resize_info, im_shape = transforms(img_file)
else:
self.arrange_transforms(
transforms=self.test_transforms, mode='test')
im, im_resize_info, im_shape = self.test_transforms(img_file)
im = np.expand_dims(im, axis=0)
im_resize_info = np.expand_dims(im_resize_info, axis=0)
im_shape = np.expand_dims(im_shape, axis=0)
outputs = self.exe.run(
self.test_prog,
feed={
'image': im,
'im_info': im_resize_info,
'im_shape': im_shape
},
fetch_list=list(self.test_outputs.values()),
return_numpy=False)
res = {
k: (np.array(v), v.recursive_sequence_lengths())
for k, v in zip(list(self.test_outputs.keys()), outputs)
}
res['im_id'] = (np.array([[0]]).astype('int32'), [])
clsid2catid = dict({i: i for i in range(self.num_classes)})
xywh_results = bbox2out([res], clsid2catid)
results = list()
for xywh_res in xywh_results:
del xywh_res['image_id']
xywh_res['category'] = self.labels[xywh_res['category_id']]
results.append(xywh_res)
return results
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import yaml
import os.path as osp
import six
import copy
from collections import OrderedDict
import paddle.fluid as fluid
from paddle.fluid.framework import Parameter
import paddlex
import paddlex.utils.logging as logging
def load_model(model_dir):
if not osp.exists(osp.join(model_dir, "model.yml")):
raise Exception("There's not model.yml in {}".format(model_dir))
with open(osp.join(model_dir, "model.yml")) as f:
info = yaml.load(f.read(), Loader=yaml.Loader)
status = info['status']
if not hasattr(paddlex.cv.models, info['Model']):
raise Exception("There's no attribute {} in paddlex.cv.models".format(
info['Model']))
if info['_Attributes']['model_type'] == 'classifier':
model = paddlex.cv.models.BaseClassifier(**info['_init_params'])
else:
model = getattr(paddlex.cv.models,
info['Model'])(**info['_init_params'])
if status == "Normal" or \
status == "Prune":
startup_prog = fluid.Program()
model.test_prog = fluid.Program()
with fluid.program_guard(model.test_prog, startup_prog):
with fluid.unique_name.guard():
model.test_inputs, model.test_outputs = model.build_net(
mode='test')
model.test_prog = model.test_prog.clone(for_test=True)
model.exe.run(startup_prog)
if status == "Prune":
from .slim.prune import update_program
model.test_prog = update_program(model.test_prog, model_dir,
model.places[0])
import pickle
with open(osp.join(model_dir, 'model.pdparams'), 'rb') as f:
load_dict = pickle.load(f)
fluid.io.set_program_state(model.test_prog, load_dict)
elif status == "Infer" or \
status == "Quant":
[prog, input_names, outputs] = fluid.io.load_inference_model(
model_dir, model.exe, params_filename='__params__')
model.test_prog = prog
test_outputs_info = info['_ModelInputsOutputs']['test_outputs']
model.test_inputs = OrderedDict()
model.test_outputs = OrderedDict()
for name in input_names:
model.test_inputs[name] = model.test_prog.global_block().var(name)
for i, out in enumerate(outputs):
var_desc = test_outputs_info[i]
model.test_outputs[var_desc[0]] = out
if 'Transforms' in info:
transforms_mode = info.get('TransformsMode', 'RGB')
if transforms_mode == 'RGB':
to_rgb = True
else:
to_rgb = False
model.test_transforms = build_transforms(model.model_type,
info['Transforms'], to_rgb)
model.eval_transforms = copy.deepcopy(model.test_transforms)
if '_Attributes' in info:
for k, v in info['_Attributes'].items():
if k in model.__dict__:
model.__dict__[k] = v
logging.info("Model[{}] loaded.".format(info['Model']))
return model
def build_transforms(model_type, transforms_info, to_rgb=True):
if model_type == "classifier":
import paddlex.cv.transforms.cls_transforms as T
elif model_type == "detector":
import paddlex.cv.transforms.det_transforms as T
elif model_type == "segmenter":
import paddlex.cv.transforms.seg_transforms as T
transforms = list()
for op_info in transforms_info:
op_name = list(op_info.keys())[0]
op_attr = op_info[op_name]
if not hasattr(T, op_name):
raise Exception(
"There's no operator named '{}' in transforms of {}".format(
op_name, model_type))
transforms.append(getattr(T, op_name)(**op_attr))
eval_transforms = T.Compose(transforms)
eval_transforms.to_rgb = to_rgb
return eval_transforms
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
import math
import tqdm
import numpy as np
import paddle.fluid as fluid
import paddlex.utils.logging as logging
import paddlex
import copy
import os.path as osp
from collections import OrderedDict
from .faster_rcnn import FasterRCNN
from .utils.detection_eval import eval_results, bbox2out, mask2out
class MaskRCNN(FasterRCNN):
"""构建MaskRCNN,并实现其训练、评估、预测和模型导出。
Args:
num_classes (int): 包含了背景类的类别数。默认为81。
backbone (str): MaskRCNN的backbone网络,取值范围为['ResNet18', 'ResNet50',
'ResNet50vd', 'ResNet101', 'ResNet101vd']。默认为'ResNet50'。
with_fpn (bool): 是否使用FPN结构。默认为True。
aspect_ratios (list): 生成anchor高宽比的可选值。默认为[0.5, 1.0, 2.0]。
anchor_sizes (list): 生成anchor大小的可选值。默认为[32, 64, 128, 256, 512]。
"""
def __init__(self,
num_classes=81,
backbone='ResNet50',
with_fpn=True,
aspect_ratios=[0.5, 1.0, 2.0],
anchor_sizes=[32, 64, 128, 256, 512]):
self.init_params = locals()
backbones = [
'ResNet18', 'ResNet50', 'ResNet50vd', 'ResNet101', 'ResNet101vd'
]
assert backbone in backbones, "backbone should be one of {}".format(
backbones)
super(FasterRCNN, self).__init__('detector')
self.backbone = backbone
self.num_classes = num_classes
self.with_fpn = with_fpn
self.anchor_sizes = anchor_sizes
self.labels = None
if with_fpn:
self.mask_head_resolution = 28
else:
self.mask_head_resolution = 14
def build_net(self, mode='train'):
train_pre_nms_top_n = 2000 if self.with_fpn else 12000
test_pre_nms_top_n = 1000 if self.with_fpn else 6000
num_convs = 4 if self.with_fpn else 0
model = paddlex.cv.nets.detection.MaskRCNN(
backbone=self._get_backbone(self.backbone),
num_classes=self.num_classes,
mode=mode,
with_fpn=self.with_fpn,
train_pre_nms_top_n=train_pre_nms_top_n,
test_pre_nms_top_n=test_pre_nms_top_n,
num_convs=num_convs,
mask_head_resolution=self.mask_head_resolution)
inputs = model.generate_inputs()
if mode == 'train':
model_out = model.build_net(inputs)
loss = model_out['loss']
self.optimizer.minimize(loss)
outputs = OrderedDict([('loss', model_out['loss']),
('loss_cls', model_out['loss_cls']),
('loss_bbox', model_out['loss_bbox']),
('loss_mask', model_out['loss_mask']),
('loss_rpn_cls', model_out['loss_rpn_cls']),
('loss_rpn_bbox',
model_out['loss_rpn_bbox'])])
else:
outputs = model.build_net(inputs)
return inputs, outputs
def default_optimizer(self, learning_rate, warmup_steps, warmup_start_lr,
lr_decay_epochs, lr_decay_gamma,
num_steps_each_epoch):
if warmup_steps > lr_decay_epochs[0] * num_steps_each_epoch:
raise Exception("warmup_step should less than {}".format(
lr_decay_epochs[0] * num_steps_each_epoch))
boundaries = [b * num_steps_each_epoch for b in lr_decay_epochs]
values = [(lr_decay_gamma**i) * learning_rate
for i in range(len(lr_decay_epochs) + 1)]
lr_decay = fluid.layers.piecewise_decay(
boundaries=boundaries, values=values)
lr_warmup = fluid.layers.linear_lr_warmup(
learning_rate=lr_decay,
warmup_steps=warmup_steps,
start_lr=warmup_start_lr,
end_lr=learning_rate)
optimizer = fluid.optimizer.Momentum(
learning_rate=lr_warmup,
momentum=0.9,
regularization=fluid.regularizer.L2Decay(1e-04))
return optimizer
def train(self,
num_epochs,
train_dataset,
train_batch_size=1,
eval_dataset=None,
save_interval_epochs=1,
log_interval_steps=2,
save_dir='output',
pretrain_weights='IMAGENET',
optimizer=None,
learning_rate=1.0 / 800,
warmup_steps=500,
warmup_start_lr=1.0 / 2400,
lr_decay_epochs=[8, 11],
lr_decay_gamma=0.1,
metric=None,
use_vdl=False):
"""训练。
Args:
num_epochs (int): 训练迭代轮数。
train_dataset (paddlex.datasets): 训练数据读取器。
train_batch_size (int): 训练或验证数据batch大小。目前检测仅支持单卡评估,训练数据batch大小与
显卡数量之商为验证数据batch大小。默认值为1。
eval_dataset (paddlex.datasets): 验证数据读取器。
save_interval_epochs (int): 模型保存间隔(单位:迭代轮数)。默认为1。
log_interval_steps (int): 训练日志输出间隔(单位:迭代次数)。默认为20。
save_dir (str): 模型保存路径。默认值为'output'。
pretrain_weights (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',
则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认为None。
optimizer (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:
fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
learning_rate (float): 默认优化器的学习率。默认为1.0/800。
warmup_steps (int): 默认优化器进行warmup过程的步数。默认为500。
warmup_start_lr (int): 默认优化器warmup的起始学习率。默认为1.0/2400。
lr_decay_epochs (list): 默认优化器的学习率衰减轮数。默认为[8, 11]。
lr_decay_gamma (float): 默认优化器的学习率衰减率。默认为0.1。
metric (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。
use_vdl (bool): 是否使用VisualDL进行可视化。默认值为False。
Raises:
ValueError: 评估类型不在指定列表中。
ValueError: 模型从inference model进行加载。
"""
if metric is None:
if isinstance(train_dataset, paddlex.datasets.CocoDetection):
metric = 'COCO'
else:
raise Exception(
"train_dataset should be datasets.COCODetection.")
assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
self.metric = metric
if not self.trainable:
raise Exception(
"Model is not trainable since it was loaded from a inference model."
)
self.labels = copy.deepcopy(train_dataset.labels)
self.labels.insert(0, 'background')
# 构建训练网络
if optimizer is None:
# 构建默认的优化策略
num_steps_each_epoch = train_dataset.num_samples // train_batch_size
optimizer = self.default_optimizer(
learning_rate=learning_rate,
warmup_steps=warmup_steps,
warmup_start_lr=warmup_start_lr,
lr_decay_epochs=lr_decay_epochs,
lr_decay_gamma=lr_decay_gamma,
num_steps_each_epoch=num_steps_each_epoch)
self.optimizer = optimizer
# 构建训练、验证、测试网络
self.build_program()
fuse_bn = True
if self.with_fpn and self.backbone in ['ResNet18', 'ResNet50']:
fuse_bn = False
self.net_initialize(
startup_prog=fluid.default_startup_program(),
pretrain_weights=pretrain_weights,
fuse_bn=fuse_bn,
save_dir=save_dir)
# 训练
self.train_loop(
num_epochs=num_epochs,
train_dataset=train_dataset,
train_batch_size=train_batch_size,
eval_dataset=eval_dataset,
save_interval_epochs=save_interval_epochs,
log_interval_steps=log_interval_steps,
save_dir=save_dir,
use_vdl=use_vdl)
def evaluate(self,
eval_dataset,
batch_size=1,
epoch_id=None,
metric=None,
return_details=False):
"""评估。
Args:
eval_dataset (paddlex.datasets): 验证数据读取器。
batch_size (int): 验证数据批大小。默认为1。
epoch_id (int): 当前评估模型所在的训练轮数。
metric (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认为None,
根据用户传入的Dataset自动选择,如为VOCDetection,则metric为'VOC';
如为COCODetection,则metric为'COCO'。
return_details (bool): 是否返回详细信息。默认值为False。
Returns:
tuple (metrics, eval_details) /dict (metrics): 当return_details为True时,返回(metrics, eval_details),
当return_details为False时,返回metrics。metrics为dict,包含关键字:'bbox_mmap'和'segm_mmap'
或者’bbox_map‘和'segm_map',分别表示预测框和分割区域平均准确率平均值在
各个IoU阈值下的结果取平均值的结果(mmAP)、平均准确率平均值(mAP)。eval_details为dict,
包含关键字:'bbox',对应元素预测框结果列表,每个预测结果由图像id、预测框类别id、
预测框坐标、预测框得分;'mask',对应元素预测区域结果列表,每个预测结果由图像id、
预测区域类别id、预测区域坐标、预测区域得分;’gt‘:真实标注框和标注区域相关信息。
"""
self.arrange_transforms(
transforms=eval_dataset.transforms, mode='eval')
if metric is None:
if hasattr(self, 'metric') and self.metric is not None:
metric = self.metric
else:
if isinstance(eval_dataset, paddlex.datasets.CocoDetection):
metric = 'COCO'
else:
raise Exception(
"eval_dataset should be datasets.COCODetection.")
assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
data_generator = eval_dataset.generator(
batch_size=batch_size, drop_last=False)
total_steps = math.ceil(eval_dataset.num_samples * 1.0 / batch_size)
results = list()
logging.info(
"Start to evaluating(total_samples={}, total_steps={})...".format(
eval_dataset.num_samples, total_steps))
for step, data in tqdm.tqdm(
enumerate(data_generator()), total=total_steps):
images = np.array([d[0] for d in data]).astype('float32')
im_infos = np.array([d[1] for d in data]).astype('float32')
im_shapes = np.array([d[3] for d in data]).astype('float32')
feed_data = {
'image': images,
'im_info': im_infos,
'im_shape': im_shapes,
}
outputs = self.exe.run(
self.test_prog,
feed=[feed_data],
fetch_list=list(self.test_outputs.values()),
return_numpy=False)
res = {
'bbox': (np.array(outputs[0]),
outputs[0].recursive_sequence_lengths()),
'mask': (np.array(outputs[1]),
outputs[1].recursive_sequence_lengths())
}
res_im_id = [d[2] for d in data]
res['im_info'] = (im_infos, [])
res['im_shape'] = (im_shapes, [])
res['im_id'] = (np.array(res_im_id), [])
results.append(res)
logging.debug("[EVAL] Epoch={}, Step={}/{}".format(
epoch_id, step + 1, total_steps))
ap_stats, eval_details = eval_results(
results,
'COCO',
eval_dataset.coco_gt,
with_background=True,
resolution=self.mask_head_resolution)
if metric == 'VOC':
if isinstance(ap_stats[0], np.ndarray) and isinstance(
ap_stats[1], np.ndarray):
metrics = OrderedDict(
zip(['bbox_map', 'segm_map'],
[ap_stats[0][1], ap_stats[1][1]]))
else:
metrics = OrderedDict(
zip(['bbox_map', 'segm_map'], [0.0, 0.0]))
elif metric == 'COCO':
if isinstance(ap_stats[0], np.ndarray) and isinstance(
ap_stats[1], np.ndarray):
metrics = OrderedDict(
zip(['bbox_mmap', 'segm_mmap'],
[ap_stats[0][0], ap_stats[1][0]]))
else:
metrics = OrderedDict(
zip(['bbox_mmap', 'segm_mmap'], [0.0, 0.0]))
if return_details:
return metrics, eval_details
return metrics
def predict(self, img_file, transforms=None):
"""预测。
Args:
img_file (str): 预测图像路径。
transforms (paddlex.det.transforms): 数据预处理操作。
Returns:
dict: 预测结果列表,每个预测结果由预测框类别标签、预测框类别名称、预测框坐标、预测框内的二值图、
预测框得分组成。
"""
if transforms is None and not hasattr(self, 'test_transforms'):
raise Exception("transforms need to be defined, now is None.")
if transforms is not None:
self.arrange_transforms(transforms=transforms, mode='test')
im, im_resize_info, im_shape = transforms(img_file)
else:
self.arrange_transforms(
transforms=self.test_transforms, mode='test')
im, im_resize_info, im_shape = self.test_transforms(img_file)
im = np.expand_dims(im, axis=0)
im_resize_info = np.expand_dims(im_resize_info, axis=0)
im_shape = np.expand_dims(im_shape, axis=0)
outputs = self.exe.run(
self.test_prog,
feed={
'image': im,
'im_info': im_resize_info,
'im_shape': im_shape
},
fetch_list=list(self.test_outputs.values()),
return_numpy=False)
res = {
k: (np.array(v), v.recursive_sequence_lengths())
for k, v in zip(list(self.test_outputs.keys()), outputs)
}
res['im_id'] = (np.array([[0]]).astype('int32'), [])
res['im_shape'] = (np.array(im_shape), [])
clsid2catid = dict({i: i for i in range(self.num_classes)})
xywh_results = bbox2out([res], clsid2catid)
segm_results = mask2out([res], clsid2catid, self.mask_head_resolution)
results = list()
import pycocotools.mask as mask_util
for index, xywh_res in enumerate(xywh_results):
del xywh_res['image_id']
xywh_res['mask'] = mask_util.decode(
segm_results[index]['segmentation'])
xywh_res['category'] = self.labels[xywh_res['category_id']]
results.append(xywh_res)
return results
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle.fluid.contrib.slim.quantization.quantization_pass import QuantizationTransformPass
from paddle.fluid.contrib.slim.quantization.quantization_pass import AddQuantDequantPass
from paddle.fluid.contrib.slim.quantization.quantization_pass import _op_real_in_out_name
from paddle.fluid.contrib.slim.quantization import PostTrainingQuantization
import paddlex.utils.logging as logging
import paddle.fluid as fluid
import os
class PaddleXPostTrainingQuantization(PostTrainingQuantization):
def __init__(self,
executor,
dataset,
program,
inputs,
outputs,
batch_size=10,
batch_nums=None,
scope=None,
algo="KL",
quantizable_op_type=["conv2d", "depthwise_conv2d", "mul"],
is_full_quantize=False,
is_use_cache_file=False,
cache_dir="./temp_post_training"):
'''
The class utilizes post training quantization methon to quantize the
fp32 model. It uses calibrate data to calculate the scale factor of
quantized variables, and inserts fake quant/dequant op to obtain the
quantized model.
Args:
executor(fluid.Executor): The executor to load, run and save the
quantized model.
dataset(Python Iterator): The data Reader.
program(fluid.Program): The paddle program, save the parameters for model.
inputs(dict): The input of prigram.
outputs(dict): The output of program.
batch_size(int, optional): The batch size of DataLoader. Default is 10.
batch_nums(int, optional): If batch_nums is not None, the number of
calibrate data is batch_size*batch_nums. If batch_nums is None, use
all data provided by sample_generator as calibrate data.
scope(fluid.Scope, optional): The scope of the program, use it to load
and save variables. If scope=None, get scope by global_scope().
algo(str, optional): If algo=KL, use KL-divergenc method to
get the more precise scale factor. If algo='direct', use
abs_max methon to get the scale factor. Default is KL.
quantizable_op_type(list[str], optional): List the type of ops
that will be quantized. Default is ["conv2d", "depthwise_conv2d",
"mul"].
is_full_quantized(bool, optional): If set is_full_quantized as True,
apply quantization to all supported quantizable op type. If set
is_full_quantized as False, only apply quantization to the op type
according to the input quantizable_op_type.
is_use_cache_file(bool, optional): If set is_use_cache_file as False,
all temp data will be saved in memory. If set is_use_cache_file as True,
it will save temp data to disk. When the fp32 model is complex or
the number of calibrate data is large, we should set is_use_cache_file
as True. Defalut is False.
cache_dir(str, optional): When is_use_cache_file is True, set cache_dir as
the directory for saving temp data. Default is ./temp_post_training.
Returns:
None
'''
self._executor = executor
self._dataset = dataset
self._batch_size = batch_size
self._batch_nums = batch_nums
self._scope = fluid.global_scope() if scope == None else scope
self._algo = algo
self._is_use_cache_file = is_use_cache_file
self._cache_dir = cache_dir
if self._is_use_cache_file and not os.path.exists(self._cache_dir):
os.mkdir(self._cache_dir)
supported_quantizable_op_type = \
QuantizationTransformPass._supported_quantizable_op_type + \
AddQuantDequantPass._supported_quantizable_op_type
if is_full_quantize:
self._quantizable_op_type = supported_quantizable_op_type
else:
self._quantizable_op_type = quantizable_op_type
for op_type in self._quantizable_op_type:
assert op_type in supported_quantizable_op_type + \
AddQuantDequantPass._activation_type, \
op_type + " is not supported for quantization."
self._place = self._executor.place
self._program = program
self._feed_list = list(inputs.values())
self._fetch_list = list(outputs.values())
self._data_loader = None
self._op_real_in_out_name = _op_real_in_out_name
self._bit_length = 8
self._quantized_weight_var_name = set()
self._quantized_act_var_name = set()
self._sampling_data = {}
self._quantized_var_scale_factor = {}
def quantize(self):
'''
Quantize the fp32 model. Use calibrate data to calculate the scale factor of
quantized variables, and inserts fake quant/dequant op to obtain the
quantized model.
Args:
None
Returns:
the program of quantized model.
'''
self._preprocess()
batch_id = 0
for data in self._data_loader():
self._executor.run(
program=self._program,
feed=data,
fetch_list=self._fetch_list,
return_numpy=False)
self._sample_data(batch_id)
if batch_id % 5 == 0:
logging.info("run batch: {}".format(batch_id))
batch_id += 1
if self._batch_nums and batch_id >= self._batch_nums:
break
logging.info("all run batch: ".format(batch_id))
logging.info("calculate scale factor ...")
self._calculate_scale_factor()
logging.info("update the program ...")
self._update_program()
self._save_output_scale()
return self._program
def save_quantized_model(self, save_model_path):
'''
Save the quantized model to the disk.
Args:
save_model_path(str): The path to save the quantized model
Returns:
None
'''
feed_vars_names = [var.name for var in self._feed_list]
fluid.io.save_inference_model(
dirname=save_model_path,
feeded_var_names=feed_vars_names,
target_vars=self._fetch_list,
executor=self._executor,
params_filename='__params__',
main_program=self._program)
def _preprocess(self):
'''
Load model and set data loader, collect the variable names for sampling,
and set activation variables to be persistable.
'''
feed_vars = [fluid.framework._get_var(var.name, self._program) \
for var in self._feed_list]
self._data_loader = fluid.io.DataLoader.from_generator(
feed_list=feed_vars, capacity=3 * self._batch_size, iterable=True)
self._data_loader.set_sample_list_generator(
self._dataset.generator(self._batch_size, drop_last=True),
places=self._place)
# collect the variable names for sampling
persistable_var_names = []
for var in self._program.list_vars():
if var.persistable:
persistable_var_names.append(var.name)
for op in self._program.global_block().ops:
op_type = op.type
if op_type in self._quantizable_op_type:
if op_type in ("conv2d", "depthwise_conv2d"):
self._quantized_act_var_name.add(op.input("Input")[0])
self._quantized_weight_var_name.add(op.input("Filter")[0])
self._quantized_act_var_name.add(op.output("Output")[0])
elif op_type == "mul":
if self._is_input_all_not_persistable(
op, persistable_var_names):
op._set_attr("skip_quant", True)
logging.warning(
"Skip quant a mul op for two input variables are not persistable"
)
else:
self._quantized_act_var_name.add(op.input("X")[0])
self._quantized_weight_var_name.add(op.input("Y")[0])
self._quantized_act_var_name.add(op.output("Out")[0])
else:
# process other quantizable op type, the input must all not persistable
if self._is_input_all_not_persistable(
op, persistable_var_names):
input_output_name_list = self._op_real_in_out_name[
op_type]
for input_name in input_output_name_list[0]:
for var_name in op.input(input_name):
self._quantized_act_var_name.add(var_name)
for output_name in input_output_name_list[1]:
for var_name in op.output(output_name):
self._quantized_act_var_name.add(var_name)
# set activation variables to be persistable, so can obtain
# the tensor data in sample_data
for var in self._program.list_vars():
if var.name in self._quantized_act_var_name:
var.persistable = True
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import yaml
import time
import pickle
import os
import os.path as osp
from functools import reduce
import paddle.fluid as fluid
from multiprocessing import Process, Queue
import paddleslim
from paddleslim.prune import Pruner, load_sensitivities
from paddleslim.core import GraphWrapper
from .prune_config import get_prune_params
import paddlex.utils.logging as logging
from paddlex.utils import seconds_to_hms
def sensitivity(program,
place,
param_names,
eval_func,
sensitivities_file=None,
pruned_ratios=None):
scope = fluid.global_scope()
graph = GraphWrapper(program)
sensitivities = load_sensitivities(sensitivities_file)
if pruned_ratios is None:
pruned_ratios = np.arange(0.1, 1, step=0.1)
total_evaluate_iters = 1
for name in param_names:
if name not in sensitivities:
sensitivities[name] = {}
total_evaluate_iters += len(list(pruned_ratios))
else:
total_evaluate_iters += (
len(list(pruned_ratios)) - len(sensitivities[name]))
eta = '-'
start_time = time.time()
progress = 1.0 / total_evaluate_iters
progress = "%.2f%%" % (progress * 100)
logging.info(
"Total evaluate iters={}, current={}, progress={}, eta={}".format(
total_evaluate_iters, 1, progress, eta),
use_color=True)
baseline = eval_func(graph.program)
cost = time.time() - start_time
eta = cost * (total_evaluate_iters - 1)
current_iter = 1
for name in sensitivities:
for ratio in pruned_ratios:
if ratio in sensitivities[name]:
logging.debug('{}, {} has computed.'.format(name, ratio))
continue
progress = float(current_iter) / total_evaluate_iters
progress = "%.2f%%" % (progress * 100)
logging.info(
"Total evaluate iters={}, current={}, progress={}, eta={}".
format(
total_evaluate_iters, current_iter, progress,
seconds_to_hms(
int(cost * (total_evaluate_iters - current_iter)))),
use_color=True)
current_iter += 1
pruner = Pruner()
logging.info("sensitive - param: {}; ratios: {}".format(
name, ratio))
pruned_program, param_backup, _ = pruner.prune(
program=graph.program,
scope=scope,
params=[name],
ratios=[ratio],
place=place,
lazy=True,
only_graph=False,
param_backup=True)
pruned_metric = eval_func(pruned_program)
loss = (baseline - pruned_metric) / baseline
logging.info("pruned param: {}; {}; loss={}".format(
name, ratio, loss))
sensitivities[name][ratio] = loss
with open(sensitivities_file, 'wb') as f:
pickle.dump(sensitivities, f)
for param_name in param_backup.keys():
param_t = scope.find_var(param_name).get_tensor()
param_t.set(param_backup[param_name], place)
return sensitivities
def channel_prune(program, prune_names, prune_ratios, place, only_graph=False):
"""通道裁剪。
Args:
program (paddle.fluid.Program): 需要裁剪的Program,Program的具体介绍可参见
https://paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basic_concept/program.html#program。
prune_names (list): 由裁剪参数名组成的参数列表。
prune_ratios (list): 由裁剪率组成的参数列表,与prune_names中的参数列表意义对应。
place (paddle.fluid.CUDAPlace/paddle.fluid.CPUPlace): 运行设备。
only_graph (bool): 是否只修改网络图,当为False时代表同时修改网络图和
scope(全局作用域)中的参数。默认为False。
Returns:
paddle.fluid.Program: 裁剪后的Program。
"""
scope = fluid.global_scope()
pruner = Pruner()
program, _, _ = pruner.prune(
program,
scope,
params=prune_names,
ratios=prune_ratios,
place=place,
lazy=False,
only_graph=only_graph,
param_backup=False,
param_shape_backup=False)
return program
def prune_program(model, prune_params_ratios=None):
"""根据裁剪参数和裁剪率裁剪Program。
1. 裁剪训练Program和测试Program。
2. 使用裁剪后的Program更新模型中的train_prog和test_prog。
【注意】Program的具体介绍可参见
https://paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basic_concept/program.html#program。
Args:
model (paddlex.cv.models): paddlex中的模型。
prune_params_ratios (dict): 由裁剪参数名和裁剪率组成的字典,当为None时
使用默认裁剪参数名和裁剪率。默认为None。
"""
place = model.places[0]
train_prog = model.train_prog
eval_prog = model.test_prog
valid_prune_names = get_prune_params(model)
assert set(list(prune_params_ratios.keys())) & set(valid_prune_names), \
"All params in 'prune_params_ratios' can't be pruned!"
prune_names = list(
set(list(prune_params_ratios.keys())) & set(valid_prune_names))
prune_ratios = [
prune_params_ratios[prune_name] for prune_name in prune_names
]
model.train_prog = channel_prune(train_prog, prune_names, prune_ratios,
place)
model.test_prog = channel_prune(
eval_prog, prune_names, prune_ratios, place, only_graph=True)
def update_program(program, model_dir, place):
"""根据裁剪信息更新Program和参数。
Args:
program (paddle.fluid.Program): 需要更新的Program,Program的具体介绍可参见
https://paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basic_concept/program.html#program。
model_dir (str): 模型存储路径。
place (paddle.fluid.CUDAPlace/paddle.fluid.CPUPlace): 运行设备。
Returns:
paddle.fluid.Program: 更新后的Program。
"""
graph = GraphWrapper(program)
with open(osp.join(model_dir, "prune.yml")) as f:
shapes = yaml.load(f.read(), Loader=yaml.Loader)
for param, shape in shapes.items():
graph.var(param).set_shape(shape)
for block in program.blocks:
for param in block.all_parameters():
if param.name in shapes:
param_tensor = fluid.global_scope().find_var(
param.name).get_tensor()
param_tensor.set(
np.zeros(list(shapes[param.name])).astype('float32'),
place)
graph.update_groups_of_conv()
graph.infer_shape()
return program
def cal_params_sensitivities(model, save_file, eval_dataset, batch_size=8):
"""计算模型中可裁剪卷积Kernel的敏感度。
1. 获取模型中可裁剪卷积Kernel的名称。
2. 计算每个可裁剪卷积Kernel不同裁剪率下的敏感度。
【注意】卷积的敏感度是指在不同裁剪率下评估数据集预测精度的损失,
通过得到的敏感度,可以决定最终模型需要裁剪的参数列表和各裁剪参数对应的裁剪率。
Args:
model (paddlex.cv.models): paddlex中的模型。
save_file (str): 计算的得到的sensetives文件存储路径。
eval_dataset (paddlex.datasets): 验证数据读取器。
batch_size (int): 验证数据批大小。默认为8。
Returns:
dict: 由参数名和不同裁剪率下敏感度组成的字典。存储的信息如下:
.. code-block:: python
{"weight_0":
{0.1: 0.22,
0.2: 0.33
},
"weight_1":
{0.1: 0.21,
0.2: 0.4
}
}
其中``weight_0``是卷积Kernel名;``sensitivities['weight_0']``是一个字典,key是裁剪率,value是敏感度。
"""
prune_names = get_prune_params(model)
def eval_for_prune(program):
eval_metrics = model.evaluate(
eval_dataset=eval_dataset,
batch_size=batch_size,
return_details=False)
primary_key = list(eval_metrics.keys())[0]
return eval_metrics[primary_key]
sensitivitives = sensitivity(
model.test_prog,
model.places[0],
prune_names,
eval_for_prune,
sensitivities_file=save_file,
pruned_ratios=list(np.arange(0.1, 1, 0.1)))
return sensitivitives
def get_params_ratios(sensitivities_file, eval_metric_loss=0.05):
"""根据设定的精度损失容忍度metric_loss_thresh和计算保存的模型参数敏感度信息文件sensetive_file,
获取裁剪的参数配置。
【注意】metric_loss_thresh并不确保最终裁剪后的模型在fine-tune后的模型效果,仅为预估值。
Args:
sensitivities_file (str): 敏感度文件存储路径。
eval_metric_loss (float): 可容忍的精度损失。默认为0.05。
Returns:
dict: 由参数名和裁剪率组成的字典。存储的信息如下:
.. code-block:: python
{"weight_0": 0.1,
"weight_1": 0.2
}
其中key是卷积Kernel名;value是裁剪率。
"""
if not osp.exists(sensitivities_file):
raise Exception('The sensitivities file is not exists!')
sensitivitives = paddleslim.prune.load_sensitivities(sensitivities_file)
params_ratios = paddleslim.prune.get_ratios_by_loss(
sensitivitives, eval_metric_loss)
return params_ratios
def cal_model_size(program, place, sensitivities_file, eval_metric_loss=0.05):
"""在可容忍的精度损失下,计算裁剪后模型大小相对于当前模型大小的比例。
Args:
program (paddle.fluid.Program): 需要裁剪的Program,Program的具体介绍可参见
https://paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basic_concept/program.html#program。
place (paddle.fluid.CUDAPlace/paddle.fluid.CPUPlace): 运行设备。
sensitivities_file (str): 敏感度文件存储路径。
eval_metric_loss (float): 可容忍的精度损失。默认为0.05。
Returns:
float: 裁剪后模型大小相对于当前模型大小的比例。
"""
prune_params_ratios = get_params_ratios(sensitivities_file,
eval_metric_loss)
prune_program = channel_prune(
program,
list(prune_params_ratios.keys()),
list(prune_params_ratios.values()),
place,
only_graph=True)
origin_size = 0
new_size = 0
for var in program.list_vars():
name = var.name
shape = var.shape
for prune_block in prune_program.blocks:
if prune_block.has_var(name):
prune_var = prune_block.var(name)
prune_shape = prune_var.shape
break
origin_size += reduce(lambda x, y: x * y, shape)
new_size += reduce(lambda x, y: x * y, prune_shape)
return (new_size * 1.0) / origin_size
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import os.path as osp
import paddle.fluid as fluid
import paddlehub as hub
import paddlex
sensitivities_data = {
'ResNet18':
'https://bj.bcebos.com/paddlex/slim_prune/resnet18.sensitivities',
'ResNet34':
'https://bj.bcebos.com/paddlex/slim_prune/resnet34.sensitivities',
'ResNet50':
'https://bj.bcebos.com/paddlex/slim_prune/resnet50.sensitivities',
'ResNet101':
'https://bj.bcebos.com/paddlex/slim_prune/resnet101.sensitivities',
'ResNet50_vd':
'https://bj.bcebos.com/paddlex/slim_prune/resnet50vd.sensitivities',
'ResNet101_vd':
'https://bj.bcebos.com/paddlex/slim_prune/resnet101vd.sensitivities',
'DarkNet53':
'https://bj.bcebos.com/paddlex/slim_prune/darknet53.sensitivities',
'MobileNetV1':
'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv1.sensitivities',
'MobileNetV2':
'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv2.sensitivities',
'MobileNetV3_large':
'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv3_large.sensitivities',
'MobileNetV3_small':
'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv3_small.sensitivities',
'DenseNet121':
'https://bj.bcebos.com/paddlex/slim_prune/densenet121.sensitivities',
'DenseNet161':
'https://bj.bcebos.com/paddlex/slim_prune/densenet161.sensitivities',
'DenseNet201':
'https://bj.bcebos.com/paddlex/slim_prune/densenet201.sensitivities',
'Xception41':
'https://bj.bcebos.com/paddlex/slim_prune/xception41.sensitivities',
'Xception65':
'https://bj.bcebos.com/paddlex/slim_prune/xception65.sensitivities',
'YOLOv3_MobileNetV1':
'https://bj.bcebos.com/paddlex/slim_prune/yolov3_mobilenetv1.sensitivities',
'YOLOv3_MobileNetV3_large':
'https://bj.bcebos.com/paddlex/slim_prune/yolov3_mobilenetv3.sensitivities',
'YOLOv3_DarkNet53':
'https://bj.bcebos.com/paddlex/slim_prune/yolov3_darknet53.sensitivities',
'YOLOv3_ResNet34':
'https://bj.bcebos.com/paddlex/slim_prune/yolov3_resnet34.sensitivities',
'UNet':
'https://bj.bcebos.com/paddlex/slim_prune/unet.sensitivities',
'DeepLabv3p_MobileNetV2_x0.25':
'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x0.25_no_aspp_decoder.sensitivities',
'DeepLabv3p_MobileNetV2_x0.5':
'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x0.5_no_aspp_decoder.sensitivities',
'DeepLabv3p_MobileNetV2_x1.0':
'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x1.0_no_aspp_decoder.sensitivities',
'DeepLabv3p_MobileNetV2_x1.5':
'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x1.5_no_aspp_decoder.sensitivities',
'DeepLabv3p_MobileNetV2_x2.0':
'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x2.0_no_aspp_decoder.sensitivities',
'DeepLabv3p_MobileNetV2_x0.25_aspp_decoder':
'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x0.25_with_aspp_decoder.sensitivities',
'DeepLabv3p_MobileNetV2_x0.5_aspp_decoder':
'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x0.5_with_aspp_decoder.sensitivities',
'DeepLabv3p_MobileNetV2_x1.0_aspp_decoder':
'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x1.0_with_aspp_decoder.sensitivities',
'DeepLabv3p_MobileNetV2_x1.5_aspp_decoder':
'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x1.5_with_aspp_decoder.sensitivities',
'DeepLabv3p_MobileNetV2_x2.0_aspp_decoder':
'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x2.0_with_aspp_decoder.sensitivities',
'DeepLabv3p_Xception65_aspp_decoder':
'https://bj.bcebos.com/paddlex/slim_prune/deeplab_xception65_with_aspp_decoder.sensitivities',
'DeepLabv3p_Xception41_aspp_decoder':
'https://bj.bcebos.com/paddlex/slim_prune/deeplab_xception41_with_aspp_decoder.sensitivities'
}
def get_sensitivities(flag, model, save_dir):
model_name = model.__class__.__name__
model_type = model_name
if hasattr(model, 'backbone'):
model_type = model_name + '_' + model.backbone
if model_type.startswith('DeepLabv3p_Xception'):
model_type = model_type + '_' + 'aspp' + '_' + 'decoder'
elif hasattr(model, 'encoder_with_aspp') or hasattr(
model, 'enable_decoder'):
model_type = model_type + '_' + 'aspp' + '_' + 'decoder'
if osp.isfile(flag):
return flag
elif flag == 'DEFAULT':
assert model_type in sensitivities_data, "There is not sensitivities data file for {}, you may need to calculate it by your self.".format(
model_type)
url = sensitivities_data[model_type]
fname = osp.split(url)[-1]
try:
hub.download(fname, save_path=save_dir)
except Exception as e:
if isinstance(e, hub.ResourceNotFoundError):
raise Exception(
"Resource for model {}(key='{}') not found".format(
model_type, fname))
elif isinstance(e, hub.ServerConnectionError):
raise Exception(
"Cannot get reource for model {}(key='{}'), please check your internet connecgtion"
.format(model_type, fname))
else:
raise Exception(
"Unexpected error, please make sure paddlehub >= 1.6.2 {}".
format(str(e)))
return osp.join(save_dir, fname)
else:
raise Exception(
"sensitivities need to be defined as directory path or `DEFAULT`(download sensitivities automatically)."
)
def get_prune_params(model):
prune_names = []
model_type = model.__class__.__name__
if model_type == 'BaseClassifier':
model_type = model.model_name
if hasattr(model, 'backbone'):
backbone = model.backbone
model_type += ('_' + backbone)
program = model.test_prog
if model_type.startswith('ResNet') or \
model_type.startswith('DenseNet') or \
model_type.startswith('DarkNet'):
for block in program.blocks:
for param in block.all_parameters():
pd_var = fluid.global_scope().find_var(param.name)
pd_param = pd_var.get_tensor()
if len(np.array(pd_param).shape) == 4:
prune_names.append(param.name)
elif model_type == "MobileNetV1":
prune_names.append("conv1_weights")
for param in program.global_block().all_parameters():
if "_sep_weights" in param.name:
prune_names.append(param.name)
elif model_type == "MobileNetV2":
for param in program.global_block().all_parameters():
if 'weight' not in param.name \
or 'dwise' in param.name \
or 'fc' in param.name :
continue
prune_names.append(param.name)
elif model_type.startswith("MobileNetV3"):
if model_type == 'MobileNetV3_small':
expand_prune_id = [3, 4]
else:
expand_prune_id = [2, 3, 4, 8, 9, 11]
for param in program.global_block().all_parameters():
if ('expand_weights' in param.name and \
int(param.name.split('_')[0][4:]) in expand_prune_id)\
or 'linear_weights' in param.name \
or 'se_1_weights' in param.name:
prune_names.append(param.name)
elif model_type.startswith('Xception') or \
model_type.startswith('DeepLabv3p_Xception'):
params_not_prune = [
'weights',
'xception_{}/exit_flow/block2/separable_conv3/pointwise/weights'.
format(model_type[-2:]), 'encoder/concat/weights',
'decoder/concat/weights'
]
for param in program.global_block().all_parameters():
if 'weight' not in param.name \
or 'dwise' in param.name \
or 'depthwise' in param.name \
or 'logit' in param.name:
continue
if param.name in params_not_prune:
continue
prune_names.append(param.name)
elif model_type.startswith('YOLOv3'):
for block in program.blocks:
for param in block.all_parameters():
if 'weights' in param.name and 'yolo_block' in param.name:
prune_names.append(param.name)
elif model_type.startswith('UNet'):
for param in program.global_block().all_parameters():
if 'weight' not in param.name:
continue
if 'logit' in param.name:
continue
prune_names.append(param.name)
params_not_prune = [
'encode/block4/down/conv1/weights',
'encode/block3/down/conv1/weights',
'encode/block2/down/conv1/weights', 'encode/block1/conv1/weights'
]
for i in params_not_prune:
if i in prune_names:
prune_names.remove(i)
elif model_type.startswith('DeepLabv3p'):
for param in program.global_block().all_parameters():
if 'weight' not in param.name:
continue
if 'dwise' in param.name or 'depthwise' in param.name or 'logit' in param.name:
continue
prune_names.append(param.name)
params_not_prune = [
'xception_{}/exit_flow/block2/separable_conv3/pointwise/weights'.
format(model_type[-2:]), 'encoder/concat/weights',
'decoder/concat/weights'
]
if model.encoder_with_aspp == True:
params_not_prune.append(
'xception_{}/exit_flow/block2/separable_conv3/pointwise/weights'
.format(model_type[-2:]))
params_not_prune.append('conv8_1_linear_weights')
for i in params_not_prune:
if i in prune_names:
prune_names.remove(i)
else:
raise Exception('The {} is not implement yet!'.format(model_type))
return prune_names
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os.path as osp
import tqdm
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from .prune import cal_model_size
from paddleslim.prune import load_sensitivities
def visualize(model, sensitivities_file, save_dir='./'):
"""将模型裁剪率和每个参数裁剪后精度损失的关系可视化。
可视化结果纵轴为eval_metric_loss参数值,横轴为对应的模型被裁剪的比例
Args:
model (paddlex.cv.models): paddlex中的模型。
sensitivities_file (str): 敏感度文件存储路径。
"""
program = model.test_prog
place = model.places[0]
fig = plt.figure()
plt.xlabel("model prune ratio")
plt.ylabel("evaluation loss")
title_name = osp.split(sensitivities_file)[-1].split('.')[0]
plt.title(title_name)
plt.grid(linestyle='--', linewidth=0.5)
x = list()
y = list()
for loss_thresh in tqdm.tqdm(list(np.arange(0.05, 1, 0.05))):
prune_ratio = 1 - cal_model_size(
program, place, sensitivities_file, eval_metric_loss=loss_thresh)
x.append(prune_ratio)
y.append(loss_thresh)
plt.plot(x, y, color='green', linewidth=0.5, marker='o', markersize=3)
my_x_ticks = np.arange(
min(np.array(x)) - 0.01,
max(np.array(x)) + 0.01, 0.05)
my_y_ticks = np.arange(0.05, 1, 0.05)
plt.xticks(my_x_ticks, fontsize=3)
plt.yticks(my_y_ticks, fontsize=3)
for a, b in zip(x, y):
plt.text(
a,
b, (float('%0.4f' % a), float('%0.3f' % b)),
ha='center',
va='bottom',
fontsize=3)
suffix = osp.splitext(sensitivities_file)[-1]
plt.savefig('sensitivities.png', dpi=800)
plt.close()
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
import paddlex
from collections import OrderedDict
from .deeplabv3p import DeepLabv3p
class UNet(DeepLabv3p):
"""实现UNet网络的构建并进行训练、评估、预测和模型导出。
Args:
num_classes (int): 类别数。
upsample_mode (str): UNet decode时采用的上采样方式,取值为'bilinear'时利用双线行差值进行上菜样,
当输入其他选项时则利用反卷积进行上菜样,默认为'bilinear'。
use_bce_loss (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。默认False。
use_dice_loss (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。
当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。默认False。
class_weight (list/str): 交叉熵损失函数各类损失的权重。当class_weight为list的时候,长度应为
num_classes。当class_weight为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重
自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,
即平时使用的交叉熵损失函数。
ignore_index (int): label上忽略的值,label为ignore_index的像素不参与损失函数的计算。默认255。
Raises:
ValueError: use_bce_loss或use_dice_loss为真且num_calsses > 2。
ValueError: class_weight为list, 但长度不等于num_class。
class_weight为str, 但class_weight.low()不等于dynamic。
TypeError: class_weight不为None时,其类型不是list或str。
"""
def __init__(self,
num_classes=2,
upsample_mode='bilinear',
use_bce_loss=False,
use_dice_loss=False,
class_weight=None,
ignore_index=255):
self.init_params = locals()
super(DeepLabv3p, self).__init__('segmenter')
# dice_loss或bce_loss只适用两类分割中
if num_classes > 2 and (use_bce_loss or use_dice_loss):
raise ValueError(
"dice loss and bce loss is only applicable to binary classfication"
)
if class_weight is not None:
if isinstance(class_weight, list):
if len(class_weight) != num_classes:
raise ValueError(
"Length of class_weight should be equal to number of classes"
)
elif isinstance(class_weight, str):
if class_weight.lower() != 'dynamic':
raise ValueError(
"if class_weight is string, must be dynamic!")
else:
raise TypeError(
'Expect class_weight is a list or string but receive {}'.
format(type(class_weight)))
self.num_classes = num_classes
self.upsample_mode = upsample_mode
self.use_bce_loss = use_bce_loss
self.use_dice_loss = use_dice_loss
self.class_weight = class_weight
self.ignore_index = ignore_index
self.labels = None
def build_net(self, mode='train'):
model = paddlex.cv.nets.segmentation.UNet(
self.num_classes,
mode=mode,
upsample_mode=self.upsample_mode,
use_bce_loss=self.use_bce_loss,
use_dice_loss=self.use_dice_loss,
class_weight=self.class_weight,
ignore_index=self.ignore_index)
inputs = model.generate_inputs()
model_out = model.build_net(inputs)
outputs = OrderedDict()
if mode == 'train':
self.optimizer.minimize(model_out)
outputs['loss'] = model_out
elif mode == 'eval':
outputs['loss'] = model_out[0]
outputs['pred'] = model_out[1]
outputs['label'] = model_out[2]
outputs['mask'] = model_out[3]
else:
outputs['pred'] = model_out[0]
outputs['logit'] = model_out[1]
return inputs, outputs
def train(self,
num_epochs,
train_dataset,
train_batch_size=2,
eval_dataset=None,
save_interval_epochs=1,
log_interval_steps=2,
save_dir='output',
pretrain_weights='COCO',
optimizer=None,
learning_rate=0.01,
lr_decay_power=0.9,
use_vdl=False,
sensitivities_file=None,
eval_metric_loss=0.05):
"""训练。
Args:
num_epochs (int): 训练迭代轮数。
train_dataset (paddlex.datasets): 训练数据读取器。
train_batch_size (int): 训练数据batch大小。同时作为验证数据batch大小。默认2。
eval_dataset (paddlex.datasets): 评估数据读取器。
save_interval_epochs (int): 模型保存间隔(单位:迭代轮数)。默认为1。
log_interval_steps (int): 训练日志输出间隔(单位:迭代次数)。默认为2。
save_dir (str): 模型保存路径。默认'output'。
pretrain_weights (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'COCO',
则自动下载在COCO图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认为'COCO'。
optimizer (paddle.fluid.optimizer): 优化器。当改参数为None时,使用默认的优化器:使用
fluid.optimizer.Momentum优化方法,polynomial的学习率衰减策略。
learning_rate (float): 默认优化器的初始学习率。默认0.01。
lr_decay_power (float): 默认优化器学习率多项式衰减系数。默认0.9。
use_vdl (bool): 是否使用VisualDL进行可视化。默认False。
sensitivities_file (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',
则自动下载在ImageNet图片数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
eval_metric_loss (float): 可容忍的精度损失。默认为0.05。
Raises:
ValueError: 模型从inference model进行加载。
"""
return super(UNet, self).train(
num_epochs, train_dataset, train_batch_size, eval_dataset,
save_interval_epochs, log_interval_steps, save_dir,
pretrain_weights, optimizer, learning_rate, lr_decay_power,
use_vdl, sensitivities_file, eval_metric_loss)
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
import numpy as np
import json
import os
import sys
import cv2
import copy
import paddlex.utils.logging as logging
# fix linspace problem for pycocotools while numpy > 1.17.2
backup_linspace = np.linspace
def fixed_linspace(start,
stop,
num=50,
endpoint=True,
retstep=False,
dtype=None,
axis=0):
num = int(num)
return backup_linspace(start, stop, num, endpoint, retstep, dtype, axis)
def eval_results(results,
metric,
coco_gt,
with_background=True,
resolution=None,
is_bbox_normalized=False,
map_type='11point'):
"""Evaluation for evaluation program results"""
box_ap_stats = []
coco_gt_data = copy.deepcopy(coco_gt)
eval_details = {'gt': copy.deepcopy(coco_gt.dataset)}
if metric == 'COCO':
np.linspace = fixed_linspace
if 'proposal' in results[0]:
proposal_eval(results, coco_gt_data)
if 'bbox' in results[0]:
box_ap_stats, xywh_results = coco_bbox_eval(
results,
coco_gt_data,
with_background,
is_bbox_normalized=is_bbox_normalized)
if 'mask' in results[0]:
mask_ap_stats, segm_results = mask_eval(results, coco_gt_data,
resolution)
ap_stats = [box_ap_stats, mask_ap_stats]
eval_details['bbox'] = xywh_results
eval_details['mask'] = segm_results
return ap_stats, eval_details
np.linspace = backup_linspace
else:
if 'accum_map' in results[-1]:
res = np.mean(results[-1]['accum_map'][0])
logging.debug('mAP: {:.2f}'.format(res * 100.))
box_ap_stats.append(res * 100.)
elif 'bbox' in results[0]:
box_ap, xywh_results = voc_bbox_eval(
results,
coco_gt_data,
with_background,
is_bbox_normalized=is_bbox_normalized,
map_type=map_type)
box_ap_stats.append(box_ap)
eval_details['bbox'] = xywh_results
return box_ap_stats, eval_details
def proposal_eval(results, coco_gt, outputfile, max_dets=(100, 300, 1000)):
assert 'proposal' in results[0]
assert outfile.endswith('.json')
xywh_results = proposal2out(results)
assert len(
xywh_results) > 0, "The number of valid proposal detected is zero.\n \
Please use reasonable model and check input data."
with open(outfile, 'w') as f:
json.dump(xywh_results, f)
cocoapi_eval(xywh_results, 'proposal', coco_gt=coco_gt, max_dets=max_dets)
# flush coco evaluation result
sys.stdout.flush()
def coco_bbox_eval(results,
coco_gt,
with_background=True,
is_bbox_normalized=False):
assert 'bbox' in results[0]
from pycocotools.coco import COCO
cat_ids = coco_gt.getCatIds()
# when with_background = True, mapping category to classid, like:
# background:0, first_class:1, second_class:2, ...
clsid2catid = dict(
{i + int(with_background): catid
for i, catid in enumerate(cat_ids)})
xywh_results = bbox2out(
results, clsid2catid, is_bbox_normalized=is_bbox_normalized)
results = copy.deepcopy(xywh_results)
if len(xywh_results) == 0:
logging.warning(
"The number of valid bbox detected is zero.\n Please use reasonable model and check input data.\n stop eval!"
)
return [0.0], results
map_stats = cocoapi_eval(xywh_results, 'bbox', coco_gt=coco_gt)
# flush coco evaluation result
sys.stdout.flush()
return map_stats, results
def loadRes(coco_obj, anns):
"""
Load result file and return a result api object.
:param resFile (str) : file name of result file
:return: res (obj) : result api object
"""
from pycocotools.coco import COCO
import pycocotools.mask as maskUtils
import time
res = COCO()
res.dataset['images'] = [img for img in coco_obj.dataset['images']]
tic = time.time()
assert type(anns) == list, 'results in not an array of objects'
annsImgIds = [ann['image_id'] for ann in anns]
assert set(annsImgIds) == (set(annsImgIds) & set(coco_obj.getImgIds())), \
'Results do not correspond to current coco set'
if 'caption' in anns[0]:
imgIds = set([img['id'] for img in res.dataset['images']]) & set(
[ann['image_id'] for ann in anns])
res.dataset['images'] = [
img for img in res.dataset['images'] if img['id'] in imgIds
]
for id, ann in enumerate(anns):
ann['id'] = id + 1
elif 'bbox' in anns[0] and not anns[0]['bbox'] == []:
res.dataset['categories'] = copy.deepcopy(
coco_obj.dataset['categories'])
for id, ann in enumerate(anns):
bb = ann['bbox']
x1, x2, y1, y2 = [bb[0], bb[0] + bb[2], bb[1], bb[1] + bb[3]]
if not 'segmentation' in ann:
ann['segmentation'] = [[x1, y1, x1, y2, x2, y2, x2, y1]]
ann['area'] = bb[2] * bb[3]
ann['id'] = id + 1
ann['iscrowd'] = 0
elif 'segmentation' in anns[0]:
res.dataset['categories'] = copy.deepcopy(
coco_obj.dataset['categories'])
for id, ann in enumerate(anns):
# now only support compressed RLE format as segmentation results
ann['area'] = maskUtils.area(ann['segmentation'])
if not 'bbox' in ann:
ann['bbox'] = maskUtils.toBbox(ann['segmentation'])
ann['id'] = id + 1
ann['iscrowd'] = 0
elif 'keypoints' in anns[0]:
res.dataset['categories'] = copy.deepcopy(
coco_obj.dataset['categories'])
for id, ann in enumerate(anns):
s = ann['keypoints']
x = s[0::3]
y = s[1::3]
x0, x1, y0, y1 = np.min(x), np.max(x), np.min(y), np.max(y)
ann['area'] = (x1 - x0) * (y1 - y0)
ann['id'] = id + 1
ann['bbox'] = [x0, y0, x1 - x0, y1 - y0]
res.dataset['annotations'] = anns
res.createIndex()
return res
def mask_eval(results, coco_gt, resolution, thresh_binarize=0.5):
assert 'mask' in results[0]
from pycocotools.coco import COCO
clsid2catid = {i + 1: v for i, v in enumerate(coco_gt.getCatIds())}
segm_results = mask2out(results, clsid2catid, resolution, thresh_binarize)
results = copy.deepcopy(segm_results)
if len(segm_results) == 0:
logging.warning(
"The number of valid mask detected is zero.\n Please use reasonable model and check input data."
)
return None, results
map_stats = cocoapi_eval(segm_results, 'segm', coco_gt=coco_gt)
return map_stats, results
def cocoapi_eval(anns,
style,
coco_gt=None,
anno_file=None,
max_dets=(100, 300, 1000)):
"""
Args:
anns: Evaluation result.
style: COCOeval style, can be `bbox` , `segm` and `proposal`.
coco_gt: Whether to load COCOAPI through anno_file,
eg: coco_gt = COCO(anno_file)
anno_file: COCO annotations file.
max_dets: COCO evaluation maxDets.
"""
assert coco_gt != None or anno_file != None
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
if coco_gt == None:
coco_gt = COCO(anno_file)
logging.debug("Start evaluate...")
coco_dt = loadRes(coco_gt, anns)
if style == 'proposal':
coco_eval = COCOeval(coco_gt, coco_dt, 'bbox')
coco_eval.params.useCats = 0
coco_eval.params.maxDets = list(max_dets)
else:
coco_eval = COCOeval(coco_gt, coco_dt, style)
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()
return coco_eval.stats
def proposal2out(results, is_bbox_normalized=False):
xywh_res = []
for t in results:
bboxes = t['proposal'][0]
lengths = t['proposal'][1][0]
im_ids = np.array(t['im_id'][0]).flatten()
assert len(lengths) == im_ids.size
if bboxes.shape == (1, 1) or bboxes is None:
continue
k = 0
for i in range(len(lengths)):
num = lengths[i]
im_id = int(im_ids[i])
for j in range(num):
dt = bboxes[k]
xmin, ymin, xmax, ymax = dt.tolist()
if is_bbox_normalized:
xmin, ymin, xmax, ymax = \
clip_bbox([xmin, ymin, xmax, ymax])
w = xmax - xmin
h = ymax - ymin
else:
w = xmax - xmin + 1
h = ymax - ymin + 1
bbox = [xmin, ymin, w, h]
coco_res = {
'image_id': im_id,
'category_id': 1,
'bbox': bbox,
'score': 1.0
}
xywh_res.append(coco_res)
k += 1
return xywh_res
def bbox2out(results, clsid2catid, is_bbox_normalized=False):
"""
Args:
results: request a dict, should include: `bbox`, `im_id`,
if is_bbox_normalized=True, also need `im_shape`.
clsid2catid: class id to category id map of COCO2017 dataset.
is_bbox_normalized: whether or not bbox is normalized.
"""
xywh_res = []
for t in results:
bboxes = t['bbox'][0]
lengths = t['bbox'][1][0]
im_ids = np.array(t['im_id'][0]).flatten()
if bboxes.shape == (1, 1) or bboxes is None:
continue
k = 0
for i in range(len(lengths)):
num = lengths[i]
im_id = int(im_ids[i])
for j in range(num):
dt = bboxes[k]
clsid, score, xmin, ymin, xmax, ymax = dt.tolist()
catid = (clsid2catid[int(clsid)])
if is_bbox_normalized:
xmin, ymin, xmax, ymax = \
clip_bbox([xmin, ymin, xmax, ymax])
w = xmax - xmin
h = ymax - ymin
im_shape = t['im_shape'][0][i].tolist()
im_height, im_width = int(im_shape[0]), int(im_shape[1])
xmin *= im_width
ymin *= im_height
w *= im_width
h *= im_height
else:
w = xmax - xmin + 1
h = ymax - ymin + 1
bbox = [xmin, ymin, w, h]
coco_res = {
'image_id': im_id,
'category_id': catid,
'bbox': bbox,
'score': score
}
xywh_res.append(coco_res)
k += 1
return xywh_res
def mask2out(results, clsid2catid, resolution, thresh_binarize=0.5):
import pycocotools.mask as mask_util
scale = (resolution + 2.0) / resolution
segm_res = []
# for each batch
for t in results:
bboxes = t['bbox'][0]
lengths = t['bbox'][1][0]
im_ids = np.array(t['im_id'][0])
if bboxes.shape == (1, 1) or bboxes is None:
continue
if len(bboxes.tolist()) == 0:
continue
masks = t['mask'][0]
s = 0
# for each sample
for i in range(len(lengths)):
num = lengths[i]
im_id = int(im_ids[i][0])
im_shape = t['im_shape'][0][i]
bbox = bboxes[s:s + num][:, 2:]
clsid_scores = bboxes[s:s + num][:, 0:2]
mask = masks[s:s + num]
s += num
im_h = int(im_shape[0])
im_w = int(im_shape[1])
expand_bbox = expand_boxes(bbox, scale)
expand_bbox = expand_bbox.astype(np.int32)
padded_mask = np.zeros((resolution + 2, resolution + 2),
dtype=np.float32)
for j in range(num):
xmin, ymin, xmax, ymax = expand_bbox[j].tolist()
clsid, score = clsid_scores[j].tolist()
clsid = int(clsid)
padded_mask[1:-1, 1:-1] = mask[j, clsid, :, :]
catid = clsid2catid[clsid]
w = xmax - xmin + 1
h = ymax - ymin + 1
w = np.maximum(w, 1)
h = np.maximum(h, 1)
resized_mask = cv2.resize(padded_mask, (w, h))
resized_mask = np.array(
resized_mask > thresh_binarize, dtype=np.uint8)
im_mask = np.zeros((im_h, im_w), dtype=np.uint8)
x0 = min(max(xmin, 0), im_w)
x1 = min(max(xmax + 1, 0), im_w)
y0 = min(max(ymin, 0), im_h)
y1 = min(max(ymax + 1, 0), im_h)
im_mask[y0:y1, x0:x1] = resized_mask[(y0 - ymin):(y1 - ymin), (
x0 - xmin):(x1 - xmin)]
segm = mask_util.encode(
np.array(im_mask[:, :, np.newaxis], order='F'))[0]
catid = clsid2catid[clsid]
segm['counts'] = segm['counts'].decode('utf8')
coco_res = {
'image_id': im_id,
'category_id': catid,
'segmentation': segm,
'score': score
}
segm_res.append(coco_res)
return segm_res
def expand_boxes(boxes, scale):
"""
Expand an array of boxes by a given scale.
"""
w_half = (boxes[:, 2] - boxes[:, 0]) * .5
h_half = (boxes[:, 3] - boxes[:, 1]) * .5
x_c = (boxes[:, 2] + boxes[:, 0]) * .5
y_c = (boxes[:, 3] + boxes[:, 1]) * .5
w_half *= scale
h_half *= scale
boxes_exp = np.zeros(boxes.shape)
boxes_exp[:, 0] = x_c - w_half
boxes_exp[:, 2] = x_c + w_half
boxes_exp[:, 1] = y_c - h_half
boxes_exp[:, 3] = y_c + h_half
return boxes_exp
def voc_bbox_eval(results,
coco_gt,
with_background=False,
overlap_thresh=0.5,
map_type='11point',
is_bbox_normalized=False,
evaluate_difficult=False):
"""
Bounding box evaluation for VOC dataset
Args:
results (list): prediction bounding box results.
class_num (int): evaluation class number.
overlap_thresh (float): the postive threshold of
bbox overlap
map_type (string): method for mAP calcualtion,
can only be '11point' or 'integral'
is_bbox_normalized (bool): whether bbox is normalized
to range [0, 1].
evaluate_difficult (bool): whether to evaluate
difficult gt bbox.
"""
assert 'bbox' in results[0]
logging.debug("Start evaluate...")
from pycocotools.coco import COCO
cat_ids = coco_gt.getCatIds()
# when with_background = True, mapping category to classid, like:
# background:0, first_class:1, second_class:2, ...
clsid2catid = dict(
{i + int(with_background): catid
for i, catid in enumerate(cat_ids)})
class_num = len(clsid2catid) + int(with_background)
detection_map = DetectionMAP(
class_num=class_num,
overlap_thresh=overlap_thresh,
map_type=map_type,
is_bbox_normalized=is_bbox_normalized,
evaluate_difficult=evaluate_difficult)
xywh_res = []
det_nums = 0
gt_nums = 0
for t in results:
bboxes = t['bbox'][0]
bbox_lengths = t['bbox'][1][0]
im_ids = np.array(t['im_id'][0]).flatten()
if bboxes.shape == (1, 1) or bboxes is None:
continue
gt_boxes = t['gt_box'][0]
gt_labels = t['gt_label'][0]
difficults = t['is_difficult'][0] if not evaluate_difficult \
else None
if len(t['gt_box'][1]) == 0:
# gt_box, gt_label, difficult read as zero padded Tensor
bbox_idx = 0
for i in range(len(gt_boxes)):
gt_box = gt_boxes[i]
gt_label = gt_labels[i]
difficult = None if difficults is None \
else difficults[i]
bbox_num = bbox_lengths[i]
bbox = bboxes[bbox_idx:bbox_idx + bbox_num]
gt_box, gt_label, difficult = prune_zero_padding(
gt_box, gt_label, difficult)
detection_map.update(bbox, gt_box, gt_label, difficult)
bbox_idx += bbox_num
det_nums += bbox_num
gt_nums += gt_box.shape[0]
im_id = int(im_ids[i])
for b in bbox:
clsid, score, xmin, ymin, xmax, ymax = b.tolist()
w = xmax - xmin + 1
h = ymax - ymin + 1
bbox = [xmin, ymin, w, h]
coco_res = {
'image_id': im_id,
'category_id': clsid2catid[clsid],
'bbox': bbox,
'score': score
}
xywh_res.append(coco_res)
else:
# gt_box, gt_label, difficult read as LoDTensor
gt_box_lengths = t['gt_box'][1][0]
bbox_idx = 0
gt_box_idx = 0
for i in range(len(bbox_lengths)):
bbox_num = bbox_lengths[i]
gt_box_num = gt_box_lengths[i]
bbox = bboxes[bbox_idx:bbox_idx + bbox_num]
gt_box = gt_boxes[gt_box_idx:gt_box_idx + gt_box_num]
gt_label = gt_labels[gt_box_idx:gt_box_idx + gt_box_num]
difficult = None if difficults is None else \
difficults[gt_box_idx: gt_box_idx + gt_box_num]
detection_map.update(bbox, gt_box, gt_label, difficult)
bbox_idx += bbox_num
gt_box_idx += gt_box_num
im_id = int(im_ids[i])
for b in bbox:
clsid, score, xmin, ymin, xmax, ymax = b.tolist()
w = xmax - xmin + 1
h = ymax - ymin + 1
bbox = [xmin, ymin, w, h]
coco_res = {
'image_id': im_id,
'category_id': clsid2catid[clsid],
'bbox': bbox,
'score': score
}
xywh_res.append(coco_res)
logging.debug("Accumulating evaluatation results...")
detection_map.accumulate()
map_stat = 100. * detection_map.get_map()
logging.debug("mAP({:.2f}, {}) = {:.2f}".format(overlap_thresh, map_type,
map_stat))
return map_stat, xywh_res
def prune_zero_padding(gt_box, gt_label, difficult=None):
valid_cnt = 0
for i in range(len(gt_box)):
if gt_box[i, 0] == 0 and gt_box[i, 1] == 0 and \
gt_box[i, 2] == 0 and gt_box[i, 3] == 0:
break
valid_cnt += 1
return (gt_box[:valid_cnt], gt_label[:valid_cnt],
difficult[:valid_cnt] if difficult is not None else None)
def bbox_area(bbox, is_bbox_normalized):
"""
Calculate area of a bounding box
"""
norm = 1. - float(is_bbox_normalized)
width = bbox[2] - bbox[0] + norm
height = bbox[3] - bbox[1] + norm
return width * height
def jaccard_overlap(pred, gt, is_bbox_normalized=False):
"""
Calculate jaccard overlap ratio between two bounding box
"""
if pred[0] >= gt[2] or pred[2] <= gt[0] or \
pred[1] >= gt[3] or pred[3] <= gt[1]:
return 0.
inter_xmin = max(pred[0], gt[0])
inter_ymin = max(pred[1], gt[1])
inter_xmax = min(pred[2], gt[2])
inter_ymax = min(pred[3], gt[3])
inter_size = bbox_area([inter_xmin, inter_ymin, inter_xmax, inter_ymax],
is_bbox_normalized)
pred_size = bbox_area(pred, is_bbox_normalized)
gt_size = bbox_area(gt, is_bbox_normalized)
overlap = float(inter_size) / (pred_size + gt_size - inter_size)
return overlap
class DetectionMAP(object):
"""
Calculate detection mean average precision.
Currently support two types: 11point and integral
Args:
class_num (int): the class number.
overlap_thresh (float): The threshold of overlap
ratio between prediction bounding box and
ground truth bounding box for deciding
true/false positive. Default 0.5.
map_type (str): calculation method of mean average
precision, currently support '11point' and
'integral'. Default '11point'.
is_bbox_normalized (bool): whther bounding boxes
is normalized to range[0, 1]. Default False.
evaluate_difficult (bool): whether to evaluate
difficult bounding boxes. Default False.
"""
def __init__(self,
class_num,
overlap_thresh=0.5,
map_type='11point',
is_bbox_normalized=False,
evaluate_difficult=False):
self.class_num = class_num
self.overlap_thresh = overlap_thresh
assert map_type in ['11point', 'integral'], \
"map_type currently only support '11point' "\
"and 'integral'"
self.map_type = map_type
self.is_bbox_normalized = is_bbox_normalized
self.evaluate_difficult = evaluate_difficult
self.reset()
def update(self, bbox, gt_box, gt_label, difficult=None):
"""
Update metric statics from given prediction and ground
truth infomations.
"""
if difficult is None:
difficult = np.zeros_like(gt_label)
# record class gt count
for gtl, diff in zip(gt_label, difficult):
if self.evaluate_difficult or int(diff) == 0:
self.class_gt_counts[int(np.array(gtl))] += 1
# record class score positive
visited = [False] * len(gt_label)
for b in bbox:
label, score, xmin, ymin, xmax, ymax = b.tolist()
pred = [xmin, ymin, xmax, ymax]
max_idx = -1
max_overlap = -1.0
for i, gl in enumerate(gt_label):
if int(gl) == int(label):
overlap = jaccard_overlap(pred, gt_box[i],
self.is_bbox_normalized)
if overlap > max_overlap:
max_overlap = overlap
max_idx = i
if max_overlap > self.overlap_thresh:
if self.evaluate_difficult or \
int(np.array(difficult[max_idx])) == 0:
if not visited[max_idx]:
self.class_score_poss[int(label)].append([score, 1.0])
visited[max_idx] = True
else:
self.class_score_poss[int(label)].append([score, 0.0])
else:
self.class_score_poss[int(label)].append([score, 0.0])
def reset(self):
"""
Reset metric statics
"""
self.class_score_poss = [[] for _ in range(self.class_num)]
self.class_gt_counts = [0] * self.class_num
self.mAP = None
self.APs = [None] * self.class_num
def accumulate(self):
"""
Accumulate metric results and calculate mAP
"""
mAP = 0.
valid_cnt = 0
for id, (score_pos, count) in enumerate(
zip(self.class_score_poss, self.class_gt_counts)):
if count == 0: continue
if len(score_pos) == 0:
valid_cnt += 1
continue
accum_tp_list, accum_fp_list = \
self._get_tp_fp_accum(score_pos)
precision = []
recall = []
for ac_tp, ac_fp in zip(accum_tp_list, accum_fp_list):
precision.append(float(ac_tp) / (ac_tp + ac_fp))
recall.append(float(ac_tp) / count)
if self.map_type == '11point':
max_precisions = [0.] * 11
start_idx = len(precision) - 1
for j in range(10, -1, -1):
for i in range(start_idx, -1, -1):
if recall[i] < float(j) / 10.:
start_idx = i
if j > 0:
max_precisions[j - 1] = max_precisions[j]
break
else:
if max_precisions[j] < precision[i]:
max_precisions[j] = precision[i]
mAP += sum(max_precisions) / 11.
self.APs[id] = sum(max_precisions) / 11.
valid_cnt += 1
elif self.map_type == 'integral':
import math
ap = 0.
prev_recall = 0.
for i in range(len(precision)):
recall_gap = math.fabs(recall[i] - prev_recall)
if recall_gap > 1e-6:
ap += precision[i] * recall_gap
prev_recall = recall[i]
mAP += ap
self.APs[id] = sum(max_precisions) / 11.
valid_cnt += 1
else:
raise Exception("Unspported mAP type {}".format(self.map_type))
self.mAP = mAP / float(valid_cnt) if valid_cnt > 0 else mAP
def get_map(self):
"""
Get mAP result
"""
if self.mAP is None:
raise Exception("mAP is not calculated.")
return self.mAP
def _get_tp_fp_accum(self, score_pos_list):
"""
Calculate accumulating true/false positive results from
[score, pos] records
"""
sorted_list = sorted(score_pos_list, key=lambda s: s[0], reverse=True)
accum_tp = 0
accum_fp = 0
accum_tp_list = []
accum_fp_list = []
for (score, pos) in sorted_list:
accum_tp += int(pos)
accum_tp_list.append(accum_tp)
accum_fp += 1 - int(pos)
accum_fp_list.append(accum_fp)
return accum_tp_list, accum_fp_list
import paddlex
import paddlehub as hub
import os
import os.path as osp
image_pretrain = {
'ResNet18':
'https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_pretrained.tar',
'ResNet34':
'https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_pretrained.tar',
'ResNet50':
'http://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.tar',
'ResNet101':
'http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar',
'ResNet50_vd':
'https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar',
'ResNet101_vd':
'https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar',
'MobileNetV1':
'http://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.tar',
'MobileNetV2_x1.0':
'https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.tar',
'MobileNetV2_x0.5':
'https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_5_pretrained.tar',
'MobileNetV2_x2.0':
'https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x2_0_pretrained.tar',
'MobileNetV2_x0.25':
'https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_25_pretrained.tar',
'MobileNetV2_x1.5':
'https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x1_5_pretrained.tar',
'MobileNetV3_small':
'https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x1_0_pretrained.tar',
'MobileNetV3_large':
'https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_pretrained.tar',
'DarkNet53':
'https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_ImageNet1k_pretrained.tar',
'DenseNet121':
'https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet121_pretrained.tar',
'DenseNet161':
'https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet161_pretrained.tar',
'DenseNet201':
'https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet201_pretrained.tar',
'DetResNet50':
'https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar',
'SegXception41':
'https://paddle-imagenet-models-name.bj.bcebos.com/Xception41_deeplab_pretrained.tar',
'SegXception65':
'https://paddle-imagenet-models-name.bj.bcebos.com/Xception65_deeplab_pretrained.tar',
'ShuffleNetV2':
'https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_pretrained.tar',
}
coco_pretrain = {
'UNet': 'https://paddleseg.bj.bcebos.com/models/unet_coco_v3.tgz'
}
def get_pretrain_weights(flag, model_type, backbone, save_dir):
if flag is None:
return None
elif osp.isdir(flag):
return flag
elif flag == 'IMAGENET':
new_save_dir = save_dir
if hasattr(paddlex, 'pretrain_dir'):
new_save_dir = paddlex.pretrain_dir
if backbone.startswith('Xception'):
backbone = 'Seg{}'.format(backbone)
elif backbone == 'MobileNetV2':
backbone = 'MobileNetV2_x1.0'
if model_type == 'detector':
if backbone == 'ResNet50':
backbone = 'DetResNet50'
assert backbone in image_pretrain, "There is not ImageNet pretrain weights for {}, you may try COCO.".format(
backbone)
try:
hub.download(backbone, save_path=new_save_dir)
except Exception as e:
if isinstance(hub.ResourceNotFoundError):
raise Exception(
"Resource for backbone {} not found".format(backbone))
elif isinstance(hub.ServerConnectionError):
raise Exception(
"Cannot get reource for backbone {}, please check your internet connecgtion"
.format(backbone))
else:
raise Exception(
"Unexpected error, please make sure paddlehub >= 1.6.2")
return osp.join(new_save_dir, backbone)
elif flag == 'COCO':
new_save_dir = save_dir
if hasattr(paddlex, 'pretrain_dir'):
new_save_dir = paddlex.pretrain_dir
assert backbone in coco_pretrain, "There is not COCO pretrain weights for {}, you may try ImageNet.".format(
backbone)
try:
hub.download(backbone, save_path=new_save_dir)
except Exception as e:
if isinstance(hub.ResourceNotFoundError):
raise Exception(
"Resource for backbone {} not found".format(backbone))
elif isinstance(hub.ServerConnectionError):
raise Exception(
"Cannot get reource for backbone {}, please check your internet connecgtion"
.format(backbone))
else:
raise Exception(
"Unexpected error, please make sure paddlehub >= 1.6.2")
return osp.join(new_save_dir, backbone)
else:
raise Exception(
"pretrain_weights need to be defined as directory path or `IMAGENET` or 'COCO' (download pretrain weights automatically)."
)
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import numpy as np
from scipy.sparse import csr_matrix
class ConfusionMatrix(object):
"""
Confusion Matrix for segmentation evaluation
"""
def __init__(self, num_classes=2, streaming=False):
self.confusion_matrix = np.zeros([num_classes, num_classes],
dtype='int64')
self.num_classes = num_classes
self.streaming = streaming
def calculate(self, pred, label, ignore=None):
# If not in streaming mode, clear matrix everytime when call `calculate`
if not self.streaming:
self.zero_matrix()
label = np.transpose(label, (0, 2, 3, 1))
ignore = np.transpose(ignore, (0, 2, 3, 1))
mask = np.array(ignore) == 1
label = np.asarray(label)[mask]
pred = np.asarray(pred)[mask]
one = np.ones_like(pred)
# Accumuate ([row=label, col=pred], 1) into sparse matrix
spm = csr_matrix((one, (label, pred)),
shape=(self.num_classes, self.num_classes))
spm = spm.todense()
self.confusion_matrix += spm
def zero_matrix(self):
""" Clear confusion matrix """
self.confusion_matrix = np.zeros([self.num_classes, self.num_classes],
dtype='int64')
def mean_iou(self):
iou_list = []
avg_iou = 0
# TODO: use numpy sum axis api to simpliy
vji = np.zeros(self.num_classes, dtype=int)
vij = np.zeros(self.num_classes, dtype=int)
for j in range(self.num_classes):
v_j = 0
for i in range(self.num_classes):
v_j += self.confusion_matrix[j][i]
vji[j] = v_j
for i in range(self.num_classes):
v_i = 0
for j in range(self.num_classes):
v_i += self.confusion_matrix[j][i]
vij[i] = v_i
for c in range(self.num_classes):
total = vji[c] + vij[c] - self.confusion_matrix[c][c]
if total == 0:
iou = 0
else:
iou = float(self.confusion_matrix[c][c]) / total
avg_iou += iou
iou_list.append(iou)
avg_iou = float(avg_iou) / float(self.num_classes)
return np.array(iou_list), avg_iou
def accuracy(self):
total = self.confusion_matrix.sum()
total_right = 0
for c in range(self.num_classes):
total_right += self.confusion_matrix[c][c]
if total == 0:
avg_acc = 0
else:
avg_acc = float(total_right) / total
vij = np.zeros(self.num_classes, dtype=int)
for i in range(self.num_classes):
v_i = 0
for j in range(self.num_classes):
v_i += self.confusion_matrix[j][i]
vij[i] = v_i
acc_list = []
for c in range(self.num_classes):
if vij[c] == 0:
acc = 0
else:
acc = self.confusion_matrix[c][c] / float(vij[c])
acc_list.append(acc)
return np.array(acc_list), avg_acc
def kappa(self):
vji = np.zeros(self.num_classes)
vij = np.zeros(self.num_classes)
for j in range(self.num_classes):
v_j = 0
for i in range(self.num_classes):
v_j += self.confusion_matrix[j][i]
vji[j] = v_j
for i in range(self.num_classes):
v_i = 0
for j in range(self.num_classes):
v_i += self.confusion_matrix[j][i]
vij[i] = v_i
total = self.confusion_matrix.sum()
# avoid spillovers
# TODO: is it reasonable to hard code 10000.0?
total = float(total) / 10000.0
vji = vji / 10000.0
vij = vij / 10000.0
tp = 0
tc = 0
for c in range(self.num_classes):
tp += vji[c] * vij[c]
tc += self.confusion_matrix[c][c]
tc = tc / 10000.0
pe = tp / (total * total)
po = tc / total
kappa = (po - pe) / (1 - pe)
return kappa
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import cv2
import numpy as np
from PIL import Image, ImageDraw
def visualize_detection(image, result, threshold=0.5, save_dir=None):
"""
Visualize bbox and mask results
"""
image_name = os.path.split(image)[-1]
image = Image.open(image).convert('RGB')
image = draw_bbox_mask(image, results, threshold=threshold)
if save_dir is not None:
if not os.path.exists(save_dir):
os.makedirs(save_dir)
out_path = os.path.join(save_dir, 'visualize_{}'.format(image_name))
image.save(out_path, quality=95)
else:
return image
def visualize_segmentation(image, result, weight=0.6, save_dir=None):
"""
Convert segment result to color image, and save added image.
Args:
image: the path of origin image
result: the predict result of image
weight: the image weight of visual image, and the result weight is (1 - weight)
save_dir: the directory for saving visual image
"""
label_map = result['label_map']
color_map = get_color_map_list(256)
color_map = np.array(color_map).astype("uint8")
# Use OpenCV LUT for color mapping
c1 = cv2.LUT(label_map, color_map[:, 0])
c2 = cv2.LUT(label_map, color_map[:, 1])
c3 = cv2.LUT(label_map, color_map[:, 2])
pseudo_img = np.dstack((c1, c2, c3))
im = cv2.imread(image)
vis_result = cv2.addWeighted(im, weight, pseudo_img, 1 - weight, 0)
if save_dir is not None:
if not os.path.exists(save_dir):
os.makedirs(save_dir)
image_name = os.path.split(image)[-1]
out_path = os.path.join(save_dir, 'visualize_{}'.format(image_name))
cv2.imwrite(out_path, vis_result)
else:
return vis_result
def get_color_map_list(num_classes):
""" Returns the color map for visualizing the segmentation mask,
which can support arbitrary number of classes.
Args:
num_classes: Number of classes
Returns:
The color map
"""
color_map = num_classes * [0, 0, 0]
for i in range(0, num_classes):
j = 0
lab = i
while lab:
color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
j += 1
lab >>= 3
color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
return color_map
# expand an array of boxes by a given scale.
def expand_boxes(boxes, scale):
"""
"""
w_half = (boxes[:, 2] - boxes[:, 0]) * .5
h_half = (boxes[:, 3] - boxes[:, 1]) * .5
x_c = (boxes[:, 2] + boxes[:, 0]) * .5
y_c = (boxes[:, 3] + boxes[:, 1]) * .5
w_half *= scale
h_half *= scale
boxes_exp = np.zeros(boxes.shape)
boxes_exp[:, 0] = x_c - w_half
boxes_exp[:, 2] = x_c + w_half
boxes_exp[:, 1] = y_c - h_half
boxes_exp[:, 3] = y_c + h_half
return boxes_exp
def clip_bbox(bbox):
xmin = max(min(bbox[0], 1.), 0.)
ymin = max(min(bbox[1], 1.), 0.)
xmax = max(min(bbox[2], 1.), 0.)
ymax = max(min(bbox[3], 1.), 0.)
return xmin, ymin, xmax, ymax
def draw_bbox_mask(image, results, threshold=0.5, alpha=0.7):
labels = list()
for dt in np.array(results):
if dt['category'] not in labels:
labels.append(dt['category'])
color_map = get_color_map_list(len(labels))
for dt in np.array(results):
cname, bbox, score = dt['category'], dt['bbox'], dt['score']
if score < threshold:
continue
xmin, ymin, w, h = bbox
xmax = xmin + w
ymax = ymin + h
color = tuple(color_map[labels.index(cname)])
# draw bbox
draw = ImageDraw.Draw(image)
draw.line([(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
(xmin, ymin)],
width=2,
fill=color)
# draw label
text = "{} {:.2f}".format(cname, score)
tw, th = draw.textsize(text)
draw.rectangle([(xmin + 1, ymin - th), (xmin + tw + 1, ymin)],
fill=color)
draw.text((xmin + 1, ymin - th), text, fill=(255, 255, 255))
# draw mask
if 'mask' in dt:
mask = dt['mask']
color_mask = np.array(color_map[labels.index(
dt['category'])]).astype('float32')
img_array = np.array(image).astype('float32')
idx = np.nonzero(mask)
img_array[idx[0], idx[1], :] *= 1.0 - alpha
img_array[idx[0], idx[1], :] += alpha * color_mask
image = Image.fromarray(img_array.astype('uint8'))
return image
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
import math
import tqdm
import os.path as osp
import numpy as np
import paddle.fluid as fluid
import paddlex.utils.logging as logging
import paddlex
from .base import BaseAPI
from collections import OrderedDict
from .utils.detection_eval import eval_results, bbox2out
import copy
class YOLOv3(BaseAPI):
"""构建YOLOv3,并实现其训练、评估、预测和模型导出。
Args:
num_classes (int): 类别数。默认为80。
backbone (str): YOLOv3的backbone网络,取值范围为['DarkNet53',
'ResNet34', 'MobileNetV1', 'MobileNetV3_large']。默认为'MobileNetV1'。
anchors (list|tuple): anchor框的宽度和高度,为None时表示使用默认值
[[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
[59, 119], [116, 90], [156, 198], [373, 326]]。
anchor_masks (list|tuple): 在计算YOLOv3损失时,使用anchor的mask索引,为None时表示使用默认值
[[6, 7, 8], [3, 4, 5], [0, 1, 2]]。
ignore_threshold (float): 在计算YOLOv3损失时,IoU大于`ignore_threshold`的预测框的置信度被忽略。默认为0.7。
nms_score_threshold (float): 检测框的置信度得分阈值,置信度得分低于阈值的框应该被忽略。默认为0.01。
nms_topk (int): 进行NMS时,根据置信度保留的最大检测框数。默认为1000。
nms_keep_topk (int): 进行NMS后,每个图像要保留的总检测框数。默认为100。
nms_iou_threshold (float): 进行NMS时,用于剔除检测框IOU的阈值。默认为0.45。
label_smooth (bool): 是否使用label smooth。默认值为False。
train_random_shapes (list|tuple): 训练时从列表中随机选择图像大小。默认值为[320, 352, 384, 416, 448, 480, 512, 544, 576, 608]。
"""
def __init__(self,
num_classes=80,
backbone='MobileNetV1',
anchors=None,
anchor_masks=None,
ignore_threshold=0.7,
nms_score_threshold=0.01,
nms_topk=1000,
nms_keep_topk=100,
nms_iou_threshold=0.45,
label_smooth=False,
train_random_shapes=[
320, 352, 384, 416, 448, 480, 512, 544, 576, 608
]):
self.init_params = locals()
super(YOLOv3, self).__init__('detector')
backbones = [
'DarkNet53', 'ResNet34', 'MobileNetV1', 'MobileNetV3_large'
]
assert backbone in backbones, "backbone should be one of {}".format(
backbones)
self.backbone = backbone
self.num_classes = num_classes
self.anchors = anchors
self.anchor_masks = anchor_masks
self.ignore_threshold = ignore_threshold
self.nms_score_threshold = nms_score_threshold
self.nms_topk = nms_topk
self.nms_keep_topk = nms_keep_topk
self.nms_iou_threshold = nms_iou_threshold
self.label_smooth = label_smooth
self.sync_bn = True
self.train_random_shapes = train_random_shapes
def _get_backbone(self, backbone_name):
if backbone_name == 'DarkNet53':
backbone = paddlex.cv.nets.DarkNet(norm_type='sync_bn')
elif backbone_name == 'ResNet34':
backbone = paddlex.cv.nets.ResNet(
norm_type='sync_bn',
layers=34,
freeze_norm=False,
norm_decay=0.,
feature_maps=[3, 4, 5],
freeze_at=0)
elif backbone_name == 'MobileNetV1':
backbone = paddlex.cv.nets.MobileNetV1(norm_type='sync_bn')
elif backbone_name.startswith('MobileNetV3'):
models_name = backbone_name.split('_')[1]
backbone = paddlex.cv.nets.MobileNetV3(
norm_type='sync_bn', models_name=models_name)
return backbone
def build_net(self, mode='train'):
model = paddlex.cv.nets.detection.YOLOv3(
backbone=self._get_backbone(self.backbone),
num_classes=self.num_classes,
mode=mode,
anchors=self.anchors,
anchor_masks=self.anchor_masks,
ignore_threshold=self.ignore_threshold,
label_smooth=self.label_smooth,
nms_score_threshold=self.nms_score_threshold,
nms_topk=self.nms_topk,
nms_keep_topk=self.nms_keep_topk,
nms_iou_threshold=self.nms_iou_threshold,
train_random_shapes=self.train_random_shapes)
inputs = model.generate_inputs()
model_out = model.build_net(inputs)
outputs = OrderedDict([('bbox', model_out)])
if mode == 'train':
self.optimizer.minimize(model_out)
outputs = OrderedDict([('loss', model_out)])
return inputs, outputs
def default_optimizer(self, learning_rate, warmup_steps, warmup_start_lr,
lr_decay_epochs, lr_decay_gamma,
num_steps_each_epoch):
if warmup_steps > lr_decay_epochs[0] * num_steps_each_epoch:
raise Exception("warmup_steps should less than {}".format(
lr_decay_epochs[0] * num_steps_each_epoch))
boundaries = [b * num_steps_each_epoch for b in lr_decay_epochs]
values = [(lr_decay_gamma**i) * learning_rate
for i in range(len(lr_decay_epochs) + 1)]
lr_decay = fluid.layers.piecewise_decay(
boundaries=boundaries, values=values)
lr_warmup = fluid.layers.linear_lr_warmup(
learning_rate=lr_decay,
warmup_steps=warmup_steps,
start_lr=warmup_start_lr,
end_lr=learning_rate)
optimizer = fluid.optimizer.Momentum(
learning_rate=lr_warmup,
momentum=0.9,
regularization=fluid.regularizer.L2DecayRegularizer(5e-04))
return optimizer
def train(self,
num_epochs,
train_dataset,
train_batch_size=8,
eval_dataset=None,
save_interval_epochs=20,
log_interval_steps=2,
save_dir='output',
pretrain_weights='IMAGENET',
optimizer=None,
learning_rate=1.0 / 8000,
warmup_steps=1000,
warmup_start_lr=0.0,
lr_decay_epochs=[213, 240],
lr_decay_gamma=0.1,
metric=None,
use_vdl=False,
sensitivities_file=None,
eval_metric_loss=0.05):
"""训练。
Args:
num_epochs (int): 训练迭代轮数。
train_dataset (paddlex.datasets): 训练数据读取器。
train_batch_size (int): 训练数据batch大小。目前检测仅支持单卡评估,训练数据batch大小与显卡
数量之商为验证数据batch大小。默认值为8。
eval_dataset (paddlex.datasets): 验证数据读取器。
save_interval_epochs (int): 模型保存间隔(单位:迭代轮数)。默认为20。
log_interval_steps (int): 训练日志输出间隔(单位:迭代次数)。默认为10。
save_dir (str): 模型保存路径。默认值为'output'。
pretrain_weights (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',
则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认为None。
optimizer (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:
fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
learning_rate (float): 默认优化器的学习率。默认为1.0/8000。
warmup_steps (int): 默认优化器进行warmup过程的步数。默认为1000。
warmup_start_lr (int): 默认优化器warmup的起始学习率。默认为0.0。
lr_decay_epochs (list): 默认优化器的学习率衰减轮数。默认为[213, 240]。
lr_decay_gamma (float): 默认优化器的学习率衰减率。默认为0.1。
metric (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认值为None。
use_vdl (bool): 是否使用VisualDL进行可视化。默认值为False。
sensitivities_file (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',
则自动下载在ImageNet图片数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
eval_metric_loss (float): 可容忍的精度损失。默认为0.05。
Raises:
ValueError: 评估类型不在指定列表中。
ValueError: 模型从inference model进行加载。
"""
if not self.trainable:
raise ValueError(
"Model is not trainable since it was loaded from a inference model."
)
if metric is None:
if isinstance(train_dataset, paddlex.datasets.CocoDetection):
metric = 'COCO'
elif isinstance(train_dataset, paddlex.datasets.VOCDetection):
metric = 'VOC'
else:
raise ValueError(
"train_dataset should be datasets.VOCDetection or datasets.COCODetection."
)
assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
self.metric = metric
self.labels = train_dataset.labels
# 构建训练网络
if optimizer is None:
# 构建默认的优化策略
num_steps_each_epoch = train_dataset.num_samples // train_batch_size
optimizer = self.default_optimizer(
learning_rate=learning_rate,
warmup_steps=warmup_steps,
warmup_start_lr=warmup_start_lr,
lr_decay_epochs=lr_decay_epochs,
lr_decay_gamma=lr_decay_gamma,
num_steps_each_epoch=num_steps_each_epoch)
self.optimizer = optimizer
# 构建训练、验证、预测网络
self.build_program()
# 初始化网络权重
self.net_initialize(
startup_prog=fluid.default_startup_program(),
pretrain_weights=pretrain_weights,
save_dir=save_dir,
sensitivities_file=sensitivities_file,
eval_metric_loss=eval_metric_loss)
# 训练
self.train_loop(
num_epochs=num_epochs,
train_dataset=train_dataset,
train_batch_size=train_batch_size,
eval_dataset=eval_dataset,
save_interval_epochs=save_interval_epochs,
log_interval_steps=log_interval_steps,
save_dir=save_dir,
use_vdl=use_vdl)
def evaluate(self,
eval_dataset,
batch_size=1,
epoch_id=None,
metric=None,
return_details=False):
"""评估。
Args:
eval_dataset (paddlex.datasets): 验证数据读取器。
batch_size (int): 验证数据批大小。默认为1。
epoch_id (int): 当前评估模型所在的训练轮数。
metric (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认为None,
根据用户传入的Dataset自动选择,如为VOCDetection,则metric为'VOC';
如为COCODetection,则metric为'COCO'。
return_details (bool): 是否返回详细信息。
Returns:
tuple (metrics, eval_details) | dict (metrics): 当return_details为True时,返回(metrics, eval_details),
当return_details为False时,返回metrics。metrics为dict,包含关键字:'bbox_mmap'或者’bbox_map‘,
分别表示平均准确率平均值在各个IoU阈值下的结果取平均值的结果(mmAP)、平均准确率平均值(mAP)。
eval_details为dict,包含关键字:'bbox',对应元素预测结果列表,每个预测结果由图像id、
预测框类别id、预测框坐标、预测框得分;’gt‘:真实标注框相关信息。
"""
self.arrange_transforms(
transforms=eval_dataset.transforms, mode='eval')
if metric is None:
if hasattr(self, 'metric') and self.metric is not None:
metric = self.metric
else:
if isinstance(eval_dataset, paddlex.datasets.CocoDetection):
metric = 'COCO'
elif isinstance(eval_dataset, paddlex.datasets.VOCDetection):
metric = 'VOC'
else:
raise Exception(
"eval_dataset should be datasets.VOCDetection or datasets.COCODetection."
)
assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
total_steps = math.ceil(eval_dataset.num_samples * 1.0 / batch_size)
results = list()
data_generator = eval_dataset.generator(
batch_size=batch_size, drop_last=False)
logging.info(
"Start to evaluating(total_samples={}, total_steps={})...".format(
eval_dataset.num_samples, total_steps))
for step, data in tqdm.tqdm(
enumerate(data_generator()), total=total_steps):
images = np.array([d[0] for d in data])
im_sizes = np.array([d[1] for d in data])
feed_data = {'image': images, 'im_size': im_sizes}
outputs = self.exe.run(
self.test_prog,
feed=[feed_data],
fetch_list=list(self.test_outputs.values()),
return_numpy=False)
res = {
'bbox': (np.array(outputs[0]),
outputs[0].recursive_sequence_lengths())
}
res_id = [np.array([d[2]]) for d in data]
res['im_id'] = (res_id, [])
if metric == 'VOC':
res_gt_box = [d[3].reshape(-1, 4) for d in data]
res_gt_label = [d[4].reshape(-1, 1) for d in data]
res_is_difficult = [d[5].reshape(-1, 1) for d in data]
res_id = [np.array([d[2]]) for d in data]
res['gt_box'] = (res_gt_box, [])
res['gt_label'] = (res_gt_label, [])
res['is_difficult'] = (res_is_difficult, [])
results.append(res)
logging.debug("[EVAL] Epoch={}, Step={}/{}".format(
epoch_id, step + 1, total_steps))
box_ap_stats, eval_details = eval_results(
results, metric, eval_dataset.coco_gt, with_background=False)
evaluate_metrics = OrderedDict(
zip(['bbox_mmap' if metric == 'COCO' else 'bbox_map'],
box_ap_stats))
if return_details:
return evaluate_metrics, eval_details
return evaluate_metrics
def predict(self, img_file, transforms=None):
"""预测。
Args:
img_file (str): 预测图像路径。
transforms (paddlex.det.transforms): 数据预处理操作。
Returns:
list: 预测结果列表,每个预测结果由预测框类别标签、
预测框类别名称、预测框坐标、预测框得分组成。
"""
if transforms is None and not hasattr(self, 'test_transforms'):
raise Exception("transforms need to be defined, now is None.")
if transforms is not None:
self.arrange_transforms(transforms=transforms, mode='test')
im, im_size = transforms(img_file)
else:
self.arrange_transforms(
transforms=self.test_transforms, mode='test')
im, im_size = self.test_transforms(img_file)
im = np.expand_dims(im, axis=0)
im_size = np.expand_dims(im_size, axis=0)
outputs = self.exe.run(
self.test_prog,
feed={
'image': im,
'im_size': im_size
},
fetch_list=list(self.test_outputs.values()),
return_numpy=False)
res = {
k: (np.array(v), v.recursive_sequence_lengths())
for k, v in zip(list(self.test_outputs.keys()), outputs)
}
res['im_id'] = (np.array([[0]]).astype('int32'), [])
clsid2catid = dict({i: i for i in range(self.num_classes)})
xywh_results = bbox2out([res], clsid2catid)
results = list()
for xywh_res in xywh_results:
del xywh_res['image_id']
xywh_res['category'] = self.labels[xywh_res['category_id']]
results.append(xywh_res)
return results
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .resnet import ResNet
from .darknet import DarkNet
from .detection import FasterRCNN
from .mobilenet_v1 import MobileNetV1
from .mobilenet_v2 import MobileNetV2
from .mobilenet_v3 import MobileNetV3
from .segmentation import UNet
from .segmentation import DeepLabv3p
from .xception import Xception
from .densenet import DenseNet
from .shufflenet_v2 import ShuffleNetV2
def resnet18(input, num_classes=1000):
model = ResNet(layers=18, num_classes=num_classes)
return model(input)
def resnet34(input, num_classes=1000):
model = ResNet(layers=34, num_classes=num_classes)
return model(input)
def resnet50(input, num_classes=1000):
model = ResNet(layers=50, num_classes=num_classes)
return model(input)
def resnet101(input, num_classes=1000):
model = ResNet(layers=101, num_classes=num_classes)
return model(input)
def resnet50_vd(input, num_classes=1000):
model = ResNet(layers=50, num_classes=num_classes, variant='d')
return model(input)
def resnet101_vd(input, num_classes=1000):
model = ResNet(layers=101, num_classes=num_classes, variant='d')
return model(input)
def darknet53(input, num_classes=1000):
model = DarkNet(depth=53, num_classes=num_classes, bn_act='relu')
return model(input)
def mobilenetv1(input, num_classes=1000):
model = MobileNetV1(num_classes=num_classes)
return model(input)
def mobilenetv2(input, num_classes=1000):
model = MobileNetV2(num_classes=num_classes)
return model(input)
def mobilenetv3_small(input, num_classes=1000):
model = MobileNetV3(num_classes=num_classes, model_name='small')
return model(input)
def mobilenetv3_large(input, num_classes=1000):
model = MobileNetV3(num_classes=num_classes, model_name='large')
return model(input)
def xception65(input, num_classes=1000):
model = Xception(layers=65, num_classes=num_classes)
return model(input)
def xception71(input, num_classes=1000):
model = Xception(layers=71, num_classes=num_classes)
return model(input)
def xception41(input, num_classes=1000):
model = Xception(layers=41, num_classes=num_classes)
return model(input)
def densenet121(input, num_classes=1000):
model = DenseNet(layers=121, num_classes=num_classes)
return model(input)
def densenet161(input, num_classes=1000):
model = DenseNet(layers=161, num_classes=num_classes)
return model(input)
def densenet201(input, num_classes=1000):
model = DenseNet(layers=201, num_classes=num_classes)
return model(input)
def shufflenetv2(input, num_classes=1000):
model = ShuffleNetV2(num_classes=num_classes)
return model(input)
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
class NameAdapter(object):
"""Fix the backbones variable names for pretrained weight"""
def __init__(self, model):
super(NameAdapter, self).__init__()
self.model = model
@property
def model_type(self):
return getattr(self.model, '_model_type', '')
@property
def variant(self):
return getattr(self.model, 'variant', '')
def fix_conv_norm_name(self, name):
if name == "conv1":
bn_name = "bn_" + name
else:
bn_name = "bn" + name[3:]
# the naming rule is same as pretrained weight
if self.model_type == 'SEResNeXt':
bn_name = name + "_bn"
return bn_name
def fix_shortcut_name(self, name):
if self.model_type == 'SEResNeXt':
name = 'conv' + name + '_prj'
return name
def fix_bottleneck_name(self, name):
if self.model_type == 'SEResNeXt':
conv_name1 = 'conv' + name + '_x1'
conv_name2 = 'conv' + name + '_x2'
conv_name3 = 'conv' + name + '_x3'
shortcut_name = name
else:
conv_name1 = name + "_branch2a"
conv_name2 = name + "_branch2b"
conv_name3 = name + "_branch2c"
shortcut_name = name + "_branch1"
return conv_name1, conv_name2, conv_name3, shortcut_name
def fix_layer_warp_name(self, stage_num, count, i):
name = 'res' + str(stage_num)
if count > 10 and stage_num == 4:
if i == 0:
conv_name = name + "a"
else:
conv_name = name + "b" + str(i)
else:
conv_name = name + chr(ord("a") + i)
if self.model_type == 'SEResNeXt':
conv_name = str(stage_num + 2) + '_' + str(i + 1)
return conv_name
def fix_c1_stage_name(self):
return "res_conv1" if self.model_type == 'ResNeXt' else "conv1"
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import six
import math
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.regularizer import L2Decay
class DarkNet(object):
"""
DarkNet, see https://pjreddie.com/darknet/yolo/
Args:
depth (int): network depth, currently only darknet 53 is supported
norm_type (str): normalization type, 'bn' and 'sync_bn' are supported
norm_decay (float): weight decay for normalization layer weights
"""
def __init__(self,
depth=53,
num_classes=None,
norm_type='bn',
norm_decay=0.,
bn_act='leaky',
weight_prefix_name=''):
assert depth in [53], "unsupported depth value"
self.depth = depth
self.num_classes = num_classes
self.norm_type = norm_type
self.norm_decay = norm_decay
self.depth_cfg = {53: ([1, 2, 8, 8, 4], self.basicblock)}
self.bn_act = bn_act
self.prefix_name = weight_prefix_name
def _conv_norm(self,
input,
ch_out,
filter_size,
stride,
padding,
act='leaky',
name=None):
conv = fluid.layers.conv2d(
input=input,
num_filters=ch_out,
filter_size=filter_size,
stride=stride,
padding=padding,
act=None,
param_attr=ParamAttr(name=name + ".conv.weights"),
bias_attr=False)
bn_name = name + ".bn"
bn_param_attr = ParamAttr(
regularizer=L2Decay(float(self.norm_decay)),
name=bn_name + '.scale')
bn_bias_attr = ParamAttr(
regularizer=L2Decay(float(self.norm_decay)),
name=bn_name + '.offset')
out = fluid.layers.batch_norm(
input=conv,
param_attr=bn_param_attr,
bias_attr=bn_bias_attr,
moving_mean_name=bn_name + '.mean',
moving_variance_name=bn_name + '.var')
# leaky relu here has `alpha` as 0.1, can not be set by
# `act` param in fluid.layers.batch_norm above.
if act == 'leaky':
out = fluid.layers.leaky_relu(x=out, alpha=0.1)
if act == 'relu':
out = fluid.layers.relu(x=out)
return out
def _downsample(self,
input,
ch_out,
filter_size=3,
stride=2,
padding=1,
name=None):
return self._conv_norm(
input,
ch_out=ch_out,
filter_size=filter_size,
stride=stride,
padding=padding,
act=self.bn_act,
name=name)
def basicblock(self, input, ch_out, name=None):
conv1 = self._conv_norm(
input,
ch_out=ch_out,
filter_size=1,
stride=1,
padding=0,
act=self.bn_act,
name=name + ".0")
conv2 = self._conv_norm(
conv1,
ch_out=ch_out * 2,
filter_size=3,
stride=1,
padding=1,
act=self.bn_act,
name=name + ".1")
out = fluid.layers.elementwise_add(x=input, y=conv2, act=None)
return out
def layer_warp(self, block_func, input, ch_out, count, name=None):
out = block_func(input, ch_out=ch_out, name='{}.0'.format(name))
for j in six.moves.xrange(1, count):
out = block_func(out, ch_out=ch_out, name='{}.{}'.format(name, j))
return out
def __call__(self, input):
"""
Get the backbone of DarkNet, that is output for the 5 stages.
Args:
input (Variable): input variable.
Returns:
The last variables of each stage.
"""
stages, block_func = self.depth_cfg[self.depth]
stages = stages[0:5]
conv = self._conv_norm(
input=input,
ch_out=32,
filter_size=3,
stride=1,
padding=1,
act=self.bn_act,
name=self.prefix_name + "yolo_input")
downsample_ = self._downsample(
input=conv,
ch_out=conv.shape[1] * 2,
name=self.prefix_name + "yolo_input.downsample")
blocks = []
for i, stage in enumerate(stages):
block = self.layer_warp(
block_func=block_func,
input=downsample_,
ch_out=32 * 2**i,
count=stage,
name=self.prefix_name + "stage.{}".format(i))
blocks.append(block)
if i < len(stages) - 1: # do not downsaple in the last stage
downsample_ = self._downsample(
input=block,
ch_out=block.shape[1] * 2,
name=self.prefix_name + "stage.{}.downsample".format(i))
if self.num_classes is not None:
pool = fluid.layers.pool2d(
input=blocks[-1], pool_type='avg', global_pooling=True)
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(
input=pool,
size=self.num_classes,
param_attr=ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv),
name='fc_weights'),
bias_attr=ParamAttr(name='fc_offset'))
return out
return blocks
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle
import paddle.fluid as fluid
import math
from paddle.fluid.param_attr import ParamAttr
__all__ = ["DenseNet"]
class DenseNet(object):
def __init__(self, layers=121, bn_size=4, dropout=0, num_classes=None):
assert layers in [121, 161, 169, 201, 264], \
"supported layers are {} but input layer is {}".format(supported_layers, layers)
self.layers = layers
self.bn_size = bn_size
self.dropout = dropout
self.num_classes = num_classes
def __call__(self, input):
layers = self.layers
densenet_spec = {
121: (64, 32, [6, 12, 24, 16]),
161: (96, 48, [6, 12, 36, 24]),
169: (64, 32, [6, 12, 32, 32]),
201: (64, 32, [6, 12, 48, 32]),
264: (64, 32, [6, 12, 64, 48])
}
num_init_features, growth_rate, block_config = densenet_spec[layers]
conv = fluid.layers.conv2d(
input=input,
num_filters=num_init_features,
filter_size=7,
stride=2,
padding=3,
act=None,
param_attr=ParamAttr(name="conv1_weights"),
bias_attr=False)
conv = fluid.layers.batch_norm(
input=conv,
act='relu',
param_attr=ParamAttr(name='conv1_bn_scale'),
bias_attr=ParamAttr(name='conv1_bn_offset'),
moving_mean_name='conv1_bn_mean',
moving_variance_name='conv1_bn_variance')
conv = fluid.layers.pool2d(
input=conv,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
num_features = num_init_features
for i, num_layers in enumerate(block_config):
conv = self.make_dense_block(
conv,
num_layers,
self.bn_size,
growth_rate,
self.dropout,
name='conv' + str(i + 2))
num_features = num_features + num_layers * growth_rate
if i != len(block_config) - 1:
conv = self.make_transition(
conv, num_features // 2, name='conv' + str(i + 2) + '_blk')
num_features = num_features // 2
conv = fluid.layers.batch_norm(
input=conv,
act='relu',
param_attr=ParamAttr(name='conv5_blk_bn_scale'),
bias_attr=ParamAttr(name='conv5_blk_bn_offset'),
moving_mean_name='conv5_blk_bn_mean',
moving_variance_name='conv5_blk_bn_variance')
if self.num_classes:
conv = fluid.layers.pool2d(
input=conv, pool_type='avg', global_pooling=True)
stdv = 1.0 / math.sqrt(conv.shape[1] * 1.0)
out = fluid.layers.fc(
input=conv,
size=self.num_classes,
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv),
name="fc_weights"),
bias_attr=ParamAttr(name='fc_offset'))
return out
def make_transition(self, input, num_output_features, name=None):
bn_ac = fluid.layers.batch_norm(
input,
act='relu',
param_attr=ParamAttr(name=name + '_bn_scale'),
bias_attr=ParamAttr(name + '_bn_offset'),
moving_mean_name=name + '_bn_mean',
moving_variance_name=name + '_bn_variance')
bn_ac_conv = fluid.layers.conv2d(
input=bn_ac,
num_filters=num_output_features,
filter_size=1,
stride=1,
act=None,
bias_attr=False,
param_attr=ParamAttr(name=name + "_weights"))
pool = fluid.layers.pool2d(
input=bn_ac_conv, pool_size=2, pool_stride=2, pool_type='avg')
return pool
def make_dense_block(self,
input,
num_layers,
bn_size,
growth_rate,
dropout,
name=None):
conv = input
for layer in range(num_layers):
conv = self.make_dense_layer(
conv,
growth_rate,
bn_size,
dropout,
name=name + '_' + str(layer + 1))
return conv
def make_dense_layer(self, input, growth_rate, bn_size, dropout,
name=None):
bn_ac = fluid.layers.batch_norm(
input,
act='relu',
param_attr=ParamAttr(name=name + '_x1_bn_scale'),
bias_attr=ParamAttr(name + '_x1_bn_offset'),
moving_mean_name=name + '_x1_bn_mean',
moving_variance_name=name + '_x1_bn_variance')
bn_ac_conv = fluid.layers.conv2d(
input=bn_ac,
num_filters=bn_size * growth_rate,
filter_size=1,
stride=1,
act=None,
bias_attr=False,
param_attr=ParamAttr(name=name + "_x1_weights"))
bn_ac = fluid.layers.batch_norm(
bn_ac_conv,
act='relu',
param_attr=ParamAttr(name=name + '_x2_bn_scale'),
bias_attr=ParamAttr(name + '_x2_bn_offset'),
moving_mean_name=name + '_x2_bn_mean',
moving_variance_name=name + '_x2_bn_variance')
bn_ac_conv = fluid.layers.conv2d(
input=bn_ac,
num_filters=growth_rate,
filter_size=3,
stride=1,
padding=1,
act=None,
bias_attr=False,
param_attr=ParamAttr(name=name + "_x2_weights"))
if dropout:
bn_ac_conv = fluid.layers.dropout(
x=bn_ac_conv, dropout_prob=dropout)
bn_ac_conv = fluid.layers.concat([input, bn_ac_conv], axis=1)
return bn_ac_conv
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .yolo_v3 import YOLOv3
from .faster_rcnn import FasterRCNN
from .mask_rcnn import MaskRCNN
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Normal, Xavier
from paddle.fluid.regularizer import L2Decay
from paddle.fluid.initializer import MSRA
__all__ = ['BBoxHead', 'TwoFCHead']
class TwoFCHead(object):
"""
RCNN head with two Fully Connected layers
Args:
mlp_dim (int): num of filters for the fc layers
"""
def __init__(self, mlp_dim=1024):
super(TwoFCHead, self).__init__()
self.mlp_dim = mlp_dim
def __call__(self, roi_feat):
fan = roi_feat.shape[1] * roi_feat.shape[2] * roi_feat.shape[3]
fc6 = fluid.layers.fc(
input=roi_feat,
size=self.mlp_dim,
act='relu',
name='fc6',
param_attr=ParamAttr(
name='fc6_w', initializer=Xavier(fan_out=fan)),
bias_attr=ParamAttr(
name='fc6_b', learning_rate=2., regularizer=L2Decay(0.)))
head_feat = fluid.layers.fc(
input=fc6,
size=self.mlp_dim,
act='relu',
name='fc7',
param_attr=ParamAttr(name='fc7_w', initializer=Xavier()),
bias_attr=ParamAttr(
name='fc7_b', learning_rate=2., regularizer=L2Decay(0.)))
return head_feat
class BBoxHead(object):
def __init__(
self,
head,
#box_coder
prior_box_var=[0.1, 0.1, 0.2, 0.2],
code_type='decode_center_size',
box_normalized=False,
axis=1,
#MultiClassNMS
score_threshold=.05,
nms_top_k=-1,
keep_top_k=100,
nms_threshold=.5,
normalized=False,
nms_eta=1.0,
background_label=0,
#bbox_loss
sigma=1.0,
num_classes=81):
super(BBoxHead, self).__init__()
self.head = head
self.prior_box_var = prior_box_var
self.code_type = code_type
self.box_normalized = box_normalized
self.axis = axis
self.score_threshold = score_threshold
self.nms_top_k = nms_top_k
self.keep_top_k = keep_top_k
self.nms_threshold = nms_threshold
self.normalized = normalized
self.nms_eta = nms_eta
self.background_label = background_label
self.sigma = sigma
self.num_classes = num_classes
self.head_feat = None
def get_head_feat(self, input=None):
"""
Get the bbox head feature map.
"""
if input is not None:
feat = self.head(input)
if isinstance(feat, OrderedDict):
feat = list(feat.values())[0]
self.head_feat = feat
return self.head_feat
def _get_output(self, roi_feat):
"""
Get bbox head output.
Args:
roi_feat (Variable): RoI feature from RoIExtractor.
Returns:
cls_score(Variable): Output of rpn head with shape of
[N, num_anchors, H, W].
bbox_pred(Variable): Output of rpn head with shape of
[N, num_anchors * 4, H, W].
"""
head_feat = self.get_head_feat(roi_feat)
# when ResNetC5 output a single feature map
if not isinstance(self.head, TwoFCHead):
head_feat = fluid.layers.pool2d(
head_feat, pool_type='avg', global_pooling=True)
cls_score = fluid.layers.fc(
input=head_feat,
size=self.num_classes,
act=None,
name='cls_score',
param_attr=ParamAttr(
name='cls_score_w', initializer=Normal(loc=0.0, scale=0.01)),
bias_attr=ParamAttr(
name='cls_score_b', learning_rate=2., regularizer=L2Decay(0.)))
bbox_pred = fluid.layers.fc(
input=head_feat,
size=4 * self.num_classes,
act=None,
name='bbox_pred',
param_attr=ParamAttr(
name='bbox_pred_w', initializer=Normal(loc=0.0, scale=0.001)),
bias_attr=ParamAttr(
name='bbox_pred_b', learning_rate=2., regularizer=L2Decay(0.)))
return cls_score, bbox_pred
def get_loss(self, roi_feat, labels_int32, bbox_targets,
bbox_inside_weights, bbox_outside_weights):
"""
Get bbox_head loss.
Args:
roi_feat (Variable): RoI feature from RoIExtractor.
labels_int32(Variable): Class label of a RoI with shape [P, 1].
P is the number of RoI.
bbox_targets(Variable): Box label of a RoI with shape
[P, 4 * class_nums].
bbox_inside_weights(Variable): Indicates whether a box should
contribute to loss. Same shape as bbox_targets.
bbox_outside_weights(Variable): Indicates whether a box should
contribute to loss. Same shape as bbox_targets.
Return:
Type: Dict
loss_cls(Variable): bbox_head loss.
loss_bbox(Variable): bbox_head loss.
"""
cls_score, bbox_pred = self._get_output(roi_feat)
labels_int64 = fluid.layers.cast(x=labels_int32, dtype='int64')
labels_int64.stop_gradient = True
loss_cls = fluid.layers.softmax_with_cross_entropy(
logits=cls_score, label=labels_int64, numeric_stable_mode=True)
loss_cls = fluid.layers.reduce_mean(loss_cls)
loss_bbox = fluid.layers.smooth_l1(
x=bbox_pred,
y=bbox_targets,
inside_weight=bbox_inside_weights,
outside_weight=bbox_outside_weights,
sigma=self.sigma)
loss_bbox = fluid.layers.reduce_mean(loss_bbox)
return {'loss_cls': loss_cls, 'loss_bbox': loss_bbox}
def get_prediction(self,
roi_feat,
rois,
im_info,
im_shape,
return_box_score=False):
"""
Get prediction bounding box in test stage.
Args:
roi_feat (Variable): RoI feature from RoIExtractor.
rois (Variable): Output of generate_proposals in rpn head.
im_info (Variable): A 2-D LoDTensor with shape [B, 3]. B is the
number of input images, each element consists of im_height,
im_width, im_scale.
im_shape (Variable): Actual shape of original image with shape
[B, 3]. B is the number of images, each element consists of
original_height, original_width, 1
Returns:
pred_result(Variable): Prediction result with shape [N, 6]. Each
row has 6 values: [label, confidence, xmin, ymin, xmax, ymax].
N is the total number of prediction.
"""
cls_score, bbox_pred = self._get_output(roi_feat)
im_scale = fluid.layers.slice(im_info, [1], starts=[2], ends=[3])
im_scale = fluid.layers.sequence_expand(im_scale, rois)
boxes = rois / im_scale
cls_prob = fluid.layers.softmax(cls_score, use_cudnn=False)
bbox_pred = fluid.layers.reshape(bbox_pred, (-1, self.num_classes, 4))
decoded_box = fluid.layers.box_coder(
prior_box=boxes,
target_box=bbox_pred,
prior_box_var=self.prior_box_var,
code_type=self.code_type,
box_normalized=self.box_normalized,
axis=self.axis)
cliped_box = fluid.layers.box_clip(input=decoded_box, im_info=im_shape)
if return_box_score:
return {'bbox': cliped_box, 'score': cls_prob}
pred_result = fluid.layers.multiclass_nms(
bboxes=cliped_box,
scores=cls_prob,
score_threshold=self.score_threshold,
nms_top_k=self.nms_top_k,
keep_top_k=self.keep_top_k,
nms_threshold=self.nms_threshold,
normalized=self.normalized,
nms_eta=self.nms_eta,
background_label=self.background_label)
return {'bbox': pred_result}
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
import copy
from paddle import fluid
from .fpn import FPN
from .rpn_head import (RPNHead, FPNRPNHead)
from .roi_extractor import (RoIAlign, FPNRoIAlign)
from .bbox_head import (BBoxHead, TwoFCHead)
from ..resnet import ResNetC5
__all__ = ['FasterRCNN']
class FasterRCNN(object):
"""
Faster R-CNN architecture, see https://arxiv.org/abs/1506.01497
Args:
backbone (object): backbone instance
rpn_head (object): `RPNhead` instance
roi_extractor (object): ROI extractor instance
bbox_head (object): `BBoxHead` instance
fpn (object): feature pyramid network instance
"""
def __init__(
self,
backbone,
mode='train',
num_classes=81,
with_fpn=False,
fpn=None,
#rpn_head
rpn_only=False,
rpn_head=None,
anchor_sizes=[32, 64, 128, 256, 512],
aspect_ratios=[0.5, 1.0, 2.0],
rpn_batch_size_per_im=256,
rpn_fg_fraction=0.5,
rpn_positive_overlap=0.7,
rpn_negative_overlap=0.3,
train_pre_nms_top_n=12000,
train_post_nms_top_n=2000,
train_nms_thresh=0.7,
test_pre_nms_top_n=6000,
test_post_nms_top_n=1000,
test_nms_thresh=0.7,
#roi_extractor
roi_extractor=None,
#bbox_head
bbox_head=None,
keep_top_k=100,
nms_threshold=0.5,
score_threshold=0.05,
#bbox_assigner
batch_size_per_im=512,
fg_fraction=.25,
fg_thresh=.5,
bg_thresh_hi=.5,
bg_thresh_lo=0.,
bbox_reg_weights=[0.1, 0.1, 0.2, 0.2]):
super(FasterRCNN, self).__init__()
self.backbone = backbone
self.mode = mode
if with_fpn and fpn is None:
fpn = FPN()
self.fpn = fpn
self.num_classes = num_classes
if rpn_head is None:
if self.fpn is None:
rpn_head = RPNHead(
anchor_sizes=anchor_sizes,
aspect_ratios=aspect_ratios,
rpn_batch_size_per_im=rpn_batch_size_per_im,
rpn_fg_fraction=rpn_fg_fraction,
rpn_positive_overlap=rpn_positive_overlap,
rpn_negative_overlap=rpn_negative_overlap,
train_pre_nms_top_n=train_pre_nms_top_n,
train_post_nms_top_n=train_post_nms_top_n,
train_nms_thresh=train_nms_thresh,
test_pre_nms_top_n=test_pre_nms_top_n,
test_post_nms_top_n=test_post_nms_top_n,
test_nms_thresh=test_nms_thresh)
else:
rpn_head = FPNRPNHead(
anchor_start_size=anchor_sizes[0],
aspect_ratios=aspect_ratios,
num_chan=self.fpn.num_chan,
min_level=self.fpn.min_level,
max_level=self.fpn.max_level,
rpn_batch_size_per_im=rpn_batch_size_per_im,
rpn_fg_fraction=rpn_fg_fraction,
rpn_positive_overlap=rpn_positive_overlap,
rpn_negative_overlap=rpn_negative_overlap,
train_pre_nms_top_n=train_pre_nms_top_n,
train_post_nms_top_n=train_post_nms_top_n,
train_nms_thresh=train_nms_thresh,
test_pre_nms_top_n=test_pre_nms_top_n,
test_post_nms_top_n=test_post_nms_top_n,
test_nms_thresh=test_nms_thresh)
self.rpn_head = rpn_head
if roi_extractor is None:
if self.fpn is None:
roi_extractor = RoIAlign(
resolution=14,
spatial_scale=1. / 2**self.backbone.feature_maps[0])
else:
roi_extractor = FPNRoIAlign(sampling_ratio=2)
self.roi_extractor = roi_extractor
if bbox_head is None:
if self.fpn is None:
head = ResNetC5(
layers=self.backbone.layers,
norm_type=self.backbone.norm_type,
freeze_norm=self.backbone.freeze_norm,
variant=self.backbone.variant)
else:
head = TwoFCHead()
bbox_head = BBoxHead(
head=head,
keep_top_k=keep_top_k,
nms_threshold=nms_threshold,
score_threshold=score_threshold,
num_classes=num_classes)
self.bbox_head = bbox_head
self.batch_size_per_im = batch_size_per_im
self.fg_fraction = fg_fraction
self.fg_thresh = fg_thresh
self.bg_thresh_hi = bg_thresh_hi
self.bg_thresh_lo = bg_thresh_lo
self.bbox_reg_weights = bbox_reg_weights
self.rpn_only = rpn_only
def build_net(self, inputs):
im = inputs['image']
im_info = inputs['im_info']
if self.mode == 'train':
gt_bbox = inputs['gt_box']
is_crowd = inputs['is_crowd']
else:
im_shape = inputs['im_shape']
body_feats = self.backbone(im)
body_feat_names = list(body_feats.keys())
if self.fpn is not None:
body_feats, spatial_scale = self.fpn.get_output(body_feats)
rois = self.rpn_head.get_proposals(body_feats, im_info, mode=self.mode)
if self.mode == 'train':
rpn_loss = self.rpn_head.get_loss(im_info, gt_bbox, is_crowd)
outputs = fluid.layers.generate_proposal_labels(
rpn_rois=rois,
gt_classes=inputs['gt_label'],
is_crowd=inputs['is_crowd'],
gt_boxes=inputs['gt_box'],
im_info=inputs['im_info'],
batch_size_per_im=self.batch_size_per_im,
fg_fraction=self.fg_fraction,
fg_thresh=self.fg_thresh,
bg_thresh_hi=self.bg_thresh_hi,
bg_thresh_lo=self.bg_thresh_lo,
bbox_reg_weights=self.bbox_reg_weights,
class_nums=self.num_classes,
use_random=self.rpn_head.use_random)
rois = outputs[0]
labels_int32 = outputs[1]
bbox_targets = outputs[2]
bbox_inside_weights = outputs[3]
bbox_outside_weights = outputs[4]
else:
if self.rpn_only:
im_scale = fluid.layers.slice(
im_info, [1], starts=[2], ends=[3])
im_scale = fluid.layers.sequence_expand(im_scale, rois)
rois = rois / im_scale
return {'proposal': rois}
if self.fpn is None:
# in models without FPN, roi extractor only uses the last level of
# feature maps. And body_feat_names[-1] represents the name of
# last feature map.
body_feat = body_feats[body_feat_names[-1]]
roi_feat = self.roi_extractor(body_feat, rois)
else:
roi_feat = self.roi_extractor(body_feats, rois, spatial_scale)
if self.mode == 'train':
loss = self.bbox_head.get_loss(roi_feat, labels_int32,
bbox_targets, bbox_inside_weights,
bbox_outside_weights)
loss.update(rpn_loss)
total_loss = fluid.layers.sum(list(loss.values()))
loss.update({'loss': total_loss})
return loss
else:
pred = self.bbox_head.get_prediction(roi_feat, rois, im_info,
im_shape)
return pred
def generate_inputs(self):
inputs = OrderedDict()
inputs['image'] = fluid.data(
dtype='float32', shape=[None, 3, None, None], name='image')
if self.mode == 'train':
inputs['im_info'] = fluid.data(
dtype='float32', shape=[None, 3], name='im_info')
inputs['gt_box'] = fluid.data(
dtype='float32', shape=[None, 4], lod_level=1, name='gt_box')
inputs['gt_label'] = fluid.data(
dtype='int32', shape=[None, 1], lod_level=1, name='gt_label')
inputs['is_crowd'] = fluid.data(
dtype='int32', shape=[None, 1], lod_level=1, name='is_crowd')
elif self.mode == 'eval':
inputs['im_info'] = fluid.data(
dtype='float32', shape=[None, 3], name='im_info')
inputs['im_id'] = fluid.data(
dtype='int64', shape=[None, 1], name='im_id')
inputs['im_shape'] = fluid.data(
dtype='float32', shape=[None, 3], name='im_shape')
inputs['gt_box'] = fluid.data(
dtype='float32', shape=[None, 4], lod_level=1, name='gt_box')
inputs['gt_label'] = fluid.data(
dtype='int32', shape=[None, 1], lod_level=1, name='gt_label')
inputs['is_difficult'] = fluid.data(
dtype='int32',
shape=[None, 1],
lod_level=1,
name='is_difficult')
elif self.mode == 'test':
inputs['im_info'] = fluid.data(
dtype='float32', shape=[None, 3], name='im_info')
inputs['im_shape'] = fluid.data(
dtype='float32', shape=[None, 3], name='im_shape')
return inputs
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
import copy
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Xavier
from paddle.fluid.regularizer import L2Decay
__all__ = ['FPN']
def ConvNorm(input,
num_filters,
filter_size,
stride=1,
groups=1,
norm_decay=0.,
norm_type='affine_channel',
norm_groups=32,
dilation=1,
lr_scale=1,
freeze_norm=False,
act=None,
norm_name=None,
initializer=None,
name=None):
fan = num_filters
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=((filter_size - 1) // 2) * dilation,
dilation=dilation,
groups=groups,
act=None,
param_attr=ParamAttr(
name=name + "_weights",
initializer=initializer,
learning_rate=lr_scale),
bias_attr=False,
name=name + '.conv2d.output.1')
norm_lr = 0. if freeze_norm else 1.
pattr = ParamAttr(
name=norm_name + '_scale',
learning_rate=norm_lr * lr_scale,
regularizer=L2Decay(norm_decay))
battr = ParamAttr(
name=norm_name + '_offset',
learning_rate=norm_lr * lr_scale,
regularizer=L2Decay(norm_decay))
if norm_type in ['bn', 'sync_bn']:
global_stats = True if freeze_norm else False
out = fluid.layers.batch_norm(
input=conv,
act=act,
name=norm_name + '.output.1',
param_attr=pattr,
bias_attr=battr,
moving_mean_name=norm_name + '_mean',
moving_variance_name=norm_name + '_variance',
use_global_stats=global_stats)
scale = fluid.framework._get_var(pattr.name)
bias = fluid.framework._get_var(battr.name)
elif norm_type == 'gn':
out = fluid.layers.group_norm(
input=conv,
act=act,
name=norm_name + '.output.1',
groups=norm_groups,
param_attr=pattr,
bias_attr=battr)
scale = fluid.framework._get_var(pattr.name)
bias = fluid.framework._get_var(battr.name)
elif norm_type == 'affine_channel':
scale = fluid.layers.create_parameter(
shape=[conv.shape[1]],
dtype=conv.dtype,
attr=pattr,
default_initializer=fluid.initializer.Constant(1.))
bias = fluid.layers.create_parameter(
shape=[conv.shape[1]],
dtype=conv.dtype,
attr=battr,
default_initializer=fluid.initializer.Constant(0.))
out = fluid.layers.affine_channel(
x=conv, scale=scale, bias=bias, act=act)
if freeze_norm:
scale.stop_gradient = True
bias.stop_gradient = True
return out
class FPN(object):
"""
Feature Pyramid Network, see https://arxiv.org/abs/1612.03144
Args:
num_chan (int): number of feature channels
min_level (int): lowest level of the backbone feature map to use
max_level (int): highest level of the backbone feature map to use
spatial_scale (list): feature map scaling factor
has_extra_convs (bool): whether has extral convolutions in higher levels
norm_type (str|None): normalization type, 'bn'/'sync_bn'/'affine_channel'
"""
def __init__(self,
num_chan=256,
min_level=2,
max_level=6,
spatial_scale=[1. / 32., 1. / 16., 1. / 8., 1. / 4.],
has_extra_convs=False,
norm_type=None,
freeze_norm=False):
self.freeze_norm = freeze_norm
self.num_chan = num_chan
self.min_level = min_level
self.max_level = max_level
self.spatial_scale = spatial_scale
self.has_extra_convs = has_extra_convs
self.norm_type = norm_type
def _add_topdown_lateral(self, body_name, body_input, upper_output):
lateral_name = 'fpn_inner_' + body_name + '_lateral'
topdown_name = 'fpn_topdown_' + body_name
fan = body_input.shape[1]
if self.norm_type:
initializer = Xavier(fan_out=fan)
lateral = ConvNorm(
body_input,
self.num_chan,
1,
initializer=initializer,
norm_type=self.norm_type,
freeze_norm=self.freeze_norm,
name=lateral_name,
norm_name=lateral_name)
else:
lateral = fluid.layers.conv2d(
body_input,
self.num_chan,
1,
param_attr=ParamAttr(
name=lateral_name + "_w", initializer=Xavier(fan_out=fan)),
bias_attr=ParamAttr(
name=lateral_name + "_b",
learning_rate=2.,
regularizer=L2Decay(0.)),
name=lateral_name)
topdown = fluid.layers.resize_nearest(
upper_output, scale=2., name=topdown_name)
return lateral + topdown
def get_output(self, body_dict):
"""
Add FPN onto backbone.
Args:
body_dict(OrderedDict): Dictionary of variables and each element is the
output of backbone.
Return:
fpn_dict(OrderedDict): A dictionary represents the output of FPN with
their name.
spatial_scale(list): A list of multiplicative spatial scale factor.
"""
spatial_scale = copy.deepcopy(self.spatial_scale)
body_name_list = list(body_dict.keys())[::-1]
num_backbone_stages = len(body_name_list)
self.fpn_inner_output = [[] for _ in range(num_backbone_stages)]
fpn_inner_name = 'fpn_inner_' + body_name_list[0]
body_input = body_dict[body_name_list[0]]
fan = body_input.shape[1]
if self.norm_type:
initializer = Xavier(fan_out=fan)
self.fpn_inner_output[0] = ConvNorm(
body_input,
self.num_chan,
1,
initializer=initializer,
norm_type=self.norm_type,
freeze_norm=self.freeze_norm,
name=fpn_inner_name,
norm_name=fpn_inner_name)
else:
self.fpn_inner_output[0] = fluid.layers.conv2d(
body_input,
self.num_chan,
1,
param_attr=ParamAttr(
name=fpn_inner_name + "_w",
initializer=Xavier(fan_out=fan)),
bias_attr=ParamAttr(
name=fpn_inner_name + "_b",
learning_rate=2.,
regularizer=L2Decay(0.)),
name=fpn_inner_name)
for i in range(1, num_backbone_stages):
body_name = body_name_list[i]
body_input = body_dict[body_name]
top_output = self.fpn_inner_output[i - 1]
fpn_inner_single = self._add_topdown_lateral(
body_name, body_input, top_output)
self.fpn_inner_output[i] = fpn_inner_single
fpn_dict = {}
fpn_name_list = []
for i in range(num_backbone_stages):
fpn_name = 'fpn_' + body_name_list[i]
fan = self.fpn_inner_output[i].shape[1] * 3 * 3
if self.norm_type:
initializer = Xavier(fan_out=fan)
fpn_output = ConvNorm(
self.fpn_inner_output[i],
self.num_chan,
3,
initializer=initializer,
norm_type=self.norm_type,
freeze_norm=self.freeze_norm,
name=fpn_name,
norm_name=fpn_name)
else:
fpn_output = fluid.layers.conv2d(
self.fpn_inner_output[i],
self.num_chan,
filter_size=3,
padding=1,
param_attr=ParamAttr(
name=fpn_name + "_w", initializer=Xavier(fan_out=fan)),
bias_attr=ParamAttr(
name=fpn_name + "_b",
learning_rate=2.,
regularizer=L2Decay(0.)),
name=fpn_name)
fpn_dict[fpn_name] = fpn_output
fpn_name_list.append(fpn_name)
if not self.has_extra_convs and self.max_level - self.min_level == len(
spatial_scale):
body_top_name = fpn_name_list[0]
body_top_extension = fluid.layers.pool2d(
fpn_dict[body_top_name],
1,
'max',
pool_stride=2,
name=body_top_name + '_subsampled_2x')
fpn_dict[body_top_name + '_subsampled_2x'] = body_top_extension
fpn_name_list.insert(0, body_top_name + '_subsampled_2x')
spatial_scale.insert(0, spatial_scale[0] * 0.5)
# Coarser FPN levels introduced for RetinaNet
highest_backbone_level = self.min_level + len(spatial_scale) - 1
if self.has_extra_convs and self.max_level > highest_backbone_level:
fpn_blob = body_dict[body_name_list[0]]
for i in range(highest_backbone_level + 1, self.max_level + 1):
fpn_blob_in = fpn_blob
fpn_name = 'fpn_' + str(i)
if i > highest_backbone_level + 1:
fpn_blob_in = fluid.layers.relu(fpn_blob)
fan = fpn_blob_in.shape[1] * 3 * 3
fpn_blob = fluid.layers.conv2d(
input=fpn_blob_in,
num_filters=self.num_chan,
filter_size=3,
stride=2,
padding=1,
param_attr=ParamAttr(
name=fpn_name + "_w", initializer=Xavier(fan_out=fan)),
bias_attr=ParamAttr(
name=fpn_name + "_b",
learning_rate=2.,
regularizer=L2Decay(0.)),
name=fpn_name)
fpn_dict[fpn_name] = fpn_blob
fpn_name_list.insert(0, fpn_name)
spatial_scale.insert(0, spatial_scale[0] * 0.5)
res_dict = OrderedDict([(k, fpn_dict[k]) for k in fpn_name_list])
return res_dict, spatial_scale
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import MSRA
from paddle.fluid.regularizer import L2Decay
from .fpn import ConvNorm
__all__ = ['MaskHead']
class MaskHead(object):
"""
RCNN mask head
Args:
num_convs (int): num of convolutions, 4 for FPN, 1 otherwise
conv_dim (int): num of channels after first convolution
resolution (int): size of the output mask
dilation (int): dilation rate
num_classes (int): number of output classes
"""
def __init__(self,
num_convs=0,
conv_dim=256,
resolution=14,
dilation=1,
num_classes=81,
norm_type=None):
super(MaskHead, self).__init__()
self.num_convs = num_convs
self.conv_dim = conv_dim
self.resolution = resolution
self.dilation = dilation
self.num_classes = num_classes
self.norm_type = norm_type
def _mask_conv_head(self, roi_feat, num_convs, norm_type):
if norm_type == 'gn':
for i in range(num_convs):
layer_name = "mask_inter_feat_" + str(i + 1)
fan = self.conv_dim * 3 * 3
initializer = MSRA(uniform=False, fan_in=fan)
roi_feat = ConvNorm(
roi_feat,
self.conv_dim,
3,
act='relu',
dilation=self.dilation,
initializer=initializer,
norm_type=self.norm_type,
name=layer_name,
norm_name=layer_name)
else:
for i in range(num_convs):
layer_name = "mask_inter_feat_" + str(i + 1)
fan = self.conv_dim * 3 * 3
initializer = MSRA(uniform=False, fan_in=fan)
roi_feat = fluid.layers.conv2d(
input=roi_feat,
num_filters=self.conv_dim,
filter_size=3,
padding=1 * self.dilation,
act='relu',
stride=1,
dilation=self.dilation,
name=layer_name,
param_attr=ParamAttr(
name=layer_name + '_w', initializer=initializer),
bias_attr=ParamAttr(
name=layer_name + '_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
fan = roi_feat.shape[1] * 2 * 2
feat = fluid.layers.conv2d_transpose(
input=roi_feat,
num_filters=self.conv_dim,
filter_size=2,
stride=2,
act='relu',
param_attr=ParamAttr(
name='conv5_mask_w',
initializer=MSRA(uniform=False, fan_in=fan)),
bias_attr=ParamAttr(
name='conv5_mask_b', learning_rate=2.,
regularizer=L2Decay(0.)))
return feat
def _get_output(self, roi_feat):
class_num = self.num_classes
# configure the conv number for FPN if necessary
head_feat = self._mask_conv_head(roi_feat, self.num_convs,
self.norm_type)
fan = class_num
mask_logits = fluid.layers.conv2d(
input=head_feat,
num_filters=class_num,
filter_size=1,
act=None,
param_attr=ParamAttr(
name='mask_fcn_logits_w',
initializer=MSRA(uniform=False, fan_in=fan)),
bias_attr=ParamAttr(
name="mask_fcn_logits_b",
learning_rate=2.,
regularizer=L2Decay(0.)))
return mask_logits
def get_loss(self, roi_feat, mask_int32):
mask_logits = self._get_output(roi_feat)
num_classes = self.num_classes
resolution = self.resolution
dim = num_classes * resolution * resolution
mask_logits = fluid.layers.reshape(mask_logits, (-1, dim))
mask_label = fluid.layers.cast(x=mask_int32, dtype='float32')
mask_label.stop_gradient = True
loss_mask = fluid.layers.sigmoid_cross_entropy_with_logits(
x=mask_logits, label=mask_label, ignore_index=-1, normalize=True)
loss_mask = fluid.layers.reduce_sum(loss_mask, name='loss_mask')
return {'loss_mask': loss_mask}
def get_prediction(self, roi_feat, bbox_pred):
"""
Get prediction mask in test stage.
Args:
roi_feat (Variable): RoI feature from RoIExtractor.
bbox_pred (Variable): predicted bbox.
Returns:
mask_pred (Variable): Prediction mask with shape
[N, num_classes, resolution, resolution].
"""
mask_logits = self._get_output(roi_feat)
mask_prob = fluid.layers.sigmoid(mask_logits)
mask_prob = fluid.layers.lod_reset(mask_prob, bbox_pred)
return mask_prob
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
import copy
import paddle.fluid as fluid
from .fpn import FPN
from .rpn_head import (RPNHead, FPNRPNHead)
from .roi_extractor import (RoIAlign, FPNRoIAlign)
from .bbox_head import (BBoxHead, TwoFCHead)
from .mask_head import MaskHead
from ..resnet import ResNetC5
__all__ = ['MaskRCNN']
class MaskRCNN(object):
"""
Mask R-CNN architecture, see https://arxiv.org/abs/1703.06870
Args:
backbone (object): backbone instance
rpn_head (object): `RPNhead` instance
roi_extractor (object): ROI extractor instance
bbox_head (object): `BBoxHead` instance
mask_head (object): `MaskHead` instance
fpn (object): feature pyramid network instance
"""
def __init__(
self,
backbone,
num_classes=81,
mode='train',
with_fpn=False,
fpn=None,
num_chan=256,
min_level=2,
max_level=6,
spatial_scale=[1. / 32., 1. / 16., 1. / 8., 1. / 4.],
#rpn_head
rpn_only=False,
rpn_head=None,
anchor_sizes=[32, 64, 128, 256, 512],
aspect_ratios=[0.5, 1.0, 2.0],
rpn_batch_size_per_im=256,
rpn_fg_fraction=0.5,
rpn_positive_overlap=0.7,
rpn_negative_overlap=0.3,
train_pre_nms_top_n=12000,
train_post_nms_top_n=2000,
train_nms_thresh=0.7,
test_pre_nms_top_n=6000,
test_post_nms_top_n=1000,
test_nms_thresh=0.7,
#roi_extractor
roi_extractor=None,
#bbox_head
bbox_head=None,
keep_top_k=100,
nms_threshold=0.5,
score_threshold=0.05,
#MaskHead
mask_head=None,
num_convs=0,
mask_head_resolution=14,
#bbox_assigner
batch_size_per_im=512,
fg_fraction=.25,
fg_thresh=.5,
bg_thresh_hi=.5,
bg_thresh_lo=0.,
bbox_reg_weights=[0.1, 0.1, 0.2, 0.2]):
super(MaskRCNN, self).__init__()
self.backbone = backbone
self.mode = mode
if with_fpn and fpn is None:
fpn = FPN(
num_chan=num_chan,
min_level=min_level,
max_level=max_level,
spatial_scale=spatial_scale)
self.fpn = fpn
self.num_classes = num_classes
if rpn_head is None:
if self.fpn is None:
rpn_head = RPNHead(
anchor_sizes=anchor_sizes,
aspect_ratios=aspect_ratios,
rpn_batch_size_per_im=rpn_batch_size_per_im,
rpn_fg_fraction=rpn_fg_fraction,
rpn_positive_overlap=rpn_positive_overlap,
rpn_negative_overlap=rpn_negative_overlap,
train_pre_nms_top_n=train_pre_nms_top_n,
train_post_nms_top_n=train_post_nms_top_n,
train_nms_thresh=train_nms_thresh,
test_pre_nms_top_n=test_pre_nms_top_n,
test_post_nms_top_n=test_post_nms_top_n,
test_nms_thresh=test_nms_thresh)
else:
rpn_head = FPNRPNHead(
anchor_start_size=anchor_sizes[0],
aspect_ratios=aspect_ratios,
num_chan=self.fpn.num_chan,
min_level=self.fpn.min_level,
max_level=self.fpn.max_level,
rpn_batch_size_per_im=rpn_batch_size_per_im,
rpn_fg_fraction=rpn_fg_fraction,
rpn_positive_overlap=rpn_positive_overlap,
rpn_negative_overlap=rpn_negative_overlap,
train_pre_nms_top_n=train_pre_nms_top_n,
train_post_nms_top_n=train_post_nms_top_n,
train_nms_thresh=train_nms_thresh,
test_pre_nms_top_n=test_pre_nms_top_n,
test_post_nms_top_n=test_post_nms_top_n,
test_nms_thresh=test_nms_thresh)
self.rpn_head = rpn_head
if roi_extractor is None:
if self.fpn is None:
roi_extractor = RoIAlign(
resolution=14,
spatial_scale=1. / 2**self.backbone.feature_maps[0])
else:
roi_extractor = FPNRoIAlign(sampling_ratio=2)
self.roi_extractor = roi_extractor
if bbox_head is None:
if self.fpn is None:
head = ResNetC5(
layers=self.backbone.layers,
norm_type=self.backbone.norm_type,
freeze_norm=self.backbone.freeze_norm)
else:
head = TwoFCHead()
bbox_head = BBoxHead(
head=head,
keep_top_k=keep_top_k,
nms_threshold=nms_threshold,
score_threshold=score_threshold,
num_classes=num_classes)
self.bbox_head = bbox_head
if mask_head is None:
mask_head = MaskHead(
num_convs=num_convs,
resolution=mask_head_resolution,
num_classes=num_classes)
self.mask_head = mask_head
self.batch_size_per_im = batch_size_per_im
self.fg_fraction = fg_fraction
self.fg_thresh = fg_thresh
self.bg_thresh_hi = bg_thresh_hi
self.bg_thresh_lo = bg_thresh_lo
self.bbox_reg_weights = bbox_reg_weights
self.rpn_only = rpn_only
def build_net(self, inputs):
im = inputs['image']
im_info = inputs['im_info']
# backbone
body_feats = self.backbone(im)
# FPN
spatial_scale = None
if self.fpn is not None:
body_feats, spatial_scale = self.fpn.get_output(body_feats)
# RPN proposals
rois = self.rpn_head.get_proposals(body_feats, im_info, mode=self.mode)
if self.mode == 'train':
rpn_loss = self.rpn_head.get_loss(im_info, inputs['gt_box'],
inputs['is_crowd'])
outputs = fluid.layers.generate_proposal_labels(
rpn_rois=rois,
gt_classes=inputs['gt_label'],
is_crowd=inputs['is_crowd'],
gt_boxes=inputs['gt_box'],
im_info=inputs['im_info'],
batch_size_per_im=self.batch_size_per_im,
fg_fraction=self.fg_fraction,
fg_thresh=self.fg_thresh,
bg_thresh_hi=self.bg_thresh_hi,
bg_thresh_lo=self.bg_thresh_lo,
bbox_reg_weights=self.bbox_reg_weights,
class_nums=self.num_classes,
use_random=self.rpn_head.use_random)
rois = outputs[0]
labels_int32 = outputs[1]
if self.fpn is None:
last_feat = body_feats[list(body_feats.keys())[-1]]
roi_feat = self.roi_extractor(last_feat, rois)
else:
roi_feat = self.roi_extractor(body_feats, rois, spatial_scale)
loss = self.bbox_head.get_loss(roi_feat, labels_int32,
*outputs[2:])
loss.update(rpn_loss)
mask_rois, roi_has_mask_int32, mask_int32 = fluid.layers.generate_mask_labels(
rois=rois,
gt_classes=inputs['gt_label'],
is_crowd=inputs['is_crowd'],
gt_segms=inputs['gt_mask'],
im_info=inputs['im_info'],
labels_int32=labels_int32,
num_classes=self.num_classes,
resolution=self.mask_head.resolution)
if self.fpn is None:
bbox_head_feat = self.bbox_head.get_head_feat()
feat = fluid.layers.gather(bbox_head_feat, roi_has_mask_int32)
else:
feat = self.roi_extractor(
body_feats, mask_rois, spatial_scale, is_mask=True)
mask_loss = self.mask_head.get_loss(feat, mask_int32)
loss.update(mask_loss)
total_loss = fluid.layers.sum(list(loss.values()))
loss.update({'loss': total_loss})
return loss
else:
if self.rpn_only:
im_scale = fluid.layers.slice(
im_info, [1], starts=[2], ends=[3])
im_scale = fluid.layers.sequence_expand(im_scale, rois)
rois = rois / im_scale
return {'proposal': rois}
mask_name = 'mask_pred'
mask_pred, bbox_pred = self._eval(body_feats, mask_name, rois,
im_info, inputs['im_shape'],
spatial_scale)
return OrderedDict(zip(['bbox', 'mask'], [bbox_pred, mask_pred]))
def _eval(self,
body_feats,
mask_name,
rois,
im_info,
im_shape,
spatial_scale,
bbox_pred=None):
if not bbox_pred:
if self.fpn is None:
last_feat = body_feats[list(body_feats.keys())[-1]]
roi_feat = self.roi_extractor(last_feat, rois)
else:
roi_feat = self.roi_extractor(body_feats, rois, spatial_scale)
bbox_pred = self.bbox_head.get_prediction(roi_feat, rois, im_info,
im_shape)
bbox_pred = bbox_pred['bbox']
# share weight
bbox_shape = fluid.layers.shape(bbox_pred)
bbox_size = fluid.layers.reduce_prod(bbox_shape)
bbox_size = fluid.layers.reshape(bbox_size, [1, 1])
size = fluid.layers.fill_constant([1, 1], value=6, dtype='int32')
cond = fluid.layers.less_than(x=bbox_size, y=size)
mask_pred = fluid.layers.create_global_var(
shape=[1],
value=0.0,
dtype='float32',
persistable=False,
name=mask_name)
with fluid.layers.control_flow.Switch() as switch:
with switch.case(cond):
fluid.layers.assign(input=bbox_pred, output=mask_pred)
with switch.default():
bbox = fluid.layers.slice(bbox_pred, [1], starts=[2], ends=[6])
im_scale = fluid.layers.slice(
im_info, [1], starts=[2], ends=[3])
im_scale = fluid.layers.sequence_expand(im_scale, bbox)
mask_rois = bbox * im_scale
if self.fpn is None:
last_feat = body_feats[list(body_feats.keys())[-1]]
mask_feat = self.roi_extractor(last_feat, mask_rois)
mask_feat = self.bbox_head.get_head_feat(mask_feat)
else:
mask_feat = self.roi_extractor(
body_feats, mask_rois, spatial_scale, is_mask=True)
mask_out = self.mask_head.get_prediction(mask_feat, bbox)
fluid.layers.assign(input=mask_out, output=mask_pred)
return mask_pred, bbox_pred
def generate_inputs(self):
inputs = OrderedDict()
inputs['image'] = fluid.data(
dtype='float32', shape=[None, 3, None, None], name='image')
if self.mode == 'train':
inputs['im_info'] = fluid.data(
dtype='float32', shape=[None, 3], name='im_info')
inputs['gt_box'] = fluid.data(
dtype='float32', shape=[None, 4], lod_level=1, name='gt_box')
inputs['gt_label'] = fluid.data(
dtype='int32', shape=[None, 1], lod_level=1, name='gt_label')
inputs['is_crowd'] = fluid.data(
dtype='int32', shape=[None, 1], lod_level=1, name='is_crowd')
inputs['gt_mask'] = fluid.data(
dtype='float32', shape=[None, 2], lod_level=3, name='gt_mask')
elif self.mode == 'eval':
inputs['im_info'] = fluid.data(
dtype='float32', shape=[None, 3], name='im_info')
inputs['im_id'] = fluid.data(
dtype='int64', shape=[None, 1], name='im_id')
inputs['im_shape'] = fluid.data(
dtype='float32', shape=[None, 3], name='im_shape')
elif self.mode == 'test':
inputs['im_info'] = fluid.data(
dtype='float32', shape=[None, 3], name='im_info')
inputs['im_shape'] = fluid.data(
dtype='float32', shape=[None, 3], name='im_shape')
return inputs
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.fluid as fluid
__all__ = ['RoIAlign', 'FPNRoIAlign']
class RoIAlign(object):
def __init__(self, resolution=7, spatial_scale=1. / 16, sampling_ratio=0):
super(RoIAlign, self).__init__()
self.resolution = resolution
self.spatial_scale = spatial_scale
self.sampling_ratio = sampling_ratio
def __call__(self, head_inputs, rois):
roi_feat = fluid.layers.roi_align(
head_inputs,
rois,
pooled_height=self.resolution,
pooled_width=self.resolution,
spatial_scale=self.spatial_scale,
sampling_ratio=self.sampling_ratio)
return roi_feat
class FPNRoIAlign(object):
"""
RoI align pooling for FPN feature maps
Args:
sampling_ratio (int): number of sampling points
min_level (int): lowest level of FPN layer
max_level (int): highest level of FPN layer
canconical_level (int): the canconical FPN feature map level
canonical_size (int): the canconical FPN feature map size
box_resolution (int): box resolution
mask_resolution (int): mask roi resolution
"""
def __init__(self,
sampling_ratio=0,
min_level=2,
max_level=5,
canconical_level=4,
canonical_size=224,
box_resolution=7,
mask_resolution=14):
super(FPNRoIAlign, self).__init__()
self.sampling_ratio = sampling_ratio
self.min_level = min_level
self.max_level = max_level
self.canconical_level = canconical_level
self.canonical_size = canonical_size
self.box_resolution = box_resolution
self.mask_resolution = mask_resolution
def __call__(self, head_inputs, rois, spatial_scale, is_mask=False):
"""
Adopt RoI align onto several level of feature maps to get RoI features.
Distribute RoIs to different levels by area and get a list of RoI
features by distributed RoIs and their corresponding feature maps.
Returns:
roi_feat(Variable): RoI features with shape of [M, C, R, R],
where M is the number of RoIs and R is RoI resolution
"""
k_min = self.min_level
k_max = self.max_level
num_roi_lvls = k_max - k_min + 1
name_list = list(head_inputs.keys())
input_name_list = name_list[-num_roi_lvls:]
spatial_scale = spatial_scale[-num_roi_lvls:]
rois_dist, restore_index = fluid.layers.distribute_fpn_proposals(
rois, k_min, k_max, self.canconical_level, self.canonical_size)
# rois_dist is in ascend order
roi_out_list = []
resolution = is_mask and self.mask_resolution or self.box_resolution
for lvl in range(num_roi_lvls):
name_index = num_roi_lvls - lvl - 1
rois_input = rois_dist[lvl]
head_input = head_inputs[input_name_list[name_index]]
sc = spatial_scale[name_index]
roi_out = fluid.layers.roi_align(
input=head_input,
rois=rois_input,
pooled_height=resolution,
pooled_width=resolution,
spatial_scale=sc,
sampling_ratio=self.sampling_ratio)
roi_out_list.append(roi_out)
roi_feat_shuffle = fluid.layers.concat(roi_out_list)
roi_feat_ = fluid.layers.gather(roi_feat_shuffle, restore_index)
roi_feat = fluid.layers.lod_reset(roi_feat_, rois)
return roi_feat
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Normal
from paddle.fluid.regularizer import L2Decay
__all__ = ['RPNHead', 'FPNRPNHead']
class RPNHead(object):
def __init__(
self,
#anchor_generator
stride=[16.0, 16.0],
anchor_sizes=[32, 64, 128, 256, 512],
aspect_ratios=[0.5, 1., 2.],
variance=[1., 1., 1., 1.],
#rpn_target_assign
rpn_batch_size_per_im=256,
rpn_straddle_thresh=0.,
rpn_fg_fraction=0.5,
rpn_positive_overlap=0.7,
rpn_negative_overlap=0.3,
use_random=True,
#train_proposal
train_pre_nms_top_n=12000,
train_post_nms_top_n=2000,
train_nms_thresh=.7,
train_min_size=.0,
train_eta=1.,
#test_proposal
test_pre_nms_top_n=6000,
test_post_nms_top_n=1000,
test_nms_thresh=.7,
test_min_size=.0,
test_eta=1.,
#num_classes
num_classes=1):
super(RPNHead, self).__init__()
self.stride = stride
self.anchor_sizes = anchor_sizes
self.aspect_ratios = aspect_ratios
self.variance = variance
self.rpn_batch_size_per_im = rpn_batch_size_per_im
self.rpn_straddle_thresh = rpn_straddle_thresh
self.rpn_fg_fraction = rpn_fg_fraction
self.rpn_positive_overlap = rpn_positive_overlap
self.rpn_negative_overlap = rpn_negative_overlap
self.use_random = use_random
self.train_pre_nms_top_n = train_pre_nms_top_n
self.train_post_nms_top_n = train_post_nms_top_n
self.train_nms_thresh = train_nms_thresh
self.train_min_size = train_min_size
self.train_eta = train_eta
self.test_pre_nms_top_n = test_pre_nms_top_n
self.test_post_nms_top_n = test_post_nms_top_n
self.test_nms_thresh = test_nms_thresh
self.test_min_size = test_min_size
self.test_eta = test_eta
self.num_classes = num_classes
def _get_output(self, input):
"""
Get anchor and RPN head output.
Args:
input(Variable): feature map from backbone with shape of [N, C, H, W]
Returns:
rpn_cls_score(Variable): Output of rpn head with shape of
[N, num_anchors, H, W].
rpn_bbox_pred(Variable): Output of rpn head with shape of
[N, num_anchors * 4, H, W].
"""
dim_out = input.shape[1]
rpn_conv = fluid.layers.conv2d(
input=input,
num_filters=dim_out,
filter_size=3,
stride=1,
padding=1,
act='relu',
name='conv_rpn',
param_attr=ParamAttr(
name="conv_rpn_w", initializer=Normal(loc=0., scale=0.01)),
bias_attr=ParamAttr(
name="conv_rpn_b", learning_rate=2., regularizer=L2Decay(0.)))
# Generate anchors
self.anchor, self.anchor_var = fluid.layers.anchor_generator(
input=rpn_conv,
stride=self.stride,
anchor_sizes=self.anchor_sizes,
aspect_ratios=self.aspect_ratios,
variance=self.variance)
num_anchor = self.anchor.shape[2]
# Proposal classification scores
self.rpn_cls_score = fluid.layers.conv2d(
rpn_conv,
num_filters=num_anchor * self.num_classes,
filter_size=1,
stride=1,
padding=0,
act=None,
name='rpn_cls_score',
param_attr=ParamAttr(
name="rpn_cls_logits_w",
initializer=Normal(loc=0., scale=0.01)),
bias_attr=ParamAttr(
name="rpn_cls_logits_b",
learning_rate=2.,
regularizer=L2Decay(0.)))
# Proposal bbox regression deltas
self.rpn_bbox_pred = fluid.layers.conv2d(
rpn_conv,
num_filters=4 * num_anchor,
filter_size=1,
stride=1,
padding=0,
act=None,
name='rpn_bbox_pred',
param_attr=ParamAttr(
name="rpn_bbox_pred_w", initializer=Normal(loc=0.,
scale=0.01)),
bias_attr=ParamAttr(
name="rpn_bbox_pred_b",
learning_rate=2.,
regularizer=L2Decay(0.)))
return self.rpn_cls_score, self.rpn_bbox_pred
def get_proposals(self, body_feats, im_info, mode='train'):
"""
Get proposals according to the output of backbone.
Args:
body_feats (dict): The dictionary of feature maps from backbone.
im_info(Variable): The information of image with shape [N, 3] with
shape (height, width, scale).
body_feat_names(list): A list of names of feature maps from
backbone.
Returns:
rpn_rois(Variable): Output proposals with shape of (rois_num, 4).
"""
# In RPN Heads, only the last feature map of backbone is used.
# And body_feat_names[-1] represents the last level name of backbone.
body_feat = list(body_feats.values())[-1]
rpn_cls_score, rpn_bbox_pred = self._get_output(body_feat)
if self.num_classes == 1:
rpn_cls_prob = fluid.layers.sigmoid(
rpn_cls_score, name='rpn_cls_prob')
else:
rpn_cls_score = fluid.layers.transpose(
rpn_cls_score, perm=[0, 2, 3, 1])
rpn_cls_score = fluid.layers.reshape(
rpn_cls_score, shape=(0, 0, 0, -1, self.num_classes))
rpn_cls_prob_tmp = fluid.layers.softmax(
rpn_cls_score, use_cudnn=False, name='rpn_cls_prob')
rpn_cls_prob_slice = fluid.layers.slice(
rpn_cls_prob_tmp,
axes=[4],
starts=[1],
ends=[self.num_classes])
rpn_cls_prob, _ = fluid.layers.topk(rpn_cls_prob_slice, 1)
rpn_cls_prob = fluid.layers.reshape(
rpn_cls_prob, shape=(0, 0, 0, -1))
rpn_cls_prob = fluid.layers.transpose(
rpn_cls_prob, perm=[0, 3, 1, 2])
if mode == 'train':
rpn_rois, rpn_roi_probs = fluid.layers.generate_proposals(
scores=rpn_cls_prob,
bbox_deltas=rpn_bbox_pred,
im_info=im_info,
anchors=self.anchor,
variances=self.anchor_var,
pre_nms_top_n=self.train_pre_nms_top_n,
post_nms_top_n=self.train_post_nms_top_n,
nms_thresh=self.train_nms_thresh,
min_size=self.train_min_size,
eta=self.train_eta)
else:
rpn_rois, rpn_roi_probs = fluid.layers.generate_proposals(
scores=rpn_cls_prob,
bbox_deltas=rpn_bbox_pred,
im_info=im_info,
anchors=self.anchor,
variances=self.anchor_var,
pre_nms_top_n=self.test_pre_nms_top_n,
post_nms_top_n=self.test_post_nms_top_n,
nms_thresh=self.test_nms_thresh,
min_size=self.test_min_size,
eta=self.test_eta)
return rpn_rois
def _transform_input(self, rpn_cls_score, rpn_bbox_pred, anchor,
anchor_var):
rpn_cls_score = fluid.layers.transpose(
rpn_cls_score, perm=[0, 2, 3, 1])
rpn_bbox_pred = fluid.layers.transpose(
rpn_bbox_pred, perm=[0, 2, 3, 1])
anchor = fluid.layers.reshape(anchor, shape=(-1, 4))
anchor_var = fluid.layers.reshape(anchor_var, shape=(-1, 4))
rpn_cls_score = fluid.layers.reshape(
x=rpn_cls_score, shape=(0, -1, self.num_classes))
rpn_bbox_pred = fluid.layers.reshape(x=rpn_bbox_pred, shape=(0, -1, 4))
return rpn_cls_score, rpn_bbox_pred, anchor, anchor_var
def _get_loss_input(self):
for attr in ['rpn_cls_score', 'rpn_bbox_pred', 'anchor', 'anchor_var']:
if not getattr(self, attr, None):
raise ValueError("self.{} should not be None,".format(attr),
"call RPNHead.get_proposals first")
return self._transform_input(self.rpn_cls_score, self.rpn_bbox_pred,
self.anchor, self.anchor_var)
def get_loss(self, im_info, gt_box, is_crowd, gt_label=None):
"""
Sample proposals and Calculate rpn loss.
Args:
im_info(Variable): The information of image with shape [N, 3] with
shape (height, width, scale).
gt_box(Variable): The ground-truth bounding boxes with shape [M, 4].
M is the number of groundtruth.
is_crowd(Variable): Indicates groud-truth is crowd or not with
shape [M, 1]. M is the number of groundtruth.
Returns:
Type: dict
rpn_cls_loss(Variable): RPN classification loss.
rpn_bbox_loss(Variable): RPN bounding box regression loss.
"""
rpn_cls, rpn_bbox, anchor, anchor_var = self._get_loss_input()
if self.num_classes == 1:
score_pred, loc_pred, score_tgt, loc_tgt, bbox_weight = \
fluid.layers.rpn_target_assign(
bbox_pred=rpn_bbox,
cls_logits=rpn_cls,
anchor_box=anchor,
anchor_var=anchor_var,
gt_boxes=gt_box,
is_crowd=is_crowd,
im_info=im_info,
rpn_batch_size_per_im=self.rpn_batch_size_per_im,
rpn_straddle_thresh=self.rpn_straddle_thresh,
rpn_fg_fraction=self.rpn_fg_fraction,
rpn_positive_overlap=self.rpn_positive_overlap,
rpn_negative_overlap=self.rpn_negative_overlap,
use_random=self.use_random)
score_tgt = fluid.layers.cast(x=score_tgt, dtype='float32')
score_tgt.stop_gradient = True
rpn_cls_loss = fluid.layers.sigmoid_cross_entropy_with_logits(
x=score_pred, label=score_tgt)
else:
score_pred, loc_pred, score_tgt, loc_tgt, bbox_weight = \
fluid.layers.rpn_target_assign(
bbox_pred=rpn_bbox,
cls_logits=rpn_cls,
anchor_box=anchor,
anchor_var=anchor_var,
gt_boxes=gt_box,
gt_labels=gt_label,
is_crowd=is_crowd,
num_classes=self.num_classes,
im_info=im_info,
rpn_batch_size_per_im=self.rpn_batch_size_per_im,
rpn_straddle_thresh=self.rpn_straddle_thresh,
rpn_fg_fraction=self.rpn_fg_fraction,
rpn_positive_overlap=self.rpn_positive_overlap,
rpn_negative_overlap=self.rpn_negative_overlap,
use_random=self.use_random)
labels_int64 = fluid.layers.cast(x=score_tgt, dtype='int64')
labels_int64.stop_gradient = True
rpn_cls_loss = fluid.layers.softmax_with_cross_entropy(
logits=score_pred,
label=labels_int64,
numeric_stable_mode=True)
rpn_cls_loss = fluid.layers.reduce_mean(
rpn_cls_loss, name='loss_rpn_cls')
loc_tgt = fluid.layers.cast(x=loc_tgt, dtype='float32')
loc_tgt.stop_gradient = True
rpn_reg_loss = fluid.layers.smooth_l1(
x=loc_pred,
y=loc_tgt,
sigma=3.0,
inside_weight=bbox_weight,
outside_weight=bbox_weight)
rpn_reg_loss = fluid.layers.reduce_sum(
rpn_reg_loss, name='loss_rpn_bbox')
score_shape = fluid.layers.shape(score_tgt)
score_shape = fluid.layers.cast(x=score_shape, dtype='float32')
norm = fluid.layers.reduce_prod(score_shape)
norm.stop_gradient = True
rpn_reg_loss = rpn_reg_loss / norm
return {'loss_rpn_cls': rpn_cls_loss, 'loss_rpn_bbox': rpn_reg_loss}
class FPNRPNHead(RPNHead):
def __init__(
self,
anchor_start_size=32,
aspect_ratios=[0.5, 1., 2.],
variance=[1., 1., 1., 1.],
num_chan=256,
min_level=2,
max_level=6,
#rpn_target_assign
rpn_batch_size_per_im=256,
rpn_straddle_thresh=0.,
rpn_fg_fraction=0.5,
rpn_positive_overlap=0.7,
rpn_negative_overlap=0.3,
use_random=True,
#train_proposal
train_pre_nms_top_n=2000,
train_post_nms_top_n=2000,
train_nms_thresh=.7,
train_min_size=.0,
train_eta=1.,
#test_proposal
test_pre_nms_top_n=1000,
test_post_nms_top_n=1000,
test_nms_thresh=.7,
test_min_size=.0,
test_eta=1.,
#num_classes
num_classes=1):
super(FPNRPNHead, self).__init__(
aspect_ratios=aspect_ratios,
variance=variance,
rpn_batch_size_per_im=rpn_batch_size_per_im,
rpn_straddle_thresh=rpn_straddle_thresh,
rpn_fg_fraction=rpn_fg_fraction,
rpn_positive_overlap=rpn_positive_overlap,
rpn_negative_overlap=rpn_negative_overlap,
use_random=use_random,
train_pre_nms_top_n=train_pre_nms_top_n,
train_post_nms_top_n=train_post_nms_top_n,
train_nms_thresh=train_nms_thresh,
train_min_size=train_min_size,
train_eta=train_eta,
test_pre_nms_top_n=test_pre_nms_top_n,
test_post_nms_top_n=test_post_nms_top_n,
test_nms_thresh=test_nms_thresh,
test_min_size=test_min_size,
test_eta=test_eta,
num_classes=num_classes)
self.anchor_start_size = anchor_start_size
self.num_chan = num_chan
self.min_level = min_level
self.max_level = max_level
self.num_classes = num_classes
self.fpn_rpn_list = []
self.anchors_list = []
self.anchor_var_list = []
def _get_output(self, input, feat_lvl):
"""
Get anchor and FPN RPN head output at one level.
Args:
input(Variable): Body feature from backbone.
feat_lvl(int): Indicate the level of rpn output corresponding
to the level of feature map.
Return:
rpn_cls_score(Variable): Output of one level of fpn rpn head with
shape of [N, num_anchors, H, W].
rpn_bbox_pred(Variable): Output of one level of fpn rpn head with
shape of [N, num_anchors * 4, H, W].
"""
slvl = str(feat_lvl)
conv_name = 'conv_rpn_fpn' + slvl
cls_name = 'rpn_cls_logits_fpn' + slvl
bbox_name = 'rpn_bbox_pred_fpn' + slvl
conv_share_name = 'conv_rpn_fpn' + str(self.min_level)
cls_share_name = 'rpn_cls_logits_fpn' + str(self.min_level)
bbox_share_name = 'rpn_bbox_pred_fpn' + str(self.min_level)
num_anchors = len(self.aspect_ratios)
conv_rpn_fpn = fluid.layers.conv2d(
input=input,
num_filters=self.num_chan,
filter_size=3,
padding=1,
act='relu',
name=conv_name,
param_attr=ParamAttr(
name=conv_share_name + '_w',
initializer=Normal(loc=0., scale=0.01)),
bias_attr=ParamAttr(
name=conv_share_name + '_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
self.anchors, self.anchor_var = fluid.layers.anchor_generator(
input=conv_rpn_fpn,
anchor_sizes=(self.anchor_start_size * 2.**
(feat_lvl - self.min_level), ),
stride=(2.**feat_lvl, 2.**feat_lvl),
aspect_ratios=self.aspect_ratios,
variance=self.variance)
cls_num_filters = num_anchors * self.num_classes
self.rpn_cls_score = fluid.layers.conv2d(
input=conv_rpn_fpn,
num_filters=cls_num_filters,
filter_size=1,
act=None,
name=cls_name,
param_attr=ParamAttr(
name=cls_share_name + '_w',
initializer=Normal(loc=0., scale=0.01)),
bias_attr=ParamAttr(
name=cls_share_name + '_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
self.rpn_bbox_pred = fluid.layers.conv2d(
input=conv_rpn_fpn,
num_filters=num_anchors * 4,
filter_size=1,
act=None,
name=bbox_name,
param_attr=ParamAttr(
name=bbox_share_name + '_w',
initializer=Normal(loc=0., scale=0.01)),
bias_attr=ParamAttr(
name=bbox_share_name + '_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
return self.rpn_cls_score, self.rpn_bbox_pred
def _get_single_proposals(self, body_feat, im_info, feat_lvl,
mode='train'):
"""
Get proposals in one level according to the output of fpn rpn head
Args:
body_feat(Variable): the feature map from backone.
im_info(Variable): The information of image with shape [N, 3] with
format (height, width, scale).
feat_lvl(int): Indicate the level of proposals corresponding to
the feature maps.
Returns:
rpn_rois_fpn(Variable): Output proposals with shape of (rois_num, 4).
rpn_roi_probs_fpn(Variable): Scores of proposals with
shape of (rois_num, 1).
"""
rpn_cls_score_fpn, rpn_bbox_pred_fpn = self._get_output(
body_feat, feat_lvl)
if self.num_classes == 1:
rpn_cls_prob_fpn = fluid.layers.sigmoid(
rpn_cls_score_fpn, name='rpn_cls_prob_fpn' + str(feat_lvl))
else:
rpn_cls_score_fpn = fluid.layers.transpose(
rpn_cls_score_fpn, perm=[0, 2, 3, 1])
rpn_cls_score_fpn = fluid.layers.reshape(
rpn_cls_score_fpn, shape=(0, 0, 0, -1, self.num_classes))
rpn_cls_prob_fpn = fluid.layers.softmax(
rpn_cls_score_fpn,
use_cudnn=False,
name='rpn_cls_prob_fpn' + str(feat_lvl))
rpn_cls_prob_fpn = fluid.layers.slice(
rpn_cls_prob_fpn,
axes=[4],
starts=[1],
ends=[self.num_classes])
rpn_cls_prob_fpn, _ = fluid.layers.topk(rpn_cls_prob_fpn, 1)
rpn_cls_prob_fpn = fluid.layers.reshape(
rpn_cls_prob_fpn, shape=(0, 0, 0, -1))
rpn_cls_prob_fpn = fluid.layers.transpose(
rpn_cls_prob_fpn, perm=[0, 3, 1, 2])
if mode == 'train':
rpn_rois_fpn, rpn_roi_prob_fpn = fluid.layers.generate_proposals(
scores=rpn_cls_prob_fpn,
bbox_deltas=rpn_bbox_pred_fpn,
im_info=im_info,
anchors=self.anchors,
variances=self.anchor_var,
pre_nms_top_n=self.train_pre_nms_top_n,
post_nms_top_n=self.train_post_nms_top_n,
nms_thresh=self.train_nms_thresh,
min_size=self.train_min_size,
eta=self.train_eta)
else:
rpn_rois_fpn, rpn_roi_prob_fpn = fluid.layers.generate_proposals(
scores=rpn_cls_prob_fpn,
bbox_deltas=rpn_bbox_pred_fpn,
im_info=im_info,
anchors=self.anchors,
variances=self.anchor_var,
pre_nms_top_n=self.test_pre_nms_top_n,
post_nms_top_n=self.test_post_nms_top_n,
nms_thresh=self.test_nms_thresh,
min_size=self.test_min_size,
eta=self.test_eta)
return rpn_rois_fpn, rpn_roi_prob_fpn
def get_proposals(self, fpn_feats, im_info, mode='train'):
"""
Get proposals in multiple levels according to the output of fpn
rpn head
Args:
fpn_feats(dict): A dictionary represents the output feature map
of FPN with their name.
im_info(Variable): The information of image with shape [N, 3] with
format (height, width, scale).
Return:
rois_list(Variable): Output proposals in shape of [rois_num, 4]
"""
rois_list = []
roi_probs_list = []
fpn_feat_names = list(fpn_feats.keys())
for lvl in range(self.min_level, self.max_level + 1):
fpn_feat_name = fpn_feat_names[self.max_level - lvl]
fpn_feat = fpn_feats[fpn_feat_name]
rois_fpn, roi_probs_fpn = self._get_single_proposals(
fpn_feat, im_info, lvl, mode)
self.fpn_rpn_list.append((self.rpn_cls_score, self.rpn_bbox_pred))
rois_list.append(rois_fpn)
roi_probs_list.append(roi_probs_fpn)
self.anchors_list.append(self.anchors)
self.anchor_var_list.append(self.anchor_var)
post_nms_top_n = self.train_post_nms_top_n if mode == 'train' else \
self.test_post_nms_top_n
rois_collect = fluid.layers.collect_fpn_proposals(
rois_list,
roi_probs_list,
self.min_level,
self.max_level,
post_nms_top_n,
name='collect')
return rois_collect
def _get_loss_input(self):
rpn_clses = []
rpn_bboxes = []
anchors = []
anchor_vars = []
for i in range(len(self.fpn_rpn_list)):
single_input = self._transform_input(
self.fpn_rpn_list[i][0], self.fpn_rpn_list[i][1],
self.anchors_list[i], self.anchor_var_list[i])
rpn_clses.append(single_input[0])
rpn_bboxes.append(single_input[1])
anchors.append(single_input[2])
anchor_vars.append(single_input[3])
rpn_cls = fluid.layers.concat(rpn_clses, axis=1)
rpn_bbox = fluid.layers.concat(rpn_bboxes, axis=1)
anchors = fluid.layers.concat(anchors)
anchor_var = fluid.layers.concat(anchor_vars)
return rpn_cls, rpn_bbox, anchors, anchor_var
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.regularizer import L2Decay
from collections import OrderedDict
class YOLOv3:
def __init__(self,
backbone,
num_classes,
mode='train',
anchors=None,
anchor_masks=None,
ignore_threshold=0.7,
label_smooth=False,
nms_score_threshold=0.01,
nms_topk=1000,
nms_keep_topk=100,
nms_iou_threshold=0.45,
train_random_shapes=[
320, 352, 384, 416, 448, 480, 512, 544, 576, 608
]):
if anchors is None:
anchors = [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
[59, 119], [116, 90], [156, 198], [373, 326]]
if anchor_masks is None:
anchor_masks = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
self.anchors = anchors
self.anchor_masks = anchor_masks
self._parse_anchors(anchors)
self.mode = mode
self.num_classes = num_classes
self.backbone = backbone
self.ignore_thresh = ignore_threshold
self.label_smooth = label_smooth
self.nms_score_threshold = nms_score_threshold
self.nms_topk = nms_topk
self.nms_keep_topk = nms_keep_topk
self.nms_iou_threshold = nms_iou_threshold
self.norm_decay = 0.0
self.prefix_name = ''
self.train_random_shapes = train_random_shapes
def _head(self, feats):
outputs = []
out_layer_num = len(self.anchor_masks)
blocks = feats[-1:-out_layer_num - 1:-1]
route = None
for i, block in enumerate(blocks):
if i > 0:
block = fluid.layers.concat(input=[route, block], axis=1)
route, tip = self._detection_block(
block,
channel=512 // (2**i),
name=self.prefix_name + 'yolo_block.{}'.format(i))
num_filters = len(self.anchor_masks[i]) * (self.num_classes + 5)
block_out = fluid.layers.conv2d(
input=tip,
num_filters=num_filters,
filter_size=1,
stride=1,
padding=0,
act=None,
param_attr=ParamAttr(name=self.prefix_name +
'yolo_output.{}.conv.weights'.format(i)),
bias_attr=ParamAttr(
regularizer=L2Decay(0.0),
name=self.prefix_name +
'yolo_output.{}.conv.bias'.format(i)))
outputs.append(block_out)
if i < len(blocks) - 1:
route = self._conv_bn(
input=route,
ch_out=256 // (2**i),
filter_size=1,
stride=1,
padding=0,
name=self.prefix_name + 'yolo_transition.{}'.format(i))
route = self._upsample(route)
return outputs
def _parse_anchors(self, anchors):
self.anchors = []
self.mask_anchors = []
assert len(anchors) > 0, "ANCHORS not set."
assert len(self.anchor_masks) > 0, "ANCHOR_MASKS not set."
for anchor in anchors:
assert len(anchor) == 2, "anchor {} len should be 2".format(anchor)
self.anchors.extend(anchor)
anchor_num = len(anchors)
for masks in self.anchor_masks:
self.mask_anchors.append([])
for mask in masks:
assert mask < anchor_num, "anchor mask index overflow"
self.mask_anchors[-1].extend(anchors[mask])
def _conv_bn(self,
input,
ch_out,
filter_size,
stride,
padding,
act='leaky',
is_test=False,
name=None):
conv = fluid.layers.conv2d(
input=input,
num_filters=ch_out,
filter_size=filter_size,
stride=stride,
padding=padding,
act=None,
param_attr=ParamAttr(name=name + '.conv.weights'),
bias_attr=False)
bn_name = name + '.bn'
bn_param_attr = ParamAttr(
regularizer=L2Decay(self.norm_decay), name=bn_name + '.scale')
bn_bias_attr = ParamAttr(
regularizer=L2Decay(self.norm_decay), name=bn_name + '.offset')
out = fluid.layers.batch_norm(
input=conv,
act=None,
is_test=is_test,
param_attr=bn_param_attr,
bias_attr=bn_bias_attr,
moving_mean_name=bn_name + '.mean',
moving_variance_name=bn_name + '.var')
if act == 'leaky':
out = fluid.layers.leaky_relu(x=out, alpha=0.1)
return out
def _upsample(self, input, scale=2, name=None):
out = fluid.layers.resize_nearest(
input=input, scale=float(scale), name=name)
return out
def _detection_block(self, input, channel, name=None):
assert channel % 2 == 0, "channel({}) cannot be divided by 2 in detection block({})".format(
channel, name)
is_test = False if self.mode == 'train' else True
conv = input
for i in range(2):
conv = self._conv_bn(
conv,
channel,
filter_size=1,
stride=1,
padding=0,
is_test=is_test,
name='{}.{}.0'.format(name, i))
conv = self._conv_bn(
conv,
channel * 2,
filter_size=3,
stride=1,
padding=1,
is_test=is_test,
name='{}.{}.1'.format(name, i))
route = self._conv_bn(
conv,
channel,
filter_size=1,
stride=1,
padding=0,
is_test=is_test,
name='{}.2'.format(name))
tip = self._conv_bn(
route,
channel * 2,
filter_size=3,
stride=1,
padding=1,
is_test=is_test,
name='{}.tip'.format(name))
return route, tip
def _get_loss(self, inputs, gt_box, gt_label, gt_score):
losses = []
downsample = 32
for i, input in enumerate(inputs):
loss = fluid.layers.yolov3_loss(
x=input,
gt_box=gt_box,
gt_label=gt_label,
gt_score=gt_score,
anchors=self.anchors,
anchor_mask=self.anchor_masks[i],
class_num=self.num_classes,
ignore_thresh=self.ignore_thresh,
downsample_ratio=downsample,
use_label_smooth=self.label_smooth,
name=self.prefix_name + 'yolo_loss' + str(i))
losses.append(fluid.layers.reduce_mean(loss))
downsample //= 2
return sum(losses)
def _get_prediction(self, inputs, im_size):
boxes = []
scores = []
downsample = 32
for i, input in enumerate(inputs):
box, score = fluid.layers.yolo_box(
x=input,
img_size=im_size,
anchors=self.mask_anchors[i],
class_num=self.num_classes,
conf_thresh=self.nms_score_threshold,
downsample_ratio=downsample,
name=self.prefix_name + 'yolo_box' + str(i))
boxes.append(box)
scores.append(fluid.layers.transpose(score, perm=[0, 2, 1]))
downsample //= 2
yolo_boxes = fluid.layers.concat(boxes, axis=1)
yolo_scores = fluid.layers.concat(scores, axis=2)
pred = fluid.layers.multiclass_nms(
bboxes=yolo_boxes,
scores=yolo_scores,
score_threshold=self.nms_score_threshold,
nms_top_k=self.nms_topk,
keep_top_k=self.nms_keep_topk,
nms_threshold=self.nms_iou_threshold,
normalized=False,
nms_eta=1.0,
background_label=-1)
return pred
def generate_inputs(self):
inputs = OrderedDict()
inputs['image'] = fluid.data(
dtype='float32', shape=[None, 3, None, None], name='image')
if self.mode == 'train':
inputs['gt_box'] = fluid.data(
dtype='float32', shape=[None, None, 4], name='gt_box')
inputs['gt_label'] = fluid.data(
dtype='int32', shape=[None, None], name='gt_label')
inputs['gt_score'] = fluid.data(
dtype='float32', shape=[None, None], name='gt_score')
inputs['im_size'] = fluid.data(
dtype='int32', shape=[None, 2], name='im_size')
elif self.mode == 'eval':
inputs['im_size'] = fluid.data(
dtype='int32', shape=[None, 2], name='im_size')
inputs['im_id'] = fluid.data(
dtype='int32', shape=[None, 1], name='im_id')
inputs['gt_box'] = fluid.data(
dtype='float32', shape=[None, None, 4], name='gt_box')
inputs['gt_label'] = fluid.data(
dtype='int32', shape=[None, None], name='gt_label')
inputs['is_difficult'] = fluid.data(
dtype='int32', shape=[None, None], name='is_difficult')
elif self.mode == 'test':
inputs['im_size'] = fluid.data(
dtype='int32', shape=[None, 2], name='im_size')
return inputs
def build_net(self, inputs):
image = inputs['image']
if self.mode == 'train':
if isinstance(self.train_random_shapes,
(list, tuple)) and len(self.train_random_shapes) > 0:
import numpy as np
shapes = np.array(self.train_random_shapes)
shapes = np.stack([shapes, shapes], axis=1).astype('float32')
shapes_tensor = fluid.layers.assign(shapes)
index = fluid.layers.uniform_random(
shape=[1], dtype='float32', min=0.0, max=1)
index = fluid.layers.cast(
index * len(self.train_random_shapes), dtype='int32')
shape = fluid.layers.gather(shapes_tensor, index)
shape = fluid.layers.reshape(shape, [-1])
shape = fluid.layers.cast(shape, dtype='int32')
image = fluid.layers.resize_nearest(
image, out_shape=shape, align_corners=False)
feats = self.backbone(image)
if isinstance(feats, OrderedDict):
feat_names = list(feats.keys())
feats = [feats[name] for name in feat_names]
head_outputs = self._head(feats)
if self.mode == 'train':
gt_box = inputs['gt_box']
gt_label = inputs['gt_label']
gt_score = inputs['gt_score']
im_size = inputs['im_size']
num_boxes = fluid.layers.shape(gt_box)[1]
im_size_wh = fluid.layers.reverse(im_size, axis=1)
whwh = fluid.layers.concat([im_size_wh, im_size_wh], axis=1)
whwh = fluid.layers.unsqueeze(whwh, axes=[1])
whwh = fluid.layers.expand(whwh, expand_times=[1, num_boxes, 1])
whwh = fluid.layers.cast(whwh, dtype='float32')
whwh.stop_gradient = True
normalized_box = fluid.layers.elementwise_div(gt_box, whwh)
return self._get_loss(head_outputs, normalized_box, gt_label,
gt_score)
else:
im_size = inputs['im_size']
return self._get_prediction(head_outputs, im_size)
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.regularizer import L2Decay
class MobileNetV1(object):
"""
MobileNet v1, see https://arxiv.org/abs/1704.04861
Args:
norm_type (str): normalization type, 'bn' and 'sync_bn' are supported
norm_decay (float): weight decay for normalization layer weights
conv_group_scale (int): scaling factor for convolution groups
with_extra_blocks (bool): if extra blocks should be added
extra_block_filters (list): number of filter for each extra block
"""
def __init__(self,
norm_type='bn',
norm_decay=0.,
conv_group_scale=1,
conv_learning_rate=1.0,
with_extra_blocks=False,
extra_block_filters=[[256, 512], [128, 256], [128, 256],
[64, 128]],
weight_prefix_name='',
num_classes=None):
self.norm_type = norm_type
self.norm_decay = norm_decay
self.conv_group_scale = conv_group_scale
self.conv_learning_rate = conv_learning_rate
self.with_extra_blocks = with_extra_blocks
self.extra_block_filters = extra_block_filters
self.prefix_name = weight_prefix_name
self.num_classes = num_classes
def _conv_norm(self,
input,
filter_size,
num_filters,
stride,
padding,
num_groups=1,
act='relu',
use_cudnn=True,
name=None):
parameter_attr = ParamAttr(
learning_rate=self.conv_learning_rate,
initializer=fluid.initializer.MSRA(),
name=name + "_weights")
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=parameter_attr,
bias_attr=False)
bn_name = name + "_bn"
norm_decay = self.norm_decay
bn_param_attr = ParamAttr(
regularizer=L2Decay(norm_decay), name=bn_name + '_scale')
bn_bias_attr = ParamAttr(
regularizer=L2Decay(norm_decay), name=bn_name + '_offset')
return fluid.layers.batch_norm(
input=conv,
act=act,
param_attr=bn_param_attr,
bias_attr=bn_bias_attr,
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance')
def depthwise_separable(self,
input,
num_filters1,
num_filters2,
num_groups,
stride,
scale,
name=None):
depthwise_conv = self._conv_norm(
input=input,
filter_size=3,
num_filters=int(num_filters1 * scale),
stride=stride,
padding=1,
num_groups=int(num_groups * scale),
use_cudnn=False,
name=name + "_dw")
pointwise_conv = self._conv_norm(
input=depthwise_conv,
filter_size=1,
num_filters=int(num_filters2 * scale),
stride=1,
padding=0,
name=name + "_sep")
return pointwise_conv
def _extra_block(self,
input,
num_filters1,
num_filters2,
num_groups,
stride,
name=None):
pointwise_conv = self._conv_norm(
input=input,
filter_size=1,
num_filters=int(num_filters1),
stride=1,
num_groups=int(num_groups),
padding=0,
name=name + "_extra1")
normal_conv = self._conv_norm(
input=pointwise_conv,
filter_size=3,
num_filters=int(num_filters2),
stride=2,
num_groups=int(num_groups),
padding=1,
name=name + "_extra2")
return normal_conv
def __call__(self, input):
scale = self.conv_group_scale
blocks = []
# input 1/1
out = self._conv_norm(
input, 3, int(32 * scale), 2, 1, name=self.prefix_name + "conv1")
# 1/2
out = self.depthwise_separable(
out, 32, 64, 32, 1, scale, name=self.prefix_name + "conv2_1")
out = self.depthwise_separable(
out, 64, 128, 64, 2, scale, name=self.prefix_name + "conv2_2")
# 1/4
out = self.depthwise_separable(
out, 128, 128, 128, 1, scale, name=self.prefix_name + "conv3_1")
out = self.depthwise_separable(
out, 128, 256, 128, 2, scale, name=self.prefix_name + "conv3_2")
# 1/8
blocks.append(out)
out = self.depthwise_separable(
out, 256, 256, 256, 1, scale, name=self.prefix_name + "conv4_1")
out = self.depthwise_separable(
out, 256, 512, 256, 2, scale, name=self.prefix_name + "conv4_2")
# 1/16
blocks.append(out)
for i in range(5):
out = self.depthwise_separable(
out,
512,
512,
512,
1,
scale,
name=self.prefix_name + "conv5_" + str(i + 1))
module11 = out
out = self.depthwise_separable(
out, 512, 1024, 512, 2, scale, name=self.prefix_name + "conv5_6")
# 1/32
out = self.depthwise_separable(
out, 1024, 1024, 1024, 1, scale, name=self.prefix_name + "conv6")
module13 = out
blocks.append(out)
if self.num_classes:
out = fluid.layers.pool2d(
input=out, pool_type='avg', global_pooling=True)
output = fluid.layers.fc(
input=out,
size=self.num_classes,
param_attr=ParamAttr(
initializer=fluid.initializer.MSRA(), name="fc7_weights"),
bias_attr=ParamAttr(name="fc7_offset"))
return output
if not self.with_extra_blocks:
return blocks
num_filters = self.extra_block_filters
module14 = self._extra_block(module13, num_filters[0][0],
num_filters[0][1], 1, 2,
self.prefix_name + "conv7_1")
module15 = self._extra_block(module14, num_filters[1][0],
num_filters[1][1], 1, 2,
self.prefix_name + "conv7_2")
module16 = self._extra_block(module15, num_filters[2][0],
num_filters[2][1], 1, 2,
self.prefix_name + "conv7_3")
module17 = self._extra_block(module16, num_filters[3][0],
num_filters[3][1], 1, 2,
self.prefix_name + "conv7_4")
return module11, module13, module14, module15, module16, module17
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
class MobileNetV2:
def __init__(self,
num_classes=None,
scale=1.0,
output_stride=None,
end_points=None,
decode_points=None):
self.scale = scale
self.num_classes = num_classes
self.output_stride = output_stride
self.end_points = end_points
self.decode_points = decode_points
self.bottleneck_params_list = [(1, 16, 1, 1), (6, 24, 2, 2),
(6, 32, 3, 2), (6, 64, 4, 2),
(6, 96, 3, 1), (6, 160, 3, 2),
(6, 320, 1, 1)]
self.modify_bottle_params(output_stride)
def __call__(self, input):
scale = self.scale
decode_ends = dict()
def check_points(count, points):
if points is None:
return False
else:
if isinstance(points, list):
return (True if count in points else False)
else:
return (True if count == points else False)
# conv1
input = self.conv_bn_layer(
input,
num_filters=int(32 * scale),
filter_size=3,
stride=2,
padding=1,
if_act=True,
name='conv1_1')
layer_count = 1
if check_points(layer_count, self.decode_points):
decode_ends[layer_count] = input
if check_points(layer_count, self.end_points):
return input, decode_ends
# bottleneck sequences
i = 1
in_c = int(32 * scale)
for layer_setting in self.bottleneck_params_list:
t, c, n, s = layer_setting
i += 1
input, depthwise_output = self.invresi_blocks(
input=input,
in_c=in_c,
t=t,
c=int(c * scale),
n=n,
s=s,
name='conv' + str(i))
in_c = int(c * scale)
layer_count += n
if check_points(layer_count, self.decode_points):
decode_ends[layer_count] = depthwise_output
if check_points(layer_count, self.end_points):
return input, decode_ends
# last_conv
output = self.conv_bn_layer(
input=input,
num_filters=int(1280 * scale) if scale > 1.0 else 1280,
filter_size=1,
stride=1,
padding=0,
if_act=True,
name='conv9')
if self.num_classes is not None:
output = fluid.layers.pool2d(
input=output, pool_type='avg', global_pooling=True)
output = fluid.layers.fc(
input=output,
size=self.num_classes,
param_attr=ParamAttr(name='fc10_weights'),
bias_attr=ParamAttr(name='fc10_offset'))
return output
def modify_bottle_params(self, output_stride=None):
if output_stride is not None and output_stride % 2 != 0:
raise Exception("output stride must to be even number")
if output_stride is None:
return
else:
stride = 2
for i, layer_setting in enumerate(self.bottleneck_params_list):
t, c, n, s = layer_setting
stride = stride * s
if stride > output_stride:
s = 1
self.bottleneck_params_list[i] = (t, c, n, s)
def conv_bn_layer(self,
input,
filter_size,
num_filters,
stride,
padding,
channels=None,
num_groups=1,
if_act=True,
name=None,
use_cudnn=True):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=ParamAttr(name=name + '_weights'),
bias_attr=False)
bn_name = name + '_bn'
bn = fluid.layers.batch_norm(
input=conv,
param_attr=ParamAttr(name=bn_name + "_scale"),
bias_attr=ParamAttr(name=bn_name + "_offset"),
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance')
if if_act:
return fluid.layers.relu6(bn)
else:
return bn
def shortcut(self, input, data_residual):
return fluid.layers.elementwise_add(input, data_residual)
def inverted_residual_unit(self,
input,
num_in_filter,
num_filters,
ifshortcut,
stride,
filter_size,
padding,
expansion_factor,
name=None):
num_expfilter = int(round(num_in_filter * expansion_factor))
channel_expand = self.conv_bn_layer(
input=input,
num_filters=num_expfilter,
filter_size=1,
stride=1,
padding=0,
num_groups=1,
if_act=True,
name=name + '_expand')
bottleneck_conv = self.conv_bn_layer(
input=channel_expand,
num_filters=num_expfilter,
filter_size=filter_size,
stride=stride,
padding=padding,
num_groups=num_expfilter,
if_act=True,
name=name + '_dwise',
use_cudnn=False)
depthwise_output = bottleneck_conv
linear_out = self.conv_bn_layer(
input=bottleneck_conv,
num_filters=num_filters,
filter_size=1,
stride=1,
padding=0,
num_groups=1,
if_act=False,
name=name + '_linear')
if ifshortcut:
out = self.shortcut(input=input, data_residual=linear_out)
return out, depthwise_output
else:
return linear_out, depthwise_output
def invresi_blocks(self, input, in_c, t, c, n, s, name=None):
first_block, depthwise_output = self.inverted_residual_unit(
input=input,
num_in_filter=in_c,
num_filters=c,
ifshortcut=False,
stride=s,
filter_size=3,
padding=1,
expansion_factor=t,
name=name + '_1')
last_residual_block = first_block
last_c = c
for i in range(1, n):
last_residual_block, depthwise_output = self.inverted_residual_unit(
input=last_residual_block,
num_in_filter=last_c,
num_filters=c,
ifshortcut=True,
stride=1,
filter_size=3,
padding=1,
expansion_factor=t,
name=name + '_' + str(i + 1))
return last_residual_block, depthwise_output
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.regularizer import L2Decay
import math
class MobileNetV3():
"""
MobileNet v3, see https://arxiv.org/abs/1905.02244
Args:
scale (float): scaling factor for convolution groups proportion of mobilenet_v3.
model_name (str): There are two modes, small and large.
norm_type (str): normalization type, 'bn' and 'sync_bn' are supported.
norm_decay (float): weight decay for normalization layer weights.
conv_decay (float): weight decay for convolution layer weights.
with_extra_blocks (bool): if extra blocks should be added.
extra_block_filters (list): number of filter for each extra block.
"""
def __init__(self,
scale=1.0,
model_name='small',
with_extra_blocks=False,
conv_decay=0.0,
norm_type='bn',
norm_decay=0.0,
extra_block_filters=[[256, 512], [128, 256], [128, 256],
[64, 128]],
num_classes=None):
self.scale = scale
self.with_extra_blocks = with_extra_blocks
self.extra_block_filters = extra_block_filters
self.conv_decay = conv_decay
self.norm_decay = norm_decay
self.inplanes = 16
self.end_points = []
self.block_stride = 1
self.num_classes = num_classes
if model_name == "large":
self.cfg = [
# kernel_size, expand, channel, se_block, act_mode, stride
[3, 16, 16, False, 'relu', 1],
[3, 64, 24, False, 'relu', 2],
[3, 72, 24, False, 'relu', 1],
[5, 72, 40, True, 'relu', 2],
[5, 120, 40, True, 'relu', 1],
[5, 120, 40, True, 'relu', 1],
[3, 240, 80, False, 'hard_swish', 2],
[3, 200, 80, False, 'hard_swish', 1],
[3, 184, 80, False, 'hard_swish', 1],
[3, 184, 80, False, 'hard_swish', 1],
[3, 480, 112, True, 'hard_swish', 1],
[3, 672, 112, True, 'hard_swish', 1],
[5, 672, 160, True, 'hard_swish', 2],
[5, 960, 160, True, 'hard_swish', 1],
[5, 960, 160, True, 'hard_swish', 1],
]
self.cls_ch_squeeze = 960
self.cls_ch_expand = 1280
elif model_name == "small":
self.cfg = [
# kernel_size, expand, channel, se_block, act_mode, stride
[3, 16, 16, True, 'relu', 2],
[3, 72, 24, False, 'relu', 2],
[3, 88, 24, False, 'relu', 1],
[5, 96, 40, True, 'hard_swish', 2],
[5, 240, 40, True, 'hard_swish', 1],
[5, 240, 40, True, 'hard_swish', 1],
[5, 120, 48, True, 'hard_swish', 1],
[5, 144, 48, True, 'hard_swish', 1],
[5, 288, 96, True, 'hard_swish', 2],
[5, 576, 96, True, 'hard_swish', 1],
[5, 576, 96, True, 'hard_swish', 1],
]
self.cls_ch_squeeze = 576
self.cls_ch_expand = 1280
else:
raise NotImplementedError
def _conv_bn_layer(self,
input,
filter_size,
num_filters,
stride,
padding,
num_groups=1,
if_act=True,
act=None,
name=None,
use_cudnn=True):
conv_param_attr = ParamAttr(
name=name + '_weights', regularizer=L2Decay(self.conv_decay))
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=conv_param_attr,
bias_attr=False)
bn_name = name + '_bn'
bn_param_attr = ParamAttr(
name=bn_name + "_scale", regularizer=L2Decay(self.norm_decay))
bn_bias_attr = ParamAttr(
name=bn_name + "_offset", regularizer=L2Decay(self.norm_decay))
bn = fluid.layers.batch_norm(
input=conv,
param_attr=bn_param_attr,
bias_attr=bn_bias_attr,
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance')
if if_act:
if act == 'relu':
bn = fluid.layers.relu(bn)
elif act == 'hard_swish':
bn = self._hard_swish(bn)
elif act == 'relu6':
bn = fluid.layers.relu6(bn)
return bn
def _hard_swish(self, x):
return x * fluid.layers.relu6(x + 3) / 6.
def _se_block(self, input, num_out_filter, ratio=4, name=None):
num_mid_filter = int(num_out_filter // ratio)
pool = fluid.layers.pool2d(
input=input, pool_type='avg', global_pooling=True, use_cudnn=False)
conv1 = fluid.layers.conv2d(
input=pool,
filter_size=1,
num_filters=num_mid_filter,
act='relu',
param_attr=ParamAttr(name=name + '_1_weights'),
bias_attr=ParamAttr(name=name + '_1_offset'))
conv2 = fluid.layers.conv2d(
input=conv1,
filter_size=1,
num_filters=num_out_filter,
act='hard_sigmoid',
param_attr=ParamAttr(name=name + '_2_weights'),
bias_attr=ParamAttr(name=name + '_2_offset'))
scale = fluid.layers.elementwise_mul(x=input, y=conv2, axis=0)
return scale
def _residual_unit(self,
input,
num_in_filter,
num_mid_filter,
num_out_filter,
stride,
filter_size,
act=None,
use_se=False,
name=None):
input_data = input
conv0 = self._conv_bn_layer(
input=input,
filter_size=1,
num_filters=num_mid_filter,
stride=1,
padding=0,
if_act=True,
act=act,
name=name + '_expand')
if self.block_stride == 16 and stride == 2:
self.end_points.append(conv0)
conv1 = self._conv_bn_layer(
input=conv0,
filter_size=filter_size,
num_filters=num_mid_filter,
stride=stride,
padding=int((filter_size - 1) // 2),
if_act=True,
act=act,
num_groups=num_mid_filter,
use_cudnn=False,
name=name + '_depthwise')
if use_se:
conv1 = self._se_block(
input=conv1, num_out_filter=num_mid_filter, name=name + '_se')
conv2 = self._conv_bn_layer(
input=conv1,
filter_size=1,
num_filters=num_out_filter,
stride=1,
padding=0,
if_act=False,
name=name + '_linear')
if num_in_filter != num_out_filter or stride != 1:
return conv2
else:
return fluid.layers.elementwise_add(
x=input_data, y=conv2, act=None)
def _extra_block_dw(self,
input,
num_filters1,
num_filters2,
stride,
name=None):
pointwise_conv = self._conv_bn_layer(
input=input,
filter_size=1,
num_filters=int(num_filters1),
stride=1,
padding="SAME",
act='relu6',
name=name + "_extra1")
depthwise_conv = self._conv_bn_layer(
input=pointwise_conv,
filter_size=3,
num_filters=int(num_filters2),
stride=stride,
padding="SAME",
num_groups=int(num_filters1),
act='relu6',
use_cudnn=False,
name=name + "_extra2_dw")
normal_conv = self._conv_bn_layer(
input=depthwise_conv,
filter_size=1,
num_filters=int(num_filters2),
stride=1,
padding="SAME",
act='relu6',
name=name + "_extra2_sep")
return normal_conv
def __call__(self, input):
scale = self.scale
inplanes = self.inplanes
cfg = self.cfg
blocks = []
#conv1
conv = self._conv_bn_layer(
input,
filter_size=3,
num_filters=inplanes if scale <= 1.0 else int(inplanes * scale),
stride=2,
padding=1,
num_groups=1,
if_act=True,
act='hard_swish',
name='conv1')
i = 0
for layer_cfg in cfg:
self.block_stride *= layer_cfg[5]
if layer_cfg[5] == 2:
blocks.append(conv)
conv = self._residual_unit(
input=conv,
num_in_filter=inplanes,
num_mid_filter=int(scale * layer_cfg[1]),
num_out_filter=int(scale * layer_cfg[2]),
act=layer_cfg[4],
stride=layer_cfg[5],
filter_size=layer_cfg[0],
use_se=layer_cfg[3],
name='conv' + str(i + 2))
inplanes = int(scale * layer_cfg[2])
i += 1
blocks.append(conv)
if self.num_classes:
conv = self._conv_bn_layer(
input=conv,
filter_size=1,
num_filters=int(scale * self.cls_ch_squeeze),
stride=1,
padding=0,
num_groups=1,
if_act=True,
act='hard_swish',
name='conv_last')
conv = fluid.layers.pool2d(
input=conv,
pool_type='avg',
global_pooling=True,
use_cudnn=False)
conv = fluid.layers.conv2d(
input=conv,
num_filters=self.cls_ch_expand,
filter_size=1,
stride=1,
padding=0,
act=None,
param_attr=ParamAttr(name='last_1x1_conv_weights'),
bias_attr=False)
conv = self._hard_swish(conv)
drop = fluid.layers.dropout(x=conv, dropout_prob=0.2)
out = fluid.layers.fc(
input=drop,
size=self.num_classes,
param_attr=ParamAttr(name='fc_weights'),
bias_attr=ParamAttr(name='fc_offset'))
return out
if not self.with_extra_blocks:
return blocks
# extra block
conv_extra = self._conv_bn_layer(
conv,
filter_size=1,
num_filters=int(scale * cfg[-1][1]),
stride=1,
padding="SAME",
num_groups=1,
if_act=True,
act='hard_swish',
name='conv' + str(i + 2))
self.end_points.append(conv_extra)
i += 1
for block_filter in self.extra_block_filters:
conv_extra = self._extra_block_dw(conv_extra, block_filter[0],
block_filter[1], 2,
'conv' + str(i + 2))
self.end_points.append(conv_extra)
i += 1
return self.end_points
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
from collections import OrderedDict
import paddle
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.framework import Variable
from paddle.fluid.regularizer import L2Decay
from paddle.fluid.initializer import Constant
from numbers import Integral
from .backbone_utils import NameAdapter
__all__ = ['ResNet', 'ResNetC5']
class ResNet(object):
"""
Residual Network, see https://arxiv.org/abs/1512.03385
Args:
layers (int): ResNet layers, should be 18, 34, 50, 101, 152.
freeze_at (int): freeze the backbone at which stage
norm_type (str): normalization type, 'bn'/'sync_bn'/'affine_channel'
freeze_norm (bool): freeze normalization layers
norm_decay (float): weight decay for normalization layer weights
variant (str): ResNet variant, supports 'a', 'b', 'c', 'd' currently
feature_maps (list): index of stages whose feature maps are returned
dcn_v2_stages (list): index of stages who select deformable conv v2
nonlocal_stages (list): index of stages who select nonlocal networks
gcb_stages (list): index of stages who select gc blocks
gcb_params (dict): gc blocks config, includes ratio(default as 1.0/16),
pooling_type(default as "att") and
fusion_types(default as ['channel_add'])
"""
def __init__(self,
layers=50,
freeze_at=0,
norm_type='bn',
freeze_norm=False,
norm_decay=0.,
variant='b',
feature_maps=[2, 3, 4, 5],
dcn_v2_stages=[],
weight_prefix_name='',
nonlocal_stages=[],
gcb_stages=[],
gcb_params=dict(),
num_classes=None):
super(ResNet, self).__init__()
if isinstance(feature_maps, Integral):
feature_maps = [feature_maps]
assert layers in [18, 34, 50, 101, 152, 200], \
"layers {} not in [18, 34, 50, 101, 152, 200]"
assert variant in ['a', 'b', 'c', 'd'], "invalid ResNet variant"
assert 0 <= freeze_at <= 5, "freeze_at should be 0, 1, 2, 3, 4 or 5"
assert len(feature_maps) > 0, "need one or more feature maps"
assert norm_type in ['bn', 'sync_bn', 'affine_channel']
assert not (len(nonlocal_stages)>0 and layers<50), \
"non-local is not supported for resnet18 or resnet34"
self.layers = layers
self.freeze_at = freeze_at
self.norm_type = norm_type
self.norm_decay = norm_decay
self.freeze_norm = freeze_norm
self.variant = variant
self._model_type = 'ResNet'
self.feature_maps = feature_maps
self.dcn_v2_stages = dcn_v2_stages
self.layers_cfg = {
18: ([2, 2, 2, 2], self.basicblock),
34: ([3, 4, 6, 3], self.basicblock),
50: ([3, 4, 6, 3], self.bottleneck),
101: ([3, 4, 23, 3], self.bottleneck),
152: ([3, 8, 36, 3], self.bottleneck),
200: ([3, 12, 48, 3], self.bottleneck),
}
self.stage_filters = [64, 128, 256, 512]
self._c1_out_chan_num = 64
self.na = NameAdapter(self)
self.prefix_name = weight_prefix_name
self.nonlocal_stages = nonlocal_stages
self.nonlocal_mod_cfg = {
50: 2,
101: 5,
152: 8,
200: 12,
}
self.gcb_stages = gcb_stages
self.gcb_params = gcb_params
self.num_classes = num_classes
def _conv_offset(self,
input,
filter_size,
stride,
padding,
act=None,
name=None):
out_channel = filter_size * filter_size * 3
out = fluid.layers.conv2d(
input,
num_filters=out_channel,
filter_size=filter_size,
stride=stride,
padding=padding,
param_attr=ParamAttr(
initializer=Constant(0.0), name=name + ".w_0"),
bias_attr=ParamAttr(initializer=Constant(0.0), name=name + ".b_0"),
act=act,
name=name)
return out
def _conv_norm(self,
input,
num_filters,
filter_size,
stride=1,
groups=1,
act=None,
name=None,
dcn_v2=False):
_name = self.prefix_name + name if self.prefix_name != '' else name
if not dcn_v2:
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=groups,
act=None,
param_attr=ParamAttr(name=_name + "_weights"),
bias_attr=False,
name=_name + '.conv2d.output.1')
else:
# select deformable conv"
offset_mask = self._conv_offset(
input=input,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
act=None,
name=_name + "_conv_offset")
offset_channel = filter_size**2 * 2
mask_channel = filter_size**2
offset, mask = fluid.layers.split(
input=offset_mask,
num_or_sections=[offset_channel, mask_channel],
dim=1)
mask = fluid.layers.sigmoid(mask)
conv = fluid.layers.deformable_conv(
input=input,
offset=offset,
mask=mask,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=groups,
deformable_groups=1,
im2col_step=1,
param_attr=ParamAttr(name=_name + "_weights"),
bias_attr=False,
name=_name + ".conv2d.output.1")
bn_name = self.na.fix_conv_norm_name(name)
bn_name = self.prefix_name + bn_name if self.prefix_name != '' else bn_name
norm_lr = 0. if self.freeze_norm else 1.
norm_decay = self.norm_decay
pattr = ParamAttr(
name=bn_name + '_scale',
learning_rate=norm_lr,
regularizer=L2Decay(norm_decay))
battr = ParamAttr(
name=bn_name + '_offset',
learning_rate=norm_lr,
regularizer=L2Decay(norm_decay))
if self.norm_type in ['bn', 'sync_bn']:
global_stats = True if self.freeze_norm else False
out = fluid.layers.batch_norm(
input=conv,
act=act,
name=bn_name + '.output.1',
param_attr=pattr,
bias_attr=battr,
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance',
use_global_stats=global_stats)
scale = fluid.framework._get_var(pattr.name)
bias = fluid.framework._get_var(battr.name)
elif self.norm_type == 'affine_channel':
scale = fluid.layers.create_parameter(
shape=[conv.shape[1]],
dtype=conv.dtype,
attr=pattr,
default_initializer=fluid.initializer.Constant(1.))
bias = fluid.layers.create_parameter(
shape=[conv.shape[1]],
dtype=conv.dtype,
attr=battr,
default_initializer=fluid.initializer.Constant(0.))
out = fluid.layers.affine_channel(
x=conv, scale=scale, bias=bias, act=act)
if self.freeze_norm:
scale.stop_gradient = True
bias.stop_gradient = True
return out
def _shortcut(self, input, ch_out, stride, is_first, name):
max_pooling_in_short_cut = self.variant == 'd'
ch_in = input.shape[1]
# the naming rule is same as pretrained weight
name = self.na.fix_shortcut_name(name)
std_senet = getattr(self, 'std_senet', False)
if ch_in != ch_out or stride != 1 or (self.layers < 50 and is_first):
if std_senet:
if is_first:
return self._conv_norm(input, ch_out, 1, stride, name=name)
else:
return self._conv_norm(input, ch_out, 3, stride, name=name)
if max_pooling_in_short_cut and not is_first:
input = fluid.layers.pool2d(
input=input,
pool_size=2,
pool_stride=2,
pool_padding=0,
ceil_mode=True,
pool_type='avg')
return self._conv_norm(input, ch_out, 1, 1, name=name)
return self._conv_norm(input, ch_out, 1, stride, name=name)
else:
return input
def bottleneck(self,
input,
num_filters,
stride,
is_first,
name,
dcn_v2=False,
gcb=False,
gcb_name=None):
if self.variant == 'a':
stride1, stride2 = stride, 1
else:
stride1, stride2 = 1, stride
# ResNeXt
groups = getattr(self, 'groups', 1)
group_width = getattr(self, 'group_width', -1)
if groups == 1:
expand = 4
elif (groups * group_width) == 256:
expand = 1
else: # FIXME hard code for now, handles 32x4d, 64x4d and 32x8d
num_filters = num_filters // 2
expand = 2
conv_name1, conv_name2, conv_name3, \
shortcut_name = self.na.fix_bottleneck_name(name)
std_senet = getattr(self, 'std_senet', False)
if std_senet:
conv_def = [[
int(num_filters / 2), 1, stride1, 'relu', 1, conv_name1
], [num_filters, 3, stride2, 'relu', groups, conv_name2],
[num_filters * expand, 1, 1, None, 1, conv_name3]]
else:
conv_def = [[num_filters, 1, stride1, 'relu', 1, conv_name1],
[num_filters, 3, stride2, 'relu', groups, conv_name2],
[num_filters * expand, 1, 1, None, 1, conv_name3]]
residual = input
for i, (c, k, s, act, g, _name) in enumerate(conv_def):
residual = self._conv_norm(
input=residual,
num_filters=c,
filter_size=k,
stride=s,
act=act,
groups=g,
name=_name,
dcn_v2=(i == 1 and dcn_v2))
short = self._shortcut(
input,
num_filters * expand,
stride,
is_first=is_first,
name=shortcut_name)
# Squeeze-and-Excitation
if callable(getattr(self, '_squeeze_excitation', None)):
residual = self._squeeze_excitation(
input=residual, num_channels=num_filters, name='fc' + name)
if gcb:
residual = add_gc_block(residual, name=gcb_name, **self.gcb_params)
return fluid.layers.elementwise_add(
x=short, y=residual, act='relu', name=name + ".add.output.5")
def basicblock(self,
input,
num_filters,
stride,
is_first,
name,
dcn_v2=False,
gcb=False,
gcb_name=None):
assert dcn_v2 is False, "Not implemented yet."
assert gcb is False, "Not implemented yet."
conv0 = self._conv_norm(
input=input,
num_filters=num_filters,
filter_size=3,
act='relu',
stride=stride,
name=name + "_branch2a")
conv1 = self._conv_norm(
input=conv0,
num_filters=num_filters,
filter_size=3,
act=None,
name=name + "_branch2b")
short = self._shortcut(
input, num_filters, stride, is_first, name=name + "_branch1")
return fluid.layers.elementwise_add(x=short, y=conv1, act='relu')
def layer_warp(self, input, stage_num):
"""
Args:
input (Variable): input variable.
stage_num (int): the stage number, should be 2, 3, 4, 5
Returns:
The last variable in endpoint-th stage.
"""
assert stage_num in [2, 3, 4, 5]
stages, block_func = self.layers_cfg[self.layers]
count = stages[stage_num - 2]
ch_out = self.stage_filters[stage_num - 2]
is_first = False if stage_num != 2 else True
dcn_v2 = True if stage_num in self.dcn_v2_stages else False
nonlocal_mod = 1000
if stage_num in self.nonlocal_stages:
nonlocal_mod = self.nonlocal_mod_cfg[
self.layers] if stage_num == 4 else 2
# Make the layer name and parameter name consistent
# with ImageNet pre-trained model
conv = input
for i in range(count):
conv_name = self.na.fix_layer_warp_name(stage_num, count, i)
if self.layers < 50:
is_first = True if i == 0 and stage_num == 2 else False
gcb = stage_num in self.gcb_stages
gcb_name = "gcb_res{}_b{}".format(stage_num, i)
conv = block_func(
input=conv,
num_filters=ch_out,
stride=2 if i == 0 and stage_num != 2 else 1,
is_first=is_first,
name=conv_name,
dcn_v2=dcn_v2,
gcb=gcb,
gcb_name=gcb_name)
# add non local model
dim_in = conv.shape[1]
nonlocal_name = "nonlocal_conv{}".format(stage_num)
if i % nonlocal_mod == nonlocal_mod - 1:
conv = add_space_nonlocal(conv, dim_in, dim_in,
nonlocal_name + '_{}'.format(i),
int(dim_in / 2))
return conv
def c1_stage(self, input):
out_chan = self._c1_out_chan_num
conv1_name = self.na.fix_c1_stage_name()
if self.variant in ['c', 'd']:
conv_def = [
[out_chan // 2, 3, 2, "conv1_1"],
[out_chan // 2, 3, 1, "conv1_2"],
[out_chan, 3, 1, "conv1_3"],
]
else:
conv_def = [[out_chan, 7, 2, conv1_name]]
for (c, k, s, _name) in conv_def:
input = self._conv_norm(
input=input,
num_filters=c,
filter_size=k,
stride=s,
act='relu',
name=_name)
output = fluid.layers.pool2d(
input=input,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
return output
def __call__(self, input):
assert isinstance(input, Variable)
assert not (set(self.feature_maps) - set([1, 2, 3, 4, 5])), \
"feature maps {} not in [1, 2, 3, 4, 5]".format(self.feature_maps)
res_endpoints = []
res = input
feature_maps = self.feature_maps
severed_head = getattr(self, 'severed_head', False)
if not severed_head:
res = self.c1_stage(res)
feature_maps = range(2, max(self.feature_maps) + 1)
for i in feature_maps:
res = self.layer_warp(res, i)
if i in self.feature_maps:
res_endpoints.append(res)
if self.freeze_at >= i:
res.stop_gradient = True
if self.num_classes is not None:
pool = fluid.layers.pool2d(
input=res, pool_type='avg', global_pooling=True)
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(
input=pool,
size=self.num_classes,
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)))
return out
return OrderedDict([('res{}_sum'.format(self.feature_maps[idx]), feat)
for idx, feat in enumerate(res_endpoints)])
class ResNetC5(ResNet):
__doc__ = ResNet.__doc__
def __init__(self,
layers=50,
freeze_at=2,
norm_type='affine_channel',
freeze_norm=True,
norm_decay=0.,
variant='b',
feature_maps=[5],
weight_prefix_name=''):
super(ResNetC5,
self).__init__(layers, freeze_at, norm_type, freeze_norm,
norm_decay, variant, feature_maps)
self.severed_head = True
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .unet import UNet
from .deeplabv3p import DeepLabv3p
from .model_utils import libs
from .model_utils import loss
# coding: utf8
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
import paddle.fluid as fluid
from .model_utils.libs import scope, name_scope
from .model_utils.libs import bn, bn_relu, relu
from .model_utils.libs import conv, max_pool, deconv
from .model_utils.libs import separate_conv
from .model_utils.libs import sigmoid_to_softmax
from .model_utils.loss import softmax_with_loss
from .model_utils.loss import dice_loss
from .model_utils.loss import bce_loss
import paddlex.utils.logging as logging
from paddlex.cv.nets.xception import Xception
from paddlex.cv.nets.mobilenet_v2 import MobileNetV2
class DeepLabv3p(object):
"""实现DeepLabv3+模型
`"Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation"
<https://arxiv.org/abs/1802.02611>`
Args:
num_classes (int): 类别数。
backbone (paddlex.cv.nets): 神经网络,实现DeepLabv3+特征图的计算。
mode (str): 网络运行模式,根据mode构建网络的输入和返回。
当mode为'train'时,输入为image(-1, 3, -1, -1)和label (-1, 1, -1, -1) 返回loss。
当mode为'train'时,输入为image (-1, 3, -1, -1)和label (-1, 1, -1, -1),返回loss,
pred (与网络输入label 相同大小的预测结果,值代表相应的类别),label,mask(非忽略值的mask,
与label相同大小,bool类型)。
当mode为'test'时,输入为image(-1, 3, -1, -1)返回pred (-1, 1, -1, -1)和
logit (-1, num_classes, -1, -1) 通道维上代表每一类的概率值。
output_stride (int): backbone 输出特征图相对于输入的下采样倍数,一般取值为8或16。
aspp_with_sep_conv (bool): 在asspp模块是否采用separable convolutions。
decoder_use_sep_conv (bool): decoder模块是否采用separable convolutions。
encoder_with_aspp (bool): 是否在encoder阶段采用aspp模块。
enable_decoder (bool): 是否使用decoder模块。
use_bce_loss (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。
use_dice_loss (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。
当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。
class_weight (list/str): 交叉熵损失函数各类损失的权重。当class_weight为list的时候,长度应为
num_classes。当class_weight为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重
自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,
即平时使用的交叉熵损失函数。
ignore_index (int): label上忽略的值,label为ignore_index的像素不参与损失函数的计算。
Raises:
ValueError: use_bce_loss或use_dice_loss为真且num_calsses > 2。
ValueError: class_weight为list, 但长度不等于num_class。
class_weight为str, 但class_weight.low()不等于dynamic。
TypeError: class_weight不为None时,其类型不是list或str。
"""
def __init__(self,
num_classes,
backbone,
mode='train',
output_stride=16,
aspp_with_sep_conv=True,
decoder_use_sep_conv=True,
encoder_with_aspp=True,
enable_decoder=True,
use_bce_loss=False,
use_dice_loss=False,
class_weight=None,
ignore_index=255):
# dice_loss或bce_loss只适用两类分割中
if num_classes > 2 and (use_bce_loss or use_dice_loss):
raise ValueError(
"dice loss and bce loss is only applicable to binary classfication"
)
if class_weight is not None:
if isinstance(class_weight, list):
if len(class_weight) != num_classes:
raise ValueError(
"Length of class_weight should be equal to number of classes"
)
elif isinstance(class_weight, str):
if class_weight.lower() != 'dynamic':
raise ValueError(
"if class_weight is string, must be dynamic!")
else:
raise TypeError(
'Expect class_weight is a list or string but receive {}'.
format(type(class_weight)))
self.num_classes = num_classes
self.backbone = backbone
self.mode = mode
self.use_bce_loss = use_bce_loss
self.use_dice_loss = use_dice_loss
self.class_weight = class_weight
self.ignore_index = ignore_index
self.output_stride = output_stride
self.aspp_with_sep_conv = aspp_with_sep_conv
self.decoder_use_sep_conv = decoder_use_sep_conv
self.encoder_with_aspp = encoder_with_aspp
self.enable_decoder = enable_decoder
def _encoder(self, input):
# 编码器配置,采用ASPP架构,pooling + 1x1_conv + 三个不同尺度的空洞卷积并行, concat后1x1conv
# ASPP_WITH_SEP_CONV:默认为真,使用depthwise可分离卷积,否则使用普通卷积
# OUTPUT_STRIDE: 下采样倍数,8或16,决定aspp_ratios大小
# aspp_ratios:ASPP模块空洞卷积的采样率
if self.output_stride == 16:
aspp_ratios = [6, 12, 18]
elif self.output_stride == 8:
aspp_ratios = [12, 24, 36]
else:
raise Exception("DeepLabv3p only support stride 8 or 16")
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
with scope('encoder'):
channel = 256
with scope("image_pool"):
image_avg = fluid.layers.reduce_mean(
input, [2, 3], keep_dim=True)
image_avg = bn_relu(
conv(
image_avg,
channel,
1,
1,
groups=1,
padding=0,
param_attr=param_attr))
input_shape = fluid.layers.shape(input)
image_avg = fluid.layers.resize_bilinear(
image_avg, input_shape[2:])
with scope("aspp0"):
aspp0 = bn_relu(
conv(
input,
channel,
1,
1,
groups=1,
padding=0,
param_attr=param_attr))
with scope("aspp1"):
if self.aspp_with_sep_conv:
aspp1 = separate_conv(
input,
channel,
1,
3,
dilation=aspp_ratios[0],
act=relu)
else:
aspp1 = bn_relu(
conv(
input,
channel,
stride=1,
filter_size=3,
dilation=aspp_ratios[0],
padding=aspp_ratios[0],
param_attr=param_attr))
with scope("aspp2"):
if self.aspp_with_sep_conv:
aspp2 = separate_conv(
input,
channel,
1,
3,
dilation=aspp_ratios[1],
act=relu)
else:
aspp2 = bn_relu(
conv(
input,
channel,
stride=1,
filter_size=3,
dilation=aspp_ratios[1],
padding=aspp_ratios[1],
param_attr=param_attr))
with scope("aspp3"):
if self.aspp_with_sep_conv:
aspp3 = separate_conv(
input,
channel,
1,
3,
dilation=aspp_ratios[2],
act=relu)
else:
aspp3 = bn_relu(
conv(
input,
channel,
stride=1,
filter_size=3,
dilation=aspp_ratios[2],
padding=aspp_ratios[2],
param_attr=param_attr))
with scope("concat"):
data = fluid.layers.concat(
[image_avg, aspp0, aspp1, aspp2, aspp3], axis=1)
data = bn_relu(
conv(
data,
channel,
1,
1,
groups=1,
padding=0,
param_attr=param_attr))
data = fluid.layers.dropout(data, 0.9)
return data
def _decoder(self, encode_data, decode_shortcut):
# 解码器配置
# encode_data:编码器输出
# decode_shortcut: 从backbone引出的分支, resize后与encode_data concat
# decoder_use_sep_conv: 默认为真,则concat后连接两个可分离卷积,否则为普通卷积
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
with scope('decoder'):
with scope('concat'):
decode_shortcut = bn_relu(
conv(
decode_shortcut,
48,
1,
1,
groups=1,
padding=0,
param_attr=param_attr))
decode_shortcut_shape = fluid.layers.shape(decode_shortcut)
encode_data = fluid.layers.resize_bilinear(
encode_data, decode_shortcut_shape[2:])
encode_data = fluid.layers.concat(
[encode_data, decode_shortcut], axis=1)
if self.decoder_use_sep_conv:
with scope("separable_conv1"):
encode_data = separate_conv(
encode_data, 256, 1, 3, dilation=1, act=relu)
with scope("separable_conv2"):
encode_data = separate_conv(
encode_data, 256, 1, 3, dilation=1, act=relu)
else:
with scope("decoder_conv1"):
encode_data = bn_relu(
conv(
encode_data,
256,
stride=1,
filter_size=3,
dilation=1,
padding=1,
param_attr=param_attr))
with scope("decoder_conv2"):
encode_data = bn_relu(
conv(
encode_data,
256,
stride=1,
filter_size=3,
dilation=1,
padding=1,
param_attr=param_attr))
return encode_data
def _get_loss(self, logit, label, mask):
avg_loss = 0
if not (self.use_dice_loss or self.use_bce_loss):
avg_loss += softmax_with_loss(
logit,
label,
mask,
num_classes=self.num_classes,
weight=self.class_weight,
ignore_index=self.ignore_index)
else:
if self.use_dice_loss:
avg_loss += dice_loss(logit, label, mask)
if self.use_bce_loss:
avg_loss += bce_loss(
logit, label, mask, ignore_index=self.ignore_index)
return avg_loss
def generate_inputs(self):
inputs = OrderedDict()
inputs['image'] = fluid.data(
dtype='float32', shape=[None, 3, None, None], name='image')
if self.mode == 'train':
inputs['label'] = fluid.data(
dtype='int32', shape=[None, 1, None, None], name='label')
elif self.mode == 'eval':
inputs['label'] = fluid.data(
dtype='int32', shape=[None, 1, None, None], name='label')
return inputs
def build_net(self, inputs):
# 在两类分割情况下,当loss函数选择dice_loss或bce_loss的时候,最后logit输出通道数设置为1
if self.use_dice_loss or self.use_bce_loss:
self.num_classes = 1
image = inputs['image']
data, decode_shortcuts = self.backbone(image)
decode_shortcut = decode_shortcuts[self.backbone.decode_points]
# 编码器解码器设置
if self.encoder_with_aspp:
data = self._encoder(data)
if self.enable_decoder:
data = self._decoder(data, decode_shortcut)
# 根据类别数设置最后一个卷积层输出,并resize到图片原始尺寸
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
with scope('logit'):
with fluid.name_scope('last_conv'):
logit = conv(
data,
self.num_classes,
1,
stride=1,
padding=0,
bias_attr=True,
param_attr=param_attr)
image_shape = fluid.layers.shape(image)
logit = fluid.layers.resize_bilinear(logit, image_shape[2:])
if self.num_classes == 1:
out = sigmoid_to_softmax(logit)
out = fluid.layers.transpose(out, [0, 2, 3, 1])
else:
out = fluid.layers.transpose(logit, [0, 2, 3, 1])
pred = fluid.layers.argmax(out, axis=3)
pred = fluid.layers.unsqueeze(pred, axes=[3])
if self.mode == 'train':
label = inputs['label']
mask = label != self.ignore_index
return self._get_loss(logit, label, mask)
elif self.mode == 'eval':
label = inputs['label']
mask = label != self.ignore_index
loss = self._get_loss(logit, label, mask)
return loss, pred, label, mask
else:
if self.num_classes == 1:
logit = sigmoid_to_softmax(logit)
else:
logit = fluid.layers.softmax(logit, axis=1)
return pred, logit
return logit
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import libs
from . import loss
# coding: utf8
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle
import paddle.fluid as fluid
import contextlib
bn_regularizer = fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.0)
name_scope = ""
@contextlib.contextmanager
def scope(name):
global name_scope
bk = name_scope
name_scope = name_scope + name + '/'
yield
name_scope = bk
def max_pool(input, kernel, stride, padding):
data = fluid.layers.pool2d(
input,
pool_size=kernel,
pool_type='max',
pool_stride=stride,
pool_padding=padding)
return data
def avg_pool(input, kernel, stride, padding=0):
data = fluid.layers.pool2d(
input,
pool_size=kernel,
pool_type='avg',
pool_stride=stride,
pool_padding=padding)
return data
def group_norm(input, G, eps=1e-5, param_attr=None, bias_attr=None):
N, C, H, W = input.shape
if C % G != 0:
for d in range(10):
for t in [d, -d]:
if G + t <= 0: continue
if C % (G + t) == 0:
G = G + t
break
if C % G == 0:
break
assert C % G == 0, "group can not divide channle"
x = fluid.layers.group_norm(
input,
groups=G,
param_attr=param_attr,
bias_attr=bias_attr,
name=name_scope + 'group_norm')
return x
def bn(*args,
norm_type='bn',
eps=1e-5,
bn_momentum=0.99,
group_norm=32,
**kargs):
if norm_type == 'bn':
with scope('BatchNorm'):
return fluid.layers.batch_norm(
*args,
epsilon=eps,
momentum=bn_momentum,
param_attr=fluid.ParamAttr(
name=name_scope + 'gamma', regularizer=bn_regularizer),
bias_attr=fluid.ParamAttr(
name=name_scope + 'beta', regularizer=bn_regularizer),
moving_mean_name=name_scope + 'moving_mean',
moving_variance_name=name_scope + 'moving_variance',
**kargs)
elif norm_type == 'gn':
with scope('GroupNorm'):
return group_norm(
args[0],
group_norm,
eps=eps,
param_attr=fluid.ParamAttr(
name=name_scope + 'gamma', regularizer=bn_regularizer),
bias_attr=fluid.ParamAttr(
name=name_scope + 'beta', regularizer=bn_regularizer))
else:
raise Exception("Unsupport norm type:" + norm_type)
def bn_relu(data, norm_type='bn', eps=1e-5):
return fluid.layers.relu(bn(data, norm_type=norm_type, eps=eps))
def relu(data):
return fluid.layers.relu(data)
def conv(*args, **kargs):
kargs['param_attr'] = name_scope + 'weights'
if 'bias_attr' in kargs and kargs['bias_attr']:
kargs['bias_attr'] = fluid.ParamAttr(
name=name_scope + 'biases',
regularizer=None,
initializer=fluid.initializer.ConstantInitializer(value=0.0))
else:
kargs['bias_attr'] = False
return fluid.layers.conv2d(*args, **kargs)
def deconv(*args, **kargs):
kargs['param_attr'] = name_scope + 'weights'
if 'bias_attr' in kargs and kargs['bias_attr']:
kargs['bias_attr'] = name_scope + 'biases'
else:
kargs['bias_attr'] = False
return fluid.layers.conv2d_transpose(*args, **kargs)
def separate_conv(input,
channel,
stride,
filter,
dilation=1,
act=None,
eps=1e-5):
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.33))
with scope('depthwise'):
input = conv(
input,
input.shape[1],
filter,
stride,
groups=input.shape[1],
padding=(filter // 2) * dilation,
dilation=dilation,
use_cudnn=False,
param_attr=param_attr)
input = bn(input, eps=eps)
if act: input = act(input)
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
with scope('pointwise'):
input = conv(
input, channel, 1, 1, groups=1, padding=0, param_attr=param_attr)
input = bn(input, eps=eps)
if act: input = act(input)
return input
def conv_bn_layer(input,
filter_size,
num_filters,
stride,
padding,
channels=None,
num_groups=1,
if_act=True,
name=None,
use_cudnn=True):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=fluid.ParamAttr(name=name + '_weights'),
bias_attr=False)
bn_name = name + '_bn'
bn = fluid.layers.batch_norm(
input=conv,
param_attr=fluid.ParamAttr(name=bn_name + "_scale"),
bias_attr=fluid.ParamAttr(name=bn_name + "_offset"),
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance')
if if_act:
return fluid.layers.relu6(bn)
else:
return bn
def sigmoid_to_softmax(input):
"""
one channel to two channel
"""
logit = fluid.layers.sigmoid(input)
logit_back = 1 - logit
logit = fluid.layers.concat([logit_back, logit], axis=1)
return logit
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle.fluid as fluid
import numpy as np
def softmax_with_loss(logit,
label,
ignore_mask=None,
num_classes=2,
weight=None,
ignore_index=255):
ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
label = fluid.layers.elementwise_min(
label, fluid.layers.assign(
np.array([num_classes - 1], dtype=np.int32)))
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
logit = fluid.layers.reshape(logit, [-1, num_classes])
label = fluid.layers.reshape(label, [-1, 1])
label = fluid.layers.cast(label, 'int64')
ignore_mask = fluid.layers.reshape(ignore_mask, [-1, 1])
if weight is None:
loss, probs = fluid.layers.softmax_with_cross_entropy(
logit, label, ignore_index=ignore_index, return_softmax=True)
else:
label_one_hot = fluid.layers.one_hot(input=label, depth=num_classes)
if isinstance(weight, list):
assert len(
weight
) == num_classes, "weight length must equal num of classes"
weight = fluid.layers.assign(np.array([weight], dtype='float32'))
elif isinstance(weight, str):
assert weight.lower(
) == 'dynamic', 'if weight is string, must be dynamic!'
tmp = []
total_num = fluid.layers.cast(
fluid.layers.shape(label)[0], 'float32')
for i in range(num_classes):
cls_pixel_num = fluid.layers.reduce_sum(label_one_hot[:, i])
ratio = total_num / (cls_pixel_num + 1)
tmp.append(ratio)
weight = fluid.layers.concat(tmp)
weight = weight / fluid.layers.reduce_sum(weight) * num_classes
elif isinstance(weight, fluid.layers.Variable):
pass
else:
raise ValueError(
'Expect weight is a list, string or Variable, but receive {}'.
format(type(weight)))
weight = fluid.layers.reshape(weight, [1, num_classes])
weighted_label_one_hot = fluid.layers.elementwise_mul(
label_one_hot, weight)
probs = fluid.layers.softmax(logit)
loss = fluid.layers.cross_entropy(
probs,
weighted_label_one_hot,
soft_label=True,
ignore_index=ignore_index)
weighted_label_one_hot.stop_gradient = True
loss = loss * ignore_mask
avg_loss = fluid.layers.mean(loss) / (
fluid.layers.mean(ignore_mask) + 0.00001)
label.stop_gradient = True
ignore_mask.stop_gradient = True
return avg_loss
# to change, how to appicate ignore index and ignore mask
def dice_loss(logit, label, ignore_mask=None, epsilon=0.00001):
if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1:
raise Exception(
"dice loss is only applicable to one channel classfication")
ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
label = fluid.layers.transpose(label, [0, 2, 3, 1])
label = fluid.layers.cast(label, 'int64')
ignore_mask = fluid.layers.transpose(ignore_mask, [0, 2, 3, 1])
logit = fluid.layers.sigmoid(logit)
logit = logit * ignore_mask
label = label * ignore_mask
reduce_dim = list(range(1, len(logit.shape)))
inse = fluid.layers.reduce_sum(logit * label, dim=reduce_dim)
dice_denominator = fluid.layers.reduce_sum(
logit, dim=reduce_dim) + fluid.layers.reduce_sum(
label, dim=reduce_dim)
dice_score = 1 - inse * 2 / (dice_denominator + epsilon)
label.stop_gradient = True
ignore_mask.stop_gradient = True
return fluid.layers.reduce_mean(dice_score)
def bce_loss(logit, label, ignore_mask=None, ignore_index=255):
if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1:
raise Exception("bce loss is only applicable to binary classfication")
label = fluid.layers.cast(label, 'float32')
loss = fluid.layers.sigmoid_cross_entropy_with_logits(
x=logit, label=label, ignore_index=ignore_index,
normalize=True) # or False
loss = fluid.layers.reduce_sum(loss)
label.stop_gradient = True
ignore_mask.stop_gradient = True
return loss
# coding: utf8
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import OrderedDict
import paddle.fluid as fluid
from .model_utils.libs import scope, name_scope
from .model_utils.libs import bn, bn_relu, relu
from .model_utils.libs import conv, max_pool, deconv
from .model_utils.libs import sigmoid_to_softmax
from .model_utils.loss import softmax_with_loss
from .model_utils.loss import dice_loss
from .model_utils.loss import bce_loss
import paddlex.utils.logging as logging
class UNet(object):
"""实现Unet模型
`"U-Net: Convolutional Networks for Biomedical Image Segmentation"
<https://arxiv.org/abs/1505.04597>`
Args:
num_classes (int): 类别数
mode (str): 网络运行模式,根据mode构建网络的输入和返回。
当mode为'train'时,输入为image(-1, 3, -1, -1)和label (-1, 1, -1, -1) 返回loss。
当mode为'train'时,输入为image (-1, 3, -1, -1)和label (-1, 1, -1, -1),返回loss,
pred (与网络输入label 相同大小的预测结果,值代表相应的类别),label,mask(非忽略值的mask,
与label相同大小,bool类型)。
当mode为'test'时,输入为image(-1, 3, -1, -1)返回pred (-1, 1, -1, -1)和
logit (-1, num_classes, -1, -1) 通道维上代表每一类的概率值。
upsample_mode (str): UNet decode时采用的上采样方式,取值为'bilinear'时利用双线行差值进行上菜样,
当输入其他选项时则利用反卷积进行上菜样,默认为'bilinear'。
use_bce_loss (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。
use_dice_loss (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。
当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。
class_weight (list/str): 交叉熵损失函数各类损失的权重。当class_weight为list的时候,长度应为
num_classes。当class_weight为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重
自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,
即平时使用的交叉熵损失函数。
ignore_index (int): label上忽略的值,label为ignore_index的像素不参与损失函数的计算。
Raises:
ValueError: use_bce_loss或use_dice_loss为真且num_calsses > 2。
ValueError: class_weight为list, 但长度不等于num_class。
class_weight为str, 但class_weight.low()不等于dynamic。
TypeError: class_weight不为None时,其类型不是list或str。
"""
def __init__(self,
num_classes,
mode='train',
upsample_mode='bilinear',
use_bce_loss=False,
use_dice_loss=False,
class_weight=None,
ignore_index=255):
# dice_loss或bce_loss只适用两类分割中
if num_classes > 2 and (use_bce_loss or use_dice_loss):
raise Exception(
"dice loss and bce loss is only applicable to binary classfication"
)
if class_weight is not None:
if isinstance(class_weight, list):
if len(class_weight) != num_classes:
raise ValueError(
"Length of class_weight should be equal to number of classes"
)
elif isinstance(class_weight, str):
if class_weight.lower() != 'dynamic':
raise ValueError(
"if class_weight is string, must be dynamic!")
else:
raise TypeError(
'Expect class_weight is a list or string but receive {}'.
format(type(class_weight)))
self.num_classes = num_classes
self.mode = mode
self.upsample_mode = upsample_mode
self.use_bce_loss = use_bce_loss
self.use_dice_loss = use_dice_loss
self.class_weight = class_weight
self.ignore_index = ignore_index
def _double_conv(self, data, out_ch):
param_attr = fluid.ParamAttr(
name='weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.33))
with scope("conv0"):
data = bn_relu(
conv(
data,
out_ch,
3,
stride=1,
padding=1,
param_attr=param_attr))
with scope("conv1"):
data = bn_relu(
conv(
data,
out_ch,
3,
stride=1,
padding=1,
param_attr=param_attr))
return data
def _down(self, data, out_ch):
# 下采样:max_pool + 2个卷积
with scope("down"):
data = max_pool(data, 2, 2, 0)
data = self._double_conv(data, out_ch)
return data
def _up(self, data, short_cut, out_ch):
# 上采样:data上采样(resize或deconv), 并与short_cut concat
param_attr = fluid.ParamAttr(
name='weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
initializer=fluid.initializer.XavierInitializer(),
)
with scope("up"):
if self.upsample_mode == 'bilinear':
short_cut_shape = fluid.layers.shape(short_cut)
data = fluid.layers.resize_bilinear(data, short_cut_shape[2:])
else:
data = deconv(
data,
out_ch // 2,
filter_size=2,
stride=2,
padding=0,
param_attr=param_attr)
data = fluid.layers.concat([data, short_cut], axis=1)
data = self._double_conv(data, out_ch)
return data
def _encode(self, data):
# 编码器设置
short_cuts = []
with scope("encode"):
with scope("block1"):
data = self._double_conv(data, 64)
short_cuts.append(data)
with scope("block2"):
data = self._down(data, 128)
short_cuts.append(data)
with scope("block3"):
data = self._down(data, 256)
short_cuts.append(data)
with scope("block4"):
data = self._down(data, 512)
short_cuts.append(data)
with scope("block5"):
data = self._down(data, 512)
return data, short_cuts
def _decode(self, data, short_cuts):
# 解码器设置,与编码器对称
with scope("decode"):
with scope("decode1"):
data = self._up(data, short_cuts[3], 256)
with scope("decode2"):
data = self._up(data, short_cuts[2], 128)
with scope("decode3"):
data = self._up(data, short_cuts[1], 64)
with scope("decode4"):
data = self._up(data, short_cuts[0], 64)
return data
def _get_logit(self, data, num_classes):
# 根据类别数设置最后一个卷积层输出
param_attr = fluid.ParamAttr(
name='weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
with scope("logit"):
data = conv(
data,
num_classes,
3,
stride=1,
padding=1,
param_attr=param_attr)
return data
def _get_loss(self, logit, label, mask):
avg_loss = 0
if not (self.use_dice_loss or self.use_bce_loss):
avg_loss += softmax_with_loss(
logit,
label,
mask,
num_classes=self.num_classes,
weight=self.class_weight,
ignore_index=self.ignore_index)
else:
if self.use_dice_loss:
avg_loss += dice_loss(logit, label, mask)
if self.use_bce_loss:
avg_loss += bce_loss(
logit, label, mask, ignore_index=self.ignore_index)
return avg_loss
def generate_inputs(self):
inputs = OrderedDict()
inputs['image'] = fluid.data(
dtype='float32', shape=[None, 3, None, None], name='image')
if self.mode == 'train':
inputs['label'] = fluid.data(
dtype='int32', shape=[None, 1, None, None], name='label')
elif self.mode == 'eval':
inputs['label'] = fluid.data(
dtype='int32', shape=[None, 1, None, None], name='label')
return inputs
def build_net(self, inputs):
# 在两类分割情况下,当loss函数选择dice_loss或bce_loss的时候,最后logit输出通道数设置为1
if self.use_dice_loss or self.use_bce_loss:
self.num_classes = 1
image = inputs['image']
encode_data, short_cuts = self._encode(image)
decode_data = self._decode(encode_data, short_cuts)
logit = self._get_logit(decode_data, self.num_classes)
if self.num_classes == 1:
out = sigmoid_to_softmax(logit)
out = fluid.layers.transpose(out, [0, 2, 3, 1])
else:
out = fluid.layers.transpose(logit, [0, 2, 3, 1])
pred = fluid.layers.argmax(out, axis=3)
pred = fluid.layers.unsqueeze(pred, axes=[3])
if self.mode == 'train':
label = inputs['label']
mask = label != self.ignore_index
return self._get_loss(logit, label, mask)
elif self.mode == 'eval':
label = inputs['label']
mask = label != self.ignore_index
loss = self._get_loss(logit, label, mask)
return loss, pred, label, mask
else:
if self.num_classes == 1:
logit = sigmoid_to_softmax(logit)
else:
logit = fluid.layers.softmax(logit, axis=1)
return pred, logit
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.fluid as fluid
from paddle.fluid.initializer import MSRA
from paddle.fluid.param_attr import ParamAttr
class ShuffleNetV2():
def __init__(self, num_classes=None, scale=1.0):
self.num_classes = num_classes
self.scale = scale
def __call__(self, input):
scale = self.scale
stage_repeats = [4, 8, 4]
if scale == 0.25:
stage_out_channels = [-1, 24, 24, 48, 96, 512]
elif scale == 0.33:
stage_out_channels = [-1, 24, 32, 64, 128, 512]
elif scale == 0.5:
stage_out_channels = [-1, 24, 48, 96, 192, 1024]
elif scale == 1.0:
stage_out_channels = [-1, 24, 116, 232, 464, 1024]
elif scale == 1.5:
stage_out_channels = [-1, 24, 176, 352, 704, 1024]
elif scale == 2.0:
stage_out_channels = [-1, 24, 224, 488, 976, 2048]
else:
raise NotImplementedError("This scale size:[" + str(scale) +
"] is not implemented!")
#conv1
input_channel = stage_out_channels[1]
conv1 = self.conv_bn_layer(
input=input,
filter_size=3,
num_filters=input_channel,
padding=1,
stride=2,
name='stage1_conv')
pool1 = fluid.layers.pool2d(
input=conv1,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
conv = pool1
# bottleneck sequences
for idxstage in range(len(stage_repeats)):
numrepeat = stage_repeats[idxstage]
output_channel = stage_out_channels[idxstage + 2]
for i in range(numrepeat):
if i == 0:
conv = self.inverted_residual_unit(
input=conv,
num_filters=output_channel,
stride=2,
benchmodel=2,
name=str(idxstage + 2) + '_' + str(i + 1))
else:
conv = self.inverted_residual_unit(
input=conv,
num_filters=output_channel,
stride=1,
benchmodel=1,
name=str(idxstage + 2) + '_' + str(i + 1))
output = self.conv_bn_layer(
input=conv,
filter_size=1,
num_filters=stage_out_channels[-1],
padding=0,
stride=1,
name='conv5')
if self.num_classes is not None:
output = fluid.layers.pool2d(
input=output,
pool_size=7,
pool_stride=1,
pool_padding=0,
pool_type='avg')
output = fluid.layers.fc(
input=output,
size=self.num_classes,
param_attr=ParamAttr(initializer=MSRA(), name='fc6_weights'),
bias_attr=ParamAttr(name='fc6_offset'))
return output
def conv_bn_layer(self,
input,
filter_size,
num_filters,
stride,
padding,
num_groups=1,
use_cudnn=True,
if_act=True,
name=None):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=ParamAttr(initializer=MSRA(), name=name + '_weights'),
bias_attr=False)
out = int((input.shape[2] - 1) / float(stride) + 1)
bn_name = name + '_bn'
if if_act:
return fluid.layers.batch_norm(
input=conv,
act='relu',
param_attr=ParamAttr(name=bn_name + "_scale"),
bias_attr=ParamAttr(name=bn_name + "_offset"),
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance')
else:
return fluid.layers.batch_norm(
input=conv,
param_attr=ParamAttr(name=bn_name + "_scale"),
bias_attr=ParamAttr(name=bn_name + "_offset"),
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance')
def channel_shuffle(self, x, groups):
num_channels = x.shape[1]
channels_per_group = num_channels // groups
x_shape = fluid.layers.shape(x)
# reshape
x = fluid.layers.reshape(
x=x,
shape=[
x_shape[0], groups, channels_per_group, x_shape[2], x_shape[3]
])
x = fluid.layers.transpose(x=x, perm=[0, 2, 1, 3, 4])
# flatten
x = fluid.layers.reshape(
x=x, shape=[x_shape[0], num_channels, x_shape[2], x_shape[3]])
return x
def inverted_residual_unit(self,
input,
num_filters,
stride,
benchmodel,
name=None):
assert stride in [1, 2], \
"supported stride are {} but your stride is {}".format([1,2], stride)
oup_inc = num_filters // 2
inp = input.shape[1]
if benchmodel == 1:
x1, x2 = fluid.layers.split(
input,
num_or_sections=[input.shape[1] // 2, input.shape[1] // 2],
dim=1)
conv_pw = self.conv_bn_layer(
input=x2,
num_filters=oup_inc,
filter_size=1,
stride=1,
padding=0,
num_groups=1,
if_act=True,
name='stage_' + name + '_conv1')
conv_dw = self.conv_bn_layer(
input=conv_pw,
num_filters=oup_inc,
filter_size=3,
stride=stride,
padding=1,
num_groups=oup_inc,
if_act=False,
use_cudnn=False,
name='stage_' + name + '_conv2')
conv_linear = self.conv_bn_layer(
input=conv_dw,
num_filters=oup_inc,
filter_size=1,
stride=1,
padding=0,
num_groups=1,
if_act=True,
name='stage_' + name + '_conv3')
out = fluid.layers.concat([x1, conv_linear], axis=1)
else:
#branch1
conv_dw_1 = self.conv_bn_layer(
input=input,
num_filters=inp,
filter_size=3,
stride=stride,
padding=1,
num_groups=inp,
if_act=False,
use_cudnn=False,
name='stage_' + name + '_conv4')
conv_linear_1 = self.conv_bn_layer(
input=conv_dw_1,
num_filters=oup_inc,
filter_size=1,
stride=1,
padding=0,
num_groups=1,
if_act=True,
name='stage_' + name + '_conv5')
#branch2
conv_pw_2 = self.conv_bn_layer(
input=input,
num_filters=oup_inc,
filter_size=1,
stride=1,
padding=0,
num_groups=1,
if_act=True,
name='stage_' + name + '_conv1')
conv_dw_2 = self.conv_bn_layer(
input=conv_pw_2,
num_filters=oup_inc,
filter_size=3,
stride=stride,
padding=1,
num_groups=oup_inc,
if_act=False,
use_cudnn=False,
name='stage_' + name + '_conv2')
conv_linear_2 = self.conv_bn_layer(
input=conv_dw_2,
num_filters=oup_inc,
filter_size=1,
stride=1,
padding=0,
num_groups=1,
if_act=True,
name='stage_' + name + '_conv3')
out = fluid.layers.concat([conv_linear_1, conv_linear_2], axis=1)
return self.channel_shuffle(out, 2)
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import contextlib
import paddle
import math
import paddle.fluid as fluid
from .segmentation.model_utils.libs import scope, name_scope
from .segmentation.model_utils.libs import bn, bn_relu, relu
from .segmentation.model_utils.libs import conv
from .segmentation.model_utils.libs import separate_conv
__all__ = ['xception_65', 'xception_41', 'xception_71']
def check_data(data, number):
if type(data) == int:
return [data] * number
assert len(data) == number
return data
def check_stride(s, os):
if s <= os:
return True
else:
return False
def check_points(count, points):
if points is None:
return False
else:
if isinstance(points, list):
return (True if count in points else False)
else:
return (True if count == points else False)
class Xception():
def __init__(self,
num_classes=None,
layers=65,
output_stride=32,
end_points=None,
decode_points=None):
self.backbone = 'xception_' + str(layers)
self.num_classes = num_classes
self.output_stride = output_stride
self.output_stride = output_stride
self.end_points = end_points
self.decode_points = decode_points
self.bottleneck_params = self.gen_bottleneck_params(self.backbone)
def __call__(
self,
input,
):
self.stride = 2
self.block_point = 0
self.short_cuts = dict()
with scope(self.backbone):
# Entry flow
data = self.entry_flow(input)
if check_points(self.block_point, self.end_points):
return data, self.short_cuts
# Middle flow
data = self.middle_flow(data)
if check_points(self.block_point, self.end_points):
return data, self.short_cuts
# Exit flow
data = self.exit_flow(data)
if check_points(self.block_point, self.end_points):
return data, self.short_cuts
if self.num_classes is not None:
data = fluid.layers.reduce_mean(data, [2, 3], keep_dim=True)
data = fluid.layers.dropout(data, 0.5)
stdv = 1.0 / math.sqrt(data.shape[1] * 1.0)
with scope("logit"):
out = fluid.layers.fc(
input=data,
size=self.num_classes,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(
name='weights',
initializer=fluid.initializer.Uniform(-stdv, stdv)),
bias_attr=fluid.param_attr.ParamAttr(name='bias'))
return out
else:
return data
def gen_bottleneck_params(self, backbone='xception_65'):
if backbone == 'xception_65':
bottleneck_params = {
"entry_flow": (3, [2, 2, 2], [128, 256, 728]),
"middle_flow": (16, 1, 728),
"exit_flow": (2, [2, 1], [[728, 1024, 1024],
[1536, 1536, 2048]])
}
elif backbone == 'xception_41':
bottleneck_params = {
"entry_flow": (3, [2, 2, 2], [128, 256, 728]),
"middle_flow": (8, 1, 728),
"exit_flow": (2, [2, 1], [[728, 1024, 1024],
[1536, 1536, 2048]])
}
elif backbone == 'xception_71':
bottleneck_params = {
"entry_flow": (5, [2, 1, 2, 1, 2], [128, 256, 256, 728, 728]),
"middle_flow": (16, 1, 728),
"exit_flow": (2, [2, 1], [[728, 1024, 1024],
[1536, 1536, 2048]])
}
else:
raise Exception(
"xception backbont only support xception_41/xception_65/xception_71"
)
return bottleneck_params
def entry_flow(self, data):
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.09))
with scope("entry_flow"):
with scope("conv1"):
data = bn_relu(
conv(
data,
32,
3,
stride=2,
padding=1,
param_attr=param_attr),
eps=1e-3)
with scope("conv2"):
data = bn_relu(
conv(
data,
64,
3,
stride=1,
padding=1,
param_attr=param_attr),
eps=1e-3)
# get entry flow params
block_num = self.bottleneck_params["entry_flow"][0]
strides = self.bottleneck_params["entry_flow"][1]
chns = self.bottleneck_params["entry_flow"][2]
strides = check_data(strides, block_num)
chns = check_data(chns, block_num)
# params to control your flow
s = self.stride
block_point = self.block_point
output_stride = self.output_stride
with scope("entry_flow"):
for i in range(block_num):
block_point = block_point + 1
with scope("block" + str(i + 1)):
stride = strides[i] if check_stride(
s * strides[i], output_stride) else 1
data, short_cuts = self.xception_block(
data, chns[i], [1, 1, stride])
s = s * stride
if check_points(block_point, self.decode_points):
self.short_cuts[block_point] = short_cuts[1]
self.stride = s
self.block_point = block_point
return data
def middle_flow(self, data):
block_num = self.bottleneck_params["middle_flow"][0]
strides = self.bottleneck_params["middle_flow"][1]
chns = self.bottleneck_params["middle_flow"][2]
strides = check_data(strides, block_num)
chns = check_data(chns, block_num)
# params to control your flow
s = self.stride
block_point = self.block_point
output_stride = self.output_stride
with scope("middle_flow"):
for i in range(block_num):
block_point = block_point + 1
with scope("block" + str(i + 1)):
stride = strides[i] if check_stride(
s * strides[i], output_stride) else 1
data, short_cuts = self.xception_block(
data, chns[i], [1, 1, strides[i]], skip_conv=False)
s = s * stride
if check_points(block_point, self.decode_points):
self.short_cuts[block_point] = short_cuts[1]
self.stride = s
self.block_point = block_point
return data
def exit_flow(self, data):
block_num = self.bottleneck_params["exit_flow"][0]
strides = self.bottleneck_params["exit_flow"][1]
chns = self.bottleneck_params["exit_flow"][2]
strides = check_data(strides, block_num)
chns = check_data(chns, block_num)
assert (block_num == 2)
# params to control your flow
s = self.stride
block_point = self.block_point
output_stride = self.output_stride
with scope("exit_flow"):
with scope('block1'):
block_point += 1
stride = strides[0] if check_stride(s * strides[0],
output_stride) else 1
data, short_cuts = self.xception_block(data, chns[0],
[1, 1, stride])
s = s * stride
if check_points(block_point, self.decode_points):
self.short_cuts[block_point] = short_cuts[1]
with scope('block2'):
block_point += 1
stride = strides[1] if check_stride(s * strides[1],
output_stride) else 1
data, short_cuts = self.xception_block(
data,
chns[1], [1, 1, stride],
dilation=2,
has_skip=False,
activation_fn_in_separable_conv=True)
s = s * stride
if check_points(block_point, self.decode_points):
self.short_cuts[block_point] = short_cuts[1]
self.stride = s
self.block_point = block_point
return data
def xception_block(self,
input,
channels,
strides=1,
filters=3,
dilation=1,
skip_conv=True,
has_skip=True,
activation_fn_in_separable_conv=False):
repeat_number = 3
channels = check_data(channels, repeat_number)
filters = check_data(filters, repeat_number)
strides = check_data(strides, repeat_number)
data = input
results = []
for i in range(repeat_number):
with scope('separable_conv' + str(i + 1)):
if not activation_fn_in_separable_conv:
data = relu(data)
data = separate_conv(
data,
channels[i],
strides[i],
filters[i],
dilation=dilation,
eps=1e-3)
else:
data = separate_conv(
data,
channels[i],
strides[i],
filters[i],
dilation=dilation,
act=relu,
eps=1e-3)
results.append(data)
if not has_skip:
return data, results
if skip_conv:
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(
loc=0.0, scale=0.09))
with scope('shortcut'):
skip = bn(
conv(
input,
channels[-1],
1,
strides[-1],
groups=1,
padding=0,
param_attr=param_attr),
eps=1e-3)
else:
skip = input
return data + skip, results
def xception_65(num_classes=None):
model = Xception(num_classes, 65)
return model
def xception_41(num_classes=None):
model = Xception(num_classes, 41)
return model
def xception_71(num_classes=None):
model = Xception(num_classes, 71)
return model
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import cls_transforms
from . import det_transforms
from . import seg_transforms
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import random
import math
import cv2
import scipy
def meet_emit_constraint(src_bbox, sample_bbox):
center_x = (src_bbox[2] + src_bbox[0]) / 2
center_y = (src_bbox[3] + src_bbox[1]) / 2
if center_x >= sample_bbox[0] and \
center_x <= sample_bbox[2] and \
center_y >= sample_bbox[1] and \
center_y <= sample_bbox[3]:
return True
return False
def clip_bbox(src_bbox):
src_bbox[0] = max(min(src_bbox[0], 1.0), 0.0)
src_bbox[1] = max(min(src_bbox[1], 1.0), 0.0)
src_bbox[2] = max(min(src_bbox[2], 1.0), 0.0)
src_bbox[3] = max(min(src_bbox[3], 1.0), 0.0)
return src_bbox
def bbox_area(src_bbox):
if src_bbox[2] < src_bbox[0] or src_bbox[3] < src_bbox[1]:
return 0.
else:
width = src_bbox[2] - src_bbox[0]
height = src_bbox[3] - src_bbox[1]
return width * height
def is_overlap(object_bbox, sample_bbox):
if object_bbox[0] >= sample_bbox[2] or \
object_bbox[2] <= sample_bbox[0] or \
object_bbox[1] >= sample_bbox[3] or \
object_bbox[3] <= sample_bbox[1]:
return False
else:
return True
def filter_and_process(sample_bbox, bboxes, labels, scores=None):
new_bboxes = []
new_labels = []
new_scores = []
for i in range(len(bboxes)):
new_bbox = [0, 0, 0, 0]
obj_bbox = [bboxes[i][0], bboxes[i][1], bboxes[i][2], bboxes[i][3]]
if not meet_emit_constraint(obj_bbox, sample_bbox):
continue
if not is_overlap(obj_bbox, sample_bbox):
continue
sample_width = sample_bbox[2] - sample_bbox[0]
sample_height = sample_bbox[3] - sample_bbox[1]
new_bbox[0] = (obj_bbox[0] - sample_bbox[0]) / sample_width
new_bbox[1] = (obj_bbox[1] - sample_bbox[1]) / sample_height
new_bbox[2] = (obj_bbox[2] - sample_bbox[0]) / sample_width
new_bbox[3] = (obj_bbox[3] - sample_bbox[1]) / sample_height
new_bbox = clip_bbox(new_bbox)
if bbox_area(new_bbox) > 0:
new_bboxes.append(new_bbox)
new_labels.append([labels[i][0]])
if scores is not None:
new_scores.append([scores[i][0]])
bboxes = np.array(new_bboxes)
labels = np.array(new_labels)
scores = np.array(new_scores)
return bboxes, labels, scores
def bbox_area_sampling(bboxes, labels, scores, target_size, min_size):
new_bboxes = []
new_labels = []
new_scores = []
for i, bbox in enumerate(bboxes):
w = float((bbox[2] - bbox[0]) * target_size)
h = float((bbox[3] - bbox[1]) * target_size)
if w * h < float(min_size * min_size):
continue
else:
new_bboxes.append(bbox)
new_labels.append(labels[i])
if scores is not None and scores.size != 0:
new_scores.append(scores[i])
bboxes = np.array(new_bboxes)
labels = np.array(new_labels)
scores = np.array(new_scores)
return bboxes, labels, scores
def generate_sample_bbox(sampler):
scale = np.random.uniform(sampler[2], sampler[3])
aspect_ratio = np.random.uniform(sampler[4], sampler[5])
aspect_ratio = max(aspect_ratio, (scale**2.0))
aspect_ratio = min(aspect_ratio, 1 / (scale**2.0))
bbox_width = scale * (aspect_ratio**0.5)
bbox_height = scale / (aspect_ratio**0.5)
xmin_bound = 1 - bbox_width
ymin_bound = 1 - bbox_height
xmin = np.random.uniform(0, xmin_bound)
ymin = np.random.uniform(0, ymin_bound)
xmax = xmin + bbox_width
ymax = ymin + bbox_height
sampled_bbox = [xmin, ymin, xmax, ymax]
return sampled_bbox
def generate_sample_bbox_square(sampler, image_width, image_height):
scale = np.random.uniform(sampler[2], sampler[3])
aspect_ratio = np.random.uniform(sampler[4], sampler[5])
aspect_ratio = max(aspect_ratio, (scale**2.0))
aspect_ratio = min(aspect_ratio, 1 / (scale**2.0))
bbox_width = scale * (aspect_ratio**0.5)
bbox_height = scale / (aspect_ratio**0.5)
if image_height < image_width:
bbox_width = bbox_height * image_height / image_width
else:
bbox_height = bbox_width * image_width / image_height
xmin_bound = 1 - bbox_width
ymin_bound = 1 - bbox_height
xmin = np.random.uniform(0, xmin_bound)
ymin = np.random.uniform(0, ymin_bound)
xmax = xmin + bbox_width
ymax = ymin + bbox_height
sampled_bbox = [xmin, ymin, xmax, ymax]
return sampled_bbox
def data_anchor_sampling(bbox_labels, image_width, image_height, scale_array,
resize_width):
num_gt = len(bbox_labels)
# np.random.randint range: [low, high)
rand_idx = np.random.randint(0, num_gt) if num_gt != 0 else 0
if num_gt != 0:
norm_xmin = bbox_labels[rand_idx][0]
norm_ymin = bbox_labels[rand_idx][1]
norm_xmax = bbox_labels[rand_idx][2]
norm_ymax = bbox_labels[rand_idx][3]
xmin = norm_xmin * image_width
ymin = norm_ymin * image_height
wid = image_width * (norm_xmax - norm_xmin)
hei = image_height * (norm_ymax - norm_ymin)
range_size = 0
area = wid * hei
for scale_ind in range(0, len(scale_array) - 1):
if area > scale_array[scale_ind] ** 2 and area < \
scale_array[scale_ind + 1] ** 2:
range_size = scale_ind + 1
break
if area > scale_array[len(scale_array) - 2]**2:
range_size = len(scale_array) - 2
scale_choose = 0.0
if range_size == 0:
rand_idx_size = 0
else:
# np.random.randint range: [low, high)
rng_rand_size = np.random.randint(0, range_size + 1)
rand_idx_size = rng_rand_size % (range_size + 1)
if rand_idx_size == range_size:
min_resize_val = scale_array[rand_idx_size] / 2.0
max_resize_val = min(2.0 * scale_array[rand_idx_size],
2 * math.sqrt(wid * hei))
scale_choose = random.uniform(min_resize_val, max_resize_val)
else:
min_resize_val = scale_array[rand_idx_size] / 2.0
max_resize_val = 2.0 * scale_array[rand_idx_size]
scale_choose = random.uniform(min_resize_val, max_resize_val)
sample_bbox_size = wid * resize_width / scale_choose
w_off_orig = 0.0
h_off_orig = 0.0
if sample_bbox_size < max(image_height, image_width):
if wid <= sample_bbox_size:
w_off_orig = np.random.uniform(xmin + wid - sample_bbox_size,
xmin)
else:
w_off_orig = np.random.uniform(xmin,
xmin + wid - sample_bbox_size)
if hei <= sample_bbox_size:
h_off_orig = np.random.uniform(ymin + hei - sample_bbox_size,
ymin)
else:
h_off_orig = np.random.uniform(ymin,
ymin + hei - sample_bbox_size)
else:
w_off_orig = np.random.uniform(image_width - sample_bbox_size, 0.0)
h_off_orig = np.random.uniform(image_height - sample_bbox_size,
0.0)
w_off_orig = math.floor(w_off_orig)
h_off_orig = math.floor(h_off_orig)
# Figure out top left coordinates.
w_off = float(w_off_orig / image_width)
h_off = float(h_off_orig / image_height)
sampled_bbox = [
w_off, h_off, w_off + float(sample_bbox_size / image_width),
h_off + float(sample_bbox_size / image_height)
]
return sampled_bbox
else:
return 0
def jaccard_overlap(sample_bbox, object_bbox):
if sample_bbox[0] >= object_bbox[2] or \
sample_bbox[2] <= object_bbox[0] or \
sample_bbox[1] >= object_bbox[3] or \
sample_bbox[3] <= object_bbox[1]:
return 0
intersect_xmin = max(sample_bbox[0], object_bbox[0])
intersect_ymin = max(sample_bbox[1], object_bbox[1])
intersect_xmax = min(sample_bbox[2], object_bbox[2])
intersect_ymax = min(sample_bbox[3], object_bbox[3])
intersect_size = (intersect_xmax - intersect_xmin) * (
intersect_ymax - intersect_ymin)
sample_bbox_size = bbox_area(sample_bbox)
object_bbox_size = bbox_area(object_bbox)
overlap = intersect_size / (
sample_bbox_size + object_bbox_size - intersect_size)
return overlap
def intersect_bbox(bbox1, bbox2):
if bbox2[0] > bbox1[2] or bbox2[2] < bbox1[0] or \
bbox2[1] > bbox1[3] or bbox2[3] < bbox1[1]:
intersection_box = [0.0, 0.0, 0.0, 0.0]
else:
intersection_box = [
max(bbox1[0], bbox2[0]),
max(bbox1[1], bbox2[1]),
min(bbox1[2], bbox2[2]),
min(bbox1[3], bbox2[3])
]
return intersection_box
def bbox_coverage(bbox1, bbox2):
inter_box = intersect_bbox(bbox1, bbox2)
intersect_size = bbox_area(inter_box)
if intersect_size > 0:
bbox1_size = bbox_area(bbox1)
return intersect_size / bbox1_size
else:
return 0.
def satisfy_sample_constraint(sampler,
sample_bbox,
gt_bboxes,
satisfy_all=False):
if sampler[6] == 0 and sampler[7] == 0:
return True
satisfied = []
for i in range(len(gt_bboxes)):
object_bbox = [
gt_bboxes[i][0], gt_bboxes[i][1], gt_bboxes[i][2], gt_bboxes[i][3]
]
overlap = jaccard_overlap(sample_bbox, object_bbox)
if sampler[6] != 0 and \
overlap < sampler[6]:
satisfied.append(False)
continue
if sampler[7] != 0 and \
overlap > sampler[7]:
satisfied.append(False)
continue
satisfied.append(True)
if not satisfy_all:
return True
if satisfy_all:
return np.all(satisfied)
else:
return False
def satisfy_sample_constraint_coverage(sampler, sample_bbox, gt_bboxes):
if sampler[6] == 0 and sampler[7] == 0:
has_jaccard_overlap = False
else:
has_jaccard_overlap = True
if sampler[8] == 0 and sampler[9] == 0:
has_object_coverage = False
else:
has_object_coverage = True
if not has_jaccard_overlap and not has_object_coverage:
return True
found = False
for i in range(len(gt_bboxes)):
object_bbox = [
gt_bboxes[i][0], gt_bboxes[i][1], gt_bboxes[i][2], gt_bboxes[i][3]
]
if has_jaccard_overlap:
overlap = jaccard_overlap(sample_bbox, object_bbox)
if sampler[6] != 0 and \
overlap < sampler[6]:
continue
if sampler[7] != 0 and \
overlap > sampler[7]:
continue
found = True
if has_object_coverage:
object_coverage = bbox_coverage(object_bbox, sample_bbox)
if sampler[8] != 0 and \
object_coverage < sampler[8]:
continue
if sampler[9] != 0 and \
object_coverage > sampler[9]:
continue
found = True
if found:
return True
return found
def crop_image_sampling(img, sample_bbox, image_width, image_height,
target_size):
# no clipping here
xmin = int(sample_bbox[0] * image_width)
xmax = int(sample_bbox[2] * image_width)
ymin = int(sample_bbox[1] * image_height)
ymax = int(sample_bbox[3] * image_height)
w_off = xmin
h_off = ymin
width = xmax - xmin
height = ymax - ymin
cross_xmin = max(0.0, float(w_off))
cross_ymin = max(0.0, float(h_off))
cross_xmax = min(float(w_off + width - 1.0), float(image_width))
cross_ymax = min(float(h_off + height - 1.0), float(image_height))
cross_width = cross_xmax - cross_xmin
cross_height = cross_ymax - cross_ymin
roi_xmin = 0 if w_off >= 0 else abs(w_off)
roi_ymin = 0 if h_off >= 0 else abs(h_off)
roi_width = cross_width
roi_height = cross_height
roi_y1 = int(roi_ymin)
roi_y2 = int(roi_ymin + roi_height)
roi_x1 = int(roi_xmin)
roi_x2 = int(roi_xmin + roi_width)
cross_y1 = int(cross_ymin)
cross_y2 = int(cross_ymin + cross_height)
cross_x1 = int(cross_xmin)
cross_x2 = int(cross_xmin + cross_width)
sample_img = np.zeros((height, width, 3))
sample_img[roi_y1: roi_y2, roi_x1: roi_x2] = \
img[cross_y1: cross_y2, cross_x1: cross_x2]
sample_img = cv2.resize(
sample_img, (target_size, target_size), interpolation=cv2.INTER_AREA)
return sample_img
def box_horizontal_flip(bboxes, width):
oldx1 = bboxes[:, 0].copy()
oldx2 = bboxes[:, 2].copy()
bboxes[:, 0] = width - oldx2 - 1
bboxes[:, 2] = width - oldx1 - 1
if bboxes.shape[0] != 0 and (bboxes[:, 2] < bboxes[:, 0]).all():
raise ValueError(
"RandomHorizontalFlip: invalid box, x2 should be greater than x1")
return bboxes
def segms_horizontal_flip(segms, height, width):
def _flip_poly(poly, width):
flipped_poly = np.array(poly)
flipped_poly[0::2] = width - np.array(poly[0::2]) - 1
return flipped_poly.tolist()
def _flip_rle(rle, height, width):
if 'counts' in rle and type(rle['counts']) == list:
rle = mask_util.frPyObjects([rle], height, width)
mask = mask_util.decode(rle)
mask = mask[:, ::-1, :]
rle = mask_util.encode(np.array(mask, order='F', dtype=np.uint8))
return rle
def is_poly(segm):
if not isinstance(segm, (list, dict)):
raise Exception("Invalid segm type: {}".format(type(segm)))
return isinstance(segm, list)
flipped_segms = []
for segm in segms:
if is_poly(segm):
# Polygon format
flipped_segms.append([_flip_poly(poly, width) for poly in segm])
else:
# RLE format
import pycocotools.mask as mask_util
flipped_segms.append(_flip_rle(segm, height, width))
return flipped_segms
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .ops import *
import random
import os.path as osp
import numpy as np
from PIL import Image, ImageEnhance
class Compose:
"""根据数据预处理/增强算子对输入数据进行操作。
所有操作的输入图像流形状均是[H, W, C],其中H为图像高,W为图像宽,C为图像通道数。
Args:
transforms (list): 数据预处理/增强算子。
Raises:
TypeError: 形参数据类型不满足需求。
ValueError: 数据长度不匹配。
"""
def __init__(self, transforms):
if not isinstance(transforms, list):
raise TypeError('The transforms must be a list!')
if len(transforms) < 1:
raise ValueError('The length of transforms ' + \
'must be equal or larger than 1!')
self.transforms = transforms
def __call__(self, im, label=None):
"""
Args:
im (str/np.ndarray): 图像路径/图像np.ndarray数据。
label (int): 每张图像所对应的类别序号。
Returns:
tuple: 根据网络所需字段所组成的tuple;
字段由transforms中的最后一个数据预处理操作决定。
"""
im = cv2.imread(im).astype('float32')
if im is None:
raise TypeError('Can\'t read The image file {}!'.format(im))
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
for op in self.transforms:
outputs = op(im, label)
im = outputs[0]
if len(outputs) == 2:
label = outputs[1]
return outputs
class RandomCrop:
"""对图像进行随机剪裁,模型训练时的数据增强操作。
1. 根据lower_scale、lower_ratio、upper_ratio计算随机剪裁的高、宽。
2. 根据随机剪裁的高、宽随机选取剪裁的起始点。
3. 剪裁图像。
4. 调整剪裁后的图像的大小到crop_size*crop_size。
Args:
crop_size (int): 随机裁剪后重新调整的目标边长。默认为224。
lower_scale (float): 裁剪面积相对原面积比例的最小限制。默认为0.88。
lower_ratio (float): 宽变换比例的最小限制。默认为3. / 4。
upper_ratio (float): 宽变换比例的最大限制。默认为4. / 3。
"""
def __init__(self,
crop_size=224,
lower_scale=0.88,
lower_ratio=3. / 4,
upper_ratio=4. / 3):
self.crop_size = crop_size
self.lower_scale = lower_scale
self.lower_ratio = lower_ratio
self.upper_ratio = upper_ratio
def __call__(self, im, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
label (int): 每张图像所对应的类别序号。
Returns:
tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据;
当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。
"""
im = random_crop(im, self.crop_size, self.lower_scale,
self.lower_ratio, self.upper_ratio)
if label is None:
return (im, )
else:
return (im, label)
class RandomHorizontalFlip:
"""以一定的概率对图像进行随机水平翻转,模型训练时的数据增强操作。
Args:
prob (float): 随机水平翻转的概率。默认为0.5。
"""
def __init__(self, prob=0.5):
self.prob = prob
def __call__(self, im, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
label (int): 每张图像所对应的类别序号。
Returns:
tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据;
当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。
"""
if random.random() < self.prob:
im = horizontal_flip(im)
if label is None:
return (im, )
else:
return (im, label)
class RandomVerticalFlip:
"""以一定的概率对图像进行随机垂直翻转,模型训练时的数据增强操作。
Args:
prob (float): 随机垂直翻转的概率。默认为0.5。
"""
def __init__(self, prob=0.5):
self.prob = prob
def __call__(self, im, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
label (int): 每张图像所对应的类别序号。
Returns:
tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据;
当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。
"""
if random.random() < self.prob:
im = vertical_flip(im)
if label is None:
return (im, )
else:
return (im, label)
class Normalize:
"""对图像进行标准化。
1. 对图像进行归一化到区间[0.0, 1.0]。
2. 对图像进行减均值除以标准差操作。
Args:
mean (list): 图像数据集的均值。默认为[0.485, 0.456, 0.406]。
std (list): 图像数据集的标准差。默认为[0.229, 0.224, 0.225]。
"""
def __init__(self, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
self.mean = mean
self.std = std
def __call__(self, im, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
label (int): 每张图像所对应的类别序号。
Returns:
tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据;
当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。
"""
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
std = np.array(self.std)[np.newaxis, np.newaxis, :]
im = normalize(im, mean, std)
if label is None:
return (im, )
else:
return (im, label)
class ResizeByShort:
"""根据图像短边对图像重新调整大小(resize)。
1. 获取图像的长边和短边长度。
2. 根据短边与short_size的比例,计算长边的目标长度,
此时高、宽的resize比例为short_size/原图短边长度。
3. 如果max_size>0,调整resize比例:
如果长边的目标长度>max_size,则高、宽的resize比例为max_size/原图长边长度;
4. 根据调整大小的比例对图像进行resize。
Args:
short_size (int): 调整大小后的图像目标短边长度。默认为256。
max_size (int): 长边目标长度的最大限制。默认为-1。
"""
def __init__(self, short_size=256, max_size=-1):
self.short_size = short_size
self.max_size = max_size
def __call__(self, im, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
label (int): 每张图像所对应的类别序号。
Returns:
tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据;
当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。
"""
im_short_size = min(im.shape[0], im.shape[1])
im_long_size = max(im.shape[0], im.shape[1])
scale = float(self.short_size) / im_short_size
if self.max_size > 0 and np.round(
scale * im_long_size) > self.max_size:
scale = float(self.max_size) / float(im_long_size)
resized_width = int(round(im.shape[1] * scale))
resized_height = int(round(im.shape[0] * scale))
im = cv2.resize(
im, (resized_width, resized_height),
interpolation=cv2.INTER_LINEAR)
if label is None:
return (im, )
else:
return (im, label)
class CenterCrop:
"""以图像中心点扩散裁剪长宽为`crop_size`的正方形
1. 计算剪裁的起始点。
2. 剪裁图像。
Args:
crop_size (int): 裁剪的目标边长。默认为224。
"""
def __init__(self, crop_size=224):
self.crop_size = crop_size
def __call__(self, im, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
label (int): 每张图像所对应的类别序号。
Returns:
tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据;
当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。
"""
im = center_crop(im, self.crop_size)
if label is None:
return (im, )
else:
return (im, label)
class RandomRotate:
def __init__(self, rotate_range=30, prob=0.5):
"""以一定的概率对图像在[-rotate_range, rotaterange]角度范围内进行旋转,模型训练时的数据增强操作。
Args:
rotate_range (int): 旋转度数的范围。默认为30。
prob (float): 随机旋转的概率。默认为0.5。
"""
self.rotate_range = rotate_range
self.prob = prob
def __call__(self, im, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
label (int): 每张图像所对应的类别序号。
Returns:
tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据;
当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。
"""
rotate_lower = -self.rotate_range
rotate_upper = self.rotate_range
im = im.astype('uint8')
im = Image.fromarray(im)
if np.random.uniform(0, 1) < self.prob:
im = rotate(im, rotate_lower, rotate_upper)
im = np.asarray(im).astype('float32')
if label is None:
return (im, )
else:
return (im, label)
class RandomDistort:
"""以一定的概率对图像进行随机像素内容变换,模型训练时的数据增强操作。
1. 对变换的操作顺序进行随机化操作。
2. 按照1中的顺序以一定的概率对图像在范围[-range, range]内进行随机像素内容变换。
Args:
brightness_range (float): 明亮度因子的范围。默认为0.9。
brightness_prob (float): 随机调整明亮度的概率。默认为0.5。
contrast_range (float): 对比度因子的范围。默认为0.9。
contrast_prob (float): 随机调整对比度的概率。默认为0.5。
saturation_range (float): 饱和度因子的范围。默认为0.9。
saturation_prob (float): 随机调整饱和度的概率。默认为0.5。
hue_range (int): 色调因子的范围。默认为18。
hue_prob (float): 随机调整色调的概率。默认为0.5。
"""
def __init__(self,
brightness_range=0.9,
brightness_prob=0.5,
contrast_range=0.9,
contrast_prob=0.5,
saturation_range=0.9,
saturation_prob=0.5,
hue_range=18,
hue_prob=0.5):
self.brightness_range = brightness_range
self.brightness_prob = brightness_prob
self.contrast_range = contrast_range
self.contrast_prob = contrast_prob
self.saturation_range = saturation_range
self.saturation_prob = saturation_prob
self.hue_range = hue_range
self.hue_prob = hue_prob
def __call__(self, im, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
label (int): 每张图像所对应的类别序号。
Returns:
tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据;
当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。
"""
brightness_lower = 1 - self.brightness_range
brightness_upper = 1 + self.brightness_range
contrast_lower = 1 - self.contrast_range
contrast_upper = 1 + self.contrast_range
saturation_lower = 1 - self.saturation_range
saturation_upper = 1 + self.saturation_range
hue_lower = -self.hue_range
hue_upper = self.hue_range
ops = [brightness, contrast, saturation, hue]
random.shuffle(ops)
params_dict = {
'brightness': {
'brightness_lower': brightness_lower,
'brightness_upper': brightness_upper
},
'contrast': {
'contrast_lower': contrast_lower,
'contrast_upper': contrast_upper
},
'saturation': {
'saturation_lower': saturation_lower,
'saturation_upper': saturation_upper
},
'hue': {
'hue_lower': hue_lower,
'hue_upper': hue_upper
}
}
prob_dict = {
'brightness': self.brightness_prob,
'contrast': self.contrast_prob,
'saturation': self.saturation_prob,
'hue': self.hue_prob,
}
im = im.astype('uint8')
im = Image.fromarray(im)
for id in range(len(ops)):
params = params_dict[ops[id].__name__]
prob = prob_dict[ops[id].__name__]
params['im'] = im
if np.random.uniform(0, 1) < prob:
im = ops[id](**params)
im = np.asarray(im).astype('float32')
if label is None:
return (im, )
else:
return (im, label)
class ArrangeClassifier:
"""获取训练/验证/预测所需信息。注意:此操作不需用户自己显示调用
Args:
mode (str): 指定数据用于何种用途,取值范围为['train', 'eval', 'test', 'quant']。
Raises:
ValueError: mode的取值不在['train', 'eval', 'test', 'quant']之内。
"""
def __init__(self, mode=None):
if mode not in ['train', 'eval', 'test', 'quant']:
raise ValueError(
"mode must be in ['train', 'eval', 'test', 'quant']!")
self.mode = mode
def __call__(self, im, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
label (int): 每张图像所对应的类别序号。
Returns:
tuple: 当mode为'train'或'eval'时,返回(im, label),分别对应图像np.ndarray数据、
图像类别id;当mode为'test'或'quant'时,返回(im, ),对应图像np.ndarray数据。
"""
im = permute(im, False)
if self.mode == 'train' or self.mode == 'eval':
outputs = (im, label)
else:
outputs = (im, )
return outputs
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .ops import *
from .box_utils import *
import random
import os.path as osp
import numpy as np
from PIL import Image, ImageEnhance
import cv2
class Compose:
"""根据数据预处理/增强列表对输入数据进行操作。
所有操作的输入图像流形状均是[H, W, C],其中H为图像高,W为图像宽,C为图像通道数。
Args:
transforms (list): 数据预处理/增强列表。
Raises:
TypeError: 形参数据类型不满足需求。
ValueError: 数据长度不匹配。
"""
def __init__(self, transforms):
if not isinstance(transforms, list):
raise TypeError('The transforms must be a list!')
if len(transforms) < 1:
raise ValueError('The length of transforms ' + \
'must be equal or larger than 1!')
self.transforms = transforms
def __call__(self, im, im_info=None, label_info=None):
"""
Args:
im (str/np.ndarray): 图像路径/图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息,dict中的字段如下:
- im_id (np.ndarray): 图像序列号,形状为(1,)。
- origin_shape (np.ndarray): 图像原始大小,形状为(2,),
origin_shape[0]为高,origin_shape[1]为宽。
- mixup (list): list为[im, im_info, label_info],分别对应
与当前图像进行mixup的图像np.ndarray数据、图像相关信息、标注框相关信息;
注意,当前epoch若无需进行mixup,则无该字段。
label_info (dict): 存储与标注框相关的信息,dict中的字段如下:
- gt_bbox (np.ndarray): 真实标注框坐标[x1, y1, x2, y2],形状为(n, 4),
其中n代表真实标注框的个数。
- gt_class (np.ndarray): 每个真实标注框对应的类别序号,形状为(n, 1),
其中n代表真实标注框的个数。
- gt_score (np.ndarray): 每个真实标注框对应的混合得分,形状为(n, 1),
其中n代表真实标注框的个数。
- gt_poly (list): 每个真实标注框内的多边形分割区域,每个分割区域由点的x、y坐标组成,
长度为n,其中n代表真实标注框的个数。
- is_crowd (np.ndarray): 每个真实标注框中是否是一组对象,形状为(n, 1),
其中n代表真实标注框的个数。
- difficult (np.ndarray): 每个真实标注框中的对象是否为难识别对象,形状为(n, 1),
其中n代表真实标注框的个数。
Returns:
tuple: 根据网络所需字段所组成的tuple;
字段由transforms中的最后一个数据预处理操作决定。
"""
def decode_image(im_file, im_info, label_info):
if im_info is None:
im_info = dict()
im = cv2.imread(im_file).astype('float32')
if im is None:
raise TypeError(
'Can\'t read The image file {}!'.format(im_file))
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
# make default im_info with [h, w, 1]
im_info['im_resize_info'] = np.array(
[im.shape[0], im.shape[1], 1.], dtype=np.float32)
# copy augment_shape from origin_shape
im_info['augment_shape'] = np.array([im.shape[0],
im.shape[1]]).astype('int32')
# decode mixup image
if 'mixup' in im_info:
im_info['mixup'] = \
decode_image(im_info['mixup'][0],
im_info['mixup'][1],
im_info['mixup'][2])
if label_info is None:
return (im, im_info)
else:
return (im, im_info, label_info)
outputs = decode_image(im, im_info, label_info)
im = outputs[0]
im_info = outputs[1]
if len(outputs) == 3:
label_info = outputs[2]
for op in self.transforms:
if im is None:
return None
outputs = op(im, im_info, label_info)
im = outputs[0]
return outputs
class ResizeByShort:
"""根据图像的短边调整图像大小(resize)。
1. 获取图像的长边和短边长度。
2. 根据短边与short_size的比例,计算长边的目标长度,
此时高、宽的resize比例为short_size/原图短边长度。
3. 如果max_size>0,调整resize比例:
如果长边的目标长度>max_size,则高、宽的resize比例为max_size/原图长边长度。
4. 根据调整大小的比例对图像进行resize。
Args:
target_size (int): 短边目标长度。默认为800。
max_size (int): 长边目标长度的最大限制。默认为1333。
Raises:
TypeError: 形参数据类型不满足需求。
"""
def __init__(self, short_size=800, max_size=1333):
self.max_size = int(max_size)
if not isinstance(short_size, int):
raise TypeError(
"Type of short_size is invalid. Must be Integer, now is {}".
format(type(short_size)))
self.short_size = short_size
if not (isinstance(self.max_size, int)):
raise TypeError("max_size: input type is invalid.")
def __call__(self, im, im_info=None, label_info=None):
"""
Args:
im (numnp.ndarraypy): 图像np.ndarray数据。
im_info (dict, 可选): 存储与图像相关的信息。
label_info (dict, 可选): 存储与标注框相关的信息。
Returns:
tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、
存储与标注框相关信息的字典。
其中,im_info更新字段为:
- im_resize_info (np.ndarray): resize后的图像高、resize后的图像宽、resize后的图像相对原始图的缩放比例
三者组成的np.ndarray,形状为(3,)。
Raises:
TypeError: 形参数据类型不满足需求。
ValueError: 数据长度不匹配。
"""
if im_info is None:
im_info = dict()
if not isinstance(im, np.ndarray):
raise TypeError("ResizeByShort: image type is not numpy.")
if len(im.shape) != 3:
raise ValueError('ResizeByShort: image is not 3-dimensional.')
im_short_size = min(im.shape[0], im.shape[1])
im_long_size = max(im.shape[0], im.shape[1])
scale = float(self.short_size) / im_short_size
if self.max_size > 0 and np.round(
scale * im_long_size) > self.max_size:
scale = float(self.max_size) / float(im_long_size)
resized_width = int(round(im.shape[1] * scale))
resized_height = int(round(im.shape[0] * scale))
im_resize_info = [resized_height, resized_width, scale]
im = cv2.resize(
im, (resized_width, resized_height),
interpolation=cv2.INTER_LINEAR)
im_info['im_resize_info'] = np.array(im_resize_info).astype(np.float32)
if label_info is None:
return (im, im_info)
else:
return (im, im_info, label_info)
class Padding:
"""将图像的长和宽padding至coarsest_stride的倍数。如输入图像为[300, 640],
`coarest_stride`为32,则由于300不为32的倍数,因此在图像最右和最下使用0值
进行padding,最终输出图像为[320, 640]。
1. 如果coarsest_stride为1则直接返回。
2. 获取图像的高H、宽W。
3. 计算填充后图像的高H_new、宽W_new。
4. 构建大小为(H_new, W_new, 3)像素值为0的np.ndarray,
并将原图的np.ndarray粘贴于左上角。
Args:
coarsest_stride (int): 填充后的图像长、宽为该参数的倍数,默认为1。
"""
def __init__(self, coarsest_stride=1):
self.coarsest_stride = coarsest_stride
def __call__(self, im, im_info=None, label_info=None):
"""
Args:
im (numnp.ndarraypy): 图像np.ndarray数据。
im_info (dict, 可选): 存储与图像相关的信息。
label_info (dict, 可选): 存储与标注框相关的信息。
Returns:
tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、
存储与标注框相关信息的字典。
Raises:
TypeError: 形参数据类型不满足需求。
ValueError: 数据长度不匹配。
"""
if self.coarsest_stride == 1:
if label_info is None:
return (im, im_info)
else:
return (im, im_info, label_info)
if im_info is None:
im_info = dict()
if not isinstance(im, np.ndarray):
raise TypeError("Padding: image type is not numpy.")
if len(im.shape) != 3:
raise ValueError('Padding: image is not 3-dimensional.')
im_h, im_w, im_c = im.shape[:]
if self.coarsest_stride > 1:
padding_im_h = int(
np.ceil(im_h / self.coarsest_stride) * self.coarsest_stride)
padding_im_w = int(
np.ceil(im_w / self.coarsest_stride) * self.coarsest_stride)
padding_im = np.zeros((padding_im_h, padding_im_w, im_c),
dtype=np.float32)
padding_im[:im_h, :im_w, :] = im
if label_info is None:
return (padding_im, im_info)
else:
return (padding_im, im_info, label_info)
class Resize:
"""调整图像大小(resize)。
- 当目标大小(target_size)类型为int时,根据插值方式,
将图像resize为[target_size, target_size]。
- 当目标大小(target_size)类型为list或tuple时,根据插值方式,
将图像resize为target_size。
注意:当插值方式为“RANDOM”时,则随机选取一种插值方式进行resize。
Args:
target_size (int/list/tuple): 短边目标长度。默认为608。
interp (str): resize的插值方式,与opencv的插值方式对应,取值范围为
['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4', 'RANDOM']。默认为"LINEAR"。
Raises:
TypeError: 形参数据类型不满足需求。
ValueError: 插值方式不在['NEAREST', 'LINEAR', 'CUBIC',
'AREA', 'LANCZOS4', 'RANDOM']中。
"""
# The interpolation mode
interp_dict = {
'NEAREST': cv2.INTER_NEAREST,
'LINEAR': cv2.INTER_LINEAR,
'CUBIC': cv2.INTER_CUBIC,
'AREA': cv2.INTER_AREA,
'LANCZOS4': cv2.INTER_LANCZOS4
}
def __init__(self, target_size=608, interp='LINEAR'):
self.interp = interp
if not (interp == "RANDOM" or interp in self.interp_dict):
raise ValueError("interp should be one of {}".format(
self.interp_dict.keys()))
if isinstance(target_size, list) or isinstance(target_size, tuple):
if len(target_size) != 2:
raise TypeError(
'when target is list or tuple, it should include 2 elements, but it is {}'
.format(target_size))
elif not isinstance(target_size, int):
raise TypeError(
"Type of target_size is invalid. Must be Integer or List or tuple, now is {}"
.format(type(target_size)))
self.target_size = target_size
def __call__(self, im, im_info=None, label_info=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict, 可选): 存储与图像相关的信息。
label_info (dict, 可选): 存储与标注框相关的信息。
Returns:
tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、
存储与标注框相关信息的字典。
Raises:
TypeError: 形参数据类型不满足需求。
ValueError: 数据长度不匹配。
"""
if im_info is None:
im_info = dict()
if not isinstance(im, np.ndarray):
raise TypeError("Resize: image type is not numpy.")
if len(im.shape) != 3:
raise ValueError('Resize: image is not 3-dimensional.')
if self.interp == "RANDOM":
interp = random.choice(list(self.interp_dict.keys()))
else:
interp = self.interp
im = resize(im, self.target_size, self.interp_dict[interp])
if label_info is None:
return (im, im_info)
else:
return (im, im_info, label_info)
class RandomHorizontalFlip:
"""随机翻转图像、标注框、分割信息,模型训练时的数据增强操作。
1. 随机采样一个0-1之间的小数,当小数小于水平翻转概率时,
执行2-4步操作,否则直接返回。
2. 水平翻转图像。
3. 计算翻转后的真实标注框的坐标,更新label_info中的gt_bbox信息。
4. 计算翻转后的真实分割区域的坐标,更新label_info中的gt_poly信息。
Args:
prob (float): 随机水平翻转的概率。默认为0.5。
Raises:
TypeError: 形参数据类型不满足需求。
"""
def __init__(self, prob=0.5):
self.prob = prob
if not isinstance(self.prob, float):
raise TypeError("RandomHorizontalFlip: input type is invalid.")
def __call__(self, im, im_info=None, label_info=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict, 可选): 存储与图像相关的信息。
label_info (dict, 可选): 存储与标注框相关的信息。
Returns:
tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、
存储与标注框相关信息的字典。
其中,im_info更新字段为:
- gt_bbox (np.ndarray): 水平翻转后的标注框坐标[x1, y1, x2, y2],形状为(n, 4),
其中n代表真实标注框的个数。
- gt_poly (list): 水平翻转后的多边形分割区域的x、y坐标,长度为n,
其中n代表真实标注框的个数。
Raises:
TypeError: 形参数据类型不满足需求。
ValueError: 数据长度不匹配。
"""
if not isinstance(im, np.ndarray):
raise TypeError(
"RandomHorizontalFlip: image is not a numpy array.")
if len(im.shape) != 3:
raise ValueError(
"RandomHorizontalFlip: image is not 3-dimensional.")
if im_info is None or label_info is None:
raise TypeError(
'Cannot do RandomHorizontalFlip! ' +
'Becasuse the im_info and label_info can not be None!')
if 'augment_shape' not in im_info:
raise TypeError('Cannot do RandomHorizontalFlip! ' + \
'Becasuse augment_shape is not in im_info!')
if 'gt_bbox' not in label_info:
raise TypeError('Cannot do RandomHorizontalFlip! ' + \
'Becasuse gt_bbox is not in label_info!')
augment_shape = im_info['augment_shape']
gt_bbox = label_info['gt_bbox']
height = augment_shape[0]
width = augment_shape[1]
if np.random.uniform(0, 1) < self.prob:
im = horizontal_flip(im)
if gt_bbox.shape[0] == 0:
if label_info is None:
return (im, im_info)
else:
return (im, im_info, label_info)
label_info['gt_bbox'] = box_horizontal_flip(gt_bbox, width)
if 'gt_poly' in label_info and \
len(label_info['gt_poly']) != 0:
label_info['gt_poly'] = segms_horizontal_flip(
label_info['gt_poly'], height, width)
if label_info is None:
return (im, im_info)
else:
return (im, im_info, label_info)
class Normalize:
"""对图像进行标准化。
1. 归一化图像到到区间[0.0, 1.0]。
2. 对图像进行减均值除以标准差操作。
Args:
mean (list): 图像数据集的均值。默认为[0.485, 0.456, 0.406]。
std (list): 图像数据集的标准差。默认为[0.229, 0.224, 0.225]。
Raises:
TypeError: 形参数据类型不满足需求。
"""
def __init__(self, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
self.mean = mean
self.std = std
if not (isinstance(self.mean, list) and isinstance(self.std, list)):
raise TypeError("NormalizeImage: input type is invalid.")
from functools import reduce
if reduce(lambda x, y: x * y, self.std) == 0:
raise TypeError('NormalizeImage: std is invalid!')
def __call__(self, im, im_info=None, label_info=None):
"""
Args:
im (numnp.ndarraypy): 图像np.ndarray数据。
im_info (dict, 可选): 存储与图像相关的信息。
label_info (dict, 可选): 存储与标注框相关的信息。
Returns:
tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、
存储与标注框相关信息的字典。
"""
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
std = np.array(self.std)[np.newaxis, np.newaxis, :]
im = normalize(im, mean, std)
if label_info is None:
return (im, im_info)
else:
return (im, im_info, label_info)
class RandomDistort:
"""以一定的概率对图像进行随机像素内容变换,模型训练时的数据增强操作
1. 对变换的操作顺序进行随机化操作。
2. 按照1中的顺序以一定的概率对图像进行随机像素内容变换。
Args:
brightness_range (float): 明亮度因子的范围。默认为0.5。
brightness_prob (float): 随机调整明亮度的概率。默认为0.5。
contrast_range (float): 对比度因子的范围。默认为0.5。
contrast_prob (float): 随机调整对比度的概率。默认为0.5。
saturation_range (float): 饱和度因子的范围。默认为0.5。
saturation_prob (float): 随机调整饱和度的概率。默认为0.5。
hue_range (int): 色调因子的范围。默认为18。
hue_prob (float): 随机调整色调的概率。默认为0.5。
is_order (bool): 是否按照固定顺序
[变换明亮度、变换对比度、变换饱和度、变换色彩]
执行像素内容变换操作。默认为False。
"""
def __init__(self,
brightness_range=0.5,
brightness_prob=0.5,
contrast_range=0.5,
contrast_prob=0.5,
saturation_range=0.5,
saturation_prob=0.5,
hue_range=18,
hue_prob=0.5,
is_order=False):
self.brightness_range = brightness_range
self.brightness_prob = brightness_prob
self.contrast_range = contrast_range
self.contrast_prob = contrast_prob
self.saturation_range = saturation_range
self.saturation_prob = saturation_prob
self.hue_range = hue_range
self.hue_prob = hue_prob
self.is_order = is_order
def __call__(self, im, im_info=None, label_info=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict, 可选): 存储与图像相关的信息。
label_info (dict, 可选): 存储与标注框相关的信息。
Returns:
tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、
存储与标注框相关信息的字典。
"""
brightness_lower = 1 - self.brightness_range
brightness_upper = 1 + self.brightness_range
contrast_lower = 1 - self.contrast_range
contrast_upper = 1 + self.contrast_range
saturation_lower = 1 - self.saturation_range
saturation_upper = 1 + self.saturation_range
hue_lower = -self.hue_range
hue_upper = self.hue_range
ops = [brightness, contrast, saturation, hue]
if self.is_order:
prob = np.random.uniform(0, 1)
if prob < 0.5:
ops = [
brightness,
saturation,
hue,
contrast,
]
else:
random.shuffle(ops)
params_dict = {
'brightness': {
'brightness_lower': brightness_lower,
'brightness_upper': brightness_upper
},
'contrast': {
'contrast_lower': contrast_lower,
'contrast_upper': contrast_upper
},
'saturation': {
'saturation_lower': saturation_lower,
'saturation_upper': saturation_upper
},
'hue': {
'hue_lower': hue_lower,
'hue_upper': hue_upper
}
}
prob_dict = {
'brightness': self.brightness_prob,
'contrast': self.contrast_prob,
'saturation': self.saturation_prob,
'hue': self.hue_prob
}
im = im.astype('uint8')
im = Image.fromarray(im)
for id in range(4):
params = params_dict[ops[id].__name__]
prob = prob_dict[ops[id].__name__]
params['im'] = im
if np.random.uniform(0, 1) < prob:
im = ops[id](**params)
im = np.asarray(im).astype('float32')
if label_info is None:
return (im, im_info)
else:
return (im, im_info, label_info)
class MixupImage:
"""对图像进行mixup操作,模型训练时的数据增强操作,目前仅YOLOv3模型支持该transform。
当label_info中不存在mixup字段时,直接返回,否则进行下述操作:
1. 从随机beta分布中抽取出随机因子factor。
2.
- 当factor>=1.0时,去除label_info中的mixup字段,直接返回。
- 当factor<=0.0时,直接返回label_info中的mixup字段,并在label_info中去除该字段。
- 其余情况,执行下述操作:
(1)原图像乘以factor,mixup图像乘以(1-factor),叠加2个结果。
(2)拼接原图像标注框和mixup图像标注框。
(3)拼接原图像标注框类别和mixup图像标注框类别。
(4)原图像标注框混合得分乘以factor,mixup图像标注框混合得分乘以(1-factor),叠加2个结果。
3. 更新im_info中的augment_shape信息。
Args:
alpha (float): 随机beta分布的下限。默认为1.5。
beta (float): 随机beta分布的上限。默认为1.5。
mixup_epoch (int): 在前mixup_epoch轮使用mixup增强操作;当该参数为-1时,该策略不会生效。
默认为-1。
Raises:
ValueError: 数据长度不匹配。
"""
def __init__(self, alpha=1.5, beta=1.5, mixup_epoch=-1):
self.alpha = alpha
self.beta = beta
if self.alpha <= 0.0:
raise ValueError("alpha shold be positive in MixupImage")
if self.beta <= 0.0:
raise ValueError("beta shold be positive in MixupImage")
self.mixup_epoch = mixup_epoch
def _mixup_img(self, img1, img2, factor):
h = max(img1.shape[0], img2.shape[0])
w = max(img1.shape[1], img2.shape[1])
img = np.zeros((h, w, img1.shape[2]), 'float32')
img[:img1.shape[0], :img1.shape[1], :] = \
img1.astype('float32') * factor
img[:img2.shape[0], :img2.shape[1], :] += \
img2.astype('float32') * (1.0 - factor)
return img.astype('uint8')
def __call__(self, im, im_info=None, label_info=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict, 可选): 存储与图像相关的信息。
label_info (dict, 可选): 存储与标注框相关的信息。
Returns:
tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、
存储与标注框相关信息的字典。
其中,im_info更新字段为:
- augment_shape (np.ndarray): mixup后的图像高、宽二者组成的np.ndarray,形状为(2,)。
im_info删除的字段:
- mixup (list): 与当前字段进行mixup的图像相关信息。
label_info更新字段为:
- gt_bbox (np.ndarray): mixup后真实标注框坐标,形状为(n, 4),
其中n代表真实标注框的个数。
- gt_class (np.ndarray): mixup后每个真实标注框对应的类别序号,形状为(n, 1),
其中n代表真实标注框的个数。
- gt_score (np.ndarray): mixup后每个真实标注框对应的混合得分,形状为(n, 1),
其中n代表真实标注框的个数。
Raises:
TypeError: 形参数据类型不满足需求。
"""
if im_info is None:
raise TypeError('Cannot do MixupImage! ' +
'Becasuse the im_info can not be None!')
if 'mixup' not in im_info:
if label_info is None:
return (im, im_info)
else:
return (im, im_info, label_info)
factor = np.random.beta(self.alpha, self.beta)
factor = max(0.0, min(1.0, factor))
if im_info['epoch'] > self.mixup_epoch \
or factor >= 1.0:
im_info.pop('mixup')
if label_info is None:
return (im, im_info)
else:
return (im, im_info, label_info)
if factor <= 0.0:
return im_info.pop('mixup')
im = self._mixup_img(im, im_info['mixup'][0], factor)
if label_info is None:
raise TypeError('Cannot do MixupImage! ' +
'Becasuse the label_info can not be None!')
if 'gt_bbox' not in label_info or \
'gt_class' not in label_info or \
'gt_score' not in label_info:
raise TypeError('Cannot do MixupImage! ' + \
'Becasuse gt_bbox/gt_class/gt_score is not in label_info!')
gt_bbox1 = label_info['gt_bbox']
gt_bbox2 = im_info['mixup'][2]['gt_bbox']
gt_bbox = np.concatenate((gt_bbox1, gt_bbox2), axis=0)
gt_class1 = label_info['gt_class']
gt_class2 = im_info['mixup'][2]['gt_class']
gt_class = np.concatenate((gt_class1, gt_class2), axis=0)
gt_score1 = label_info['gt_score']
gt_score2 = im_info['mixup'][2]['gt_score']
gt_score = np.concatenate(
(gt_score1 * factor, gt_score2 * (1. - factor)), axis=0)
label_info['gt_bbox'] = gt_bbox
label_info['gt_score'] = gt_score
label_info['gt_class'] = gt_class
im_info['augment_shape'] = np.array([im.shape[0],
im.shape[1]]).astype('int32')
im_info.pop('mixup')
if label_info is None:
return (im, im_info)
else:
return (im, im_info, label_info)
class RandomExpand:
"""随机扩张图像,模型训练时的数据增强操作。
1. 随机选取扩张比例(扩张比例大于1时才进行扩张)。
2. 计算扩张后图像大小。
3. 初始化像素值为数据集均值的图像,并将原图像随机粘贴于该图像上。
4. 根据原图像粘贴位置换算出扩张后真实标注框的位置坐标。
Args:
max_ratio (float): 图像扩张的最大比例。默认为4.0。
prob (float): 随机扩张的概率。默认为0.5。
mean (list): 图像数据集的均值(0-255)。默认为[127.5, 127.5, 127.5]。
"""
def __init__(self, max_ratio=4., prob=0.5, mean=[127.5, 127.5, 127.5]):
self.max_ratio = max_ratio
self.mean = mean
self.prob = prob
def __call__(self, im, im_info=None, label_info=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict, 可选): 存储与图像相关的信息。
label_info (dict, 可选): 存储与标注框相关的信息。
Returns:
tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、
存储与标注框相关信息的字典。
其中,im_info更新字段为:
- augment_shape (np.ndarray): 扩张后的图像高、宽二者组成的np.ndarray,形状为(2,)。
label_info更新字段为:
- gt_bbox (np.ndarray): 随机扩张后真实标注框坐标,形状为(n, 4),
其中n代表真实标注框的个数。
- gt_class (np.ndarray): 随机扩张后每个真实标注框对应的类别序号,形状为(n, 1),
其中n代表真实标注框的个数。
Raises:
TypeError: 形参数据类型不满足需求。
"""
if im_info is None or label_info is None:
raise TypeError(
'Cannot do RandomExpand! ' +
'Becasuse the im_info and label_info can not be None!')
if 'augment_shape' not in im_info:
raise TypeError('Cannot do RandomExpand! ' + \
'Becasuse augment_shape is not in im_info!')
if 'gt_bbox' not in label_info or \
'gt_class' not in label_info:
raise TypeError('Cannot do RandomExpand! ' + \
'Becasuse gt_bbox/gt_class is not in label_info!')
prob = np.random.uniform(0, 1)
augment_shape = im_info['augment_shape']
im_width = augment_shape[1]
im_height = augment_shape[0]
gt_bbox = label_info['gt_bbox']
gt_class = label_info['gt_class']
if prob < self.prob:
if self.max_ratio - 1 >= 0.01:
expand_ratio = np.random.uniform(1, self.max_ratio)
height = int(im_height * expand_ratio)
width = int(im_width * expand_ratio)
h_off = math.floor(np.random.uniform(0, height - im_height))
w_off = math.floor(np.random.uniform(0, width - im_width))
expand_bbox = [
-w_off / im_width, -h_off / im_height,
(width - w_off) / im_width, (height - h_off) / im_height
]
expand_im = np.ones((height, width, 3))
expand_im = np.uint8(expand_im * np.squeeze(self.mean))
expand_im = Image.fromarray(expand_im)
im = im.astype('uint8')
im = Image.fromarray(im)
expand_im.paste(im, (int(w_off), int(h_off)))
expand_im = np.asarray(expand_im)
for i in range(gt_bbox.shape[0]):
gt_bbox[i][0] = gt_bbox[i][0] / im_width
gt_bbox[i][1] = gt_bbox[i][1] / im_height
gt_bbox[i][2] = gt_bbox[i][2] / im_width
gt_bbox[i][3] = gt_bbox[i][3] / im_height
gt_bbox, gt_class, _ = filter_and_process(
expand_bbox, gt_bbox, gt_class)
for i in range(gt_bbox.shape[0]):
gt_bbox[i][0] = gt_bbox[i][0] * width
gt_bbox[i][1] = gt_bbox[i][1] * height
gt_bbox[i][2] = gt_bbox[i][2] * width
gt_bbox[i][3] = gt_bbox[i][3] * height
im = expand_im.astype('float32')
label_info['gt_bbox'] = gt_bbox
label_info['gt_class'] = gt_class
im_info['augment_shape'] = np.array([height,
width]).astype('int32')
if label_info is None:
return (im, im_info)
else:
return (im, im_info, label_info)
class RandomCrop:
"""随机裁剪图像。
1. 根据batch_sampler计算获取裁剪候选区域的位置。
(1) 根据min scale、max scale、min aspect ratio、max aspect ratio计算随机剪裁的高、宽。
(2) 根据随机剪裁的高、宽随机选取剪裁的起始点。
(3) 筛选出裁剪候选区域:
- 当satisfy_all为True时,需所有真实标注框与裁剪候选区域的重叠度满足需求时,该裁剪候选区域才可保留。
- 当satisfy_all为False时,当有一个真实标注框与裁剪候选区域的重叠度满足需求时,该裁剪候选区域就可保留。
2. 遍历所有裁剪候选区域:
(1) 若真实标注框与候选裁剪区域不重叠,或其中心点不在候选裁剪区域,
则将该真实标注框去除。
(2) 计算相对于该候选裁剪区域,真实标注框的位置,并筛选出对应的类别、混合得分。
(3) 若avoid_no_bbox为False,返回当前裁剪后的信息即可;
反之,要找到一个裁剪区域中真实标注框个数不为0的区域,才返回裁剪后的信息。
Args:
batch_sampler (list): 随机裁剪参数的多种组合,每种组合包含8个值,如下:
- max sample (int):满足当前组合的裁剪区域的个数上限。
- max trial (int): 查找满足当前组合的次数。
- min scale (float): 裁剪面积相对原面积,每条边缩短比例的最小限制。
- max scale (float): 裁剪面积相对原面积,每条边缩短比例的最大限制。
- min aspect ratio (float): 裁剪后短边缩放比例的最小限制。
- max aspect ratio (float): 裁剪后短边缩放比例的最大限制。
- min overlap (float): 真实标注框与裁剪图像重叠面积的最小限制。
- max overlap (float): 真实标注框与裁剪图像重叠面积的最大限制。
默认值为None,当为None时采用如下设置:
[[1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]]
satisfy_all (bool): 是否需要所有标注框满足条件,裁剪候选区域才保留。默认为False。
avoid_no_bbox (bool): 是否对裁剪图像不存在标注框的图像进行保留。默认为True。
"""
def __init__(self,
batch_sampler=None,
satisfy_all=False,
avoid_no_bbox=True):
if batch_sampler is None:
batch_sampler = [[1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 1.0],
[1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]]
self.batch_sampler = batch_sampler
self.satisfy_all = satisfy_all
self.avoid_no_bbox = avoid_no_bbox
def __call__(self, im, im_info=None, label_info=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict, 可选): 存储与图像相关的信息。
label_info (dict, 可选): 存储与标注框相关的信息。
Returns:
tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、
存储与标注框相关信息的字典。
其中,label_info更新字段为:
- gt_bbox (np.ndarray): 随机裁剪后真实标注框坐标,形状为(n, 4),
其中n代表真实标注框的个数。
- gt_class (np.ndarray): 随机裁剪后每个真实标注框对应的类别序号,形状为(n, 1),
其中n代表真实标注框的个数。
- gt_score (np.ndarray): 随机裁剪后每个真实标注框对应的混合得分,形状为(n, 1),
其中n代表真实标注框的个数。
Raises:
TypeError: 形参数据类型不满足需求。
"""
if im_info is None or label_info is None:
raise TypeError(
'Cannot do RandomCrop! ' +
'Becasuse the im_info and label_info can not be None!')
if 'augment_shape' not in im_info:
raise TypeError('Cannot do RandomCrop! ' + \
'Becasuse augment_shape is not in im_info!')
if 'gt_bbox' not in label_info or \
'gt_class' not in label_info:
raise TypeError('Cannot do RandomCrop! ' + \
'Becasuse gt_bbox/gt_class is not in label_info!')
augment_shape = im_info['augment_shape']
im_width = augment_shape[1]
im_height = augment_shape[0]
gt_bbox = label_info['gt_bbox']
gt_bbox_tmp = gt_bbox.copy()
for i in range(gt_bbox_tmp.shape[0]):
gt_bbox_tmp[i][0] = gt_bbox[i][0] / im_width
gt_bbox_tmp[i][1] = gt_bbox[i][1] / im_height
gt_bbox_tmp[i][2] = gt_bbox[i][2] / im_width
gt_bbox_tmp[i][3] = gt_bbox[i][3] / im_height
gt_class = label_info['gt_class']
gt_score = None
if 'gt_score' in label_info:
gt_score = label_info['gt_score']
sampled_bbox = []
gt_bbox_tmp = gt_bbox_tmp.tolist()
for sampler in self.batch_sampler:
found = 0
for i in range(sampler[1]):
if found >= sampler[0]:
break
sample_bbox = generate_sample_bbox(sampler)
if satisfy_sample_constraint(sampler, sample_bbox, gt_bbox_tmp,
self.satisfy_all):
sampled_bbox.append(sample_bbox)
found = found + 1
im = np.array(im)
while sampled_bbox:
idx = int(np.random.uniform(0, len(sampled_bbox)))
sample_bbox = sampled_bbox.pop(idx)
sample_bbox = clip_bbox(sample_bbox)
crop_bbox, crop_class, crop_score = \
filter_and_process(sample_bbox, gt_bbox_tmp, gt_class, gt_score)
if self.avoid_no_bbox:
if len(crop_bbox) < 1:
continue
xmin = int(sample_bbox[0] * im_width)
xmax = int(sample_bbox[2] * im_width)
ymin = int(sample_bbox[1] * im_height)
ymax = int(sample_bbox[3] * im_height)
im = im[ymin:ymax, xmin:xmax]
for i in range(crop_bbox.shape[0]):
crop_bbox[i][0] = crop_bbox[i][0] * (xmax - xmin)
crop_bbox[i][1] = crop_bbox[i][1] * (ymax - ymin)
crop_bbox[i][2] = crop_bbox[i][2] * (xmax - xmin)
crop_bbox[i][3] = crop_bbox[i][3] * (ymax - ymin)
label_info['gt_bbox'] = crop_bbox
label_info['gt_class'] = crop_class
label_info['gt_score'] = crop_score
im_info['augment_shape'] = np.array([ymax - ymin,
xmax - xmin]).astype('int32')
if label_info is None:
return (im, im_info)
else:
return (im, im_info, label_info)
if label_info is None:
return (im, im_info)
else:
return (im, im_info, label_info)
class ArrangeFasterRCNN:
"""获取FasterRCNN模型训练/验证/预测所需信息。
Args:
mode (str): 指定数据用于何种用途,取值范围为['train', 'eval', 'test', 'quant']。
Raises:
ValueError: mode的取值不在['train', 'eval', 'test', 'quant']之内。
"""
def __init__(self, mode=None):
if mode not in ['train', 'eval', 'test', 'quant']:
raise ValueError(
"mode must be in ['train', 'eval', 'test', 'quant']!")
self.mode = mode
def __call__(self, im, im_info=None, label_info=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict, 可选): 存储与图像相关的信息。
label_info (dict, 可选): 存储与标注框相关的信息。
Returns:
tuple: 当mode为'train'时,返回(im, im_resize_info, gt_bbox, gt_class, is_crowd),分别对应
图像np.ndarray数据、图像相当对于原图的resize信息、真实标注框、真实标注框对应的类别、真实标注框内是否是一组对象;
当mode为'eval'时,返回(im, im_resize_info, im_id, im_shape, gt_bbox, gt_class, is_difficult),
分别对应图像np.ndarray数据、图像相当对于原图的resize信息、图像id、图像大小信息、真实标注框、真实标注框对应的类别、
真实标注框是否为难识别对象;当mode为'test'或'quant'时,返回(im, im_resize_info, im_shape),分别对应图像np.ndarray数据、
图像相当对于原图的resize信息、图像大小信息。
Raises:
TypeError: 形参数据类型不满足需求。
ValueError: 数据长度不匹配。
"""
im = permute(im, False)
if self.mode == 'train':
if im_info is None or label_info is None:
raise TypeError(
'Cannot do ArrangeFasterRCNN! ' +
'Becasuse the im_info and label_info can not be None!')
if len(label_info['gt_bbox']) != len(label_info['gt_class']):
raise ValueError("gt num mismatch: bbox and class.")
im_resize_info = im_info['im_resize_info']
gt_bbox = label_info['gt_bbox']
gt_class = label_info['gt_class']
is_crowd = label_info['is_crowd']
outputs = (im, im_resize_info, gt_bbox, gt_class, is_crowd)
elif self.mode == 'eval':
if im_info is None or label_info is None:
raise TypeError(
'Cannot do ArrangeFasterRCNN! ' +
'Becasuse the im_info and label_info can not be None!')
im_resize_info = im_info['im_resize_info']
im_id = im_info['im_id']
im_shape = np.array(
(im_info['augment_shape'][0], im_info['augment_shape'][1], 1),
dtype=np.float32)
gt_bbox = label_info['gt_bbox']
gt_class = label_info['gt_class']
is_difficult = label_info['difficult']
outputs = (im, im_resize_info, im_id, im_shape, gt_bbox, gt_class,
is_difficult)
else:
if im_info is None:
raise TypeError('Cannot do ArrangeFasterRCNN! ' +
'Becasuse the im_info can not be None!')
im_resize_info = im_info['im_resize_info']
im_shape = np.array(
(im_info['augment_shape'][0], im_info['augment_shape'][1], 1),
dtype=np.float32)
outputs = (im, im_resize_info, im_shape)
return outputs
class ArrangeMaskRCNN:
"""获取MaskRCNN模型训练/验证/预测所需信息。
Args:
mode (str): 指定数据用于何种用途,取值范围为['train', 'eval', 'test', 'quant']。
Raises:
ValueError: mode的取值不在['train', 'eval', 'test', 'quant']之内。
"""
def __init__(self, mode=None):
if mode not in ['train', 'eval', 'test', 'quant']:
raise ValueError(
"mode must be in ['train', 'eval', 'test', 'quant']!")
self.mode = mode
def __call__(self, im, im_info=None, label_info=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict, 可选): 存储与图像相关的信息。
label_info (dict, 可选): 存储与标注框相关的信息。
Returns:
tuple: 当mode为'train'时,返回(im, im_resize_info, gt_bbox, gt_class, is_crowd, gt_masks),分别对应
图像np.ndarray数据、图像相当对于原图的resize信息、真实标注框、真实标注框对应的类别、真实标注框内是否是一组对象、
真实分割区域;当mode为'eval'时,返回(im, im_resize_info, im_id, im_shape),分别对应图像np.ndarray数据、
图像相当对于原图的resize信息、图像id、图像大小信息;当mode为'test'或'quant'时,返回(im, im_resize_info, im_shape),
分别对应图像np.ndarray数据、图像相当对于原图的resize信息、图像大小信息。
Raises:
TypeError: 形参数据类型不满足需求。
ValueError: 数据长度不匹配。
"""
im = permute(im, False)
if self.mode == 'train':
if im_info is None or label_info is None:
raise TypeError(
'Cannot do ArrangeTrainMaskRCNN! ' +
'Becasuse the im_info and label_info can not be None!')
if len(label_info['gt_bbox']) != len(label_info['gt_class']):
raise ValueError("gt num mismatch: bbox and class.")
im_resize_info = im_info['im_resize_info']
gt_bbox = label_info['gt_bbox']
gt_class = label_info['gt_class']
is_crowd = label_info['is_crowd']
assert 'gt_poly' in label_info
segms = label_info['gt_poly']
if len(segms) != 0:
assert len(segms) == is_crowd.shape[0]
gt_masks = []
valid = True
for i in range(len(segms)):
segm = segms[i]
gt_segm = []
if is_crowd[i]:
gt_segm.append([[0, 0]])
else:
for poly in segm:
if len(poly) == 0:
valid = False
break
gt_segm.append(np.array(poly).reshape(-1, 2))
if (not valid) or len(gt_segm) == 0:
break
gt_masks.append(gt_segm)
outputs = (im, im_resize_info, gt_bbox, gt_class, is_crowd,
gt_masks)
else:
if im_info is None:
raise TypeError('Cannot do ArrangeMaskRCNN! ' +
'Becasuse the im_info can not be None!')
im_resize_info = im_info['im_resize_info']
im_shape = np.array(
(im_info['augment_shape'][0], im_info['augment_shape'][1], 1),
dtype=np.float32)
if self.mode == 'eval':
im_id = im_info['im_id']
outputs = (im, im_resize_info, im_id, im_shape)
else:
outputs = (im, im_resize_info, im_shape)
return outputs
class ArrangeYOLOv3:
"""获取YOLOv3模型训练/验证/预测所需信息。
Args:
mode (str): 指定数据用于何种用途,取值范围为['train', 'eval', 'test', 'quant']。
Raises:
ValueError: mode的取值不在['train', 'eval', 'test', 'quant']之内。
"""
def __init__(self, mode=None):
if mode not in ['train', 'eval', 'test', 'quant']:
raise ValueError(
"mode must be in ['train', 'eval', 'test', 'quant']!")
self.mode = mode
def __call__(self, im, im_info=None, label_info=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict, 可选): 存储与图像相关的信息。
label_info (dict, 可选): 存储与标注框相关的信息。
Returns:
tuple: 当mode为'train'时,返回(im, gt_bbox, gt_class, gt_score, im_shape),分别对应
图像np.ndarray数据、真实标注框、真实标注框对应的类别、真实标注框混合得分、图像大小信息;
当mode为'eval'时,返回(im, im_shape, im_id, gt_bbox, gt_class, difficult),
分别对应图像np.ndarray数据、图像大小信息、图像id、真实标注框、真实标注框对应的类别、
真实标注框是否为难识别对象;当mode为'test'或'quant'时,返回(im, im_shape),
分别对应图像np.ndarray数据、图像大小信息。
Raises:
TypeError: 形参数据类型不满足需求。
ValueError: 数据长度不匹配。
"""
im = permute(im, False)
if self.mode == 'train':
if im_info is None or label_info is None:
raise TypeError(
'Cannot do ArrangeYolov3! ' +
'Becasuse the im_info and label_info can not be None!')
im_shape = im_info['augment_shape']
if len(label_info['gt_bbox']) != len(label_info['gt_class']):
raise ValueError("gt num mismatch: bbox and class.")
if len(label_info['gt_bbox']) != len(label_info['gt_score']):
raise ValueError("gt num mismatch: bbox and score.")
gt_bbox = np.zeros((50, 4), dtype=im.dtype)
gt_class = np.zeros((50, ), dtype=np.int32)
gt_score = np.zeros((50, ), dtype=im.dtype)
gt_num = min(50, len(label_info['gt_bbox']))
if gt_num > 0:
label_info['gt_class'][:gt_num, 0] = label_info[
'gt_class'][:gt_num, 0] - 1
gt_bbox[:gt_num, :] = label_info['gt_bbox'][:gt_num, :]
gt_class[:gt_num] = label_info['gt_class'][:gt_num, 0]
gt_score[:gt_num] = label_info['gt_score'][:gt_num, 0]
# parse [x1, y1, x2, y2] to [x, y, w, h]
gt_bbox[:, 2:4] = gt_bbox[:, 2:4] - gt_bbox[:, :2]
gt_bbox[:, :2] = gt_bbox[:, :2] + gt_bbox[:, 2:4] / 2.
outputs = (im, gt_bbox, gt_class, gt_score, im_shape)
elif self.mode == 'eval':
if im_info is None or label_info is None:
raise TypeError(
'Cannot do ArrangeYolov3! ' +
'Becasuse the im_info and label_info can not be None!')
im_shape = im_info['augment_shape']
if len(label_info['gt_bbox']) != len(label_info['gt_class']):
raise ValueError("gt num mismatch: bbox and class.")
im_id = im_info['im_id']
gt_bbox = np.zeros((50, 4), dtype=im.dtype)
gt_class = np.zeros((50, ), dtype=np.int32)
difficult = np.zeros((50, ), dtype=np.int32)
gt_num = min(50, len(label_info['gt_bbox']))
if gt_num > 0:
label_info['gt_class'][:gt_num, 0] = label_info[
'gt_class'][:gt_num, 0] - 1
gt_bbox[:gt_num, :] = label_info['gt_bbox'][:gt_num, :]
gt_class[:gt_num] = label_info['gt_class'][:gt_num, 0]
difficult[:gt_num] = label_info['difficult'][:gt_num, 0]
outputs = (im, im_shape, im_id, gt_bbox, gt_class, difficult)
else:
if im_info is None:
raise TypeError('Cannot do ArrangeYolov3! ' +
'Becasuse the im_info can not be None!')
im_shape = im_info['augment_shape']
outputs = (im, im_shape)
return outputs
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import cv2
import math
import numpy as np
from PIL import Image, ImageEnhance
def normalize(im, mean, std):
im = im / 255.0
im -= mean
im /= std
return im
def permute(im, to_bgr=False):
im = np.swapaxes(im, 1, 2)
im = np.swapaxes(im, 1, 0)
if to_bgr:
im = im[[2, 1, 0], :, :]
return im
def resize_long(im, long_size=224, interpolation=cv2.INTER_LINEAR):
value = max(im.shape[0], im.shape[1])
scale = float(long_size) / float(value)
resized_width = int(round(im.shape[1] * scale))
resized_height = int(round(im.shape[0] * scale))
im = cv2.resize(
im, (resized_width, resized_height), interpolation=interpolation)
return im
def resize(im, target_size=608, interp=cv2.INTER_LINEAR):
if isinstance(target_size, list) or isinstance(target_size, tuple):
w = target_size[0]
h = target_size[1]
else:
w = target_size
h = target_size
im = cv2.resize(im, (w, h), interpolation=interp)
return im
def random_crop(im,
crop_size=224,
lower_scale=0.08,
lower_ratio=3. / 4,
upper_ratio=4. / 3):
scale = [lower_scale, 1.0]
ratio = [lower_ratio, upper_ratio]
aspect_ratio = math.sqrt(np.random.uniform(*ratio))
w = 1. * aspect_ratio
h = 1. / aspect_ratio
bound = min((float(im.shape[0]) / im.shape[1]) / (h**2),
(float(im.shape[1]) / im.shape[0]) / (w**2))
scale_max = min(scale[1], bound)
scale_min = min(scale[0], bound)
target_area = im.shape[0] * im.shape[1] * np.random.uniform(
scale_min, scale_max)
target_size = math.sqrt(target_area)
w = int(target_size * w)
h = int(target_size * h)
i = np.random.randint(0, im.shape[0] - h + 1)
j = np.random.randint(0, im.shape[1] - w + 1)
im = im[i:i + h, j:j + w, :]
im = cv2.resize(im, (crop_size, crop_size))
return im
def center_crop(im, crop_size=224):
height, width = im.shape[:2]
w_start = (width - crop_size) // 2
h_start = (height - crop_size) // 2
w_end = w_start + crop_size
h_end = h_start + crop_size
im = im[h_start:h_end, w_start:w_end, :]
return im
def horizontal_flip(im):
if len(im.shape) == 3:
im = im[:, ::-1, :]
elif len(im.shape) == 2:
im = im[:, ::-1]
return im
def vertical_flip(im):
if len(im.shape) == 3:
im = im[::-1, :, :]
elif len(im.shape) == 2:
im = im[::-1, :]
return im
def bgr2rgb(im):
return im[:, :, ::-1]
def brightness(im, brightness_lower, brightness_upper):
brightness_delta = np.random.uniform(brightness_lower, brightness_upper)
im = ImageEnhance.Brightness(im).enhance(brightness_delta)
return im
def contrast(im, contrast_lower, contrast_upper):
contrast_delta = np.random.uniform(contrast_lower, contrast_upper)
im = ImageEnhance.Contrast(im).enhance(contrast_delta)
return im
def saturation(im, saturation_lower, saturation_upper):
saturation_delta = np.random.uniform(saturation_lower, saturation_upper)
im = ImageEnhance.Color(im).enhance(saturation_delta)
return im
def hue(im, hue_lower, hue_upper):
hue_delta = np.random.uniform(hue_lower, hue_upper)
im = np.array(im.convert('HSV'))
im[:, :, 0] = im[:, :, 0] + hue_delta
im = Image.fromarray(im, mode='HSV').convert('RGB')
return im
def rotate(im, rotate_lower, rotate_upper):
rotate_delta = np.random.uniform(rotate_lower, rotate_upper)
im = im.rotate(int(rotate_delta))
return im
def resize_padding(im, max_side_len=2400):
'''
resize image to a size multiple of 32 which is required by the network
:param im: the resized image
:param max_side_len: limit of max image size to avoid out of memory in gpu
:return: the resized image and the resize ratio
'''
h, w, _ = im.shape
resize_w = w
resize_h = h
# limit the max side
if max(resize_h, resize_w) > max_side_len:
ratio = float(
max_side_len) / resize_h if resize_h > resize_w else float(
max_side_len) / resize_w
else:
ratio = 1.
resize_h = int(resize_h * ratio)
resize_w = int(resize_w * ratio)
resize_h = resize_h if resize_h % 32 == 0 else (resize_h // 32 - 1) * 32
resize_w = resize_w if resize_w % 32 == 0 else (resize_w // 32 - 1) * 32
resize_h = max(32, resize_h)
resize_w = max(32, resize_w)
im = cv2.resize(im, (int(resize_w), int(resize_h)))
#im = cv2.resize(im, (512, 512))
ratio_h = resize_h / float(h)
ratio_w = resize_w / float(w)
_ratio = np.array([ratio_h, ratio_w]).reshape(-1, 2)
return im, _ratio
# coding: utf8
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .ops import *
import random
import os.path as osp
import numpy as np
from PIL import Image
import cv2
from collections import OrderedDict
class Compose:
"""根据数据预处理/增强算子对输入数据进行操作。
所有操作的输入图像流形状均是[H, W, C],其中H为图像高,W为图像宽,C为图像通道数。
Args:
transforms (list): 数据预处理/增强算子。
Raises:
TypeError: transforms不是list对象
ValueError: transforms元素个数小于1。
"""
def __init__(self, transforms):
if not isinstance(transforms, list):
raise TypeError('The transforms must be a list!')
if len(transforms) < 1:
raise ValueError('The length of transforms ' + \
'must be equal or larger than 1!')
self.transforms = transforms
self.to_rgb = False
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (str/np.ndarray): 图像路径/图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息,dict中的字段如下:
- shape_before_resize (tuple): 图像resize之前的大小(h, w)。
- shape_before_padding (tuple): 图像padding之前的大小(h, w)。
label (str/np.ndarray): 标注图像路径/标注图像np.ndarray数据。
Returns:
tuple: 根据网络所需字段所组成的tuple;字段由transforms中的最后一个数据预处理操作决定。
"""
if im_info is None:
im_info = dict()
im = cv2.imread(im).astype('float32')
if im is None:
raise ValueError('Can\'t read The image file {}!'.format(im))
if self.to_rgb:
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
if label is not None:
label = np.asarray(Image.open(label))
for op in self.transforms:
outputs = op(im, im_info, label)
im = outputs[0]
if len(outputs) >= 2:
im_info = outputs[1]
if len(outputs) == 3:
label = outputs[2]
return outputs
class RandomHorizontalFlip:
"""以一定的概率对图像进行水平翻转。当存在标注图像时,则同步进行翻转。
Args:
prob (float): 随机水平翻转的概率。默认值为0.5。
"""
def __init__(self, prob=0.5):
self.prob = prob
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if random.random() < self.prob:
im = horizontal_flip(im)
if label is not None:
label = horizontal_flip(label)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class RandomVerticalFlip:
"""以一定的概率对图像进行垂直翻转。当存在标注图像时,则同步进行翻转。
Args:
prob (float): 随机垂直翻转的概率。默认值为0.1。
"""
def __init__(self, prob=0.1):
self.prob = prob
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if random.random() < self.prob:
im = vertical_flip(im)
if label is not None:
label = vertical_flip(label)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class Resize:
"""调整图像大小(resize),当存在标注图像时,则同步进行处理。
- 当目标大小(target_size)类型为int时,根据插值方式,
将图像resize为[target_size, target_size]。
- 当目标大小(target_size)类型为list或tuple时,根据插值方式,
将图像resize为target_size, target_size的输入应为[w, h]或(w, h)。
Args:
target_size (int|list|tuple): 目标大小。
interp (str): resize的插值方式,与opencv的插值方式对应,
可选的值为['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4'],默认为"LINEAR"。
Raises:
TypeError: target_size不是int/list/tuple。
ValueError: target_size为list/tuple时元素个数不等于2。
AssertionError: interp的取值不在['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4']之内。
"""
# The interpolation mode
interp_dict = {
'NEAREST': cv2.INTER_NEAREST,
'LINEAR': cv2.INTER_LINEAR,
'CUBIC': cv2.INTER_CUBIC,
'AREA': cv2.INTER_AREA,
'LANCZOS4': cv2.INTER_LANCZOS4
}
def __init__(self, target_size, interp='LINEAR'):
self.interp = interp
assert interp in self.interp_dict, "interp should be one of {}".format(
interp_dict.keys())
if isinstance(target_size, list) or isinstance(target_size, tuple):
if len(target_size) != 2:
raise ValueError(
'when target is list or tuple, it should include 2 elements, but it is {}'
.format(target_size))
elif not isinstance(target_size, int):
raise TypeError(
"Type of target_size is invalid. Must be Integer or List or tuple, now is {}"
.format(type(target_size)))
self.target_size = target_size
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
其中,im_info跟新字段为:
-shape_before_resize (tuple): 保存resize之前图像的形状(h, w)。
Raises:
ZeroDivisionError: im的短边为0。
TypeError: im不是np.ndarray数据。
ValueError: im不是3维nd.ndarray。
"""
if im_info is None:
im_info = OrderedDict()
im_info['shape_before_resize'] = im.shape[:2]
if not isinstance(im, np.ndarray):
raise TypeError("ResizeImage: image type is not np.ndarray.")
if len(im.shape) != 3:
raise ValueError('ResizeImage: image is not 3-dimensional.')
im_shape = im.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
if float(im_size_min) == 0:
raise ZeroDivisionError('ResizeImage: min size of image is 0')
if isinstance(self.target_size, int):
resize_w = self.target_size
resize_h = self.target_size
else:
resize_w = self.target_size[0]
resize_h = self.target_size[1]
im_scale_x = float(resize_w) / float(im_shape[1])
im_scale_y = float(resize_h) / float(im_shape[0])
im = cv2.resize(
im,
None,
None,
fx=im_scale_x,
fy=im_scale_y,
interpolation=self.interp_dict[self.interp])
if label is not None:
label = cv2.resize(
label,
None,
None,
fx=im_scale_x,
fy=im_scale_y,
interpolation=self.interp_dict['NEAREST'])
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class ResizeByLong:
"""对图像长边resize到固定值,短边按比例进行缩放。当存在标注图像时,则同步进行处理。
Args:
long_size (int): resize后图像的长边大小。
"""
def __init__(self, long_size):
self.long_size = long_size
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
其中,im_info新增字段为:
-shape_before_resize (tuple): 保存resize之前图像的形状(h, w)。
"""
if im_info is None:
im_info = OrderedDict()
im_info['shape_before_resize'] = im.shape[:2]
im = resize_long(im, self.long_size)
if label is not None:
label = resize_long(label, self.long_size, cv2.INTER_NEAREST)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class ResizeRangeScaling:
"""对图像长边随机resize到指定范围内,短边按比例进行缩放。当存在标注图像时,则同步进行处理。
Args:
min_value (int): 图像长边resize后的最小值。默认值400。
max_value (int): 图像长边resize后的最大值。默认值600。
Raises:
ValueError: min_value大于max_value
"""
def __init__(self, min_value=400, max_value=600):
if min_value > max_value:
raise ValueError('min_value must be less than max_value, '
'but they are {} and {}.'.format(
min_value, max_value))
self.min_value = min_value
self.max_value = max_value
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if self.min_value == self.max_value:
random_size = self.max_value
else:
random_size = int(
np.random.uniform(self.min_value, self.max_value) + 0.5)
im = resize_long(im, random_size, cv2.INTER_LINEAR)
if label is not None:
label = resize_long(label, random_size, cv2.INTER_NEAREST)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class ResizeStepScaling:
"""对图像按照某一个比例resize,这个比例以scale_step_size为步长
在[min_scale_factor, max_scale_factor]随机变动。当存在标注图像时,则同步进行处理。
Args:
min_scale_factor(float), resize最小尺度。默认值0.75。
max_scale_factor (float), resize最大尺度。默认值1.25。
scale_step_size (float), resize尺度范围间隔。默认值0.25。
Raises:
ValueError: min_scale_factor大于max_scale_factor
"""
def __init__(self,
min_scale_factor=0.75,
max_scale_factor=1.25,
scale_step_size=0.25):
if min_scale_factor > max_scale_factor:
raise ValueError(
'min_scale_factor must be less than max_scale_factor, '
'but they are {} and {}.'.format(min_scale_factor,
max_scale_factor))
self.min_scale_factor = min_scale_factor
self.max_scale_factor = max_scale_factor
self.scale_step_size = scale_step_size
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if self.min_scale_factor == self.max_scale_factor:
scale_factor = self.min_scale_factor
elif self.scale_step_size == 0:
scale_factor = np.random.uniform(self.min_scale_factor,
self.max_scale_factor)
else:
num_steps = int((self.max_scale_factor - self.min_scale_factor) /
self.scale_step_size + 1)
scale_factors = np.linspace(self.min_scale_factor,
self.max_scale_factor,
num_steps).tolist()
np.random.shuffle(scale_factors)
scale_factor = scale_factors[0]
im = cv2.resize(
im, (0, 0),
fx=scale_factor,
fy=scale_factor,
interpolation=cv2.INTER_LINEAR)
if label is not None:
label = cv2.resize(
label, (0, 0),
fx=scale_factor,
fy=scale_factor,
interpolation=cv2.INTER_NEAREST)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class Normalize:
"""对图像进行标准化。
1.尺度缩放到 [0,1]。
2.对图像进行减均值除以标准差操作。
Args:
mean (list): 图像数据集的均值。默认值[0.5, 0.5, 0.5]。
std (list): 图像数据集的标准差。默认值[0.5, 0.5, 0.5]。
Raises:
ValueError: mean或std不是list对象。std包含0。
"""
def __init__(self, mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]):
self.mean = mean
self.std = std
if not (isinstance(self.mean, list) and isinstance(self.std, list)):
raise ValueError("{}: input type is invalid.".format(self))
from functools import reduce
if reduce(lambda x, y: x * y, self.std) == 0:
raise ValueError('{}: std is invalid!'.format(self))
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
std = np.array(self.std)[np.newaxis, np.newaxis, :]
im = normalize(im, mean, std)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class Padding:
"""对图像或标注图像进行padding,padding方向为右和下。
根据提供的值对图像或标注图像进行padding操作。
Args:
target_size (int|list|tuple): padding后图像的大小。
im_padding_value (list): 图像padding的值。默认为[127.5, 127.5, 127.5]。
label_padding_value (int): 标注图像padding的值。默认值为255。
Raises:
TypeError: target_size不是int|list|tuple。
ValueError: target_size为list|tuple时元素个数不等于2。
"""
def __init__(self,
target_size,
im_padding_value=[127.5, 127.5, 127.5],
label_padding_value=255):
if isinstance(target_size, list) or isinstance(target_size, tuple):
if len(target_size) != 2:
raise ValueError(
'when target is list or tuple, it should include 2 elements, but it is {}'
.format(target_size))
elif not isinstance(target_size, int):
raise TypeError(
"Type of target_size is invalid. Must be Integer or List or tuple, now is {}"
.format(type(target_size)))
self.target_size = target_size
self.im_padding_value = im_padding_value
self.label_padding_value = label_padding_value
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
其中,im_info新增字段为:
-shape_before_padding (tuple): 保存padding之前图像的形状(h, w)。
Raises:
ValueError: 输入图像im或label的形状大于目标值
"""
if im_info is None:
im_info = OrderedDict()
im_info['shape_before_padding'] = im.shape[:2]
im_height, im_width = im.shape[0], im.shape[1]
if isinstance(self.target_size, int):
target_height = self.target_size
target_width = self.target_size
else:
target_height = self.target_size[1]
target_width = self.target_size[0]
pad_height = target_height - im_height
pad_width = target_width - im_width
if pad_height < 0 or pad_width < 0:
raise ValueError(
'the size of image should be less than target_size, but the size of image ({}, {}), is larger than target_size ({}, {})'
.format(im_width, im_height, target_width, target_height))
else:
im = cv2.copyMakeBorder(
im,
0,
pad_height,
0,
pad_width,
cv2.BORDER_CONSTANT,
value=self.im_padding_value)
if label is not None:
label = cv2.copyMakeBorder(
label,
0,
pad_height,
0,
pad_width,
cv2.BORDER_CONSTANT,
value=self.label_padding_value)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class RandomPaddingCrop:
"""对图像和标注图进行随机裁剪,当所需要的裁剪尺寸大于原图时,则进行padding操作。
Args:
crop_size (int|list|tuple): 裁剪图像大小。默认为512。
im_padding_value (list): 图像padding的值。默认为[127.5, 127.5, 127.5]。
label_padding_value (int): 标注图像padding的值。默认值为255。
Raises:
TypeError: crop_size不是int/list/tuple。
ValueError: target_size为list/tuple时元素个数不等于2。
"""
def __init__(self,
crop_size=512,
im_padding_value=[127.5, 127.5, 127.5],
label_padding_value=255):
if isinstance(crop_size, list) or isinstance(crop_size, tuple):
if len(crop_size) != 2:
raise ValueError(
'when crop_size is list or tuple, it should include 2 elements, but it is {}'
.format(crop_size))
elif not isinstance(crop_size, int):
raise TypeError(
"Type of crop_size is invalid. Must be Integer or List or tuple, now is {}"
.format(type(crop_size)))
self.crop_size = crop_size
self.im_padding_value = im_padding_value
self.label_padding_value = label_padding_value
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if isinstance(self.crop_size, int):
crop_width = self.crop_size
crop_height = self.crop_size
else:
crop_width = self.crop_size[0]
crop_height = self.crop_size[1]
img_height = im.shape[0]
img_width = im.shape[1]
if img_height == crop_height and img_width == crop_width:
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
else:
pad_height = max(crop_height - img_height, 0)
pad_width = max(crop_width - img_width, 0)
if (pad_height > 0 or pad_width > 0):
im = cv2.copyMakeBorder(
im,
0,
pad_height,
0,
pad_width,
cv2.BORDER_CONSTANT,
value=self.im_padding_value)
if label is not None:
label = cv2.copyMakeBorder(
label,
0,
pad_height,
0,
pad_width,
cv2.BORDER_CONSTANT,
value=self.label_padding_value)
img_height = im.shape[0]
img_width = im.shape[1]
if crop_height > 0 and crop_width > 0:
h_off = np.random.randint(img_height - crop_height + 1)
w_off = np.random.randint(img_width - crop_width + 1)
im = im[h_off:(crop_height + h_off), w_off:(
w_off + crop_width), :]
if label is not None:
label = label[h_off:(crop_height + h_off), w_off:(
w_off + crop_width)]
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class RandomBlur:
"""以一定的概率对图像进行高斯模糊。
Args:
prob (float): 图像模糊概率。默认为0.1。
"""
def __init__(self, prob=0.1):
self.prob = prob
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if self.prob <= 0:
n = 0
elif self.prob >= 1:
n = 1
else:
n = int(1.0 / self.prob)
if n > 0:
if np.random.randint(0, n) == 0:
radius = np.random.randint(3, 10)
if radius % 2 != 1:
radius = radius + 1
if radius > 9:
radius = 9
im = cv2.GaussianBlur(im, (radius, radius), 0, 0)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class RandomRotate:
"""对图像进行随机旋转, 模型训练时的数据增强操作。
在旋转区间[-rotate_range, rotate_range]内,对图像进行随机旋转,当存在标注图像时,同步进行,
并对旋转后的图像和标注图像进行相应的padding。
Args:
rotate_range (float): 最大旋转角度。默认为15度。
im_padding_value (list): 图像padding的值。默认为[127.5, 127.5, 127.5]。
label_padding_value (int): 标注图像padding的值。默认为255。
"""
def __init__(self,
rotate_range=15,
im_padding_value=[127.5, 127.5, 127.5],
label_padding_value=255):
self.rotate_range = rotate_range
self.im_padding_value = im_padding_value
self.label_padding_value = label_padding_value
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if self.rotate_range > 0:
(h, w) = im.shape[:2]
do_rotation = np.random.uniform(-self.rotate_range,
self.rotate_range)
pc = (w // 2, h // 2)
r = cv2.getRotationMatrix2D(pc, do_rotation, 1.0)
cos = np.abs(r[0, 0])
sin = np.abs(r[0, 1])
nw = int((h * sin) + (w * cos))
nh = int((h * cos) + (w * sin))
(cx, cy) = pc
r[0, 2] += (nw / 2) - cx
r[1, 2] += (nh / 2) - cy
dsize = (nw, nh)
im = cv2.warpAffine(
im,
r,
dsize=dsize,
flags=cv2.INTER_LINEAR,
borderMode=cv2.BORDER_CONSTANT,
borderValue=self.im_padding_value)
label = cv2.warpAffine(
label,
r,
dsize=dsize,
flags=cv2.INTER_NEAREST,
borderMode=cv2.BORDER_CONSTANT,
borderValue=self.label_padding_value)
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class RandomScaleAspect:
"""裁剪并resize回原始尺寸的图像和标注图像。
按照一定的面积比和宽高比对图像进行裁剪,并reszie回原始图像的图像,当存在标注图时,同步进行。
Args:
min_scale (float):裁取图像占原始图像的面积比,取值[0,1],为0时则返回原图。默认为0.5。
aspect_ratio (float): 裁取图像的宽高比范围,非负值,为0时返回原图。默认为0.33。
"""
def __init__(self, min_scale=0.5, aspect_ratio=0.33):
self.min_scale = min_scale
self.aspect_ratio = aspect_ratio
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
if self.min_scale != 0 and self.aspect_ratio != 0:
img_height = im.shape[0]
img_width = im.shape[1]
for i in range(0, 10):
area = img_height * img_width
target_area = area * np.random.uniform(self.min_scale, 1.0)
aspectRatio = np.random.uniform(self.aspect_ratio,
1.0 / self.aspect_ratio)
dw = int(np.sqrt(target_area * 1.0 * aspectRatio))
dh = int(np.sqrt(target_area * 1.0 / aspectRatio))
if (np.random.randint(10) < 5):
tmp = dw
dw = dh
dh = tmp
if (dh < img_height and dw < img_width):
h1 = np.random.randint(0, img_height - dh)
w1 = np.random.randint(0, img_width - dw)
im = im[h1:(h1 + dh), w1:(w1 + dw), :]
label = label[h1:(h1 + dh), w1:(w1 + dw)]
im = cv2.resize(
im, (img_width, img_height),
interpolation=cv2.INTER_LINEAR)
label = cv2.resize(
label, (img_width, img_height),
interpolation=cv2.INTER_NEAREST)
break
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class RandomDistort:
"""对图像进行随机失真。
1. 对变换的操作顺序进行随机化操作。
2. 按照1中的顺序以一定的概率对图像进行随机像素内容变换。
Args:
brightness_range (float): 明亮度因子的范围。默认为0.5。
brightness_prob (float): 随机调整明亮度的概率。默认为0.5。
contrast_range (float): 对比度因子的范围。默认为0.5。
contrast_prob (float): 随机调整对比度的概率。默认为0.5。
saturation_range (float): 饱和度因子的范围。默认为0.5。
saturation_prob (float): 随机调整饱和度的概率。默认为0.5。
hue_range (int): 色调因子的范围。默认为18。
hue_prob (float): 随机调整色调的概率。默认为0.5。
"""
def __init__(self,
brightness_range=0.5,
brightness_prob=0.5,
contrast_range=0.5,
contrast_prob=0.5,
saturation_range=0.5,
saturation_prob=0.5,
hue_range=18,
hue_prob=0.5):
self.brightness_range = brightness_range
self.brightness_prob = brightness_prob
self.contrast_range = contrast_range
self.contrast_prob = contrast_prob
self.saturation_range = saturation_range
self.saturation_prob = saturation_prob
self.hue_range = hue_range
self.hue_prob = hue_prob
def __call__(self, im, im_info=None, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
存储与图像相关信息的字典和标注图像np.ndarray数据。
"""
brightness_lower = 1 - self.brightness_range
brightness_upper = 1 + self.brightness_range
contrast_lower = 1 - self.contrast_range
contrast_upper = 1 + self.contrast_range
saturation_lower = 1 - self.saturation_range
saturation_upper = 1 + self.saturation_range
hue_lower = -self.hue_range
hue_upper = self.hue_range
ops = [brightness, contrast, saturation, hue]
random.shuffle(ops)
params_dict = {
'brightness': {
'brightness_lower': brightness_lower,
'brightness_upper': brightness_upper
},
'contrast': {
'contrast_lower': contrast_lower,
'contrast_upper': contrast_upper
},
'saturation': {
'saturation_lower': saturation_lower,
'saturation_upper': saturation_upper
},
'hue': {
'hue_lower': hue_lower,
'hue_upper': hue_upper
}
}
prob_dict = {
'brightness': self.brightness_prob,
'contrast': self.contrast_prob,
'saturation': self.saturation_prob,
'hue': self.hue_prob
}
im = im.astype('uint8')
im = Image.fromarray(im)
for id in range(4):
params = params_dict[ops[id].__name__]
prob = prob_dict[ops[id].__name__]
params['im'] = im
if np.random.uniform(0, 1) < prob:
im = ops[id](**params)
im = np.asarray(im).astype('float32')
if label is None:
return (im, im_info)
else:
return (im, im_info, label)
class ArrangeSegmenter:
"""获取训练/验证/预测所需的信息。
Args:
mode (str): 指定数据用于何种用途,取值范围为['train', 'eval', 'test', 'quant']。
Raises:
ValueError: mode的取值不在['train', 'eval', 'test', 'quant']之内
"""
def __init__(self, mode):
if mode not in ['train', 'eval', 'test', 'quant']:
raise ValueError(
"mode should be defined as one of ['train', 'eval', 'test', 'quant']!"
)
self.mode = mode
def __call__(self, im, im_info, label=None):
"""
Args:
im (np.ndarray): 图像np.ndarray数据。
im_info (dict): 存储与图像相关的信息。
label (np.ndarray): 标注图像np.ndarray数据。
Returns:
tuple: 当mode为'train'或'eval'时,返回的tuple为(im, label),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
当mode为'test'时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;当mode为
'quant'时,返回的tuple为(im,),为图像np.ndarray数据。
"""
im = permute(im, False)
if self.mode == 'train' or self.mode == 'eval':
label = label[np.newaxis, :, :]
return (im, label)
elif self.mode == 'test':
return (im, im_info)
else:
return (im, )
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from . import cv
FasterRCNN = cv.models.FasterRCNN
YOLOv3 = cv.models.YOLOv3
MaskRCNN = cv.models.MaskRCNN
transforms = cv.transforms.det_transforms
visualize = cv.models.utils.visualize.visualize_detection
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from . import cv
UNet = cv.models.UNet
DeepLabv3p = cv.models.DeepLabv3p
transforms = cv.transforms.seg_transforms
visualize = cv.models.utils.visualize.visualize_segmentation
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from .cv.models.slim import prune
from .cv.models.slim import visualize
cal_params_sensitivities = prune.cal_params_sensitivities
visualize = visualize.visualize
def export_quant_model(model,
test_dataset,
batch_size=1,
batch_num=10,
save_dir='./quant_model',
cache_dir='./temp'):
model.export_quant_model(
dataset=test_dataset,
batch_size=batch_size,
batch_num=batch_num,
save_dir=save_dir,
cache_dir='./temp')
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from . import logging
from . import utils
from . import save
from .utils import seconds_to_hms
from .download import download
from .download import decompress
from .download import download_and_decompress
import os
import os.path as osp
import shutil
import requests
import tqdm
import time
import hashlib
import tarfile
import zipfile
from .utils import logging
DOWNLOAD_RETRY_LIMIT = 3
def md5check(fullname, md5sum=None):
if md5sum is None:
return True
logger.info("File {} md5 checking...".format(fullname))
md5 = hashlib.md5()
with open(fullname, 'rb') as f:
for chunk in iter(lambda: f.read(4096), b""):
md5.update(chunk)
calc_md5sum = md5.hexdigest()
if calc_md5sum != md5sum:
logger.info("File {} md5 check failed, {}(calc) != "
"{}(base)".format(fullname, calc_md5sum, md5sum))
return False
return True
def move_and_merge_tree(src, dst):
"""
Move src directory to dst, if dst is already exists,
merge src to dst
"""
if not osp.exists(dst):
shutil.move(src, dst)
else:
for fp in os.listdir(src):
src_fp = osp.join(src, fp)
dst_fp = osp.join(dst, fp)
if osp.isdir(src_fp):
if osp.isdir(dst_fp):
move_and_merge_tree(src_fp, dst_fp)
else:
shutil.move(src_fp, dst_fp)
elif osp.isfile(src_fp) and \
not osp.isfile(dst_fp):
shutil.move(src_fp, dst_fp)
def download(url, path, md5sum=None):
"""
Download from url, save to path.
url (str): download url
path (str): download to given path
"""
if not osp.exists(path):
os.makedirs(path)
fname = osp.split(url)[-1]
fullname = osp.join(path, fname)
retry_cnt = 0
while not (osp.exists(fullname) and md5check(fullname, md5sum)):
if retry_cnt < DOWNLOAD_RETRY_LIMIT:
retry_cnt += 1
else:
logging.debug("{} download failed.".format(fname))
raise RuntimeError("Download from {} failed. "
"Retry limit reached".format(url))
logging.info("Downloading {} from {}".format(fname, url))
req = requests.get(url, stream=True)
if req.status_code != 200:
raise RuntimeError("Downloading from {} failed with code "
"{}!".format(url, req.status_code))
# For protecting download interupted, download to
# tmp_fullname firstly, move tmp_fullname to fullname
# after download finished
tmp_fullname = fullname + "_tmp"
total_size = req.headers.get('content-length')
with open(tmp_fullname, 'wb') as f:
if total_size:
download_size = 0
current_time = time.time()
for chunk in tqdm.tqdm(
req.iter_content(chunk_size=1024),
total=(int(total_size) + 1023) // 1024,
unit='KB'):
f.write(chunk)
download_size += 1024
if download_size % 524288 == 0:
total_size_m = round(
int(total_size) / 1024.0 / 1024.0, 2)
download_size_m = round(
download_size / 1024.0 / 1024.0, 2)
speed = int(
524288 / (time.time() - current_time + 0.01) /
1024.0)
current_time = time.time()
logging.debug(
"Downloading: TotalSize={}M, DownloadSize={}M, Speed={}KB/s"
.format(total_size_m, download_size_m, speed))
else:
for chunk in req.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
shutil.move(tmp_fullname, fullname)
logging.debug("{} download completed.".format(fname))
return fullname
def decompress(fname):
"""
Decompress for zip and tar file
"""
logging.info("Decompressing {}...".format(fname))
# For protecting decompressing interupted,
# decompress to fpath_tmp directory firstly, if decompress
# successed, move decompress files to fpath and delete
# fpath_tmp and remove download compress file.
fpath = osp.split(fname)[0]
fpath_tmp = osp.join(fpath, 'tmp')
if osp.isdir(fpath_tmp):
shutil.rmtree(fpath_tmp)
os.makedirs(fpath_tmp)
if fname.find('tar') >= 0 or fname.find('tgz') >= 0:
with tarfile.open(fname) as tf:
tf.extractall(path=fpath_tmp)
elif fname.find('zip') >= 0:
with zipfile.ZipFile(fname) as zf:
zf.extractall(path=fpath_tmp)
else:
raise TypeError("Unsupport compress file type {}".format(fname))
for f in os.listdir(fpath_tmp):
src_dir = osp.join(fpath_tmp, f)
dst_dir = osp.join(fpath, f)
move_and_merge_tree(src_dir, dst_dir)
shutil.rmtree(fpath_tmp)
logging.debug("{} decompressed.".format(fname))
def download_and_decompress(url, path='.'):
download(url, path)
fname = osp.split(url)[-1]
decompress(osp.join(path, fname))
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
import os
import sys
import colorama
from colorama import init
import paddlex
levels = {0: 'ERROR', 1: 'WARNING', 2: 'INFO', 3: 'DEBUG'}
def log(level=2, message="", use_color=False):
current_time = time.time()
time_array = time.localtime(current_time)
current_time = time.strftime("%Y-%m-%d %H:%M:%S", time_array)
if paddlex.log_level >= level:
if use_color:
init(autoreset=True)
print("\033[1;31;40m{} [{}]\t{}\033[0m".format(
current_time, levels[level],
message).encode("utf-8").decode("latin1"))
else:
print(
"{} [{}]\t{}".format(current_time, levels[level],
message).encode("utf-8").decode("latin1"))
sys.stdout.flush()
def debug(message="", use_color=False):
log(level=3, message=message, use_color=use_color)
def info(message="", use_color=False):
log(level=2, message=message, use_color=use_color)
def warning(message="", use_color=False):
log(level=1, message=message, use_color=use_color)
def error(message="", use_color=False):
log(level=0, message=message, use_color=use_color)
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import six
import os
import errno
import warnings
import six
import numpy as np
import paddle
from paddle.fluid import layers
from paddle.fluid import core
from paddle.fluid import unique_name
from paddle.fluid.executor import global_scope
from paddle.fluid.compiler import CompiledProgram
from paddle.fluid.framework import Program, Parameter, default_main_program, default_startup_program, Variable, \
program_guard
__all__ = ["save_mask_inference_model"]
def _get_valid_program(main_program):
if main_program is None:
main_program = default_main_program()
elif isinstance(main_program, CompiledProgram):
main_program = main_program._program
if main_program is None:
raise TypeError("program should be as Program type or None")
warnings.warn(
"The input is a CompiledProgram, this is not recommended.")
if not isinstance(main_program, Program):
raise TypeError("program should be as Program type or None")
return main_program
def prepend_feed_ops(inference_program,
feed_target_names,
feed_holder_name='feed'):
if len(feed_target_names) == 0:
return
global_block = inference_program.global_block()
feed_var = global_block.create_var(
name=feed_holder_name,
type=core.VarDesc.VarType.FEED_MINIBATCH,
persistable=True)
for i, name in enumerate(feed_target_names):
out = global_block.var(name)
global_block._prepend_op(
type='feed',
inputs={'X': [feed_var]},
outputs={'Out': [out]},
attrs={'col': i})
def append_fetch_ops(inference_program,
fetch_target_names,
fetch_holder_name='fetch'):
global_block = inference_program.global_block()
fetch_var = global_block.create_var(
name=fetch_holder_name,
type=core.VarDesc.VarType.FETCH_LIST,
persistable=True)
for i, name in enumerate(fetch_target_names):
global_block.append_op(
type='fetch',
inputs={'X': [name]},
outputs={'Out': [fetch_var]},
attrs={'col': i})
def _clone_var_in_block_(block, var):
assert isinstance(var, Variable)
if var.desc.type() == core.VarDesc.VarType.LOD_TENSOR:
return block.create_var(
name=var.name,
shape=var.shape,
dtype=var.dtype,
type=var.type,
lod_level=var.lod_level,
persistable=True)
else:
return block.create_var(
name=var.name,
shape=var.shape,
dtype=var.dtype,
type=var.type,
persistable=True)
def save_vars(executor,
dirname,
main_program=None,
vars=None,
predicate=None,
filename=None):
"""
This API saves specific variables in the `Program` to files.
There are two ways to specify the variables to be saved: set variables in
a list and assign it to the `vars`, or use the `predicate` function to select
variables that make `predicate(variable) == True`. The first way has a higher priority.
The `dirname` is used to specify the folder where to save variables.
If you prefer to save variables in separate files in the `dirname` floder,
do not set `filename`. If you prefer to save all variables in a single file,
use `filename` to specify it.
Args:
executor(Executor): The executor to run for saving variables.
dirname(str, optional): The folder where to save variables.
When you need to save the parameter to the memory, set it to None.
main_program(Program, optional): The program whose variables will be saved.
If it is None, the default main program will
be used automatically.
Default: None
vars(list[Variable], optional): The list contains all variables to be saved.
Default: None
predicate(function, optional): The function selects the variables that make
`predicate(variable) == True`.
Default: None
filename(str, optional): If you prefer to save all variables in a single file,
use `filename` to specify it. Otherwise, let `filename` be None.
Default: None
Returns:
str: When saving parameters to a file, returns None.
When saving parameters to memory, returns a binary string containing parameters.
Raises:
TypeError: If `main_program` is not an instance of Program nor None.
Examples:
.. code-block:: python
import paddle.fluid as fluid
main_prog = fluid.Program()
startup_prog = fluid.Program()
with fluid.program_guard(main_prog, startup_prog):
data = fluid.layers.data(name="img", shape=[64, 784], append_batch_size=False)
w = fluid.layers.create_parameter(shape=[784, 200], dtype='float32', name='fc_w')
b = fluid.layers.create_parameter(shape=[200], dtype='float32', name='fc_b')
hidden_w = fluid.layers.matmul(x=data, y=w)
hidden_b = fluid.layers.elementwise_add(hidden_w, b)
place = fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(startup_prog)
# The first usage: use `vars` to set the saved variables.
var_list = [w, b]
path = "./my_paddle_vars"
fluid.io.save_vars(executor=exe, dirname=path, vars=var_list,
filename="vars_file")
# w and b will be save in a file named "var_file".
# The second usage: use `predicate` to select the saved variable.
def name_has_fc(var):
res = "fc" in var.name
return res
param_path = "./my_paddle_model"
fluid.io.save_vars(executor=exe, dirname=param_path, main_program=main_prog, vars=None, predicate = name_has_fc)
# all variables whose names contain "fc " are saved.
"""
save_to_memory = False
if dirname is None and filename is None:
save_to_memory = True
main_program = _get_valid_program(main_program)
if vars is None:
return save_vars(
executor,
main_program=main_program,
dirname=dirname,
vars=list(filter(predicate, main_program.list_vars())),
filename=filename)
else:
params_var_name = unique_name.generate("saved_params")
# give warning when there is no var in model
if len(list(vars)) == 0:
warnings.warn(
"no variable in your model, please ensure there are any variables in your model to save"
)
return None
save_program = Program()
save_block = save_program.global_block()
save_var_map = {}
for each_var in vars:
# NOTE: don't save the variable which type is RAW
if each_var.type == core.VarDesc.VarType.RAW:
continue
new_var = _clone_var_in_block_(save_block, each_var)
if filename is None and save_to_memory is False:
save_file_path = os.path.join(
os.path.normpath(dirname), new_var.name)
save_block.append_op(
type='save',
inputs={'X': [new_var]},
outputs={},
attrs={'file_path': os.path.normpath(save_file_path)})
else:
save_var_map[new_var.name] = new_var
if filename is not None or save_to_memory:
save_var_list = []
for name in sorted(save_var_map.keys()):
save_var_list.append(save_var_map[name])
save_path = str()
if save_to_memory is False:
save_path = os.path.join(os.path.normpath(dirname), filename)
saved_params = save_block.create_var(
type=core.VarDesc.VarType.RAW, name=params_var_name)
saved_params.desc.set_persistable(True)
save_block.append_op(
type='save_combine',
inputs={'X': save_var_list},
outputs={'Y': saved_params},
attrs={
'file_path': save_path,
'save_to_memory': save_to_memory
})
#NOTE(zhiqiu): save op will add variable kLookupTablePath in save_program.desc,
# which leads to diff on save_program and its desc. Call _sync_with_cpp
# to keep consistency.
save_program._sync_with_cpp()
executor.run(save_program)
if save_to_memory:
return global_scope().find_var(params_var_name).get_bytes()
def _save_distributed_persistables(executor, dirname, main_program):
"""
save_persistables for distributed training.
the method will do things listed below:
1.save part of persistable variables on trainer.
2.receive "remote prefetch variables" from parameter servers and merge them.
3.save "distributed lookup table" on parameter servers.
4.receive "optimizer variables" from parameter servers and merge them.
Args:
executor(Executor): The executor to run for saving parameters.
dirname(str): The saving directory path.
main_program(Program): The program whose parameters will be
saved. the main_program must be the trainer_program
get after transpiler.
Returns:
None
Examples:
.. code-block:: python
import paddle.fluid as fluid
exe = fluid.Executor(fluid.CPUPlace())
param_path = "./my_paddle_model"
t = distribute_transpiler.DistributeTranspiler()
t.transpile(...)
train_program = t.get_trainer_program()
_save_distributed_persistables(executor=exe, dirname=param_path, main_program=train_program)
"""
def __save_remote_params(executor, dirname, remote_params_map):
"""
recive params on pserver through rpc.
if the params are be sliced, will concat them to one, then save it.
"""
if not remote_params_map:
return
prog = Program()
block = prog.global_block()
# recv optimize vars from pserver
for name, remote_params in remote_params_map.items():
origin = remote_params[0].origin
is_slice = remote_params[0].is_slice
slices = [None] * len(remote_params)
slice_varnames = [None] * len(remote_params)
remote_varnames = [None] * len(remote_params)
endpoints = [None] * len(remote_params)
for idx, optimizer in enumerate(remote_params):
block_id = optimizer.block_id
slice = optimizer.slice
endpoint = optimizer.endpoint
index = block_id if is_slice else idx
slices[index] = slice
slice_varnames[index] = "{}.slice.{}".format(slice.name, idx)
remote_varnames[index] = slice.name
endpoints[index] = endpoint
slice_shapes = []
for slice in slices:
tmp = [str(dim) for dim in slice.shape]
slice_shapes.append(",".join(tmp))
block.append_op(
type='recv_save',
attrs={
"trainer_id": 0,
"shape": origin.shape,
"slice_shapes": slice_shapes,
"slice_varnames": slice_varnames,
"remote_varnames": remote_varnames,
"endpoints": endpoints,
"file_path": os.path.join(dirname, origin.name)
})
executor.run(prog)
def is_persistable(var):
"""
Check whether the given variable is persistable.
Args:
var(Variable): The variable to be checked.
Returns:
bool: True if the given `var` is persistable
False if not.
Examples:
.. code-block:: python
import paddle.fluid as fluid
param = fluid.default_main_program().global_block().var('fc.b')
res = fluid.io.is_persistable(param)
"""
if var.desc.type() == core.VarDesc.VarType.FEED_MINIBATCH or \
var.desc.type() == core.VarDesc.VarType.FETCH_LIST or \
var.desc.type() == core.VarDesc.VarType.READER:
return False
return var.persistable
def save_persistables(executor, dirname, main_program=None, filename=None):
"""
This operator saves all persistable variables from :code:`main_program` to
the folder :code:`dirname` or file :code:`filename`. You can refer to
:ref:`api_guide_model_save_reader_en` for more details. And then
saves these persistables variables to the folder :code:`dirname` or file
:code:`filename`.
The :code:`dirname` is used to specify the folder where persistable variables
are going to be saved. If you would like to save variables in separate
files, set :code:`filename` None; if you would like to save all variables in a
single file, use :code:`filename` to specify the file name.
Args:
executor(Executor): The executor to run for saving persistable variables.
You can refer to :ref:`api_guide_executor_en` for
more details.
dirname(str, optional): The saving directory path.
When you need to save the parameter to the memory, set it to None.
main_program(Program, optional): The program whose persistbale variables will
be saved. You can refer to
:ref:`api_guide_Program_en` for more details.
If it is None, the default main program will
be used.
Default: None.
filename(str, optional): The file to save all variables. If you prefer to
save variables in different files, set it to None.
Default: None.
Returns:
str: When saving parameters to a file, returns None.
When saving parameters to memory, returns a binary string containing parameters.
Examples:
.. code-block:: python
import paddle.fluid as fluid
dir_path = "./my_paddle_model"
file_name = "persistables"
image = fluid.data(name='img', shape=[None, 28, 28], dtype='float32')
label = fluid.data(name='label', shape=[None, 1], dtype='int64')
feeder = fluid.DataFeeder(feed_list=[image, label], place=fluid.CPUPlace())
predict = fluid.layers.fc(input=image, size=10, act='softmax')
loss = fluid.layers.cross_entropy(input=predict, label=label)
avg_loss = fluid.layers.mean(loss)
exe = fluid.Executor(fluid.CPUPlace())
exe.run(fluid.default_startup_program())
fluid.io.save_persistables(executor=exe, dirname=dir_path, filename=file_name)
# The persistables variables weights and bias in the fc layer of the network
# are going to be saved in the same file named "persistables" in the path
# "./my_paddle_model"
"""
if main_program and main_program._is_distributed:
return _save_distributed_persistables(
executor, dirname=dirname, main_program=main_program)
else:
return save_vars(
executor,
dirname=dirname,
main_program=main_program,
vars=None,
predicate=is_persistable,
filename=filename)
def save_mask_inference_model(dirname,
feeded_var_names,
target_vars,
executor,
main_program=None,
model_filename=None,
params_filename=None,
export_for_deployment=True,
program_only=False):
"""
Prune the given `main_program` to build a new program especially for inference,
and then save it and all related parameters to given `dirname` .
If you just want to save parameters of your trained model, please use the
:ref:`api_fluid_io_save_params` . You can refer to :ref:`api_guide_model_save_reader_en`
for more details.
Note:
The :code:`dirname` is used to specify the folder where inference model
structure and parameters are going to be saved. If you would like to save params of
Program in separate files, set `params_filename` None; if you would like to save all
params of Program in a single file, use `params_filename` to specify the file name.
Args:
dirname(str): The directory path to save the inference model.
feeded_var_names(list[str]): list of string. Names of variables that need to be feeded
data during inference.
target_vars(list[Variable]): list of Variable. Variables from which we can get
inference results.
executor(Executor): The executor that saves the inference model. You can refer
to :ref:`api_guide_executor_en` for more details.
main_program(Program, optional): The original program, which will be pruned to
build the inference model. If is setted None,
the global default :code:`_main_program_` will be used.
Default: None.
model_filename(str, optional): The name of file to save the inference program
itself. If is setted None, a default filename
:code:`__model__` will be used.
params_filename(str, optional): The name of file to save all related parameters.
If it is setted None, parameters will be saved
in separate files .
export_for_deployment(bool): If True, programs are modified to only support
direct inference deployment. Otherwise,
more information will be stored for flexible
optimization and re-training. Currently, only
True is supported.
Default: True.
program_only(bool, optional): If True, It will save inference program only, and do not
save params of Program.
Default: False.
Returns:
The fetch variables' name list
Return Type:
list
Raises:
ValueError: If `feed_var_names` is not a list of basestring, an exception is thrown.
ValueError: If `target_vars` is not a list of Variable, an exception is thrown.
Examples:
.. code-block:: python
import paddle.fluid as fluid
path = "./infer_model"
# User defined network, here a softmax regresssion example
image = fluid.data(name='img', shape=[None, 28, 28], dtype='float32')
label = fluid.data(name='label', shape=[None, 1], dtype='int64')
feeder = fluid.DataFeeder(feed_list=[image, label], place=fluid.CPUPlace())
predict = fluid.layers.fc(input=image, size=10, act='softmax')
loss = fluid.layers.cross_entropy(input=predict, label=label)
avg_loss = fluid.layers.mean(loss)
exe = fluid.Executor(fluid.CPUPlace())
exe.run(fluid.default_startup_program())
# Feed data and train process
# Save inference model. Note we don't save label and loss in this example
fluid.io.save_inference_model(dirname=path,
feeded_var_names=['img'],
target_vars=[predict],
executor=exe)
# In this example, the save_inference_mode inference will prune the default
# main program according to the network's input node (img) and output node(predict).
# The pruned inference program is going to be saved in the "./infer_model/__model__"
# and parameters are going to be saved in separate files under folder
# "./infer_model".
"""
if isinstance(feeded_var_names, six.string_types):
feeded_var_names = [feeded_var_names]
elif export_for_deployment:
if len(feeded_var_names) > 0:
# TODO(paddle-dev): polish these code blocks
if not (bool(feeded_var_names) and all(
isinstance(name, six.string_types)
for name in feeded_var_names)):
raise ValueError("'feed_var_names' should be a list of str.")
if isinstance(target_vars, Variable):
target_vars = [target_vars]
elif export_for_deployment:
if not (bool(target_vars)
and all(isinstance(var, Variable) for var in target_vars)):
raise ValueError("'target_vars' should be a list of Variable.")
main_program = _get_valid_program(main_program)
# remind user to set auc_states to zeros if the program contains auc op
all_ops = main_program.global_block().ops
for op in all_ops:
if op.type == 'auc':
warnings.warn(
"please ensure that you have set the auc states to zeros before saving inference model"
)
break
# fix the bug that the activation op's output as target will be pruned.
# will affect the inference performance.
# TODO(Superjomn) add an IR pass to remove 1-scale op.
with program_guard(main_program):
uniq_target_vars = []
for i, var in enumerate(target_vars):
if isinstance(var, Variable):
var = layers.scale(
var, 1., name="save_infer_model/scale_{}".format(i))
uniq_target_vars.append(var)
target_vars = uniq_target_vars
target_var_name_list = [var.name for var in target_vars]
# when a pserver and a trainer running on the same machine, mkdir may conflict
save_dirname = dirname
try:
save_dirname = os.path.normpath(dirname)
os.makedirs(save_dirname)
except OSError as e:
if e.errno != errno.EEXIST:
raise
if model_filename is not None:
model_basename = os.path.basename(model_filename)
else:
model_basename = "__model__"
model_basename = os.path.join(save_dirname, model_basename)
# When export_for_deployment is true, we modify the program online so that
# it can only be loaded for inference directly. If it's false, the whole
# original program and related meta are saved so that future usage can be
# more flexible.
origin_program = main_program.clone()
if export_for_deployment:
main_program = main_program.clone()
global_block = main_program.global_block()
need_to_remove_op_index = []
for i, op in enumerate(global_block.ops):
op.desc.set_is_target(False)
if op.type == "feed" or op.type == "fetch":
need_to_remove_op_index.append(i)
for index in need_to_remove_op_index[::-1]:
global_block._remove_op(index)
main_program.desc.flush()
main_program = main_program._prune_with_input(
feeded_var_names=feeded_var_names, targets=target_vars)
main_program = main_program._inference_optimize(prune_read_op=True)
fetch_var_names = [v.name for v in target_vars]
prepend_feed_ops(main_program, feeded_var_names)
append_fetch_ops(main_program, fetch_var_names)
main_program.desc._set_version()
paddle.fluid.core.save_op_compatible_info(main_program.desc)
with open(model_basename, "wb") as f:
f.write(main_program.desc.serialize_to_string())
else:
# TODO(panyx0718): Save more information so that it can also be used
# for training and more flexible post-processing.
with open(model_basename + ".main_program", "wb") as f:
f.write(main_program.desc.serialize_to_string())
if program_only:
warnings.warn(
"save_inference_model specified the param `program_only` to True, It will not save params of Program."
)
return target_var_name_list
main_program._copy_dist_param_info_from(origin_program)
if params_filename is not None:
params_filename = os.path.basename(params_filename)
save_persistables(executor, save_dirname, main_program, params_filename)
return target_var_name_list
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import time
import os
import os.path as osp
import numpy as np
import six
import yaml
import math
from . import logging
def seconds_to_hms(seconds):
h = math.floor(seconds / 3600)
m = math.floor((seconds - h * 3600) / 60)
s = int(seconds - h * 3600 - m * 60)
hms_str = "{}:{}:{}".format(h, m, s)
return hms_str
def setting_environ_flags():
if 'FLAGS_eager_delete_tensor_gb' not in os.environ:
os.environ['FLAGS_eager_delete_tensor_gb'] = '0.0'
if 'FLAGS_allocator_strategy' not in os.environ:
os.environ['FLAGS_allocator_strategy'] = 'auto_growth'
if "CUDA_VISIBLE_DEVICES" in os.environ:
if os.environ["CUDA_VISIBLE_DEVICES"].count("-1") > 0:
os.environ["CUDA_VISIBLE_DEVICES"] = ""
def get_environ_info():
setting_environ_flags()
import paddle.fluid as fluid
info = dict()
info['place'] = 'cpu'
info['num'] = int(os.environ.get('CPU_NUM', 1))
if os.environ.get('CUDA_VISIBLE_DEVICES', None) != "":
if hasattr(fluid.core, 'get_cuda_device_count'):
gpu_num = 0
try:
gpu_num = fluid.core.get_cuda_device_count()
except:
os.environ['CUDA_VISIBLE_DEVICES'] = ''
pass
if gpu_num > 0:
info['place'] = 'cuda'
info['num'] = fluid.core.get_cuda_device_count()
return info
def parse_param_file(param_file, return_shape=True):
from paddle.fluid.proto.framework_pb2 import VarType
f = open(param_file, 'rb')
version = np.fromstring(f.read(4), dtype='int32')
lod_level = np.fromstring(f.read(8), dtype='int64')
for i in range(int(lod_level)):
_size = np.fromstring(f.read(8), dtype='int64')
_ = f.read(_size)
version = np.fromstring(f.read(4), dtype='int32')
tensor_desc = VarType.TensorDesc()
tensor_desc_size = np.fromstring(f.read(4), dtype='int32')
tensor_desc.ParseFromString(f.read(int(tensor_desc_size)))
tensor_shape = tuple(tensor_desc.dims)
if return_shape:
f.close()
return tuple(tensor_desc.dims)
if tensor_desc.data_type != 5:
raise Exception(
"Unexpected data type while parse {}".format(param_file))
data_size = 4
for i in range(len(tensor_shape)):
data_size *= tensor_shape[i]
weight = np.fromstring(f.read(data_size), dtype='float32')
f.close()
return np.reshape(weight, tensor_shape)
def fuse_bn_weights(exe, main_prog, weights_dir):
import paddle.fluid as fluid
logging.info("Try to fuse weights of batch_norm...")
bn_vars = list()
for block in main_prog.blocks:
ops = list(block.ops)
for op in ops:
if op.type == 'affine_channel':
scale_name = op.input('Scale')[0]
bias_name = op.input('Bias')[0]
prefix = scale_name[:-5]
mean_name = prefix + 'mean'
variance_name = prefix + 'variance'
if not osp.exists(osp.join(
weights_dir, mean_name)) or not osp.exists(
osp.join(weights_dir, variance_name)):
logging.info(
"There's no batch_norm weight found to fuse, skip fuse_bn."
)
return
bias = block.var(bias_name)
pretrained_shape = parse_param_file(
osp.join(weights_dir, bias_name))
actual_shape = tuple(bias.shape)
if pretrained_shape != actual_shape:
continue
bn_vars.append(
[scale_name, bias_name, mean_name, variance_name])
eps = 1e-5
for names in bn_vars:
scale_name, bias_name, mean_name, variance_name = names
scale = parse_param_file(
osp.join(weights_dir, scale_name), return_shape=False)
bias = parse_param_file(
osp.join(weights_dir, bias_name), return_shape=False)
mean = parse_param_file(
osp.join(weights_dir, mean_name), return_shape=False)
variance = parse_param_file(
osp.join(weights_dir, variance_name), return_shape=False)
bn_std = np.sqrt(np.add(variance, eps))
new_scale = np.float32(np.divide(scale, bn_std))
new_bias = bias - mean * new_scale
scale_tensor = fluid.global_scope().find_var(scale_name).get_tensor()
bias_tensor = fluid.global_scope().find_var(bias_name).get_tensor()
scale_tensor.set(new_scale, exe.place)
bias_tensor.set(new_bias, exe.place)
if len(bn_vars) == 0:
logging.info(
"There's no batch_norm weight found to fuse, skip fuse_bn.")
else:
logging.info("There's {} batch_norm ops been fused.".format(
len(bn_vars)))
def load_pdparams(exe, main_prog, model_dir):
import paddle.fluid as fluid
from paddle.fluid.proto.framework_pb2 import VarType
from paddle.fluid.framework import Program
vars_to_load = list()
import pickle
with open(osp.join(model_dir, 'model.pdparams'), 'rb') as f:
params_dict = pickle.load(f) if six.PY2 else pickle.load(
f, encoding='latin1')
unused_vars = list()
for var in main_prog.list_vars():
if not isinstance(var, fluid.framework.Parameter):
continue
if var.name not in params_dict:
raise Exception("{} is not in saved paddlex model".format(
var.name))
if var.shape != params_dict[var.name].shape:
unused_vars.append(var.name)
logging.warning(
"[SKIP] Shape of pretrained weight {} doesn't match.(Pretrained: {}, Actual: {})"
.format(var.name, params_dict[var.name].shape, var.shape))
continue
vars_to_load.append(var)
logging.debug("Weight {} will be load".format(var.name))
for var_name in unused_vars:
del params_dict[var_name]
fluid.io.set_program_state(main_prog, params_dict)
if len(vars_to_load) == 0:
logging.warning(
"There is no pretrain weights loaded, maybe you should check you pretrain model!"
)
else:
logging.info("There are {} varaibles in {} are loaded.".format(
len(vars_to_load), model_dir))
def load_pretrain_weights(exe, main_prog, weights_dir, fuse_bn=False):
if not osp.exists(weights_dir):
raise Exception("Path {} not exists.".format(weights_dir))
if osp.exists(osp.join(weights_dir, "model.pdparams")):
return load_pdparams(exe, main_prog, weights_dir)
import paddle.fluid as fluid
vars_to_load = list()
for var in main_prog.list_vars():
if not isinstance(var, fluid.framework.Parameter):
continue
if not osp.exists(osp.join(weights_dir, var.name)):
logging.debug(
"[SKIP] Pretrained weight {}/{} doesn't exist".format(
weights_dir, var.name))
continue
pretrained_shape = parse_param_file(osp.join(weights_dir, var.name))
actual_shape = tuple(var.shape)
if pretrained_shape != actual_shape:
logging.warning(
"[SKIP] Shape of pretrained weight {}/{} doesn't match.(Pretrained: {}, Actual: {})"
.format(weights_dir, var.name, pretrained_shape, actual_shape))
continue
vars_to_load.append(var)
logging.debug("Weight {} will be load".format(var.name))
fluid.io.load_vars(
executor=exe,
dirname=weights_dir,
main_program=main_prog,
vars=vars_to_load)
if len(vars_to_load) == 0:
logging.warning(
"There is no pretrain weights loaded, maybe you should check you pretrain model!"
)
else:
logging.info("There are {} varaibles in {} are loaded.".format(
len(vars_to_load), weights_dir))
if fuse_bn:
fuse_bn_weights(exe, main_prog, weights_dir)
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import setuptools
import sys
long_description = "PaddleX. A end-to-end deeplearning model development toolkit base on PaddlePaddle\n\n"
setuptools.setup(
name="paddlex",
version='0.1.0',
author="paddlex",
author_email="paddlex@baidu.com",
description=long_description,
long_description=long_description,
long_description_content_type="text/plain",
url="https://github.com/PaddlePaddle/PaddleX",
packages=setuptools.find_packages(),
setup_requires=['cython', 'numpy', 'sklearn'],
install_requires=[
'pycocotools', 'pyyaml', 'colorama', 'tqdm', 'visualdl==1.3.0',
'paddleslim==1.0.1', 'paddlehub>=1.6.2'
],
classifiers=[
"Programming Language :: Python :: 3",
"License :: OSI Approved :: Apache Software License",
"Operating System :: OS Independent",
],
license='Apache 2.0',
entry_points={'console_scripts': [
'paddlex=paddlex.command:main',
]})
# 使用教程——模型压缩
本目录下整理了使用PaddleX进行模型裁剪训练的代码,代码中均提供了数据的自动下载,并使用单张GPU卡进行训练。
PaddleX提供了两种裁剪训练方式,
1. 用户自行计算裁剪配置(推荐),整体流程为
> 1.使用数据训练原始模型;
> 2.使用第1步训练好的模型,在验证集上计算各个模型参数的敏感度,并将敏感信息保存至本地文件
> 3.再次使用数据训练原始模型,在训练时调用`train`接口时,传入第2步计算得到的参数敏感信息文件,
> 4.模型在训练过程中,会根据传入的参数敏感信息文件,对模型结构裁剪后,继续迭代训练
>
2. 使用PaddleX预先计算好的参数敏感度信息文件,整体流程为
> 1. 在训练调用'train'接口时,将`sensetivities_file`参数设为`DEFAULT`字符串
> 2. 在训练过程中,会自动下载PaddleX预先计算好的模型参数敏感度信息,并对模型结构裁剪,继而迭代训练
上述两种方式,第1种方法相对比第2种方法少了两步(即用户训练原始模型+自行计算参数敏感度信息),在实际实验证,第1种方法的精度会更高,裁剪的模型效果更好,因此在用户时间和计算成本允许的前提下,更推荐使用第1种方法。
## 开始裁剪训练
1. 第1种方法,用户自行计算裁剪配置
```
# 训练模型
python classification/mobilenet.py
# 计算模型参数敏感度
python classification/cal_sensitivities_file.py --model_dir=output/mobilenetv2/epoch_10 --save_file=./sensitivities.data
# 裁剪训练
python classification/mobilenet.py --model_dir=output/mobilenetv2/epoch_10 --sensetive_file=./sensitivities.data --eval_metric_loss=0.05
```
2. 第2种方法,使用PaddleX预先计算好的参数敏感度文件
```
# 自动下载PaddleX预先在ImageNet上计算好的参数敏感度信息文件
python classification/mobilenet.py --sensitivities_file=DEFAULT --eval_metric_loss=0.05
```
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import argparse
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx
def cal_sensitivies_file(model_dir, dataset, save_file):
# 加载模型
model = pdx.load_model(model_dir)
# 定义验证所用的数据集
eval_dataset = pdx.datasets.ImageNet(
data_dir=dataset,
file_list=os.path.join(dataset, 'val_list.txt'),
label_list=os.path.join(dataset, 'labels.txt'),
transforms=model.eval_transforms)
pdx.slim.cal_params_sensitivities(
model, save_file, eval_dataset, batch_size=8)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--model_dir",
default="./output/mobilenet/best_model",
type=str,
help="The model path.")
parser.add_argument(
"--dataset",
default="./vegetables_cls",
type=str,
help="The dataset path.")
parser.add_argument(
"--save_file",
default="./sensitivities.data",
type=str,
help="The sensitivities file path.")
args = parser.parse_args()
cal_sensitivies_file(args.model_dir, args.dataset, args.save_file)
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import argparse
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from paddlex.cls import transforms
import paddlex as pdx
def train(model_dir=None, sensitivities_file=None, eval_metric_loss=0.05):
# 下载和解压蔬菜分类数据集
veg_dataset = 'https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz'
pdx.utils.download_and_decompress(veg_dataset, path='./')
# 定义训练和验证时的transforms
train_transforms = transforms.Compose([
transforms.RandomCrop(crop_size=224),
transforms.RandomHorizontalFlip(),
transforms.Normalize()
])
eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=256),
transforms.CenterCrop(crop_size=224),
transforms.Normalize()
])
# 定义训练和验证所用的数据集
train_dataset = pdx.datasets.ImageNet(
data_dir='vegetables_cls',
file_list='vegetables_cls/train_list.txt',
label_list='vegetables_cls/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.ImageNet(
data_dir='vegetables_cls',
file_list='vegetables_cls/val_list.txt',
label_list='vegetables_cls/labels.txt',
transforms=eval_transforms)
num_classes = len(train_dataset.labels)
model = pdx.cls.MobileNetV2(num_classes=num_classes)
if model_dir is None:
# 使用imagenet数据集预训练模型权重
pretrain_weights = "IMAGENET"
else:
# 使用传入的model_dir作为预训练模型权重
assert os.path.isdir(model_dir), "Path {} is not a directory".format(
model_dir)
pretrain_weights = model_dir
save_dir = './output/mobilenet'
if sensitivities_file is not None:
# DEFAULT 指使用模型预置的参数敏感度信息作为裁剪依据
if sensitivities_file != "DEFAULT":
assert os.path.exists(
sensitivities_file), "Path {} not exist".format(
sensitivities_file)
save_dir = './output/mobilenet_prune'
model.train(
num_epochs=10,
train_dataset=train_dataset,
train_batch_size=32,
eval_dataset=eval_dataset,
lr_decay_epochs=[4, 6, 8],
learning_rate=0.025,
pretrain_weights=pretrain_weights,
save_dir=save_dir,
use_vdl=True,
sensitivities_file=sensitivities_file,
eval_metric_loss=eval_metric_loss)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--model_dir", default=None, type=str, help="The model path.")
parser.add_argument(
"--sensitivities_file",
default=None,
type=str,
help="The sensitivities file path.")
parser.add_argument(
"--eval_metric_loss",
default=0.05,
type=float,
help="The loss threshold.")
args = parser.parse_args()
train(args.model_dir, args.sensitivities_file, args.eval_metric_loss)
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import argparse
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx
def cal_sensitivies_file(model_dir, dataset, save_file):
# 加载模型
model = pdx.load_model(model_dir)
# 定义验证所用的数据集
eval_dataset = pdx.datasets.VOCDetection(
data_dir=dataset,
file_list=os.path.join(dataset, 'val_list.txt'),
label_list=os.path.join(dataset, 'labels.txt'),
transforms=model.eval_transforms)
pdx.slim.cal_params_sensitivities(
model, save_file, eval_dataset, batch_size=8)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--model_dir",
default="./output/yolov3_mobilenet/best_model",
type=str,
help="The model path.")
parser.add_argument(
"--dataset", default="./insect_det", type=str, help="The model path.")
parser.add_argument(
"--save_file",
default="./sensitivities.data",
type=str,
help="The sensitivities file path.")
args = parser.parse_args()
cal_sensitivies_file(args.model_dir, args.dataset, args.save_file)
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import argparse
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from paddlex.det import transforms
import paddlex as pdx
def train(model_dir, sensitivities_file, eval_metric_loss):
# 下载和解压昆虫检测数据集
insect_dataset = 'https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz'
pdx.utils.download_and_decompress(insect_dataset, path='./')
# 定义训练和验证时的transforms
train_transforms = transforms.Compose([
transforms.MixupImage(mixup_epoch=250),
transforms.RandomDistort(),
transforms.RandomExpand(),
transforms.RandomCrop(),
transforms.Resize(target_size=608, interp='RANDOM'),
transforms.RandomHorizontalFlip(),
transforms.Normalize()
])
eval_transforms = transforms.Compose([
transforms.Resize(target_size=608, interp='CUBIC'),
transforms.Normalize()
])
# 定义训练和验证所用的数据集
train_dataset = pdx.datasets.VOCDetection(
data_dir='insect_det',
file_list='insect_det/train_list.txt',
label_list='insect_det/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.VOCDetection(
data_dir='insect_det',
file_list='insect_det/val_list.txt',
label_list='insect_det/labels.txt',
transforms=eval_transforms)
if model_dir is None:
# 使用imagenet数据集上的预训练权重
pretrain_weights = "IMAGENET"
else:
assert os.path.isdir(model_dir), "Path {} is not a directory".format(
model_dir)
pretrain_weights = model_dir
save_dir = "output/yolov3_mobile"
if sensitivities_file is not None:
if sensitivities_file != 'DEFAULT':
assert os.path.exists(
sensitivities_file), "Path {} not exist".format(
sensitivities_file)
save_dir = "output/yolov3_mobile_prune"
num_classes = len(train_dataset.labels)
model = pdx.det.YOLOv3(num_classes=num_classes)
model.train(
num_epochs=270,
train_dataset=train_dataset,
train_batch_size=8,
eval_dataset=eval_dataset,
learning_rate=0.000125,
lr_decay_epochs=[210, 240],
pretrain_weights=pretrain_weights,
save_dir=save_dir,
use_vdl=True,
sensitivities_file=sensitivities_file,
eval_metric_loss=eval_metric_loss)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--model_dir", default=None, type=str, help="The model path.")
parser.add_argument(
"--sensitivities_file",
default=None,
type=str,
help="The sensitivities file path.")
parser.add_argument(
"--eval_metric_loss",
default=0.05,
type=float,
help="The loss threshold.")
args = parser.parse_args()
train(args.model_dir, args.sensitivities_file, args.eval_metric_loss)
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import argparse
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx
def cal_sensitivies_file(model_dir, dataset, save_file):
# 加载模型
model = pdx.load_model(model_dir)
# 定义验证所用的数据集
eval_dataset = pdx.datasets.SegDataset(
data_dir=dataset,
file_list=os.path.join(dataset, 'val_list.txt'),
label_list=os.path.join(dataset, 'labels.txt'),
transforms=model.eval_transforms)
pdx.slim.cal_params_sensitivities(
model, save_file, eval_dataset, batch_size=4)
pdx.slim.visualize(model, save_file, save_dir='./')
if __name__ == '__main__':
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--model_dir",
default="./output/unet/best_model",
type=str,
help="The model path.")
parser.add_argument(
"--dataset",
default="./optic_disc_seg",
type=str,
help="The dataset path.")
parser.add_argument(
"--save_file",
default="./sensitivities.data",
type=str,
help="The sensitivities file path.")
args = parser.parse_args()
cal_sensitivies_file(args.model_dir, args.dataset, args.save_file)
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import argparse
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from paddlex.seg import transforms
import paddlex as pdx
def train(model_dir, sensitivities_file, eval_metric_loss):
# 下载和解压视盘分割数据集
optic_dataset = 'https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz'
pdx.utils.download_and_decompress(optic_dataset, path='./')
# 定义训练和验证时的transforms
train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.ResizeRangeScaling(),
transforms.RandomPaddingCrop(crop_size=512),
transforms.Normalize()
])
eval_transforms = transforms.Compose([
transforms.ResizeByLong(long_size=512),
transforms.Padding(target_size=512),
transforms.Normalize()
])
# 定义训练和验证所用的数据集
train_dataset = pdx.datasets.SegDataset(
data_dir='optic_disc_seg',
file_list='optic_disc_seg/train_list.txt',
label_list='optic_disc_seg/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.SegDataset(
data_dir='optic_disc_seg',
file_list='optic_disc_seg/val_list.txt',
label_list='optic_disc_seg/labels.txt',
transforms=eval_transforms)
if model_dir is None:
# 使用coco数据集上的预训练权重
pretrain_weights = "COCO"
else:
assert os.path.isdir(model_dir), "Path {} is not a directory".format(
model_dir)
pretrain_weights = model_dir
save_dir = "output/unet"
if sensitivities_file is not None:
if sensitivities_file != 'DEFAULT':
assert os.path.exists(
sensitivities_file), "Path {} not exist".format(
sensitivities_file)
save_dir = "output/unet_prune"
num_classes = len(train_dataset.labels)
model = pdx.seg.UNet(num_classes=num_classes)
model.train(
num_epochs=20,
train_dataset=train_dataset,
train_batch_size=4,
eval_dataset=eval_dataset,
learning_rate=0.01,
pretrain_weights=pretrain_weights,
save_dir=save_dir,
use_vdl=True,
sensitivities_file=sensitivities_file,
eval_metric_loss=eval_metric_loss)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--model_dir", default=None, type=str, help="The model path.")
parser.add_argument(
"--sensitivities_file",
default=None,
type=str,
help="The sensitivities file path.")
parser.add_argument(
"--eval_metric_loss",
default=0.05,
type=float,
help="The loss threshold.")
args = parser.parse_args()
train(args.model_dir, args.sensitivities_file, args.eval_metric_loss)
# 使用教程——训练模型
本目录下整理了使用PaddleX训练模型的示例代码,代码中均提供了示例数据的自动下载,并均使用单张GPU卡进行训练。
|代码 | 模型任务 | 数据 |
|------|--------|---------|
|classification/mobilenetv2.py | 图像分类MobileNetV2 | 蔬菜分类 |
|classification/resnet50.py | 图像分类ResNet50 | 蔬菜分类 |
|detection/faster_rcnn_r50_fpn.py | 目标检测FasterRCNN | 昆虫检测 |
|detection/mask_rcnn_f50_fpn.py | 实例分割MaskRCNN | 垃圾分拣 |
|segmentation/deeplabv3p.py | 语义分割DeepLabV3| 视盘分割 |
|segmentation/unet.py | 语义分割UNet | 视盘分割 |
## 开始训练
在安装PaddleX后,使用如下命令开始训练
```
python classification/mobilenetv2.py
```
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from paddlex.cls import transforms
import paddlex as pdx
# 下载和解压蔬菜分类数据集
veg_dataset = 'https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz'
pdx.utils.download_and_decompress(veg_dataset, path='./')
# 定义训练和验证时的transforms
train_transforms = transforms.Compose([
transforms.RandomCrop(crop_size=224),
transforms.RandomHorizontalFlip(),
transforms.Normalize()
])
eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=256),
transforms.CenterCrop(crop_size=224),
transforms.Normalize()
])
# 定义训练和验证所用的数据集
train_dataset = pdx.datasets.ImageNet(
data_dir='vegetables_cls',
file_list='vegetables_cls/train_list.txt',
label_list='vegetables_cls/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.ImageNet(
data_dir='vegetables_cls',
file_list='vegetables_cls/val_list.txt',
label_list='vegetables_cls/labels.txt',
transforms=eval_transforms)
# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标
# VisualDL启动方式: visualdl --logdir output/mobilenetv2/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
model = pdx.cls.MobileNetV2(num_classes=len(train_dataset.labels))
model.train(
num_epochs=10,
train_dataset=train_dataset,
train_batch_size=32,
eval_dataset=eval_dataset,
lr_decay_epochs=[4, 6, 8],
learning_rate=0.025,
save_dir='output/mobilenetv2',
use_vdl=True)
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddle.fluid as fluid
from paddlex.cls import transforms
import paddlex as pdx
# 下载和解压蔬菜分类数据集
veg_dataset = 'https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz'
pdx.utils.download_and_decompress(veg_dataset, path='./')
# 定义训练和验证时的transforms
train_transforms = transforms.Compose(
[transforms.RandomCrop(crop_size=224),
transforms.Normalize()])
eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=256),
transforms.CenterCrop(crop_size=224),
transforms.Normalize()
])
# 定义训练和验证所用的数据集
train_dataset = pdx.datasets.ImageNet(
data_dir='vegetables_cls',
file_list='vegetables_cls/train_list.txt',
label_list='vegetables_cls/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.ImageNet(
data_dir='vegetables_cls',
file_list='vegetables_cls/val_list.txt',
label_list='vegetables_cls/labels.txt',
transforms=eval_transforms)
# PaddleX支持自定义构建优化器
step_each_epoch = train_dataset.num_samples // 32
learning_rate = fluid.layers.cosine_decay(
learning_rate=0.025, step_each_epoch=step_each_epoch, epochs=10)
optimizer = fluid.optimizer.Momentum(
learning_rate=learning_rate,
momentum=0.9,
regularization=fluid.regularizer.L2Decay(4e-5))
# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标
# VisualDL启动方式: visualdl --logdir output/resnet50/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
model = pdx.cls.ResNet50(num_classes=len(train_dataset.labels))
model.train(
num_epochs=10,
train_dataset=train_dataset,
train_batch_size=32,
eval_dataset=eval_dataset,
optimizer=optimizer,
save_dir='output/resnet50',
use_vdl=True)
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from paddlex.det import transforms
import paddlex as pdx
# 下载和解压昆虫检测数据集
insect_dataset = 'https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz'
pdx.utils.download_and_decompress(insect_dataset, path='./')
# 定义训练和验证时的transforms
train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.Normalize(),
transforms.ResizeByShort(short_size=800, max_size=1333),
transforms.Padding(coarsest_stride=32)
])
eval_transforms = transforms.Compose([
transforms.Normalize(),
transforms.ResizeByShort(short_size=800, max_size=1333),
transforms.Padding(coarsest_stride=32),
])
# 定义训练和验证所用的数据集
train_dataset = pdx.datasets.VOCDetection(
data_dir='insect_det',
file_list='insect_det/train_list.txt',
label_list='insect_det/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.VOCDetection(
data_dir='insect_det',
file_list='insect_det/val_list.txt',
label_list='insect_det/labels.txt',
transforms=eval_transforms)
# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标
# VisualDL启动方式: visualdl --logdir output/faster_rcnn_r50_fpn/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
num_classes = len(train_dataset.labels) + 1
model = pdx.det.FasterRCNN(num_classes=num_classes)
model.train(
num_epochs=12,
train_dataset=train_dataset,
train_batch_size=2,
eval_dataset=eval_dataset,
learning_rate=0.0025,
lr_decay_epochs=[8, 11],
save_dir='output/faster_rcnn_r50_fpn',
use_vdl=True)
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from paddlex.det import transforms
import paddlex as pdx
# 下载和解压垃圾分拣数据集
garbage_dataset = 'https://bj.bcebos.com/paddlex/datasets/garbage_ins_det.tar.gz'
pdx.utils.download_and_decompress(garbage_dataset, path='./')
# 定义训练和验证时的transforms
train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.Normalize(),
transforms.ResizeByShort(short_size=800, max_size=1333),
transforms.Padding(coarsest_stride=32)
])
eval_transforms = transforms.Compose([
transforms.Normalize(),
transforms.ResizeByShort(short_size=800, max_size=1333),
transforms.Padding(coarsest_stride=32)
])
# 定义训练和验证所用的数据集
train_dataset = pdx.datasets.CocoDetection(
data_dir='garbage_ins_det/JPEGImages',
ann_file='garbage_ins_det/train.json',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.CocoDetection(
data_dir='garbage_ins_det/JPEGImages',
ann_file='garbage_ins_det/val.json',
transforms=eval_transforms)
# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标
# VisualDL启动方式: visualdl --logdir output/mask_rcnn_r50_fpn/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
num_classes = len(train_dataset.labels) + 1
model = pdx.det.MaskRCNN(num_classes=num_classes)
model.train(
num_epochs=12,
train_dataset=train_dataset,
train_batch_size=1,
eval_dataset=eval_dataset,
learning_rate=0.00125,
lr_decay_epochs=[8, 11],
save_dir='output/mask_rcnn_r50_fpn',
use_vdl=True)
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from paddlex.det import transforms
import paddlex as pdx
# 下载和解压昆虫检测数据集
insect_dataset = 'https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz'
pdx.utils.download_and_decompress(insect_dataset, path='./')
# 定义训练和验证时的transforms
train_transforms = transforms.Compose([
transforms.MixupImage(mixup_epoch=250),
transforms.RandomDistort(),
transforms.RandomExpand(),
transforms.RandomCrop(),
transforms.Resize(target_size=608, interp='RANDOM'),
transforms.RandomHorizontalFlip(),
transforms.Normalize(),
])
eval_transforms = transforms.Compose([
transforms.Resize(target_size=608, interp='CUBIC'),
transforms.Normalize(),
])
# 定义训练和验证所用的数据集
train_dataset = pdx.datasets.VOCDetection(
data_dir='insect_det',
file_list='insect_det/train_list.txt',
label_list='insect_det/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.VOCDetection(
data_dir='insect_det',
file_list='insect_det/val_list.txt',
label_list='insect_det/labels.txt',
transforms=eval_transforms)
# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标
# VisualDL启动方式: visualdl --logdir output/yolov3_darknet/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
num_classes = len(train_dataset.labels)
model = pdx.det.YOLOv3(num_classes=num_classes)
model.train(
num_epochs=270,
train_dataset=train_dataset,
train_batch_size=8,
eval_dataset=eval_dataset,
learning_rate=0.000125,
lr_decay_epochs=[210, 240],
save_dir='output/yolov3_mobilenetv1',
use_vdl=True)
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx
from paddlex.seg import transforms
# 下载和解压视盘分割数据集
optic_dataset = 'https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz'
pdx.utils.download_and_decompress(optic_dataset, path='./')
# 定义训练和验证时的transforms
train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.Resize(target_size=512),
transforms.RandomPaddingCrop(crop_size=500),
transforms.Normalize()
])
eval_transforms = transforms.Compose(
[transforms.Resize(512), transforms.Normalize()])
# 定义训练和验证所用的数据集
train_dataset = pdx.datasets.SegDataset(
data_dir='optic_disc_seg',
file_list='optic_disc_seg/train_list.txt',
label_list='optic_disc_seg/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.SegDataset(
data_dir='optic_disc_seg',
file_list='optic_disc_seg/val_list.txt',
label_list='optic_disc_seg/labels.txt',
transforms=eval_transforms)
# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标
# VisualDL启动方式: visualdl --logdir output/deeplab/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
num_classes = len(train_dataset.labels)
model = pdx.seg.DeepLabv3p(num_classes=num_classes)
model.train(
num_epochs=40,
train_dataset=train_dataset,
train_batch_size=4,
eval_dataset=eval_dataset,
learning_rate=0.01,
save_dir='output/deeplab',
use_vdl=True)
import os
# 选择使用0号卡
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx
from paddlex.seg import transforms
# 下载和解压视盘分割数据集
optic_dataset = 'https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz'
pdx.utils.download_and_decompress(optic_dataset, path='./')
# 定义训练和验证时的transforms
train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.ResizeRangeScaling(),
transforms.RandomPaddingCrop(crop_size=512),
transforms.Normalize()
])
eval_transforms = transforms.Compose([
transforms.ResizeByLong(long_size=512),
transforms.Padding(target_size=512),
transforms.Normalize()
])
# 定义训练和验证所用的数据集
train_dataset = pdx.datasets.SegDataset(
data_dir='optic_disc_seg',
file_list='optic_disc_seg/train_list.txt',
label_list='optic_disc_seg/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.SegDataset(
data_dir='optic_disc_seg',
file_list='optic_disc_seg/val_list.txt',
label_list='optic_disc_seg/labels.txt',
transforms=eval_transforms)
# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标
# VisualDL启动方式: visualdl --logdir output/unet/vdl_log --port 8001
# 浏览器打开 https://0.0.0.0:8001即可
# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
num_classes = len(train_dataset.labels)
model = pdx.seg.UNet(num_classes=num_classes)
model.train(
num_epochs=20,
train_dataset=train_dataset,
train_batch_size=4,
eval_dataset=eval_dataset,
learning_rate=0.01,
save_dir='output/unet',
use_vdl=True)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册