diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..21bab1f7baea919e7548df5adbf4f312c7dacc75
--- /dev/null
+++ b/.pre-commit-config.yaml
@@ -0,0 +1,27 @@
+- repo: local
+ hooks:
+ - id: yapf
+ name: yapf
+ entry: yapf
+ language: system
+ args: [-i, --style .style.yapf]
+ files: \.py$
+
+- repo: https://github.com/pre-commit/pre-commit-hooks
+ sha: a11d9314b22d8f8c7556443875b731ef05965464
+ hooks:
+ - id: check-merge-conflict
+ - id: check-symlinks
+ - id: end-of-file-fixer
+ - id: trailing-whitespace
+ - id: detect-private-key
+ - id: check-symlinks
+ - id: check-added-large-files
+- repo: local
+ hooks:
+ - id: copyright_checker
+ name: copyright_checker
+ entry: python ./.copyright.hook
+ language: system
+ files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|proto|py)$
+ exclude: (?!.*third_party)^.*$
diff --git "a/DataAnnotation/AnnotationNote/1_[\345\233\276\345\203\217\345\210\206\347\261\273]\344\273\273\345\212\241\346\225\260\346\215\256\346\240\207\346\263\250.md" "b/DataAnnotation/AnnotationNote/1_[\345\233\276\345\203\217\345\210\206\347\261\273]\344\273\273\345\212\241\346\225\260\346\215\256\346\240\207\346\263\250.md"
deleted file mode 100644
index c647ef64ca37bb4a358cae7ee6ca0fcaa55c6cdc..0000000000000000000000000000000000000000
--- "a/DataAnnotation/AnnotationNote/1_[\345\233\276\345\203\217\345\210\206\347\261\273]\344\273\273\345\212\241\346\225\260\346\215\256\346\240\207\346\263\250.md"
+++ /dev/null
@@ -1,16 +0,0 @@
-## 1. 准备「图像分类」任务数据
-
-### 图像分类的数据结构
-
-```
-data/mydataset/
-|-- class 1
- |-- 0001.jpg
- |-- 0002.jpg
- |-- ...
-|-- class 2
- |-- 0001.jpg
- |-- 0002.jpg
- |-- ...
-```
-class 1 及 class 2 文件夹需要命名为需要分类的类名,输入限定为英文字符,不可包含空格、中文或特殊字符。
diff --git "a/DataAnnotation/AnnotationNote/2_[\347\233\256\346\240\207\346\243\200\346\265\213]\345\217\212[\345\256\236\344\276\213\345\210\206\345\211\262]\346\225\260\346\215\256\346\240\207\346\263\250.md" "b/DataAnnotation/AnnotationNote/2_[\347\233\256\346\240\207\346\243\200\346\265\213]\345\217\212[\345\256\236\344\276\213\345\210\206\345\211\262]\346\225\260\346\215\256\346\240\207\346\263\250.md"
deleted file mode 100644
index 66951ea5083ddba94dd39530d8b6e89a08b7ff41..0000000000000000000000000000000000000000
--- "a/DataAnnotation/AnnotationNote/2_[\347\233\256\346\240\207\346\243\200\346\265\213]\345\217\212[\345\256\236\344\276\213\345\210\206\345\211\262]\346\225\260\346\215\256\346\240\207\346\263\250.md"
+++ /dev/null
@@ -1,62 +0,0 @@
-## 2. 使用LabelMe标注「目标检测」及「实例分割」任务数据
-
-### 2.1 准备工作
-
-* **2.1.1** 创建与图像文件夹相对应的文件夹,用于存储标注的json文件。
-* **2.1.2**点击”Open Dir“按钮,选择需要标注的图像所在的文件夹打开,则”File List“对话框中会显示所有图像所对应的绝对路径。
-
-### 2.2 目标检测标注
-
-* **2.2.1** 打开矩形框标注工具,具体如下图所示
-
-

-
-* **2.2.2** 使用拖拉的方式对目标物体进行标识,并在弹出的对话框中写明对应label(当label已存在时点击即可),具体如下图所示:
-
-
-
-当框标注错误时,可点击右侧的“Edit Polygons”再点击标注框,通过拖拉进行修改,也可再点击“Delete Polygon”进行删除。
-
-* **2.2.3** 点击右侧”Save“,将标注结果保存到***2.1.1***中创建的文件夹中
- 【注意】当所使用的模型是类似Mask R-CNN这类模型时,虽是目标检测模型,但却需要实例分割信息,具体参见***2.3***。
-
-### 2.3 实例分割标注
-
-* **2.3.1** 点击右侧的“Create Polygons”以打点的方式圈出目标的轮廓,并在弹出的对话框中写明对应label(当label已存在时点击即可),具体如下提所示:
-
-
-
-当框标注错误时,可点击右侧的“Edit Polygons”再点击标注框,通过拖拉进行修改,也可再点击“Delete Polygon”进行删除。
-
-* **2.3.2** 点击右侧”Save“,将标注结果保存到***2.1.1***中创建的文件夹中。
-
-【注意】***2.2.2***和***2.3.1***中在在定义label名字时,都用英文命名,同时在名字后加上“_0”或“_1”分别代表目标是一个对象(iscrowd=0)还是一组对象(iscrowd=1)。一个对象表示能用一个矩形框(或大于等于一个多边形框)就能将一个独立对象表示出来,当需要使用多个多边形来表示一个对象时,在“_0”后加上同一个数字代表同一个对象;一组对象表示同一类型对象联系太紧密只能用一个矩形框(或一个多边形框)将一组对象圈出来。例如下图在进行目标检测标注时,水果单堆分割成一个个水果较不容易,所以将其定义为一组水果对象,label定为“fruit_1” ;在进行实例分割标注时,装饰品无法用一个多边形框表示出来,所以使用3个label为“decoration_00”的多边形表示。
-
-
-
-## 2.4 目标检测任务数据目录结构
-```
-data/mydataset/
-|-- JPEGImages
- |-- 1.jpg
- |-- 2.jpg
-|-- Annotations
- |-- 1.xml
- |-- 2.xml
-```
-其中,Annotations文件夹中存放标注文件,JPEGImages文件夹中存放图像文件。
-
-
-
-## 2.5 实例分割任务数据目录结构
-
-```
-data/mydataset/
-|-- JPEGImages
- |-- 1.jpg
- |-- 2.jpg
-|-- annotations.json
-```
-
-其中,`annotations.json`为标注文件,JPEGImages文件夹中存放图像文件。
-
diff --git "a/DataAnnotation/AnnotationNote/3_[\350\257\255\344\271\211\345\210\206\345\211\262]\344\273\273\345\212\241\346\225\260\346\215\256\346\240\207\346\263\250.md" "b/DataAnnotation/AnnotationNote/3_[\350\257\255\344\271\211\345\210\206\345\211\262]\344\273\273\345\212\241\346\225\260\346\215\256\346\240\207\346\263\250.md"
deleted file mode 100644
index d042d8ba24af8fd1a7010b43c69374566267b2f1..0000000000000000000000000000000000000000
--- "a/DataAnnotation/AnnotationNote/3_[\350\257\255\344\271\211\345\210\206\345\211\262]\344\273\273\345\212\241\346\225\260\346\215\256\346\240\207\346\263\250.md"
+++ /dev/null
@@ -1,35 +0,0 @@
-## 3 使用LabelMe标注「语义分割」任务数据
-语义分割中常用的数据集是CityScape和COCO,此小节主要讲述CityScape数据集在LabelMe上标注的使用,有关COCO部分请参考 2.3 小节中有关Mask RCNN部分。
-
-### 3.1 准备工作
-
-* **3.1.1** 创建与图像文件夹相对应的文件夹,用于存储标注的json文件
-
-* **3.1.2** 点击”Open Dir“按钮,选择需要标注的图像所在的文件夹打开,则”File List“对话框中会显示所有图像所对应的绝对路径
-
-### 3.2 标注
-
-* **3.2.1** 点击右侧的“Create Polygons”以打点的方式圈出目标的轮廓,并在弹出的对话框中写明对应label(当label已存在时点击即可),具体如下提所示:
-
-
-
-当框标注错误时,可点击右侧的“Edit Polygons”再点击标注框,通过拖拉进行修改,也可再点击“Delete Polygon”进行删除。
-
-* **3.2.2** 点击右侧”Save“,将标注结果保存到***3.1.1***中创建的文件夹中
-
-
-
-## 3.3 语义分割任务数据目录结构:
-```
-data/mydataset/
-|-- JPEGImages
- |-- 1.jpg
- |-- 2.jpg
- |-- 3.jpg
-|-- Annotations
- |-- 1.png
- |-- 2.png
- |-- 3.png
-(可选)|-- label.txt
-```
-其中JPEGImages为图片文件夹,Annotations为标签文件夹。您可以提供一份命名为“label.txt”的包含所有标注名的清单,用于直接呈现类别名称,将标注序号“1”、“2”、“3” 等显示为对应的“空调”、“桌子”、“花瓶”。
\ No newline at end of file
diff --git a/DataAnnotation/AnnotationNote/pics/detection1.png b/DataAnnotation/AnnotationNote/pics/detection1.png
deleted file mode 100644
index f0700909ec36718c7b2edc9a49223d5049463144..0000000000000000000000000000000000000000
Binary files a/DataAnnotation/AnnotationNote/pics/detection1.png and /dev/null differ
diff --git a/DataAnnotation/AnnotationNote/pics/detection2.png b/DataAnnotation/AnnotationNote/pics/detection2.png
deleted file mode 100644
index 86b690f7b95240ccd7e5cf0d572e8311aea0f201..0000000000000000000000000000000000000000
Binary files a/DataAnnotation/AnnotationNote/pics/detection2.png and /dev/null differ
diff --git a/DataAnnotation/AnnotationNote/pics/detection3.png b/DataAnnotation/AnnotationNote/pics/detection3.png
deleted file mode 100644
index 0e97b61861a1ca0d14c7b3965315447e012a20ae..0000000000000000000000000000000000000000
Binary files a/DataAnnotation/AnnotationNote/pics/detection3.png and /dev/null differ
diff --git a/DataAnnotation/AnnotationNote/pics/detection4.png b/DataAnnotation/AnnotationNote/pics/detection4.png
deleted file mode 100644
index 0056f45e0ffef4ebb137e0b953ef676513203a88..0000000000000000000000000000000000000000
Binary files a/DataAnnotation/AnnotationNote/pics/detection4.png and /dev/null differ
diff --git a/DataAnnotation/AnnotationNote/pics/detection5.png b/DataAnnotation/AnnotationNote/pics/detection5.png
deleted file mode 100644
index d89cda7cabb6edae1999995d76c1d29bdc65c8c4..0000000000000000000000000000000000000000
Binary files a/DataAnnotation/AnnotationNote/pics/detection5.png and /dev/null differ
diff --git a/DataAnnotation/README.md b/DataAnnotation/README.md
deleted file mode 100644
index 011c06d2e88d1a1c53f685fd935b7075f1d71b31..0000000000000000000000000000000000000000
--- a/DataAnnotation/README.md
+++ /dev/null
@@ -1,16 +0,0 @@
-### LabelMe
-LabelMe是目前广泛使用的数据标注工具,您也可以在保证标注文件格式与PaddleX所支持格式进行匹配的基础,上选用其他标注工具。
-LabelMe GitHub地址:https://github.com/wkentaro/labelme
-
-#### LabelMe的安装
-
-***注:为了保证环境的统一,本文介绍了在Anaconda环境下安装及使用LabelMe的方法,您也可以根据您的实际情况及需求,采用其他方案***
-
-Windows: 参考文档[[标注工具安装和使用/1_Windows/1_3_LabelMe安装.md]](../DataAnnotation/标注工具安装和使用/1_Windows/1_3_LabelMe安装.md)
-Ubuntu: 参考文档[[标注工具安装和使用/2_Ubuntu/2_3_LabelMe安装.md]](../DataAnnotation/标注工具安装和使用/2_Ubuntu/2_3_LabelMe安装.md)
-MacOS: 参考文档[[标注工具安装和使用/3_MacOS/3_3_LabelMe安装.md]](../DataAnnotation/标注工具安装和使用/3_MacOS/3_3_LabelMe安装.md)
-
-#### 使用LabelMe标注你的数据集
-
-参考文档[[AnnotationNote]](../DataAnnotation/AnnotationNote)
-
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/1_1_Anaconda\345\256\211\350\243\205.md" "b/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/1_1_Anaconda\345\256\211\350\243\205.md"
deleted file mode 100644
index ccee15fe3723131085aea7701c53513382470810..0000000000000000000000000000000000000000
--- "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/1_1_Anaconda\345\256\211\350\243\205.md"
+++ /dev/null
@@ -1,22 +0,0 @@
-## 2.1.1.1 下载Anaconda
-在Anaconda官网[(https://www.anaconda.com/distribution/)](https://www.anaconda.com/distribution/)选择“Windows”,并选择与所需python相对应的Anaconda版本进行下载(PaddlePaddle要求安装的Anaconda版本为64-bit)
-
-## 2.1.1.2 安装Anaconda
-打开下载的安装包(以.exe为后缀),根据引导完成安装,在安装过程中可以修改安装路径,具体如下图所示:
-
-【注意】默认安装在Windows当前用户主目录下
-
-## 2.1.1.3 使用Anaconda
-
-- 点击Windows系统左下角的Windows图标,打开:所有程序->Anaconda3/2(64-bit)->Anaconda Prompt
-- 在命令行中执行下述命令
-```cmd
-# 创建一个名为mypaddle的环境,指定python版本是3.5
-conda create -n mypaddle python=3.5
-# 创建好后,使用activate进入环境
-conda activate mypaddle
-python --version
-# 若上述命令行出现Anaconda字样,则表示安装成功
-# 退出环境
-conda deactivate
-```
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/1_2_PaddlePaddle\345\256\211\350\243\205.md" "b/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/1_2_PaddlePaddle\345\256\211\350\243\205.md"
deleted file mode 100644
index 31ac61f9406aa7136957419fe8c0764bd1a7be64..0000000000000000000000000000000000000000
--- "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/1_2_PaddlePaddle\345\256\211\350\243\205.md"
+++ /dev/null
@@ -1,18 +0,0 @@
-## 1. 安装PaddlePaddle
-PaddlePaddle可以在64-bit的Windows7、Windows8、Windows10企业版、Windows10专业版上运行,同时支持python2(>=2.7.15)和python3(>= 3.5.1),但pip版本必须高于9.0.1。Windows版本同时支持CPU版和GPU版的PaddlePaddle,若使用GPU版,对于CUDA和CUDNN的安装,可参考NVIDIA官方文档[(https://docs.nvidia.com/cuda/cuda-installation-guide-linux/)](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/)和[(https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/beginners_guide/install/Tables.html/#ciwhls-release)](https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/beginners_guide/install/Tables.html/#ciwhls-release)了解。目前,Windows环境暂不支持NCCL,分布式等相关功能。
-- 在命令行中执行下述命令
-```cmd
-# 进入创建好的Anaconda环境
-conda activate mypaddle
-# (选择1)安装CPU版本PaddlePaddle
-pip install -U paddlepaddle
-# (选择2)安装GPU版本PaddlePaddle
-pip install -U paddlepaddle-gpu
-```
-【注意】默认提供的安装包需要计算机支持AVX指令集和MKL,若环境不支持,可以在PaddlePaddle官网[(https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/beginners_guide/install/Tables.html/#ciwhls-release)](https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/beginners_guide/install/Tables.html/#ciwhls-release)下载openblas版本的安装包
-- 安装成功后,打开python命令行,使用以下代码进行测试:
-```python
-import paddle.fluid as fluid
-fluid.install_check.run_check()
-# 若出现Your Paddle Fluid is installed successfully!字样则表示安装成功
-```
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/1_3_LabelMe\345\256\211\350\243\205.md" "b/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/1_3_LabelMe\345\256\211\350\243\205.md"
deleted file mode 100644
index c7b4b7e16f8b79c1c36e274660ce742d7263ebcc..0000000000000000000000000000000000000000
--- "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/1_3_LabelMe\345\256\211\350\243\205.md"
+++ /dev/null
@@ -1,23 +0,0 @@
-## 2.1.3.1 安装LabelMe
-在命令行中执行下述命令
-```cmd
-# 进入创建好的Anaconda环境
-conda activate mypaddle
-# (选择一)python版本为2.x
-conda install pyqt
-# (选择二)python版本为3.x
-pip install pyqt5
-# 安装LabelMe
-pip install labelme
-```
-## 2.1.3.2 使用LabelMe
-在命令行中执行下述命令,则会出现标注工具
-```cmd
-# 进入创建好的Anaconda环境
-source activate mypaddle
-# 开启LabelMe
-```
-LabelMe标注工具界面主要如下图所示:
-
-LabelMe默认标注多边形,可在图像中右键选择标注其他类型的框,如下图所示:
-
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/pics/anaconda1.png" "b/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/pics/anaconda1.png"
deleted file mode 100644
index fe1c62ff6134f1d3cba928d91940f404ae9ac11d..0000000000000000000000000000000000000000
Binary files "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/pics/anaconda1.png" and /dev/null differ
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/pics/labelme1.png" "b/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/pics/labelme1.png"
deleted file mode 100644
index 48c5967d4c3291801c046714679bcbd4baa08789..0000000000000000000000000000000000000000
Binary files "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/pics/labelme1.png" and /dev/null differ
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/pics/labelme2.png" "b/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/pics/labelme2.png"
deleted file mode 100644
index f76d24086bda62211adb821220c51a107db97ef2..0000000000000000000000000000000000000000
Binary files "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/pics/labelme2.png" and /dev/null differ
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/2_Ubuntu/2_1_Anaconda\345\256\211\350\243\205.md" "b/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/2_Ubuntu/2_1_Anaconda\345\256\211\350\243\205.md"
deleted file mode 100644
index 440c2a2f46c498a19c9482cf6dff0aa309f272ca..0000000000000000000000000000000000000000
--- "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/2_Ubuntu/2_1_Anaconda\345\256\211\350\243\205.md"
+++ /dev/null
@@ -1,39 +0,0 @@
-## 2.2.1.1 下载Anaconda
-Ubuntu图形界面下:在Anaconda官网[(https://www.anaconda.com/distribution/)](https://www.anaconda.com/distribution/)选择“Linux”,并选择与所需python相对应的Anaconda版本进行下载
-Ubuntu命令行界面下:使用”wget“进行下载
-```cmd
-# Anaconda2
-wget https://repo.anaconda.com/archive/Anaconda2-2019.07-Linux-x86_64.sh --no-check-certificate
-# Anaconda3
-wget https://repo.anaconda.com/archive/Anaconda3-2019.07-Linux-x86_64.sh --no-check-certificate
-```
-## 2.2.1.2 安装Anaconda
-
-***步骤一:安装***
-在Anaconda安装包所在路径执行下述命令行
-```cmd
-# 运行所下载的Anaconda,例如:
-bash ./Anaconda3-2019.07-Linux-x86_64.sh
-```
-【注意】安装过程中一直回车即可,直至出现设置路径时可对安装路径进行修改,否则默认安装在Ubuntu当前用户主目录下
-***步骤二:设置环境变量***
-在命令行中执行下述命令
-```cmd
-# 将anaconda的bin目录加入PATH
-# 根据安装路径的不同,修改”~/anaconda3/bin“
-echo 'export PATH="~/anaconda3/bin:$PATH"' >> ~/.bashrc
-# 更新bashrc以立即生效
-source ~/.bashrc
-```
-## 2.2.1.3 使用Anaconda
-在命令行中执行下述命令
-```cmd
-# 创建一个名为mypaddle的环境,指定python版本是3.5
-conda create -n mypaddle python=3.5
-# 创建好后,使用activate进入环境
-source activate mypaddle
-python --version
-# 若上述命令行出现Anaconda字样,则表示安装成功
-# 退出环境
-source deactivate
-```
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/2_Ubuntu/2_2_PaddlePaddle\345\256\211\350\243\205.md" "b/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/2_Ubuntu/2_2_PaddlePaddle\345\256\211\350\243\205.md"
deleted file mode 100644
index 502f286870fb475e4672acf9ed126616b9cf81b2..0000000000000000000000000000000000000000
--- "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/2_Ubuntu/2_2_PaddlePaddle\345\256\211\350\243\205.md"
+++ /dev/null
@@ -1,24 +0,0 @@
-## 2.2.2.1 安装PaddlePaddle
-PaddlePaddle可以在64-bit的Ubuntu14.04(支持CUDA8、CUDA10)、Ubuntu16.04(支持CUDA8、CUDA9、CUDA10)、Ubuntu18.04(支持CUDA10)上运行,同时支持python2(>=2.7.15)和python3(>= 3.5.1),但pip版本必须高于9.0.1。Windows版本同时支持CPU版和GPU版的PaddlePaddle,若使用GPU版,对于CUDA和CUDNN的安装,可参考NVIDIA官方文档[(https://docs.nvidia.com/cuda/cuda-installation-guide-linux/)](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/)和[(https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/beginners_guide/install/Tables.html/#ciwhls-release)](https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/beginners_guide/install/Tables.html/#ciwhls-release)了解。
-
-
-- 在命令行中执行下述命令
-```cmd
-# 进入创建好的Anaconda环境
-source activate mypaddle
-# (选择1)安装CPU版本PaddlePaddle
-pip install -U paddlepaddle
-# (选择2)安装GPU版本PaddlePaddle
-pip install -U paddlepaddle-gpu
-# (选择3)安装指定版本PaddlePaddle
-pip install -U paddlepaddle-gpu==[版本号]
-pip install -U paddlepaddle==[版本号]
-```
-【注意】版本号可参考PyPi官网[(https://pypi.org/project/paddlepaddle-gpu/#history)](https://pypi.org/project/paddlepaddle-gpu/#history)
-
-- 安装成功后,打开python命令行,使用以下代码进行测试:
-```python
-import paddle.fluid as fluid
-fluid.install_check.run_check()
-# 若出现Your Paddle Fluid is installed successfully!字样则表示安装成功
-```
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/2_Ubuntu/2_3_LabelMe\345\256\211\350\243\205.md" "b/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/2_Ubuntu/2_3_LabelMe\345\256\211\350\243\205.md"
deleted file mode 100644
index 8c4f3d50d5d096a4e5a717b09574e8942e51f221..0000000000000000000000000000000000000000
--- "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/2_Ubuntu/2_3_LabelMe\345\256\211\350\243\205.md"
+++ /dev/null
@@ -1,24 +0,0 @@
-## 2.2.3.1 安装LabelMe
-在命令行中执行下述命令
-```cmd
-# 进入创建好的Anaconda环境
-source activate mypaddle
-# (选择一)python版本为2.x
-conda install pyqt
-# (选择二)python版本为3.x
-pip install pyqt5
-# 安装LabelMe
-pip install labelme
-```
-## 2.2.3.2 使用LabelMe
-在命令行中执行下述命令,则会出现标注工具
-```cmd
-# 进入创建好的Anaconda环境
-source activate mypaddle
-# 开启LabelMe
-labelme
-```
-LabelMe标注工具界面主要如下图所示:
-
-LabelMe默认标注多边形,可在图像中右键选择标注其他类型的框,如下图所示:
-
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/2_Ubuntu/pics/labelme1.png" "b/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/2_Ubuntu/pics/labelme1.png"
deleted file mode 100644
index 48c5967d4c3291801c046714679bcbd4baa08789..0000000000000000000000000000000000000000
Binary files "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/2_Ubuntu/pics/labelme1.png" and /dev/null differ
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/2_Ubuntu/pics/labelme2.png" "b/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/2_Ubuntu/pics/labelme2.png"
deleted file mode 100644
index f76d24086bda62211adb821220c51a107db97ef2..0000000000000000000000000000000000000000
Binary files "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/2_Ubuntu/pics/labelme2.png" and /dev/null differ
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/3_1_Anaconda\345\256\211\350\243\205.md" "b/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/3_1_Anaconda\345\256\211\350\243\205.md"
deleted file mode 100644
index c987e49d8524ab26bd60e2687bf41c4330ca3ce4..0000000000000000000000000000000000000000
--- "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/3_1_Anaconda\345\256\211\350\243\205.md"
+++ /dev/null
@@ -1,31 +0,0 @@
-## 2.3.1.1 下载Anaconda
-在Anaconda官网[(https://www.anaconda.com/distribution/)](https://www.anaconda.com/distribution/)选择“MacOS”,并选择与所需python相对应的Anaconda版本进行下载
-
-## 2.3.1.2 安装Anaconda
-***步骤一:安装***
-打开下载的安装包(以.pkl为后缀),根据引导完成安装,在安装过程中可以修改安装路径,具体如下图所示:
-
-【注意】默认安装在MacOS当前用户主目录下
-
-***步骤二:设置环境变量***
-在命令行中执行下述命令
-
-```cmd
-# 将anaconda的bin目录加入PATH
-# 根据安装路径的不同,修改”/Users/anaconda3/bin“
-echo 'export PATH="/Users/anaconda3/bin:$PATH"' >> ~/.bash_profile
-# 更新bash_profile以立即生效
-source ~/.bash_profile
-```
-## 2.3.1.3 使用Anaconda
-在命令行中执行下述命令
-```cmd
-# 创建一个名为mypaddle的环境,指定python版本是3.5
-conda create -n mypaddle python=3.5
-# 创建好后,使用activate进入环境
-source activate mypaddle
-python --version
-# 若上述命令行出现Anaconda字样,则表示安装成功
-# 退出环境
-source deactivate
-```
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/3_2_PaddlePaddle\345\256\211\350\243\205.md" "b/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/3_2_PaddlePaddle\345\256\211\350\243\205.md"
deleted file mode 100644
index 2e637b30aa40026fdcc06f1b572e3796cf34bde5..0000000000000000000000000000000000000000
--- "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/3_2_PaddlePaddle\345\256\211\350\243\205.md"
+++ /dev/null
@@ -1,19 +0,0 @@
-## 2.3.2.1 安装PaddlePaddle
-PaddlePaddle可以在64-bit的MacOS10.11、MacOS10.12、MacOS10.13、MacOS10.14上运行,同时支持python2(>=2.7.15)和python3(>= 3.5.1),但pip版本必须高于9.0.1。目前,MacOS环境仅支持CPU版PaddlePaddle。
-
-- 在命令行中执行下述命令
-```cmd
-# 进入创建好的Anaconda环境
-source activate mypaddle
-# (选择1)安装CPU版本PaddlePaddle
-pip install -U paddlepaddle
-# (选择2)安装指定版本PaddlePaddle
-pip install -U paddlepaddle==[版本号]
-```
-
-- 安装成功后,打开python命令行,使用以下代码进行测试:
-```python
-import paddle.fluid as fluid
-fluid.install_check.run_check()
-# 若出现Your Paddle Fluid is installed successfully!字样则表示安装成功
-```
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/3_3_LabelMe\345\256\211\350\243\205.md" "b/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/3_3_LabelMe\345\256\211\350\243\205.md"
deleted file mode 100644
index 99c0b2dfe5019c29f00633e11c2d24ab30625ead..0000000000000000000000000000000000000000
--- "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/3_3_LabelMe\345\256\211\350\243\205.md"
+++ /dev/null
@@ -1,24 +0,0 @@
-## 2.3.3.1 安装LabelMe
-在命令行中执行下述命令
-```cmd
-# 进入创建好的Anaconda环境
-source activate mypaddle
-# (选择一)python版本为2.x
-conda install pyqt
-# (选择二)python版本为3.x
-pip install pyqt5
-# 安装LabelMe
-pip install labelme
-```
-## 2.3.3.2 使用LabelMe
-在命令行中执行下述命令,则会出现标注工具
-```cmd
-# 进入创建好的Anaconda环境
-source activate mypaddle
-# 开启LabelMe
-labelme
-```
-LabelMe标注工具界面主要如下图所示:
-
-LabelMe默认标注多边形,可在图像中右键选择标注其他类型的框,如下图所示:
-
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/README.md" "b/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/README.md"
deleted file mode 100644
index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/pics/anaconda1.png" "b/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/pics/anaconda1.png"
deleted file mode 100644
index a595eea0ba250dbe85fc6dcf83a1aa7d6fdd07bf..0000000000000000000000000000000000000000
Binary files "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/pics/anaconda1.png" and /dev/null differ
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/pics/labelme1.png" "b/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/pics/labelme1.png"
deleted file mode 100644
index 48c5967d4c3291801c046714679bcbd4baa08789..0000000000000000000000000000000000000000
Binary files "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/pics/labelme1.png" and /dev/null differ
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/pics/labelme2.png" "b/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/pics/labelme2.png"
deleted file mode 100644
index f76d24086bda62211adb821220c51a107db97ef2..0000000000000000000000000000000000000000
Binary files "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/3_MacOS/pics/labelme2.png" and /dev/null differ
diff --git a/QQGroup.jpeg b/QQGroup.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..de6fa4fd70aee1631cc99e6fd1414287723ccdb2
Binary files /dev/null and b/QQGroup.jpeg differ
diff --git a/README.md b/README.md
index d3d7f26a1cbb7c884ddabbccd0ed41cf4322f6db..5c36ec122a4335d4918ce1f8893b1c802f31d246 100644
--- a/README.md
+++ b/README.md
@@ -1,212 +1,31 @@
+
+PaddleX是基于飞桨技术生态的全流程深度学习模型开发工具。具备易集成,易使用,全流程等特点。PaddleX作为深度学习开发工具,不仅提供了开源的内核代码,可供用户灵活使用或集成,同时也提供了配套的前端可视化客户端套件,让用户以可视化地方式进行模型开发,访问[PaddleX官网](https://www.paddlepaddle.org.cn/paddlex/download)获取更多相关细节。
+## 安装
+查看[PaddleX安装文档](docs/install.md)
-
-

+## 文档
+推荐访问PaddleX在线使用文档,快速查阅读使用教程和API文档说明。
-
+常用文档
+- [10分钟快速上手PaddleX模型训练](docs/quick_start.md)
+- [PaddleX使用教程](docs/tutorials)
+- [PaddleX模型库](docs/model_zoo.md)
-
+## 反馈
-
-飞桨全流程开发工具,集飞桨核心框架、模型库、工具及组件等深度学习开发所需全部能力于一身,打通深度学习开发全流程,并提供简明易懂的Python API,方便用户根据实际生产需求进行直接调用或二次开发,为开发者提供飞桨全流程开发的最佳实践。
-
-为了帮助开发者更好的了解飞桨的开发步骤以及所涉及的模块组件,进一步提升项目开发效率,我们还为开发者提供了基于PaddleX实现的图形化开发界即可面示例,用户可以基于该界面示例进行改造,开发符合自己习惯的操作界面。开发者可以根据实际业务需求,直接调用、改造PaddleX后端技术内核来开发项目,也可使用图形化开发界面快速体验飞桨模型开发全流程。
-
- 我们诚挚地邀请您前往 [官网](https://www.paddlepaddle.org.cn/paddle/paddlex)下载试用PaddleX可视化前端,并提出您的宝贵意见或贡献项目。PaddleX代码将在5月初随正式版本发布时,全部开源。
-
-
-
-
-
-## 目录
-
-* **产品特性**
-* **PaddleX 可视化前端**
- 1. 下载可视化前端
- 2. 准备数据
- 3. 导入我的数据集
- 4. 创建项目
- 5. 项目开发
-* **PaddleX 后端技术内核**
-* **FAQ**
-
-
-
-## 产品特性
-
-1. **全流程打通**
-
-将深度学习开发从数据接入、模型训练、参数调优、模型评估、预测部署全流程打通,并提供可视化的使用界面,省去了对各环节间串连的代码开发与脚本调用,极大地提升了开发效率。
-
-2. **开源技术内核**
-
-集成PaddleCV领先的视觉算法和面向任务的开发套件、预训练模型应用工具PaddleHub、训练可视化工具VisualDL、模型压缩工具库PaddleSlim等技术能力于一身,并提供简明易懂的Python API,完全开源开放,易于集成和二次开发,为您的业务实践全程助力。
-
-3. **本地一键安装**
-
-高度兼容Windows、Mac、Linux系统,同时支持NVIDIA GPU加速深度学习训练。本地开发、保证数据安全,高度符合产业应用的实际需求。
-
-4. **教程与服务**
-
-从数据集准备到上线部署,为您提供业务开发全流程的文档说明及技术服务。开发者可以通过QQ群、微信群、GitHub社区等多种形式与飞桨团队及同业合作伙伴交流沟通。
-
-
-
-## PaddleX 可视化前端
-
-**第一步:下载可视化前端**
-
-您需要前往 [官网](https://www.paddlepaddle.org.cn/paddle/paddlex)填写基本信息后下载试用PaddleX可视化前端
-
-
-
-**第二步:准备数据**
-
-在开始模型训练前,您需要根据不同的任务类型,将数据标注为相应的格式。目前PaddleX支持【图像分类】、【目标检测】、【语义分割】、【实例分割】四种任务类型。不同类型任务的数据处理方式可查看[数据标注方式](https://github.com/PaddlePaddle/PaddleX/tree/master/DataAnnotation/AnnotationNote)。
-
-
-
-**第三步:导入我的数据集**
-
-①数据标注完成后,您需要根据不同的任务,将数据和标注文件,按照客户端提示更名并保存到正确的文件中。
-
-②在客户端新建数据集,选择与数据集匹配的任务类型,并选择数据集对应的路径,将数据集导入。
-
-
-
-③选定导入数据集后,客户端会自动校验数据及标注文件是否合规,校验成功后,您可根据实际需求,将数据集按比例划分为训练集、验证集、测试集。
-
-④您可在「数据分析」模块按规则预览您标注的数据集,双击单张图片可放大查看。
-
-
-
-
-
-**第四步:创建项目**
-
-① 在完成数据导入后,您可以点击「新建项目」创建一个项目。
-
-② 您可根据实际任务需求选择项目的任务类型,需要注意项目所采用的数据集也带有任务类型属性,两者需要进行匹配。
-
-
-
-
-
-**第五步:项目开发**
-
-① **数据选择**:项目创建完成后,您需要选择已载入客户端并校验后的数据集,并点击下一步,进入参数配置页面。
-
-
-
-② **参数配置**:主要分为**模型参数**、**训练参数**、**优化策略**三部分。您可根据实际需求选择模型结构及对应的训练参数、优化策略,使得任务效果最佳。
-
-
-
-参数配置完成后,点击启动训练,模型开始训练并进行效果评估。
-
-③ **训练可视化**:
-
-在训练过程中,您可通过VisualDL查看模型训练过程时的参数变化、日志详情,及当前最优的训练集和验证集训练指标。模型在训练过程中通过点击"终止训练"随时终止训练过程。
-
-
-
-
-
-模型训练结束后,点击”下一步“,进入模型评估页面。
-
-
-
-④ **模型评估**
-
-在模型评估页面,您可将训练后的模型应用在切分时留出的「验证数据集」以测试模型在验证集上的效果。评估方法包括混淆矩阵、精度、召回率等。在这个页面,您也可以直接查看模型在测试数据集上的预测效果。
-
-根据评估结果,您可决定进入模型发布页面,或返回先前步骤调整参数配置重新进行训练。
-
-
-
-⑤**模型发布**
-
-当模型效果满意后,您可根据实际的生产环境需求,选择将模型发布为需要的版本。
-
-
-
-
-
-## PaddleX 技术内核
-
-将于2020年5月下旬全面开源,敬请期待
-
-
-
-## FAQ
-
-1. **为什么我的数据集没办法切分?**
-
- 如果您的数据集已经被一个或多个项目引用,数据集将无法切分,您可以额外新建一个数据集,并引用同一批数据,再选择不同的切分比例。
-
-
-
-2. **任务和项目的区别是什么?**
-
- 一个项目可以包含多条任务,一个项目拥有唯一的数据集,但采用不同的参数配置启动训练会创建多条任务,方便您对比采用不同参数配置的训练效果,并管理多个任务。
-
-
-
-3. **为什么训练速度这么慢?**
-
- PaddleX完全采用您本地的硬件进行计算,深度学习任务确实对算力的要求比较高,为了使您能快速体验应用PaddleX进行开发,我们适配了CPU硬件,但强烈建议您使用GPU以提升训练速度和开发体验。
-
-
-
-4. **我可以在服务器或云平台上部署PaddleX么?**
-
- 当前PaddleX 可视化前端是一个适配本地单机安装的Client,无法在服务器上直接进行部署,您可以直接使用PaddleX Core后端技术内核,或采用飞桨核心框架进行服务器上的部署。如果您希望使用公有算力,强烈建议您尝试飞桨产品系列中的 [EasyDL](https://ai.baidu.com/easydl/) 或 [AI Studio](https://aistudio.baidu.com/aistudio/index)进行开发。
-
-
-
-5. **为什么我的安装总是报错?**
-
- PaddleX的安装包中打包了PaddlePaddle全流程开发所需的所有依赖,理论上不需要您额外安装CUDA等ToolKit (如使用NVIDIA GPU), 但对操作系统版本、处理器架构、驱动版本等有一定要求,如安装发生错误,建议您先检查开发环境是否与PaddleX推荐环境匹配。
-
- **这里我们推荐您在以下环境中安装使用PaddleX:**
-
- * **操作系统:**
- * Windows 7/8/10(推荐Windows 10);
- * Mac OS 10.13+ ;
- * Ubuntu 18.04+;
-
- ***注:处理器需为x86_64架构,支持 MKL***
-
- * **训练硬件:**
-
- * **GPU**(Windows及Linux系统):
-
- 推荐使用支持CUDA的NVIDIA显卡,例如:GTX 1070+以上性能的显卡;
-
- Windows系统X86_64驱动版本>=411.31;
-
- Linux系统X86_64驱动版本>=410.48;
-
- 显存8G以上;
- * **CPU**:
-
- PaddleX 当前支持您用本地CPU进行训练,但推荐使用GPU以获得更好的开发体验。
-
- * **内存** :建议8G以上
-
- * **硬盘空间** :建议SSD剩余空间1T以上(非必须)
-
- ***注:PaddleX 在 Windows及Mac系统只支持单卡模式。Windows暂时不支持NCCL。***
-
-
-
-**如果您有更多问题或建议,欢迎以issue的形式,或加入PaddleX官方QQ群(1045148026)直接反馈您的意见及建议**
-
-
-

-
-
+- 项目官网: https://www.paddlepaddle.org.cn/paddlex
+- PaddleX用户QQ群: 1045148026 (手机QQ扫描如下二维码快速加入)
+
+## 飞桨技术生态
+- [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)
+- [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg)
+- [PaddleClas](https://github.com/PaddlePaddle/PaddleClas)
+- [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
+- [PaddleHub](https://github.com/PaddlePaddle/PaddleHub)
+- [PaddleLite](https://github.com/PaddlePaddle/Paddle-Lite)
+- [VisualDL](https://github.com/PaddlePaddle/VisualDL)
diff --git a/commit-prepare.sh b/commit-prepare.sh
new file mode 100644
index 0000000000000000000000000000000000000000..faa217e8f4352029f18fe22566eb3884f2da4d9f
--- /dev/null
+++ b/commit-prepare.sh
@@ -0,0 +1,6 @@
+path=$(cd `dirname $0`; pwd)
+cd $path
+
+pip install pre-commit
+pip install yapf
+pre-commit install
diff --git a/docs/FAQ.md b/docs/FAQ.md
new file mode 100644
index 0000000000000000000000000000000000000000..d96e35c886cf582ee57b699aa2baf549e0d56a09
--- /dev/null
+++ b/docs/FAQ.md
@@ -0,0 +1,59 @@
+# 常见问题
+
+## 1. 训练过程因显存不够出错
+
+> 通过使用在终端`nvidia-smi`命令,查看GPU是否被其它任务占用,尝试清除其它任务;
+> 调低训练时的`batch_size`参数,从而降低显存的要求,注意需等比例调低`learning_rate`等参数;
+> 选用更小的模型或backbone。
+
+## 2. 是否有更小的模型,适用于更低配置的设备上运行
+> 可以使用模型裁剪,参考文档[模型裁剪使用教程](slim/prune.md),通过调整裁剪参数,可以控制模型裁剪后的大小,在实际实验中,如VOC检测数据,使用yolov3-mobilenet,原模型大小为XXM,裁剪后为XX M,精度基本保持不变
+
+## 3. 如何配置训练时GPU的卡数
+> 通过在终端export环境变量,或在Python代码中设置,可参考文档[CPU/多卡GPU训练](gpu_configure.md)
+
+## 4. 想将之前训练的模型参数上继续训练
+> 在训练调用`train`接口时,将`pretrain_weights`设为之前的模型保存路径即可
+
+
+## 5. PaddleX保存的模型分为正常训练过程中产生、裁剪训练产生、导出为部署模型和量化保存这么多种,有什么差别,怎么区分
+
+**不同模型的功能差异**
+
+>1.正常模型训练保存
+>
+>>模型在正常训练过程,每间隔n个epoch保存的模型目录,模型可作为预训练模型参数,可使用PaddleX加载预测、或导出部署模型
+
+>2.裁剪训练保存
+>
+>>模型在裁剪训练过程,每间隔n个epoch保存的模型目录,模型不可作为预训练模型参数,可使用PaddleX加载预测、或导出部署模型
+
+>3.导出部署模型
+>
+>>为了模型在服务端部署,导出的模型目录,不可作为预训练模型参数,可使用PaddleX加载预测
+
+>4.量化保存模型
+>
+>>为了提升模型预测速度,将模型参数进行量化保存的模型目录,模型不可作为预训练模型参数,可使用PaddleX加载预测
+
+**区分方法**
+>> 通过模型目录下model.yml文件中`status`字段来区别不同的模型类型, 'Normal'、'Prune'、'Infer'、'Quant'分别表示正常模型训练保存、裁剪训练保存、导出的部署模型、量化保存模型
+
+
+## 6. 模型训练需要太久时间,或者训练速度太慢,怎么提速
+> 1.模型训练速度与用户选定的模型大小,和设定的`batch_size`相关,模型大小可直接参考[模型库](model_zoo.md)中的指标,一般而言,模型越大,训练速度就越慢;
+
+> 2.在模型速度之外,模型训练完成所需的时间又与用户设定的`num_epochs`迭代轮数相关,用户可以通过观察模型在验证集上的指标来决定是否提示结束掉训练进程(训练时设定`save_interval_epochs`参数,训练过程会每间隔`save_interval_epochs`轮数在验证集上计算指标,并保存模型);
+
+## 7. 如何设定迭代的轮数
+> 1. 用户自行训练时,如不确定迭代的轮数,可以将轮数设高一些,同时注意设置`save_interval_epochs`,这样模型迭代每间隔相应轮数就会在验证集上进行评估和保存,可以根据不同轮数模型在验证集上的评估指标,判断模型是否已经收敛,若模型已收敛,可以自行结束训练进程
+>
+## 8. 只有CPU,没有GPU,如何提升训练速度
+> 当没有GPU时,可以根据自己的CPU配置,选择是否使用多CPU进行训练,具体配置方式可以参考文档[多卡CPU/GPU训练](gpu_configure.md)
+>
+## 9. 电脑不能联网,训练时因为下载预训练模型失败,如何解决
+> 可以预先通过其它方式准备好预训练模型,然后训练时自定义`pretrain_weights`即可,可参考文档[无联网模型训练](how_to_offline_run.md)
+
+## 10. 每次训练新的模型,都需要重新下载预训练模型,怎样可以下载一次就搞定
+> 1.可以按照9的方式来解决这个问题
+> 2.每次训练前都设定`paddlex.pretrain_dir`路径,如设定`paddlex.pretrain_dir='/usrname/paddlex`,如此下载完的预训练模型会存放至`/usrname/paddlex`目录下,而已经下载在该目录的模型也不会再次重复下载
diff --git a/docs/Makefile b/docs/Makefile
new file mode 100644
index 0000000000000000000000000000000000000000..d4bb2cbb9eddb1bb1b4f366623044af8e4830919
--- /dev/null
+++ b/docs/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS ?=
+SPHINXBUILD ?= sphinx-build
+SOURCEDIR = .
+BUILDDIR = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+ @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+ @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/README.md b/docs/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..c297a4d32a4e331abd650f7a390c7adee5b242d6
--- /dev/null
+++ b/docs/README.md
@@ -0,0 +1,11 @@
+# PaddleX文档
+
+PaddleX的使用文档均在本目录结构下。文档采用Read the Docs方式组织,您可以直接访问[在线文档](https://www.baidu.com)进行查阅。
+
+## 编译文档
+在本目录下按如下步骤进行文档编译
+
+- 安装依赖: `pip install -r requirements.txt`
+- 编译: `make html`
+
+编译完成后,编译文件在`_build`目录下,直接打开`_build/html/index.html`即可查阅。
diff --git a/docs/apis/datasets.md b/docs/apis/datasets.md
new file mode 100644
index 0000000000000000000000000000000000000000..34569a9c9089d12e5ca88e8e0b947ed3904579ae
--- /dev/null
+++ b/docs/apis/datasets.md
@@ -0,0 +1,82 @@
+# 数据集-datasets
+
+## ImageNet类
+```
+paddlex.datasets.ImageNet(data_dir, file_list, label_list, transforms=None, num_workers=‘auto’, buffer_size=100, parallel_method='thread', shuffle=False)
+```
+读取ImageNet格式的分类数据集,并对样本进行相应的处理。ImageNet数据集格式的介绍可查看文档:[数据集格式说明](../datasets.md)
+
+示例:[代码文件](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/classification/mobilenetv2.py#L25)
+
+### 参数
+
+> * **data_dir** (str): 数据集所在的目录路径。
+> * **file_list** (str): 描述数据集图片文件和类别id的文件路径(文本内每行路径为相对`data_dir`的相对路径)。
+> * **label_list** (str): 描述数据集包含的类别信息文件路径。
+> * **transforms** (paddlex.cls.transforms): 数据集中每个样本的预处理/增强算子,详见[paddlex.cls.transforms](./transforms/cls_transforms.md)。
+> * **num_workers** (int|str):数据集中样本在预处理过程中的线程或进程数。默认为'auto'。当设为'auto'时,根据系统的实际CPU核数设置`num_workers`: 如果CPU核数的一半大于8,则`num_workers`为8,否则为CPU核数的一半。
+> * **buffer_size** (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。
+> * **parallel_method** (str): 数据集中样本在预处理过程中并行处理的方式,支持'thread'线程和'process'进程两种方式。默认为'thread'(Windows和Mac下会强制使用thread,该参数无效)。
+> * **shuffle** (bool): 是否需要对数据集中样本打乱顺序。默认为False。
+
+## VOCDetection类
+
+```
+paddlex.datasets.VOCDetection(data_dir, file_list, label_list, transforms=None, num_workers=‘auto’, buffer_size=100, parallel_method='thread', shuffle=False)
+```
+
+读取PascalVOC格式的检测数据集,并对样本进行相应的处理。PascalVOC数据集格式的介绍可查看文档:[数据集格式说明](../datasets.md)
+
+示例:[代码文件](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/detection/yolov3_mobilenetv1.py#L29)
+
+### 参数
+
+> * **data_dir** (str): 数据集所在的目录路径。
+> * **file_list** (str): 描述数据集图片文件和对应标注文件的文件路径(文本内每行路径为相对`data_dir`的相对路径)。
+> * **label_list** (str): 描述数据集包含的类别信息文件路径。
+> * **transforms** (paddlex.det.transforms): 数据集中每个样本的预处理/增强算子,详见[paddlex.det.transforms](./transforms/det_transforms.md)。
+> * **num_workers** (int|str):数据集中样本在预处理过程中的线程或进程数。默认为'auto'。当设为'auto'时,根据系统的实际CPU核数设置`num_workers`: 如果CPU核数的一半大于8,则`num_workers`为8,否则为CPU核数的一半。
+> * **buffer_size** (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。
+> * **parallel_method** (str): 数据集中样本在预处理过程中并行处理的方式,支持'thread'线程和'process'进程两种方式。默认为'thread'(Windows和Mac下会强制使用thread,该参数无效)。
+> * **shuffle** (bool): 是否需要对数据集中样本打乱顺序。默认为False。
+
+## COCODetection类
+
+```
+paddlex.datasets.COCODetection(data_dir, ann_file, transforms=None, num_workers='auto', buffer_size=100, parallel_method='thread', shuffle=False)
+```
+
+读取MSCOCO格式的检测数据集,并对样本进行相应的处理,该格式的数据集同样可以应用到实例分割模型的训练中。MSCOCO数据集格式的介绍可查看文档:[数据集格式说明](../datasets.md)
+
+示例:[代码文件](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/detection/mask_rcnn_r50_fpn.py#L27)
+
+### 参数
+
+> * **data_dir** (str): 数据集所在的目录路径。
+> * **ann_file** (str): 数据集的标注文件,为一个独立的json格式文件。
+> * **transforms** (paddlex.det.transforms): 数据集中每个样本的预处理/增强算子,详见[paddlex.det.transforms](./transforms/det_transforms.md)。
+> * **num_workers** (int|str):数据集中样本在预处理过程中的线程或进程数。默认为'auto'。当设为'auto'时,根据系统的实际CPU核数设置`num_workers`: 如果CPU核数的一半大于8,则`num_workers`为8,否则为CPU核数的一半。
+> * **buffer_size** (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。
+> * **parallel_method** (str): 数据集中样本在预处理过程中并行处理的方式,支持'thread'线程和'process'进程两种方式。默认为'thread'(Windows和Mac下会强制使用thread,该参数无效)。
+> * **shuffle** (bool): 是否需要对数据集中样本打乱顺序。默认为False。
+
+## SegDataset类
+
+```
+paddlex.datasets.SegDataset(data_dir, file_list, label_list, transforms=None, num_workers='auto', buffer_size=100, parallel_method='thread', shuffle=False)
+```
+
+读取语分分割任务数据集,并对样本进行相应的处理。语义分割任务数据集格式的介绍可查看文档:[数据集格式说明](../datasets.md)
+
+示例:[代码文件](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/segmentation/unet.py#L27)
+
+### 参数
+
+> * **data_dir** (str): 数据集所在的目录路径。
+> * **file_list** (str): 描述数据集图片文件和对应标注文件的文件路径(文本内每行路径为相对`data_dir`的相对路径)。
+> * **label_list** (str): 描述数据集包含的类别信息文件路径。
+> * **transforms** (paddlex.seg.transforms): 数据集中每个样本的预处理/增强算子,详见[paddlex.seg.transforms](./transforms/seg_transforms.md)。
+> * **num_workers** (int|str):数据集中样本在预处理过程中的线程或进程数。默认为'auto'。当设为'auto'时,根据系统的实际CPU核数设置`num_workers`: 如果CPU核数的一半大于8,则`num_workers`为8,否则为CPU核数的一半。
+> * **buffer_size** (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。
+> * **parallel_method** (str): 数据集中样本在预处理过程中并行处理的方式,支持'thread'线程和'process'进程两种方式。默认为'thread'(Windows和Mac下会强制使用thread,该参数无效)。
+> * **shuffle** (bool): 是否需要对数据集中样本打乱顺序。默认为False。
diff --git a/docs/apis/deploy.md b/docs/apis/deploy.md
new file mode 100644
index 0000000000000000000000000000000000000000..ccdcd86dca6b91354564ccf40a47d3b6d47c4959
--- /dev/null
+++ b/docs/apis/deploy.md
@@ -0,0 +1,50 @@
+# paddlex.deploy
+
+使用AnalysisPredictor进行预测部署。
+
+## create_predictor
+
+```
+paddlex.deploy.create_predictor(model_dir, use_gpu=False)
+```
+
+#### Args
+
+* **model_dir**: 训练过程中保存的模型路径
+* **use_gpu**: 是否使用GPU进行预测
+
+#### Returns
+
+* **Predictor**: paddlex.deploy.predictor.Predictor
+
+### 示例
+
+```
+import paddlex
+# 下
+Predictor = paddlex.deploy.create_predictor(model_dir, use_gpu=True)
+```
+
+## ClassifyPredictor
+继承至paddlex.deploy.predictor.Predictor,当model_dir/model.yml里面
+
+```
+paddlex.deploy.create_predictor(model_dir, use_gpu=False)
+```
+
+#### Args
+
+* **model_dir**: 训练过程中保存的模型路径
+* **use_gpu**: 是否使用GPU进行预测
+
+#### Returns
+
+* **Predictor**: paddlex.deploy.predictor.Predictor
+
+### 示例
+
+```
+import paddlex
+# 下
+Predictor = paddlex.deploy.create_predictor(model_dir, use_gpu=True)
+```
diff --git a/docs/apis/index.rst b/docs/apis/index.rst
new file mode 100644
index 0000000000000000000000000000000000000000..d281cc0dccb5e8407ad4e479cf098a160a330400
--- /dev/null
+++ b/docs/apis/index.rst
@@ -0,0 +1,12 @@
+接口说明
+============================
+
+.. toctree::
+ :maxdepth: 2
+
+ transforms/index.rst
+ datasets.md
+ models.md
+ slim.md
+ load_model.md
+ visualize.md
diff --git a/docs/apis/load_model.md b/docs/apis/load_model.md
new file mode 100644
index 0000000000000000000000000000000000000000..7d29729fd29da5a187eea801c35556478065a27c
--- /dev/null
+++ b/docs/apis/load_model.md
@@ -0,0 +1,40 @@
+# 模型加载-load_model
+
+PaddleX提供了统一的模型加载接口,支持加载PaddleX保存的模型,并在验证集上进行评估或对测试图片进行预测
+
+## 函数接口
+
+```
+paddlex.load_model(model_dir)
+```
+
+### 参数
+
+* **model_dir**: 训练过程中保存的模型路径
+
+### 返回值
+* **paddlex.cv.models**, 模型类。
+
+### 示例
+> 1. [点击下载](https://bj.bcebos.com/paddlex/models/garbage_epoch_12.tar.gz)PaddleX在垃圾分拣数据上训练的MaskRCNN模型
+> 2. [点击下载](https://bj.bcebos.com/paddlex/datasets/garbage_ins_det.tar.gz)垃圾分拣数据集
+
+```
+import paddlex as pdx
+
+model_dir = './garbage_epoch_12'
+data_dir = './garbage_ins_det/JPEGImages'
+ann_file = './garbage_ins_det/val.json'
+
+# 加载垃圾分拣模型
+model = pdx.load_model(model_dir)
+
+# 预测
+pred_result = model.predict('./garbage_ins_det/JPEGImages/000114.bmp')
+
+# 在验证集上进行评估
+eval_reader = pdx.cv.datasets.CocoDetection(data_dir=data_dir,
+ ann_file=ann_file
+ transforms=model.eval_transforms)
+eval_result = model.evaluate(eval_reader, batch_size=1)
+```
diff --git a/docs/apis/models.md b/docs/apis/models.md
new file mode 100644
index 0000000000000000000000000000000000000000..ad098ca71d5588439e753f42b1ada1f3756e8f22
--- /dev/null
+++ b/docs/apis/models.md
@@ -0,0 +1,483 @@
+# 模型-models
+
+## 分类模型
+
+### ResNet50类
+
+```python
+paddlex.cls.ResNet50(num_classes=1000)
+```
+
+构建ResNet50分类器,并实现其训练、评估和预测。
+
+#### **参数:**
+
+> - **num_classes** (int): 类别数。默认为1000。
+
+#### 分类器训练函数接口
+
+> ```python
+> train(self, num_epochs, train_dataset, train_batch_size=64, eval_dataset=None, save_interval_epochs=1, log_interval_steps=2, save_dir='output', pretrain_weights='IMAGENET', optimizer=None, learning_rate=0.025, lr_decay_epochs=[30, 60, 90], lr_decay_gamma=0.1, use_vdl=False, sensitivities_file=None, eval_metric_loss=0.05)
+> ```
+>
+> **参数:**
+>
+> > - **num_epochs** (int): 训练迭代轮数。
+> > - **train_dataset** (paddlex.datasets): 训练数据读取器。
+> > - **train_batch_size** (int): 训练数据batch大小。同时作为验证数据batch大小。默认值为64。
+> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
+> > - **save_interval_epochs** (int): 模型保存间隔(单位:迭代轮数)。默认为1。
+> > - **log_interval_steps** (int): 训练日志输出间隔(单位:迭代步数)。默认为2。
+> > - **save_dir** (str): 模型保存路径。
+> > - **pretrain_weights** (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认为'IMAGENET'。
+> > - **optimizer** (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
+> > - **learning_rate** (float): 默认优化器的初始学习率。默认为0.025。
+> > - **lr_decay_epochs** (list): 默认优化器的学习率衰减轮数。默认为[30, 60, 90]。
+> > - **lr_decay_gamma** (float): 默认优化器的学习率衰减率。默认为0.1。
+> > - **use_vdl** (bool): 是否使用VisualDL进行可视化。默认值为False。
+> > - **sensitivities_file** (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',则自动下载在ImageNet图片数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
+> > - **eval_metric_loss** (float): 可容忍的精度损失。默认为0.05。
+
+#### 分类器评估函数接口
+
+> ```python
+> evaluate(self, eval_dataset, batch_size=1, epoch_id=None, return_details=False)
+> ```
+>
+> **参数:**
+>
+> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
+> > - **batch_size** (int): 验证数据批大小。默认为1。
+> > - **epoch_id** (int): 当前评估模型所在的训练轮数。
+> > - **return_details** (bool): 是否返回详细信息,默认False。
+>
+> **返回值:**
+>
+> > - **dict**: 当return_details为False时,返回dict, 包含关键字:'acc1'、'acc5',分别表示最大值的accuracy、前5个最大值的accuracy。
+> > - **tuple** (metrics, eval_details): 当`return_details`为True时,增加返回dict,包含关键字:'true_labels'、'pred_scores',分别代表真实类别id、每个类别的预测得分。
+
+#### 分类器预测函数接口
+
+> ```python
+> predict(self, img_file, transforms=None, topk=5)
+> ```
+>
+> **参数:**
+>
+> > - **img_file** (str): 预测图像路径。
+> > - **transforms** (paddlex.cls.transforms): 数据预处理操作。
+> > - **topk** (int): 预测时前k个最大值。
+
+> **返回值:**
+>
+> > - **list**: 其中元素均为字典。字典的关键字为'category_id'、'category'、'score',
+> > 分别对应预测类别id、预测类别标签、预测得分。
+
+### 其它分类器类
+
+除`ResNet50`外,`paddlex.cls`下还提供了`ResNet18`、`ResNet34`、`ResNet101`、`ResNet50_vd`、`ResNet101_vd`、`DarkNet53`、`MobileNetV1`、`MobileNetV2`、`MobileNetV3_small`、`MobileNetV3_large`、`Xception41`、`Xception65`、`Xception71`、`ShuffleNetV2`, 使用方式(包括函数接口和参数)均与`ResNet50`一致,各模型效果可参考[模型库](../model_zoo.md)中列表。
+
+
+
+## 检测模型
+
+### YOLOv3类
+
+```python
+paddlex.det.YOLOv3(num_classes=80, backbone='MobileNetV1', anchors=None, anchor_masks=None, ignore_threshold=0.7, nms_score_threshold=0.01, nms_topk=1000, nms_keep_topk=100, nms_iou_threshold=0.45, label_smooth=False, train_random_shapes=[320, 352, 384, 416, 448, 480, 512, 544, 576, 608])
+```
+
+构建YOLOv3检测器,并实现其训练、评估和预测。
+
+**参数:**
+
+> - **num_classes** (int): 类别数。默认为80。
+> - **backbone** (str): YOLOv3的backbone网络,取值范围为['DarkNet53', 'ResNet34', 'MobileNetV1', 'MobileNetV3_large']。默认为'MobileNetV1'。
+> - **anchors** (list|tuple): anchor框的宽度和高度,为None时表示使用默认值
+> [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
+> [59, 119], [116, 90], [156, 198], [373, 326]]。
+> - **anchor_masks** (list|tuple): 在计算YOLOv3损失时,使用anchor的mask索引,为None时表示使用默认值
+> [[6, 7, 8], [3, 4, 5], [0, 1, 2]]。
+> - **ignore_threshold** (float): 在计算YOLOv3损失时,IoU大于`ignore_threshold`的预测框的置信度被忽略。默认为0.7。
+> - **nms_score_threshold** (float): 检测框的置信度得分阈值,置信度得分低于阈值的框应该被忽略。默认为0.01。
+> - **nms_topk** (int): 进行NMS时,根据置信度保留的最大检测框数。默认为1000。
+> - **nms_keep_topk** (int): 进行NMS后,每个图像要保留的总检测框数。默认为100。
+> - **nms_iou_threshold** (float): 进行NMS时,用于剔除检测框IOU的阈值。默认为0.45。
+> - **label_smooth** (bool): 是否使用label smooth。默认值为False。
+> - **train_random_shapes** (list|tuple): 训练时从列表中随机选择图像大小。默认值为[320, 352, 384, 416, 448, 480, 512, 544, 576, 608]。
+
+#### YOLOv3训练函数接口
+
+> ```python
+> train(self, num_epochs, train_dataset, train_batch_size=8, eval_dataset=None, save_interval_epochs=20, log_interval_steps=2, save_dir='output', pretrain_weights='IMAGENET', optimizer=None, learning_rate=1.0/8000, warmup_steps=1000, warmup_start_lr=0.0, lr_decay_epochs=[213, 240], lr_decay_gamma=0.1, metric=None, use_vdl=False, sensitivities_file=None, eval_metric_loss=0.05)
+> ```
+>
+> **参数:**
+>
+> > - **num_epochs** (int): 训练迭代轮数。
+> > - **train_dataset** (paddlex.datasets): 训练数据读取器。
+> > - **train_batch_size** (int): 训练数据batch大小。目前检测仅支持单卡评估,训练数据batch大小与显卡数量之商为验证数据batch大小。默认值为8。
+> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
+> > - **save_interval_epochs** (int): 模型保存间隔(单位:迭代轮数)。默认为20。
+> > - **log_interval_steps** (int): 训练日志输出间隔(单位:迭代次数)。默认为2。
+> > - **save_dir** (str): 模型保存路径。默认值为'output'。
+> > - **pretrain_weights** (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认为None。
+> > - **optimizer** (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
+> > - **learning_rate** (float): 默认优化器的学习率。默认为1.0/8000。
+> > - **warmup_steps** (int): 默认优化器进行warmup过程的步数。默认为1000。
+> > - **warmup_start_lr** (int): 默认优化器warmup的起始学习率。默认为0.0。
+> > - **lr_decay_epochs** (list): 默认优化器的学习率衰减轮数。默认为[213, 240]。
+> > - **lr_decay_gamma** (float): 默认优化器的学习率衰减率。默认为0.1。
+> > - **metric** (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认值为None。
+> > - **use_vdl** (bool): 是否使用VisualDL进行可视化。默认值为False。
+> > - **sensitivities_file** (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',则自动下载在PascalVOC数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
+> > - **eval_metric_loss** (float): 可容忍的精度损失。默认为0.05。
+
+#### YOLOv3评估函数接口
+
+> ```python
+> evaluate(self, eval_dataset, batch_size=1, epoch_id=None, metric=None, return_details=False)
+> ```
+>
+> **参数:**
+>
+> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
+> > - **batch_size** (int): 验证数据批大小。默认为1。
+> > - **epoch_id** (int): 当前评估模型所在的训练轮数。
+> > - **metric** (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认为None,根据用户传入的Dataset自动选择,如为VOCDetection,则`metric`为'VOC';如为COCODetection,则`metric`为'COCO'默认为None。
+> > - **return_details** (bool): 是否返回详细信息。默认值为False。
+> >
+> **返回值:**
+>
+> > - **tuple** (metrics, eval_details) | **dict** (metrics): 当`return_details`为True时,返回(metrics, eval_details),当`return_details`为False时,返回metrics。metrics为dict,包含关键字:'bbox_mmap'或者’bbox_map‘,分别表示平均准确率平均值在各个阈值下的结果取平均值的结果(mmAP)、平均准确率平均值(mAP)。eval_details为dict,包含关键字:'bbox',对应元素预测结果列表,每个预测结果由图像id、预测框类别id、预测框坐标、预测框得分;’gt‘:真实标注框相关信息。
+
+#### YOLOv3预测函数接口
+
+> ```python
+> predict(self, img_file, transforms=None)
+> ```
+>
+> **参数:**
+>
+> > - **img_file** (str): 预测图像路径。
+> > - **transforms** (paddlex.det.transforms): 数据预处理操作。
+>
+> **返回值:**
+>
+> > - **list**: 预测结果列表,列表中每个元素均为一个dict,key'bbox', 'category', 'category_id', 'score',分别表示每个预测目标的框坐标信息、类别、类别id、置信度,其中框坐标信息为[xmin, ymin, w, h],即左上角x, y坐标和框的宽和高。
+
+### FasterRCNN类
+
+```python
+paddlex.det.FasterRCNN(num_classes=81, backbone='ResNet50', with_fpn=True, aspect_ratios=[0.5, 1.0, 2.0], anchor_sizes=[32, 64, 128, 256, 512])
+
+```
+
+构建FasterRCNN检测器,并实现其训练、评估和预测。
+
+**参数:**
+
+> - **num_classes** (int): 包含了背景类的类别数。默认为81。
+> - **backbone** (str): FasterRCNN的backbone网络,取值范围为['ResNet18', 'ResNet50', 'ResNet50vd', 'ResNet101', 'ResNet101vd']。默认为'ResNet50'。
+> - **with_fpn** (bool): 是否使用FPN结构。默认为True。
+> - **aspect_ratios** (list): 生成anchor高宽比的可选值。默认为[0.5, 1.0, 2.0]。
+> - **anchor_sizes** (list): 生成anchor大小的可选值。默认为[32, 64, 128, 256, 512]。
+
+#### FasterRCNN训练函数接口
+
+> ```python
+> train(self, num_epochs, train_dataset, train_batch_size=2, eval_dataset=None, save_interval_epochs=1, log_interval_steps=2,save_dir='output', pretrain_weights='IMAGENET', optimizer=None, learning_rate=0.0025, warmup_steps=500, warmup_start_lr=1.0/1200, lr_decay_epochs=[8, 11], lr_decay_gamma=0.1, metric=None, use_vdl=False)
+>
+> ```
+>
+> **参数:**
+>
+> > - **num_epochs** (int): 训练迭代轮数。
+> > - **train_dataset** (paddlex.datasets): 训练数据读取器。
+> > - **train_batch_size** (int): 训练数据batch大小。目前检测仅支持单卡评估,训练数据batch大小与显卡数量之商为验证数据batch大小。默认为2。
+> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
+> > - **save_interval_epochs** (int): 模型保存间隔(单位:迭代轮数)。默认为1。
+> > - **log_interval_steps** (int): 训练日志输出间隔(单位:迭代次数)。默认为2。
+> > - **save_dir** (str): 模型保存路径。默认值为'output'。
+> > - **pretrain_weights** (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认为None。
+> > - **optimizer** (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
+> > - **learning_rate** (float): 默认优化器的初始学习率。默认为0.0025。
+> > - **warmup_steps** (int): 默认优化器进行warmup过程的步数。默认为500。
+> > - **warmup_start_lr** (int): 默认优化器warmup的起始学习率。默认为1.0/1200。
+> > - **lr_decay_epochs** (list): 默认优化器的学习率衰减轮数。默认为[8, 11]。
+> > - **lr_decay_gamma** (float): 默认优化器的学习率衰减率。默认为0.1。
+> > - **metric** (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认值为None。
+> > - **use_vdl** (bool): 是否使用VisualDL进行可视化。默认值为False。
+
+#### FasterRCNN评估函数接口
+
+> ```python
+> evaluate(self, eval_dataset, batch_size=1, epoch_id=None, metric=None, return_details=False)
+>
+> ```
+>
+> **参数:**
+>
+> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
+> > - **batch_size** (int): 验证数据批大小。默认为1。
+> > - **epoch_id** (int): 当前评估模型所在的训练轮数。
+> > - **metric** (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认为None,根据用户传入的Dataset自动选择,如为VOCDetection,则`metric`为'VOC'; 如为COCODetection,则`metric`为'COCO'。
+> > - **return_details** (bool): 是否返回详细信息。默认值为False。
+> >
+> **返回值:**
+>
+> > - **tuple** (metrics, eval_details) | **dict** (metrics): 当`return_details`为True时,返回(metrics, eval_details),当`return_details`为False时,返回metrics。metrics为dict,包含关键字:'bbox_mmap'或者’bbox_map‘,分别表示平均准确率平均值在各个IoU阈值下的结果取平均值的结果(mmAP)、平均准确率平均值(mAP)。eval_details为dict,包含关键字:'bbox',对应元素预测结果列表,每个预测结果由图像id、预测框类别id、预测框坐标、预测框得分;’gt‘:真实标注框相关信息。
+
+#### FasterRCNN预测函数接口
+
+> ```python
+> predict(self, img_file, transforms=None)
+>
+> ```
+>
+> **参数:**
+>
+> > - **img_file** (str): 预测图像路径。
+> > - **transforms** (paddlex.det.transforms): 数据预处理操作。
+>
+> **返回值:**
+>
+> > - **list**: 预测结果列表,列表中每个元素均为一个dict,key'bbox', 'category', 'category_id', 'score',分别表示每个预测目标的框坐标信息、类别、类别id、置信度,其中框坐标信息为[xmin, ymin, w, h],即左上角x, y坐标和框的宽和高。
+
+### MaskRCNN类
+
+```python
+paddlex.det.MaskRCNN(num_classes=81, backbone='ResNet50', with_fpn=True, aspect_ratios=[0.5, 1.0, 2.0], anchor_sizes=[32, 64, 128, 256, 512])
+
+```
+
+构建MaskRCNN检测器,并实现其训练、评估和预测。
+
+**参数:**
+
+> - **num_classes** (int): 包含了背景类的类别数。默认为81。
+> - **backbone** (str): MaskRCNN的backbone网络,取值范围为['ResNet18', 'ResNet50', 'ResNet50vd', 'ResNet101', 'ResNet101vd']。默认为'ResNet50'。
+> - **with_fpn** (bool): 是否使用FPN结构。默认为True。
+> - **aspect_ratios** (list): 生成anchor高宽比的可选值。默认为[0.5, 1.0, 2.0]。
+> - **anchor_sizes** (list): 生成anchor大小的可选值。默认为[32, 64, 128, 256, 512]。
+
+#### MaskRCNN训练函数接口
+
+> ```python
+> train(self, num_epochs, train_dataset, train_batch_size=1, eval_dataset=None, save_interval_epochs=1, log_interval_steps=20, save_dir='output', pretrain_weights='IMAGENET', optimizer=None, learning_rate=1.0/800, warmup_steps=500, warmup_start_lr=1.0 / 2400, lr_decay_epochs=[8, 11], lr_decay_gamma=0.1, metric=None, use_vdl=False)
+>
+> ```
+>
+> **参数:**
+>
+> > - **num_epochs** (int): 训练迭代轮数。
+> > - **train_dataset** (paddlex.datasets): 训练数据读取器。
+> > - **train_batch_size** (int): 训练数据batch大小。目前检测仅支持单卡评估,训练数据batch大小与显卡数量之商为验证数据batch大小。默认为1。
+> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
+> > - **save_interval_epochs** (int): 模型保存间隔(单位:迭代轮数)。默认为1。
+> > - **log_interval_steps** (int): 训练日志输出间隔(单位:迭代次数)。默认为2。
+> > - **save_dir** (str): 模型保存路径。默认值为'output'。
+> > - **pretrain_weights** (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认为None。
+> > - **optimizer** (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
+> > - **learning_rate** (float): 默认优化器的初始学习率。默认为0.00125。
+> > - **warmup_steps** (int): 默认优化器进行warmup过程的步数。默认为500。
+> > - **warmup_start_lr** (int): 默认优化器warmup的起始学习率。默认为1.0/2400。
+> > - **lr_decay_epochs** (list): 默认优化器的学习率衰减轮数。默认为[8, 11]。
+> > - **lr_decay_gamma** (float): 默认优化器的学习率衰减率。默认为0.1。
+> > - **metric** (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认值为None。
+> > - **use_vdl** (bool): 是否使用VisualDL进行可视化。默认值为False。
+
+#### MaskRCNN评估函数接口
+
+> ```python
+> evaluate(self, eval_dataset, batch_size=1, epoch_id=None, metric=None, return_details=False)
+>
+> ```
+>
+> **参数:**
+>
+> > - **eval_dataset** (paddlex.datasets): 验证数据读取器。
+> > - **batch_size** (int): 验证数据批大小。默认为1。
+> > - **epoch_id** (int): 当前评估模型所在的训练轮数。
+> > - **metric** (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认为None,根据用户传入的Dataset自动选择,如为VOCDetection,则`metric`为'VOC'; 如为COCODetection,则`metric`为'COCO'。
+> > - **return_details** (bool): 是否返回详细信息。默认值为False。
+> >
+> **返回值:**
+>
+> > - **tuple** (metrics, eval_details) | **dict** (metrics): 当`return_details`为True时,返回(metrics, eval_details),当return_details为False时,返回metrics。metrics为dict,包含关键字:'bbox_mmap'和'segm_mmap'或者’bbox_map‘和'segm_map',分别表示预测框和分割区域平均准确率平均值在各个IoU阈值下的结果取平均值的结果(mmAP)、平均准确率平均值(mAP)。eval_details为dict,包含关键字:'bbox',对应元素预测框结果列表,每个预测结果由图像id、预测框类别id、预测框坐标、预测框得分;'mask',对应元素预测区域结果列表,每个预测结果由图像id、预测区域类别id、预测区域坐标、预测区域得分;’gt‘:真实标注框和标注区域相关信息。
+
+#### MaskRCNN预测函数接口
+
+> ```python
+> predict(self, img_file, transforms=None)
+>
+> ```
+>
+> **参数:**
+>
+> > - **img_file** (str): 预测图像路径。
+> > - **transforms** (paddlex.det.transforms): 数据预处理操作。
+>
+> **返回值:**
+>
+> > - **list**: 预测结果列表,列表中每个元素均为一个dict,key'bbox', 'mask', 'category', 'category_id', 'score',分别表示每个预测目标的框坐标信息、Mask信息,类别、类别id、置信度,其中框坐标信息为[xmin, ymin, w, h],即左上角x, y坐标和框的宽和高。
+
+## 分割模型
+
+### DeepLabv3p类
+
+```python
+paddlex.seg.DeepLabv3p(num_classes=2, backbone='MobileNetV2_x1.0', output_stride=16, aspp_with_sep_conv=True, decoder_use_sep_conv=True, encoder_with_aspp=True, enable_decoder=True, use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255)
+
+```
+
+构建DeepLabv3p分割器,并实现其训练、评估和预测。
+
+**参数:**
+
+> - **num_classes** (int): 类别数。
+> - **backbone** (str): DeepLabv3+的backbone网络,实现特征图的计算,取值范围为['Xception65', 'Xception41', 'MobileNetV2_x0.25', 'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x2.0'],'MobileNetV2_x1.0'。
+> - **output_stride** (int): backbone 输出特征图相对于输入的下采样倍数,一般取值为8或16。默认16。
+> - **aspp_with_sep_conv** (bool): decoder模块是否采用separable convolutions。默认True。
+> - **decoder_use_sep_conv** (bool): decoder模块是否采用separable convolutions。默认True。
+> - **encoder_with_aspp** (bool): 是否在encoder阶段采用aspp模块。默认True。
+> - **enable_decoder** (bool): 是否使用decoder模块。默认True。
+> - **use_bce_loss** (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。默认False。
+> - **use_dice_loss** (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用,当`use_bce_loss`和`use_dice_loss`都为False时,使用交叉熵损失函数。默认False。
+> - **class_weight** (list/str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。
+> - **ignore_index** (int): label上忽略的值,label为`ignore_index`的像素不参与损失函数的计算。默认255。
+
+#### DeepLabv3训练函数接口
+
+> ```python
+> train(self, num_epochs, train_dataset, train_batch_size=2, eval_dataset=None, eval_batch_size=1, save_interval_epochs=1, log_interval_steps=2, save_dir='output', pretrain_weights='IMAGENET', optimizer=None, learning_rate=0.01, lr_decay_power=0.9, use_vdl=False, sensitivities_file=None, eval_metric_loss=0.05):
+>
+> ```
+>
+> **参数:**
+> >
+> > - **num_epochs** (int): 训练迭代轮数。
+> > - **train_dataset** (paddlex.datasets): 训练数据读取器。
+> > - **train_batch_size** (int): 训练数据batch大小。同时作为验证数据batch大小。默认2。
+> > - **eval_dataset** (paddlex.datasets): 评估数据读取器。
+> > - **save_interval_epochs** (int): 模型保存间隔(单位:迭代轮数)。默认为1。
+> > - **log_interval_steps** (int): 训练日志输出间隔(单位:迭代次数)。默认为2。
+> > - **save_dir** (str): 模型保存路径。默认'output'
+> > - **pretrain_weights** (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认'IMAGENET'。
+> > - **optimizer** (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认的优化器:使用fluid.optimizer.Momentum优化方法,polynomial的学习率衰减策略。
+> > - **learning_rate** (float): 默认优化器的初始学习率。默认0.01。
+> > - **lr_decay_power** (float): 默认优化器学习率衰减指数。默认0.9。
+> > - **use_vdl** (bool): 是否使用VisualDL进行可视化。默认False。
+> > - **sensitivities_file** (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',则自动下载在ImageNet图片数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
+> > - **eval_metric_loss** (float): 可容忍的精度损失。默认为0.05。
+
+#### DeepLabv3评估函数接口
+
+> ```python
+> evaluate(self, eval_dataset, batch_size=1, epoch_id=None, return_details=False):
+> ```
+
+> **参数:**
+> >
+> > - **eval_dataset** (paddlex.datasets): 评估数据读取器。
+> > - **batch_size** (int): 评估时的batch大小。默认1。
+> > - **epoch_id** (int): 当前评估模型所在的训练轮数。
+> > - **return_details** (bool): 是否返回详细信息。默认False。
+
+> **返回值:**
+> >
+> > - **dict**: 当`return_details`为False时,返回dict。包含关键字:'miou'、'category_iou'、'macc'、
+> > 'category_acc'和'kappa',分别表示平均iou、各类别iou、平均准确率、各类别准确率和kappa系数。
+> > - **tuple** (metrics, eval_details):当`return_details`为True时,增加返回dict (eval_details),
+> > 包含关键字:'confusion_matrix',表示评估的混淆矩阵。
+
+#### DeepLabv3预测函数接口
+
+> ```
+> predict(self, im_file, transforms=None):
+> ```
+
+> **参数:**
+> >
+> > - **img_file** (str): 预测图像路径。
+> > - **transforms** (paddlex.seg.transforms): 数据预处理操作。
+
+> **返回值:**
+> >
+> > - **dict**: 包含关键字'label_map'和'score_map', 'label_map'存储预测结果灰度图,像素值表示对应的类别,'score_map'存储各类别的概率,shape=(h, w, num_classes)。
+
+### UNet类
+
+```python
+paddlex.seg.UNet(num_classes=2, upsample_mode='bilinear', use_bce_loss=False, use_dice_loss=False, class_weight=None, ignore_index=255)
+```
+
+构建UNet分割器,并实现其训练、评估和预测。
+
+
+**参数:**
+
+> - **num_classes** (int): 类别数。
+> - **upsample_mode** (str): UNet decode时采用的上采样方式,取值为'bilinear'时利用双线行差值进行上菜样,当输入其他选项时则利用反卷积进行上菜样,默认为'bilinear'。
+> - **use_bce_loss** (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。默认False。
+> - **use_dice_loss** (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。默认False。
+> - **class_weight** (list/str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。
+> - **ignore_index** (int): label上忽略的值,label为`ignore_index`的像素不参与损失函数的计算。默认255。
+
+#### Unet训练函数接口
+
+> ```python
+> train(self, num_epochs, train_dataset, train_batch_size=2, eval_dataset=None, eval_batch_size=1, save_interval_epochs=1, log_interval_steps=2, save_dir='output', pretrain_weights='COCO', optimizer=None, learning_rate=0.01, lr_decay_power=0.9, use_vdl=False, sensitivities_file=None, eval_metric_loss=0.05):
+> ```
+>
+> **参数:**
+> >
+> > - **num_epochs** (int): 训练迭代轮数。
+> > - **train_dataset** (paddlex.datasets): 训练数据读取器。
+> > - **train_batch_size** (int): 训练数据batch大小。同时作为验证数据batch大小。默认2。
+> > - **eval_dataset** (paddlex.datasets): 评估数据读取器。
+> > - **save_interval_epochs** (int): 模型保存间隔(单位:迭代轮数)。默认为1。
+> > - **log_interval_steps** (int): 训练日志输出间隔(单位:迭代次数)。默认为2。
+> > - **save_dir** (str): 模型保存路径。默认'output'
+> > - **pretrain_weights** (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',则自动下载在COCO图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认'COCO'。
+> > - **optimizer** (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认的优化器:使用fluid.optimizer.Momentum优化方法,polynomial的学习率衰减策略。
+> > - **learning_rate** (float): 默认优化器的初始学习率。默认0.01。
+> > - **lr_decay_power** (float): 默认优化器学习率衰减指数。默认0.9。
+> > - **use_vdl** (bool): 是否使用VisualDL进行可视化。默认False。
+> > - **sensitivities_file** (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',则自动下载在ImageNet图片数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
+> > - **eval_metric_loss** (float): 可容忍的精度损失。默认为0.05。
+
+#### Unet评估函数接口
+
+> ```
+> evaluate(self, eval_dataset, batch_size=1, epoch_id=None, return_details=False):
+> ```
+
+> **参数:**
+> >
+> > - **eval_dataset** (paddlex.datasets): 评估数据读取器。
+> > - **batch_size** (int): 评估时的batch大小。默认1。
+> > - **epoch_id** (int): 当前评估模型所在的训练轮数。
+> > - **return_details** (bool): 是否返回详细信息。默认False。
+
+> **返回值:**
+> >
+> > - **dict**: 当return_details为False时,返回dict。包含关键字:'miou'、'category_iou'、'macc'、
+> > 'category_acc'和'kappa',分别表示平均iou、各类别iou、平均准确率、各类别准确率和kappa系数。
+> > - **tuple** (metrics, eval_details):当return_details为True时,增加返回dict (eval_details),
+> > 包含关键字:'confusion_matrix',表示评估的混淆矩阵。
+
+#### Unet预测函数接口
+
+> ```
+> predict(self, im_file, transforms=None):
+> ```
+
+> **参数:**
+> >
+> > - **img_file** (str): 预测图像路径。
+> > - **transforms** (paddlex.seg.transforms): 数据预处理操作。
+
+> **返回值:**
+> >
+> > - **dict**: 包含关键字'label_map'和'score_map', 'label_map'存储预测结果灰度图,像素值表示对应的类别,'score_map'存储各类别的概率,shape=(h, w, num_classes)。
diff --git a/docs/apis/slim.md b/docs/apis/slim.md
new file mode 100644
index 0000000000000000000000000000000000000000..800e12373a490763d407f537f1b6de4d017f6338
--- /dev/null
+++ b/docs/apis/slim.md
@@ -0,0 +1,48 @@
+# 模型压缩-slim
+
+## 计算参数敏感度
+```
+paddlex.slim.cal_params_sensetives(model, save_file, eval_dataset, batch_size=8)
+```
+计算模型中可裁剪参数在验证集上的敏感度,并将敏感度信息保存至文件`save_file`
+1. 获取模型中可裁剪卷积Kernel的名称。
+2. 计算每个可裁剪卷积Kernel不同裁剪率下的敏感度。
+【注意】卷积的敏感度是指在不同裁剪率下评估数据集预测精度的损失,通过得到的敏感度,可以决定最终模型需要裁剪的参数列表和各裁剪参数对应的裁剪率。
+[查看使用示例](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/compress/classification/cal_sensitivities_file.py#L33)
+
+### 参数
+
+* **model** (paddlex.cls.models/paddlex.det.models/paddlex.seg.models): paddlex加载的模型。
+* **save_file** (str): 计算的得到的sensetives文件存储路径。
+* **eval_dataset** (paddlex.datasets): 评估数据集的读取器。
+* **batch_size** (int): 评估时的batch_size大小。
+
+
+## 导出量化模型
+```
+paddlex.slim.export_quant_model(model, test_dataset, batch_size=2, batch_num=10, save_dir='./quant_model', cache_dir='./temp')
+```
+导出量化模型,该接口实现了Post Quantization量化方式,需要传入测试数据集,并设定`batch_size`和`batch_num`,模型会以`batch_size`的大小计算`batch_num`批样本数据,并以这些样本数据的计算结果为统计信息进行模型量化。
+
+### 参数
+```
+* **model**(paddlex.cls.models/paddlex.det.models/paddlex.seg.models): paddlex加载的模型。
+* **test_dataset**(paddlex.dataset): 测试数据集
+* **batch_size**(int): 进行前向计算时的批数据大小
+* **batch_num**(int): 进行向前计算时批数据数量
+* **save_dir**(str): 量化后模型的保存目录
+* **cache_dir**(str): 量化过程中的统计数据临时存储目录
+```
+
+### 使用示例
+点击下载如下示例中的[模型](https://bj.bcebos.com/paddlex/models/vegetables_mobilenet.tar.gz),[数据集](https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz)
+```
+import paddlex as pdx
+model = pdx.load_model('vegetables_mobilenet')
+test_dataset = pdx.datasets.ImageNet(
+ data_dir='vegetables_cls',
+ file_list='vegetables_cls/train_list.txt',
+ label_list='vegetables_cls/labels.txt',
+ transforms=model.eval_transforms)
+pdx.slim.export_quant_model(model, test_dataset, save_dir='./quant_mobilenet')
+```
diff --git a/docs/apis/transforms/cls_transforms.md b/docs/apis/transforms/cls_transforms.md
new file mode 100644
index 0000000000000000000000000000000000000000..c6c0cd8fc95a043e86e7b89867813fedd9febbdf
--- /dev/null
+++ b/docs/apis/transforms/cls_transforms.md
@@ -0,0 +1,122 @@
+# 分类-paddlex.cls.transforms
+
+对图像分类任务的数据进行操作。可以利用[Compose](#compose)类将图像预处理/增强操作进行组合。
+
+## Compose类
+```python
+paddlex.cls.transforms.Compose(transforms)
+```
+
+根据数据预处理/增强算子对输入数据进行操作。 [使用示例](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/classification/mobilenetv2.py#L13)
+
+### 参数
+* **transforms** (list): 数据预处理/数据增强列表。
+
+
+## RandomCrop类
+```python
+paddlex.cls.transforms.RandomCrop(crop_size=224, lower_scale=0.88, lower_ratio=3. / 4, upper_ratio=4. / 3)
+```
+
+对图像进行随机剪裁,模型训练时的数据增强操作。
+1. 根据lower_scale、lower_ratio、upper_ratio计算随机剪裁的高、宽。
+2. 根据随机剪裁的高、宽随机选取剪裁的起始点。
+3. 剪裁图像。
+4. 调整剪裁后的图像的大小到crop_size*crop_size。
+
+### 参数
+* **crop_size** (int): 随机裁剪后重新调整的目标边长。默认为224。
+* **lower_scale** (float): 裁剪面积相对原面积比例的最小限制。默认为0.88。
+* **lower_ratio** (float): 宽变换比例的最小限制。默认为3. / 4。
+* **upper_ratio** (float): 宽变换比例的最小限制。默认为4. / 3。
+
+## RandomHorizontalFlip类
+```python
+paddlex.cls.transforms.RandomHorizontalFlip(prob=0.5)
+```
+
+以一定的概率对图像进行随机水平翻转,模型训练时的数据增强操作。
+
+### 参数
+* **prob** (float): 随机水平翻转的概率。默认为0.5。
+
+## RandomVerticalFlip类
+```python
+paddlex.cls.transforms.RandomVerticalFlip(prob=0.5)
+```
+
+以一定的概率对图像进行随机垂直翻转,模型训练时的数据增强操作。
+
+### 参数
+* **prob** (float): 随机垂直翻转的概率。默认为0.5。
+
+## Normalize类
+```python
+paddlex.cls.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
+```
+
+对图像进行标准化。
+1. 对图像进行归一化到区间[0.0, 1.0]。
+2. 对图像进行减均值除以标准差操作。
+
+### 参数
+* **mean** (list): 图像数据集的均值。默认为[0.485, 0.456, 0.406]。
+* **std** (list): 图像数据集的标准差。默认为[0.229, 0.224, 0.225]。
+
+## ResizeByShort类
+```python
+paddlex.cls.transforms.ResizeByShort(short_size=256, max_size=-1)
+```
+
+根据图像的短边调整图像大小(resize)。
+1. 获取图像的长边和短边长度。
+2. 根据短边与short_size的比例,计算长边的目标长度,此时高、宽的resize比例为short_size/原图短边长度。
+3. 如果max_size>0,调整resize比例:
+ 如果长边的目标长度>max_size,则高、宽的resize比例为max_size/原图长边长度。
+4. 根据调整大小的比例对图像进行resize。
+
+### 参数
+* **short_size** (int): 调整大小后的图像目标短边长度。默认为256。
+* **max_size** (int): 长边目标长度的最大限制。默认为-1。
+
+## CenterCrop类
+```python
+paddlex.cls.transforms.CenterCrop(crop_size=224)
+```
+
+以图像中心点扩散裁剪长宽为`crop_size`的正方形
+1. 计算剪裁的起始点。
+2. 剪裁图像。
+
+### 参数
+* **crop_size** (int): 裁剪的目标边长。默认为224。
+
+## RandomRotate类
+```python
+paddlex.cls.transforms.RandomRotate(rotate_range=30, prob=0.5)
+```
+
+以一定的概率对图像在[-rotate_range, rotaterange]角度范围内进行旋转,模型训练时的数据增强操作。
+
+### 参数
+* **rotate_range** (int): 旋转度数的范围。默认为30。
+* **prob** (float): 随机旋转的概率。默认为0.5。
+
+## RandomDistort类
+```python
+paddlex.cls.transforms.RandomDistort(brightness_range=0.9, brightness_prob=0.5, contrast_range=0.9, contrast_prob=0.5, saturation_range=0.9, saturation_prob=0.5, hue_range=18, hue_prob=0.5)
+```
+
+以一定的概率对图像进行随机像素内容变换,模型训练时的数据增强操作。
+1. 对变换的操作顺序进行随机化操作。
+2. 按照1中的顺序以一定的概率对图像在范围[-range, range]内进行随机像素内容变换。
+
+### 参数
+* **brightness_range** (float): 明亮度因子的范围。默认为0.9。
+* **brightness_prob** (float): 随机调整明亮度的概率。默认为0.5。
+* **contrast_range** (float): 对比度因子的范围。默认为0.9。
+* **contrast_prob** (float): 随机调整对比度的概率。默认为0.5。
+* **saturation_range** (float): 饱和度因子的范围。默认为0.9。
+* **saturation_prob** (float): 随机调整饱和度的概率。默认为0.5。
+* **hue_range** (int): 色调因子的范围。默认为18。
+* **hue_prob** (float): 随机调整色调的概率。默认为0.5。
diff --git a/docs/apis/transforms/det_transforms.md b/docs/apis/transforms/det_transforms.md
new file mode 100644
index 0000000000000000000000000000000000000000..9565415f3d7633e915f1556dc31542d60ebf562d
--- /dev/null
+++ b/docs/apis/transforms/det_transforms.md
@@ -0,0 +1,177 @@
+# 检测-paddlex.det.transforms
+
+对目标检测任务的数据进行操作。可以利用[Compose](#compose)类将图像预处理/增强操作进行组合。
+
+## Compose类
+```python
+paddlex.det.transforms.Compose(transforms)
+```
+
+根据数据预处理/增强算子对输入数据进行操作。[使用示例](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/detection/yolov3_mobilenetv1.py#L13)
+
+### 参数
+* **transforms** (list): 数据预处理/数据增强列表。
+
+## ResizeByShort类
+```python
+paddlex.det.transforms.ResizeByShort(short_size=800, max_size=1333)
+```
+
+根据图像的短边调整图像大小(resize)。
+1. 获取图像的长边和短边长度。
+2. 根据短边与short_size的比例,计算长边的目标长度,此时高、宽的resize比例为short_size/原图短边长度。
+3. 如果max_size>0,调整resize比例:
+ 如果长边的目标长度>max_size,则高、宽的resize比例为max_size/原图长边长度。
+4. 根据调整大小的比例对图像进行resize。
+
+### 参数
+* **short_size** (int): 短边目标长度。默认为800。
+* **max_size** (int): 长边目标长度的最大限制。默认为1333。
+
+## Padding类
+```python
+paddlex.det.transforms.Padding(coarsest_stride=1)
+```
+
+将图像的长和宽padding至coarsest_stride的倍数。如输入图像为[300, 640], `coarest_stride`为32,则由于300不为32的倍数,因此在图像最右和最下使用0值进行padding,最终输出图像为[320, 640]
+1. 如果coarsest_stride为1则直接返回。
+2. 计算宽和高与最邻近的coarest_stride倍数差值
+3. 根据计算得到的差值,在图像最右和最下进行padding
+
+### 参数
+* **coarsest_stride** (int): 填充后的图像长、宽为该参数的倍数,默认为1。
+
+## Resize类
+```python
+paddlex.det.transforms.Resize(target_size=608, interp='LINEAR')
+```
+
+调整图像大小(resize)。
+* 当目标大小(target_size)类型为int时,根据插值方式,将图像resize为[target_size, target_size]。
+* 当目标大小(target_size)类型为list或tuple时,根据插值方式,将图像resize为target_size。
+【注意】当插值方式为“RANDOM”时,则随机选取一种插值方式进行resize,作为模型训练时的数据增强操作。
+
+### 参数
+* **target_size** (int/list/tuple): 短边目标长度。默认为608。
+* **interp** (str): resize的插值方式,与opencv的插值方式对应,取值范围为['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4', 'RANDOM']。默认为"LINEAR"。
+
+## RandomHorizontalFlip类
+```python
+paddlex.det.transforms.RandomHorizontalFlip(prob=0.5)
+```
+
+以一定的概率对图像进行随机水平翻转,模型训练时的数据增强操作。
+
+### 参数
+* **prob** (float): 随机水平翻转的概率。默认为0.5。
+
+## Normalize类
+```python
+paddlex.det.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
+```
+
+对图像进行标准化。
+1. 归一化图像到到区间[0.0, 1.0]。
+2. 对图像进行减均值除以标准差操作。
+
+### 参数
+* **mean** (list): 图像数据集的均值。默认为[0.485, 0.456, 0.406]。
+* **std** (list): 图像数据集的标准差。默认为[0.229, 0.224, 0.225]。
+
+## RandomDistort类
+```python
+paddlex.det.transforms.RandomDistort(brightness_range=0.5, brightness_prob=0.5, contrast_range=0.5, contrast_prob=0.5, saturation_range=0.5, saturation_prob=0.5, hue_range=18, hue_prob=0.5)
+```
+
+以一定的概率对图像进行随机像素内容变换,模型训练时的数据增强操作。
+1. 对变换的操作顺序进行随机化操作。
+2. 按照1中的顺序以一定的概率对图像在范围[-range, range]内进行随机像素内容变换。
+
+### 参数
+* **brightness_range** (float): 明亮度因子的范围。默认为0.5。
+* **brightness_prob** (float): 随机调整明亮度的概率。默认为0.5。
+* **contrast_range** (float): 对比度因子的范围。默认为0.5。
+* **contrast_prob** (float): 随机调整对比度的概率。默认为0.5。
+* **saturation_range** (float): 饱和度因子的范围。默认为0.5。
+* **saturation_prob** (float): 随机调整饱和度的概率。默认为0.5。
+* **hue_range** (int): 色调因子的范围。默认为18。
+* **hue_prob** (float): 随机调整色调的概率。默认为0.5。
+
+## MixupImage类
+```python
+paddlex.det.transforms.MixupImage(alpha=1.5, beta=1.5, mixup_epoch=-1)
+```
+
+对图像进行mixup操作,模型训练时的数据增强操作,目前仅YOLOv3模型支持该transform。
+当label_info中不存在mixup字段时,直接返回,否则进行下述操作:
+1. 从随机beta分布中抽取出随机因子factor。
+2. 根据不同情况进行处理:
+ * 当factor>=1.0时,去除label_info中的mixup字段,直接返回。
+ * 当factor<=0.0时,直接返回label_info中的mixup字段,并在label_info中去除该字段。
+ * 其余情况,执行下述操作:
+ (1)原图像乘以factor,mixup图像乘以(1-factor),叠加2个结果。
+ (2)拼接原图像标注框和mixup图像标注框。
+ (3)拼接原图像标注框类别和mixup图像标注框类别。
+ (4)原图像标注框混合得分乘以factor,mixup图像标注框混合得分乘以(1-factor),叠加2个结果。
+3. 更新im_info中的augment_shape信息。
+
+### 参数
+* **alpha** (float): 随机beta分布的下限。默认为1.5。
+* **beta** (float): 随机beta分布的上限。默认为1.5。
+* **mixup_epoch** (int): 在前mixup_epoch轮使用mixup增强操作;当该参数为-1时,该策略不会生效。默认为-1。
+
+## RandomExpand类
+```python
+paddlex.det.transforms.RandomExpand(max_ratio=4., prob=0.5, mean=[127.5, 127.5, 127.5])
+```
+
+随机扩张图像,模型训练时的数据增强操作,模型训练时的数据增强操作。
+1. 随机选取扩张比例(扩张比例大于1时才进行扩张)。
+2. 计算扩张后图像大小。
+3. 初始化像素值为数据集均值的图像,并将原图像随机粘贴于该图像上。
+4. 根据原图像粘贴位置换算出扩张后真实标注框的位置坐标。
+
+### 参数
+* **max_ratio** (float): 图像扩张的最大比例。默认为4.0。
+* **prob** (float): 随机扩张的概率。默认为0.5。
+* **mean** (list): 图像数据集的均值(0-255)。默认为[127.5, 127.5, 127.5]。
+
+## RandomCrop类
+```python
+paddlex.det.transforms.RandomCrop(batch_sampler=None, satisfy_all=False, avoid_no_bbox=True)
+```
+
+随机裁剪图像,模型训练时的数据增强操作。
+1. 根据batch_sampler计算获取裁剪候选区域的位置。
+ (1) 根据min scale、max scale、min aspect ratio、max aspect ratio计算随机剪裁的高、宽。
+ (2) 根据随机剪裁的高、宽随机选取剪裁的起始点。
+ (3) 筛选出裁剪候选区域:
+ * 当satisfy_all为True时,需所有真实标注框与裁剪候选区域的重叠度满足需求时,该裁剪候选区域才可保留。
+ * 当satisfy_all为False时,当有一个真实标注框与裁剪候选区域的重叠度满足需求时,该裁剪候选区域就可保留。
+2. 遍历所有裁剪候选区域:
+ (1) 若真实标注框与候选裁剪区域不重叠,或其中心点不在候选裁剪区域,则将该真实标注框去除。
+ (2) 计算相对于该候选裁剪区域,真实标注框的位置,并筛选出对应的类别、混合得分。
+ (3) 若avoid_no_bbox为False,返回当前裁剪后的信息即可;反之,要找到一个裁剪区域中真实标注框个数不为0的区域,才返回裁剪后的信息。
+
+### 参数
+* **batch_sampler** (list): 随机裁剪参数的多种组合,每种组合包含8个值,如下:
+ - max sample (int):满足当前组合的裁剪区域的个数上限。
+ - max trial (int): 查找满足当前组合的次数。
+ - min scale (float): 裁剪面积相对原面积,每条边缩短比例的最小限制。
+ - max scale (float): 裁剪面积相对原面积,每条边缩短比例的最大限制。
+ - min aspect ratio (float): 裁剪后短边缩放比例的最小限制。
+ - max aspect ratio (float): 裁剪后短边缩放比例的最大限制。
+ - min overlap (float): 真实标注框与裁剪图像重叠面积的最小限制。
+ - max overlap (float): 真实标注框与裁剪图像重叠面积的最大限制。
+
+ 默认值为None,当为None时采用如下设置:
+
+ [[1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0],
+ [1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 1.0],
+ [1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 1.0],
+ [1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 1.0],
+ [1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 1.0],
+ [1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 1.0],
+ [1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]]
+* **satisfy_all** (bool): 是否需要所有标注框满足条件,裁剪候选区域才保留。默认为False。
+* **avoid_no_bbox** (bool): 是否对裁剪图像不存在标注框的图像进行保留。默认为True。
diff --git a/docs/apis/transforms/index.rst b/docs/apis/transforms/index.rst
new file mode 100644
index 0000000000000000000000000000000000000000..f6040978134ccf664d08ab39f55db197b752ac37
--- /dev/null
+++ b/docs/apis/transforms/index.rst
@@ -0,0 +1,11 @@
+数据处理-transforms
+============================
+
+transforms为PaddleX的模型训练提供了数据的预处理和数据增强接口。
+
+.. toctree::
+ :maxdepth: 1
+
+ cls_transforms.md
+ det_transforms.md
+ seg_transforms.md
diff --git a/docs/apis/transforms/seg_transforms.md b/docs/apis/transforms/seg_transforms.md
new file mode 100644
index 0000000000000000000000000000000000000000..d2b4d92aa4b2b6702156ba25ab0d6a6cd4341922
--- /dev/null
+++ b/docs/apis/transforms/seg_transforms.md
@@ -0,0 +1,166 @@
+# 分割-paddlex.seg.transforms
+
+对用于分割任务的数据进行操作。可以利用[Compose](#compose)类将图像预处理/增强操作进行组合。
+
+
+## Compose类
+```python
+paddlex.seg.transforms.Compose(transforms)
+```
+根据数据预处理/数据增强列表对输入数据进行操作。[使用示例](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/segmentation/unet.py#L13)
+### 参数
+* **transforms** (list): 数据预处理/数据增强列表。
+
+
+## RandomHorizontalFlip类
+```python
+paddlex.seg.transforms.RandomHorizontalFlip(prob=0.5)
+```
+以一定的概率对图像进行水平翻转,模型训练时的数据增强操作。
+### 参数
+* **prob** (float): 随机水平翻转的概率。默认值为0.5。
+
+
+## RandomVerticalFlip类
+```python
+paddlex.seg.transforms.RandomVerticalFlip(prob=0.1)
+```
+以一定的概率对图像进行垂直翻转,模型训练时的数据增强操作。
+### 参数
+* **prob** (float): 随机垂直翻转的概率。默认值为0.1。
+
+
+## Resize类
+```python
+paddlex.seg.transforms.Resize(target_size, interp='LINEAR')
+```
+调整图像大小(resize)。
+
+- 当目标大小(target_size)类型为int时,根据插值方式,
+ 将图像resize为[target_size, target_size]。
+- 当目标大小(target_size)类型为list或tuple时,根据插值方式,
+ 将图像resize为target_size, target_size的输入应为[w, h]或(w, h)。
+### 参数
+* **target_size** (int|list|tuple): 目标大小
+* **interp** (str): resize的插值方式,与opencv的插值方式对应,
+可选的值为['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4'],默认为"LINEAR"。
+
+
+## ResizeByLong类
+```python
+paddlex.seg.transforms.ResizeByLong(long_size)
+```
+对图像长边resize到固定值,短边按比例进行缩放。
+### 参数
+* **long_size** (int): resize后图像的长边大小。
+
+
+## ResizeRangeScaling类
+```python
+paddlex.seg.transforms.ResizeRangeScaling(min_value=400, max_value=600)
+```
+对图像长边随机resize到指定范围内,短边按比例进行缩放,模型训练时的数据增强操作。
+### 参数
+* **min_value** (int): 图像长边resize后的最小值。默认值400。
+* **max_value** (int): 图像长边resize后的最大值。默认值600。
+
+
+## ResizeStepScaling类
+```python
+paddlex.seg.transforms.ResizeStepScaling(min_scale_factor=0.75, max_scale_factor=1.25, scale_step_size=0.25)
+```
+对图像按照某一个比例resize,这个比例以scale_step_size为步长,在[min_scale_factor, max_scale_factor]随机变动,模型训练时的数据增强操作。
+### 参数
+* **min_scale_factor**(float), resize最小尺度。默认值0.75。
+* **max_scale_factor** (float), resize最大尺度。默认值1.25。
+* **scale_step_size** (float), resize尺度范围间隔。默认值0.25。
+
+
+## Normalize类
+```python
+paddlex.seg.transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
+```
+对图像进行标准化。
+
+1.图像像素归一化到区间 [0.0, 1.0]。
+2.对图像进行减均值除以标准差操作。
+### 参数
+* **mean** (list): 图像数据集的均值。默认值[0.5, 0.5, 0.5]。
+* **std** (list): 图像数据集的标准差。默认值[0.5, 0.5, 0.5]。
+
+
+## Padding类
+```python
+paddlex.seg.transforms.Padding(target_size, im_padding_value=[127.5, 127.5, 127.5], label_padding_value=255)
+```
+对图像或标注图像进行padding,padding方向为右和下。根据提供的值对图像或标注图像进行padding操作。
+### 参数
+* **target_size** (int|list|tuple): padding后图像的大小。
+* **im_padding_value** (list): 图像padding的值。默认为[127.5, 127.5, 127.5]。
+* **label_padding_value** (int): 标注图像padding的值。默认值为255(仅在训练时需要设定该参数)。
+
+
+## RandomPaddingCrop类
+```python
+paddlex.seg.transforms.RandomPaddingCrop(crop_size=512, im_padding_value=[127.5, 127.5, 127.5], label_padding_value=255)
+```
+对图像和标注图进行随机裁剪,当所需要的裁剪尺寸大于原图时,则进行padding操作,模型训练时的数据增强操作。
+### 参数
+* **crop_size**(int|list|tuple): 裁剪图像大小。默认为512。
+* **im_padding_value** (list): 图像padding的值。默认为[127.5, 127.5, 127.5]。
+* **label_padding_value** (int): 标注图像padding的值。默认值为255。
+
+
+## RandomBlur类
+```python
+paddlex.seg.transforms.RandomBlur(prob=0.1)
+```
+以一定的概率对图像进行高斯模糊,模型训练时的数据增强操作。
+### 参数
+* **prob** (float): 图像模糊概率。默认为0.1。
+
+
+## RandomRotation类
+```python
+paddlex.seg.transforms.RandomRotate(rotate_range=15, im_padding_value=[127.5, 127.5, 127.5], label_padding_value=255)
+```
+对图像进行随机旋转, 模型训练时的数据增强操作。
+
+在旋转区间[-rotate_range, rotate_range]内,对图像进行随机旋转,当存在标注图像时,同步进行,
+并对旋转后的图像和标注图像进行相应的padding。
+### 参数
+* **rotate_range** (float): 最大旋转角度。默认为15度。
+* **im_padding_value** (list): 图像padding的值。默认为[127.5, 127.5, 127.5]。
+* **label_padding_value** (int): 标注图像padding的值。默认为255。
+
+
+## RandomScaleAspect类
+```python
+paddlex.seg.transforms.RandomScaleAspect(min_scale=0.5, aspect_ratio=0.33)
+```
+裁剪并resize回原始尺寸的图像和标注图像,模型训练时的数据增强操作。
+
+按照一定的面积比和宽高比对图像进行裁剪,并reszie回原始图像的图像,当存在标注图时,同步进行。
+### 参数
+* **min_scale** (float):裁取图像占原始图像的面积比,取值[0,1],为0时则返回原图。默认为0.5。
+* **aspect_ratio** (float): 裁取图像的宽高比范围,非负值,为0时返回原图。默认为0.33。
+
+
+## RandomDistort类
+```python
+paddlex.seg.transforms.RandomDistort(brightness_range=0.5, brightness_prob=0.5, contrast_range=0.5, contrast_prob=0.5, saturation_range=0.5, saturation_prob=0.5, hue_range=18, hue_prob=0.5)
+```
+以一定的概率对图像进行随机像素内容变换,模型训练时的数据增强操作。
+
+1.对变换的操作顺序进行随机化操作。
+2.按照1中的顺序以一定的概率对图像在范围[-range, range]内进行随机像素内容变换。
+
+### 参数
+* **brightness_range** (float): 明亮度因子的范围。默认为0.5。
+* **brightness_prob** (float): 随机调整明亮度的概率。默认为0.5。
+* **contrast_range** (float): 对比度因子的范围。默认为0.5。
+* **contrast_prob** (float): 随机调整对比度的概率。默认为0.5。
+* **saturation_range** (float): 饱和度因子的范围。默认为0.5。
+* **saturation_prob** (float): 随机调整饱和度的概率。默认为0.5。
+* **hue_range** (int): 色调因子的范围。默认为18。
+* **hue_prob** (float): 随机调整色调的概率。默认为0.5。
diff --git a/docs/apis/visualize.md b/docs/apis/visualize.md
new file mode 100644
index 0000000000000000000000000000000000000000..6bea9e4d49cbe36dd78f1473b43080e0ad97f5aa
--- /dev/null
+++ b/docs/apis/visualize.md
@@ -0,0 +1,65 @@
+# 可视化-visualize
+PaddleX提供了一系列模型预测和结果分析的可视化函数。
+
+## 目标检测/实例分割预测结果可视化
+```
+paddlex.det.visualize(image, result, threshold=0.5, save_dir=None)
+```
+将目标检测/实例分割模型预测得到的Box框和Mask在原图上进行可视化
+
+### 参数
+> * **image** (str): 原图文件路径。
+> * **result** (str): 模型预测结果。
+> * **threshold**(float): score阈值,将Box置信度低于该阈值的框过滤不进行可视化。默认0.5
+> * **save_dir**(str): 可视化结果保存路径。若为None,则表示不保存,该函数将可视化的结果以np.ndarray的形式返回;若设为目录路径,则将可视化结果保存至该目录下
+
+### 使用示例
+> 点击下载如下示例中的[模型](https://bj.bcebos.com/paddlex/models/garbage_epoch_12.tar.gz)和[测试图片](https://bj.bcebos.com/paddlex/datasets/garbage.bmp)
+```
+import paddlex as pdx
+model = pdx.load_model('garbage_epoch_12')
+result = model.predict('garbage.bmp')
+pdx.det.visualize('garbage.bmp', result, save_dir='./')
+# 预测结果保存在./visualize_garbage.bmp
+```
+
+## 语义分割预测结果可视化
+```
+paddlex.seg.visualize(image, result, weight=0.6, save_dir=None)
+```
+将语义分割模型预测得到的Mask在原图上进行可视化
+
+### 参数
+> * **image** (str): 原图文件路径。
+> * **result** (str): 模型预测结果。
+> * **weight**(float): mask可视化结果与原图权重因子,weight表示原图的权重。默认0.6
+> * **save_dir**(str): 可视化结果保存路径。若为None,则表示不保存,该函数将可视化的结果以np.ndarray的形式返回;若设为目录路径,则将可视化结果保存至该目录下
+
+### 使用示例
+> 点击下载如下示例中的[模型](https://bj.bcebos.com/paddlex/models/cityscape_deeplab.tar.gz)和[测试图片](https://bj.bcebos.com/paddlex/datasets/city.png)
+```
+import paddlex as pdx
+model = pdx.load_model('cityscape_deeplab')
+result = model.predict('city.png')
+pdx.det.visualize('city.png', result, save_dir='./')
+# 预测结果保存在./visualize_city.png
+```
+
+## 模型裁剪比例可视化分析
+```
+paddlex.slim.visualize(model, sensitivities_file)
+```
+利用此接口,可以分析在不同的`eval_metric_loss`参数下,模型被裁剪的比例情况。可视化结果纵轴为eval_metric_loss参数值,横轴为对应的模型被裁剪的比例
+
+### 参数
+>* **model**: 使用PaddleX加载的模型
+>* **sensitivities_file**: 模型各参数在验证集上计算得到的参数敏感度信息文件
+
+### 使用示例
+> 点击下载示例中的[模型](https://bj.bcebos.com/paddlex/models/vegetables_mobilenet.tar.gz)和[sensitivities_file](https://bj.bcebos.com/paddlex/slim_prune/mobilenetv2.sensitivities)
+```
+import paddlex as pdx
+model = pdx.load_model('vegetables_mobilenet')
+pdx.slim.visualize(model, 'mobilenetv2.sensitivities', save_dir='./')
+# 可视化结果保存在./sensitivities.png
+```
diff --git a/docs/conf.py b/docs/conf.py
new file mode 100644
index 0000000000000000000000000000000000000000..5aa88d0ca24d9427484a478c96b253f54fdaf414
--- /dev/null
+++ b/docs/conf.py
@@ -0,0 +1,69 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+# import os
+# import sys
+# sys.path.insert(0, os.path.abspath('.'))
+
+import sphinx_rtd_theme
+html_theme = "sphinx_rtd_theme"
+html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
+
+# -- Project information -----------------------------------------------------
+
+project = 'PaddleX'
+copyright = '2020, paddlex@baidu.com'
+author = 'paddlex@baidu.com'
+
+# The full version, including alpha/beta/rc tags
+release = '0.1.0'
+
+from recommonmark.parser import CommonMarkParser
+source_parsers = {
+ '.md': CommonMarkParser,
+}
+source_suffix = ['.rst', '.md']
+
+# -- General configuration ---------------------------------------------------
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = ['sphinx_markdown_tables']
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#
+# This is also used if you do content translation via gettext catalogs.
+# Usually you set "language" from the command line for these cases.
+language = 'zh_CN'
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages. See the documentation for
+# a list of builtin themes.
+#
+#html_theme = 'alabaster'
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+html_logo = 'images/paddlex.png'
diff --git a/docs/convertor.md b/docs/convertor.md
new file mode 100644
index 0000000000000000000000000000000000000000..2f1b4eacb0a1d936b1e937b2837c9796c04ffdc6
--- /dev/null
+++ b/docs/convertor.md
@@ -0,0 +1,17 @@
+# 模型转换
+
+## 转ONNX模型
+PaddleX基于[Paddle2ONNX工具](https://github.com/PaddlePaddle/paddle2onnx),提供了便捷的API,支持用户将PaddleX训练保存的模型导出为ONNX模型。
+通过如下示例代码,用户即可将PaddleX训练好的MobileNetV2模型导出
+```
+import paddlex as pdx
+pdx.convertor.to_onnx(model_dir='paddle_mobilenet', save_dir='onnx_mobilenet')
+```
+
+## 转PaddleLite模型
+PaddleX可支持导出为[PaddleLite](https://github.com/PaddlePaddle/Paddle-Lite)支持的模型格式,用于支持用户将模型部署更多硬件设备。
+通过如下示例代码,用户即可将PaddleX训练好的MobileNetV2模型导出
+```
+import paddlex as pdx
+pdx.convertor.to_lite(model_dir='paddle_mobilenet', save_dir='lite_mobilnet', terminal='arm')
+```
diff --git a/docs/datasets.md b/docs/datasets.md
new file mode 100644
index 0000000000000000000000000000000000000000..3eb82c28a903927f56139b3bb8069f36b5cea1cd
--- /dev/null
+++ b/docs/datasets.md
@@ -0,0 +1,203 @@
+# 数据集格式说明
+
+---
+## 图像分类ImageNet
+
+图像分类ImageNet数据集包含对应多个标签的图像文件夹、标签文件及图像列表文件。
+参考数据文件结构如下:
+```
+./dataset/ # 数据集根目录
+|--labelA # 标签为labelA的图像目录
+| |--a1.jpg
+| |--...
+| └--...
+|
+|--...
+|
+|--labelZ # 标签为labelZ的图像目录
+| |--z1.jpg
+| |--...
+| └--...
+|
+|--train_list.txt # 训练文件列表文件
+|
+|--val_list.txt # 验证文件列表文件
+|
+└--labels.txt # 标签列表文件
+
+```
+其中,相应的文件名可根据需要自行定义。
+
+`train_list.txt`和`val_list.txt`文本以空格为分割符分为两列,第一列为图像文件相对于dataset的相对路径,第二列为图像文件对应的标签id(从0开始)。如下所示:
+```
+labelA/a1.jpg 0
+labelZ/z1.jpg 25
+...
+```
+
+`labels.txt`: 每一行为一个单独的类别,相应的行号即为类别对应的id(行号从0开始),如下所示:
+```
+labelA
+labelB
+...
+```
+[点击这里](https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz),下载蔬菜分类分类数据集
+在PaddleX中,使用`paddlex.cv.datasets.ImageNet`([API说明](./apis/datasets.html#imagenet))加载分类数据集
+
+## 目标检测VOC
+目标检测VOC数据集包含图像文件夹、标注信息文件夹、标签文件及图像列表文件。
+参考数据文件结构如下:
+```
+./dataset/ # 数据集根目录
+|--JPEGImages # 图像目录
+| |--xxx1.jpg
+| |--...
+| └--...
+|
+|--Annotations # 标注信息目录
+| |--xxx1.xml
+| |--...
+| └--...
+|
+|--train_list.txt # 训练文件列表文件
+|
+|--val_list.txt # 验证文件列表文件
+|
+└--labels.txt # 标签列表文件
+
+```
+其中,相应的文件名可根据需要自行定义。
+
+`train_list.txt`和`val_list.txt`文本以空格为分割符分为两列,第一列为图像文件相对于dataset的相对路径,第二列为标注文件相对于dataset的相对路径。如下所示:
+```
+JPEGImages/xxx1.jpg Annotations/xxx1.xml
+JPEGImages/xxx2.jpg Annotations/xxx2.xml
+...
+```
+
+`labels.txt`: 每一行为一个单独的类别,相应的行号即为类别对应的id(行号从0开始),如下所示:
+```
+labelA
+labelB
+...
+```
+[点击这里](https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz),下载昆虫检测数据集
+在PaddleX中,使用`paddlex.cv.datasets.VOCDetection`([API说明](./apis/datasets.html#vocdetection))加载目标检测VOC数据集
+
+## 目标检测和实例分割COCO
+目标检测和实例分割COCO数据集包含图像文件夹及图像标注信息文件。
+参考数据文件结构如下:
+```
+./dataset/ # 数据集根目录
+|--JPEGImages # 图像目录
+| |--xxx1.jpg
+| |--...
+| └--...
+|
+|--train.json # 训练相关信息文件
+|
+└--val.json # 验证相关信息文件
+
+```
+其中,相应的文件名可根据需要自行定义。
+
+`train.json`和`val.json`存储与标注信息、图像文件相关的信息。如下所示:
+
+```
+{
+ "annotations": [
+ {
+ "iscrowd": 0,
+ "category_id": 1,
+ "id": 1,
+ "area": 33672.0,
+ "image_id": 1,
+ "bbox": [232, 32, 138, 244],
+ "segmentation": [[32, 168, 365, 117, ...]]
+ },
+ ...
+ ],
+ "images": [
+ {
+ "file_name": "xxx1.jpg",
+ "height": 512,
+ "id": 267,
+ "width": 612
+ },
+ ...
+ ]
+ "categories": [
+ {
+ "name": "labelA",
+ "id": 1,
+ "supercategory": "component"
+ }
+ ]
+}
+```
+每个字段的含义如下所示:
+
+| 域名 | 字段名 | 含义 | 数据类型 | 备注 |
+|:-----|:--------|:------------|------|:-----|
+| annotations | id | 标注信息id | int | 从1开始 |
+| annotations | iscrowd | 标注框是否为一组对象 | int | 只有0、1两种取值 |
+| annotations | category_id | 标注框类别id | int | |
+| annotations | area | 标注框的面积 | float | |
+| annotations | image_id | 当前标注信息所在图像的id | int | |
+| annotations | bbox | 标注框坐标 | list | 长度为4,分别代表x,y,w,h |
+| annotations | segmentation | 标注区域坐标 | list | list中有至少1个list,每个list由每个小区域坐标点的横纵坐标(x,y)组成 |
+| images | id | 图像id | int | 从1开始 |
+| images | file_name | 图像文件名 | str | |
+| images | height | 图像高度 | int | |
+| images | width | 图像宽度 | int | |
+| categories | id | 类别id | int | 从1开始 |
+| categories | name | 类别标签名 | str | |
+| categories | supercategory | 类别父类的标签名 | str | |
+
+
+[点击这里](https://bj.bcebos.com/paddlex/datasets/garbage_ins_det.tar.gz),下载垃圾实例分割数据集
+在PaddleX中,使用`paddlex.cv.datasets.COCODetection`([API说明](./apis/datasets.html#cocodetection))加载COCO格式数据集
+
+## 语义分割数据
+语义分割数据集包含原图、标注图及相应的文件列表文件。
+参考数据文件结构如下:
+```
+./dataset/ # 数据集根目录
+|--images # 原图目录
+| |--xxx1.png
+| |--...
+| └--...
+|
+|--annotations # 标注图目录
+| |--xxx1.png
+| |--...
+| └--...
+|
+|--train_list.txt # 训练文件列表文件
+|
+|--val_list.txt # 验证文件列表文件
+|
+└--labels.txt # 标签列表
+
+```
+其中,相应的文件名可根据需要自行定义。
+
+`train_list.txt`和`val_list.txt`文本以空格为分割符分为两列,第一列为图像文件相对于dataset的相对路径,第二列为标注图像文件相对于dataset的相对路径。如下所示:
+```
+images/xxx1.png annotations/xxx1.png
+images/xxx2.png annotations/xxx2.png
+...
+```
+
+`labels.txt`: 每一行为一个单独的类别,相应的行号即为类别对应的id(行号从0开始),如下所示:
+```
+labelA
+labelB
+...
+```
+
+标注图像为单通道图像,像素值即为对应的类别,像素标注类别需要从0开始递增,
+例如0,1,2,3表示有4种类别,标注类别最多为256类。其中可以指定特定的像素值用于表示该值的像素不参与训练和评估(默认为255)。
+
+[点击这里](https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz),下载视盘语义分割数据集
+在PaddleX中,使用`paddlex.cv.datasets.SegReader`([API说明](./apis/datasets.html#segreader))加载语义分割数据集
diff --git a/docs/deploy.md b/docs/deploy.md
new file mode 100644
index 0000000000000000000000000000000000000000..8381d3c34149934c901c1e002efb94f62398a171
--- /dev/null
+++ b/docs/deploy.md
@@ -0,0 +1,56 @@
+# 模型预测部署
+
+本文档指引用户如何采用更高性能地方式来部署使用PaddleX训练的模型。使用本文档模型部署方式,会在模型运算过程中,对模型计算图进行优化,同时减少内存操作,相对比普通的paddlepaddle模型加载和预测方式,预测速度平均可提升1倍,具体各模型性能对比见[预测性能对比](#预测性能对比)
+
+## 服务端部署
+
+### 导出inference模型
+
+在服务端部署的模型需要首先将模型导出为inference格式模型,导出的模型将包括`__model__`、`__params__`和`model.yml`三个文名,分别为模型的网络结构,模型权重和模型的配置文件(包括数据预处理参数等等)。在安装完PaddleX后,在命令行终端使用如下命令导出模型到当前目录`inferece_model`下。
+
+> 可直接下载垃圾检测模型测试本文档的流程[garbage_epoch_12.tar.gz](https://bj.bcebos.com/paddlex/models/garbage_epoch_12.tar.gz)
+
+```
+paddlex --export_inference --model_dir=./garbage_epoch_12 --save_dir=./inference_model
+```
+
+### Python部署
+PaddleX已经集成了基于Python的高性能预测接口,在安装PaddleX后,可参照如下代码示例,进行预测。相关的接口文档可参考[paddlex.deploy](apis/deploy.md)
+> 点击下载测试图片 [garbage.bmp](https://bj.bcebos.com/paddlex/datasets/garbage.bmp)
+```
+import paddlex as pdx
+predictorpdx.deploy.create_predictor('./inference_model')
+result = predictor.predict(image='garbage.bmp')
+```
+
+### C++部署
+
+> C++部署方案正在整理中,即将开源...
+
+### 预测性能对比
+
+#### 测试环境
+
+- CUDA 9.0
+- CUDNN 7.5
+- PaddlePaddle 1.71
+- GPU: Tesla P40
+- AnalysisPredictor 指采用Python的高性能预测方式
+- Executor 指采用paddlepaddle普通的python预测方式
+- Batch Size均为1,耗时单位为ms/image,只计算模型运行时间,不包括数据的预处理和后处理
+
+| 模型 | AnalysisPredictor耗时 | Executor耗时 | 输入图像大小 |
+| :---- | :--------------------- | :------------ | :------------ |
+| resnet50 | 4.84 | 7.57 | 224*224 |
+| mobilenet_v2 | 3.27 | 5.76 | 224*224 |
+| unet | 22.51 | 34.60 |513*513 |
+| deeplab_mobile | 63.44 | 358.31 |1025*2049 |
+| yolo_mobilenetv2 | 15.20 | 19.54 | 608*608 |
+| faster_rcnn_r50_fpn_1x | 50.05 | 69.58 |800*1088 |
+| faster_rcnn_r50_1x | 326.11 | 347.22 | 800*1067 |
+| mask_rcnn_r50_fpn_1x | 67.49 | 91.02 | 800*1088 |
+| mask_rcnn_r50_1x | 326.11 | 350.94 | 800*1067 |
+
+## 移动端部署
+
+> Lite模型导出正在集成中,即将开源...
diff --git a/docs/gpu_configure.md b/docs/gpu_configure.md
new file mode 100644
index 0000000000000000000000000000000000000000..cf72fe2ecf54082d15dfc1bdc13e137c8daa5da9
--- /dev/null
+++ b/docs/gpu_configure.md
@@ -0,0 +1,70 @@
+# 多卡GPU/CPU训练
+
+## GPU卡数配置
+PaddleX在训练过程中会优先选择**当前所有可用的GPU卡进行训练**,在评估时**分类和分割任务仍使用多张卡**而**检测任务只使用1张卡**进行计算,在预测时各任务**则只会使用1张卡进行计算**。
+
+用户如想配置PaddleX在运行时使用的卡的数量,可在命令行终端(Shell)或Python代码中按如下方式配置:
+
+命令行终端:
+```
+# 使用1号GPU卡
+export CUDA_VISIBLE_DEVICES='1'
+# 使用0, 1, 3号GPU卡
+export CUDA_VISIBLE_DEVICES='0,1,3'
+# 不使用GPU,仅使用CPU
+export CUDA_VISIBLE_DEVICES=''
+```
+
+python代码:
+```
+# 注意:须要在第一次import paddlex或paddle前执行如下语句
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0,1,3'
+import paddlex as pdx
+```
+
+## 使用多个GPU卡训练
+
+目前PaddlePaddle支持在Linux下使用多卡训练,Windows只支持单卡,在命令行终端输入`nvidia-smi`可以查看自己机器的GPU卡信息,如若提示命令未找到,则用户需要自行安装CUDA驱动。
+
+PaddleX在多卡GPU下训练时,无需额外的配置,用户按照上文的方式,通过`CUDA_VISIBLE_DEVICES`环境变量配置所需要使用的卡的数量即可。
+
+需要注意的是,在训练代码中,可根据卡的数量,调高`batch_size`和`learning_rate`等参数,GPU卡数量越多,则可以支持更高的`batch_size`(注意batch_size需能被卡的数量整除), 同时更高的`batch_size`也意味着学习率`learning_rate`也要对应上调。同理,在训练过程中,如若因显存或内存不够导致训练失败,用户也需自行调低`batch_size`,并且按比例调低学习率。
+
+## CPU配置
+PaddleX在训练过程中可以选择使用CPU进行训练、评估和预测。通过以下方式进行配置:
+
+命令行终端:
+```
+export CUDA_VISIBLE_DEVICES=""
+```
+
+python代码:
+```
+# 注意:须要在第一次import paddlex或paddle前执行如下语句
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = ''
+import paddlex as pdx
+```
+此时使用的CPU个数为1。
+
+## 使用多个CPU训练
+通过设置环境变量`CPU_NUM`可以改变CPU个数,如果未设置,则CPU数目默认设为1,即`CPU_NUM`=1。 在物理核心数量范围内,该参数的配置可以加速模型。
+
+PaddleX在训练过程中会选择`CPU_NUM`个CPU进行训练,在评估时分类和分割任务仍使用`CPU_NUM`个CPU,而检测任务只使用1个CPU进行计算,在预测时各任务则只会使用1个CPU进行计算。
+通过以下方式可设置CPU的个数:
+
+命令行终端:
+```
+export CUDA_VISIBLE_DEVICES=""
+export CPU_NUM=2
+```
+
+python代码:
+```
+# 注意:须要在第一次import paddlex或paddle前执行如下语句
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = ''
+os.environ['CPU_NUM'] = '2'
+import paddlex as pdx
+```
diff --git a/docs/images/PaddleX-Pipe-Line.png b/docs/images/PaddleX-Pipe-Line.png
new file mode 100644
index 0000000000000000000000000000000000000000..7831d256a7159d465a8cfd4977430639b30b9829
Binary files /dev/null and b/docs/images/PaddleX-Pipe-Line.png differ
diff --git a/docs/images/cls_eval.png b/docs/images/cls_eval.png
new file mode 100644
index 0000000000000000000000000000000000000000..625166920f32e5fe4ef8c4f3ddcc7d35259ba021
Binary files /dev/null and b/docs/images/cls_eval.png differ
diff --git a/docs/images/cls_train.png b/docs/images/cls_train.png
new file mode 100644
index 0000000000000000000000000000000000000000..b6ccd88dc7365b1f98e985bf34c42d76a2ede958
Binary files /dev/null and b/docs/images/cls_train.png differ
diff --git a/docs/images/faster_eval.png b/docs/images/faster_eval.png
new file mode 100644
index 0000000000000000000000000000000000000000..31f59266e24fd51c9462d2b1f2e1cfa8e4af4fcf
Binary files /dev/null and b/docs/images/faster_eval.png differ
diff --git a/docs/images/faster_train.png b/docs/images/faster_train.png
new file mode 100644
index 0000000000000000000000000000000000000000..611c56dabb445bff9681eb48aac3ed407d791eb0
Binary files /dev/null and b/docs/images/faster_train.png differ
diff --git a/docs/images/garbage.bmp b/docs/images/garbage.bmp
new file mode 100644
index 0000000000000000000000000000000000000000..ba652f41584ed43abf0431208c6de597a4cacd7b
Binary files /dev/null and b/docs/images/garbage.bmp differ
diff --git a/docs/images/mask_eval.png b/docs/images/mask_eval.png
new file mode 100644
index 0000000000000000000000000000000000000000..6365f081de680dec735c3c30e03ab468e88315a7
Binary files /dev/null and b/docs/images/mask_eval.png differ
diff --git a/docs/images/mask_train.png b/docs/images/mask_train.png
new file mode 100644
index 0000000000000000000000000000000000000000..f43bf8cf9bd5e43c157dfd59ee1f382e2fd41fa0
Binary files /dev/null and b/docs/images/mask_train.png differ
diff --git a/images/paddlexlogo.png b/docs/images/paddlex.png
similarity index 100%
rename from images/paddlexlogo.png
rename to docs/images/paddlex.png
diff --git a/docs/images/seg_eval.png b/docs/images/seg_eval.png
new file mode 100644
index 0000000000000000000000000000000000000000..978b1ce3ecea799eb5fa81fe60f68ed2f4526838
Binary files /dev/null and b/docs/images/seg_eval.png differ
diff --git a/docs/images/seg_train.png b/docs/images/seg_train.png
new file mode 100644
index 0000000000000000000000000000000000000000..9490a9022b07d6da62a333362238fb8b86d64aa4
Binary files /dev/null and b/docs/images/seg_train.png differ
diff --git a/docs/images/vdl1.jpg b/docs/images/vdl1.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..57b189830edc336b169c6b19c054ae3d9c62e385
Binary files /dev/null and b/docs/images/vdl1.jpg differ
diff --git a/docs/images/vdl2.jpg b/docs/images/vdl2.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..f8b2911a9b6549e83bb8b7087e5e2ec920bab549
Binary files /dev/null and b/docs/images/vdl2.jpg differ
diff --git a/docs/images/vdl3.jpg b/docs/images/vdl3.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..4eb585681b13def5b0428f3f2231833317acd8bf
Binary files /dev/null and b/docs/images/vdl3.jpg differ
diff --git a/docs/images/visualized_deeplab.jpg b/docs/images/visualized_deeplab.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b417728e3385f6eb83885f388c988e2893966e42
Binary files /dev/null and b/docs/images/visualized_deeplab.jpg differ
diff --git a/docs/images/visualized_fasterrcnn.jpg b/docs/images/visualized_fasterrcnn.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..fe1ca0ed6346ef3e62206527ce23577554596e4c
Binary files /dev/null and b/docs/images/visualized_fasterrcnn.jpg differ
diff --git a/docs/images/visualized_maskrcnn.bmp b/docs/images/visualized_maskrcnn.bmp
new file mode 100644
index 0000000000000000000000000000000000000000..b06ed68668770715833dbade07ae5873dcf491af
Binary files /dev/null and b/docs/images/visualized_maskrcnn.bmp differ
diff --git a/docs/images/voc_eval.png b/docs/images/voc_eval.png
new file mode 100644
index 0000000000000000000000000000000000000000..f99ded08a637986d0256e85009eb9568912b2811
Binary files /dev/null and b/docs/images/voc_eval.png differ
diff --git a/docs/images/yolo_train.png b/docs/images/yolo_train.png
new file mode 100644
index 0000000000000000000000000000000000000000..4ab97aca5be9c8b9f0d03cc5f1c4dd8d265d69a7
Binary files /dev/null and b/docs/images/yolo_train.png differ
diff --git a/docs/index.rst b/docs/index.rst
new file mode 100644
index 0000000000000000000000000000000000000000..9caf2f74ae245604449fef9b778039533978a2a6
--- /dev/null
+++ b/docs/index.rst
@@ -0,0 +1,28 @@
+欢迎使用PaddleX!
+=======================================
+
+PaddleX是基于飞桨技术生态的深度学习全流程开发工具。具备易集成,易使用,全流程等特点。PaddleX作为深度学习开发工具,不仅提供了开源的内核代码,可供用户灵活使用或集成,同时也提供了配套的前端可视化客户端套件,让用户以可视化地方式进行模型开发,相关细节可查阅PaddleX官网。
+
+本文档为PaddleX内核代码使用手册
+
+.. toctree::
+ :maxdepth: 1
+ :caption: 目录:
+
+ quick_start.md
+ install.md
+ model_zoo.md
+ slim/index
+ apis/index
+ datasets.md
+ gpu_configure.md
+ tutorials/index.rst
+ metrics.md
+ FAQ.md
+
+* PaddleX版本: v0.1.0
+* 项目官网: http://www.paddlepaddle.org.cn/paddlex
+* 项目GitHub: https://github.com/PaddlePaddle/PaddleX/tree/develop
+* 官方QQ用户群: 1045148026
+* GitHub Issue反馈: http://www.github.com/PaddlePaddle/PaddleX/issues
+
diff --git a/docs/install.md b/docs/install.md
new file mode 100644
index 0000000000000000000000000000000000000000..3feb63e70a2ddebcdd13eeaf8a422d87fcfe3078
--- /dev/null
+++ b/docs/install.md
@@ -0,0 +1,34 @@
+# 安装
+
+> 以下安装过程默认用户已安装好Anaconda和CUDA 10.1(有GPU卡的情况下), Anaconda的安装可参考其官网https://www.anaconda.com/
+
+## Linux/Mac安装
+```
+# 使用conda创建虚拟环境
+conda create -n paddlex python=3.7
+conda activate paddlex
+
+# 安装paddlepaddle
+# cpu版: pip install paddlepaddle
+pip install paddlepaddle-gpu
+
+# 安装PaddleX
+pip install paddlex
+```
+
+## Windows安装
+```
+# 使用conda创建虚拟环境
+conda create -n paddlex python=3.7
+conda activate paddlex
+
+# 安装paddlepaddle
+# cpu版: pip install paddlepaddle
+pip install paddlepaddle-gpu
+
+# 安装pycocotools
+pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI
+
+# 安装PaddleX
+pip install paddlex
+```
diff --git a/docs/make.bat b/docs/make.bat
new file mode 100644
index 0000000000000000000000000000000000000000..2119f51099bf37e4fdb6071dce9f451ea44c62dd
--- /dev/null
+++ b/docs/make.bat
@@ -0,0 +1,35 @@
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+ set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=.
+set BUILDDIR=_build
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+ echo.
+ echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+ echo.installed, then set the SPHINXBUILD environment variable to point
+ echo.to the full path of the 'sphinx-build' executable. Alternatively you
+ echo.may add the Sphinx directory to PATH.
+ echo.
+ echo.If you don't have Sphinx installed, grab it from
+ echo.http://sphinx-doc.org/
+ exit /b 1
+)
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+
+:end
+popd
diff --git a/docs/metrics.md b/docs/metrics.md
new file mode 100644
index 0000000000000000000000000000000000000000..eb27ebd77709ba2bb90203ab8bbe4736f6243b83
--- /dev/null
+++ b/docs/metrics.md
@@ -0,0 +1,145 @@
+# 指标及日志含义
+
+PaddleX在模型训练、评估过程中,都会有相应的日志和指标反馈,本文档用于说明这些日志和指标的含义。
+
+## 训练通用统计信息
+
+PaddleX所有模型在训练过程中,输出的日志信息都包含了6个通用的统计信息,用于辅助用户进行模型训练,例如**分割模型**的训练日志,如下图所示。
+
+
+
+各字段含义如下:
+
+| 字段 | 字段值示例 | 含义 |
+| -------------- | -------------------- | ------------------------------------------------------------ |
+| Epoch | Epoch=4/20 | [迭代轮数]所有训练数据会被训练20轮,当前处于第4轮 |
+| Step | Step=62/66 | [迭代步数]所有训练数据被训练一轮所需要的迭代步数为66,当前处于第62步 |
+| loss | loss=0.007226 | [损失函数值]参与当前迭代步数的训练样本的平均损失函数值loss,loss值越低,表明模型在训练集上拟合的效果越好(如上日志中第1行表示第4个epoch的第62个Batch的loss值为0.007226) |
+| lr | lr=0.008215 | [学习率]当前模型迭代过程中的学习率 |
+| time_each_step | time_each_step=0.41s | [每步迭代时间]训练过程计算得到的每步迭代平均用时 |
+| eta | eta=0:9:44 | [剩余时间]模型训练完成所需剩余时间预估为0小时9分钟44秒 |
+| | | |
+
+不同模型的日志中除了上述通用字段外,还有其它字段,这些字段含义可见文档后面对各任务模型的描述。
+
+## 评估通用统计信息
+
+PaddleX所有模型在训练过程中会根据用户设定的`save_interval_epochs`参数,每间隔一定轮数进行评估和保存。例如**分类模型**的评估日志,如下图所示。
+
+
+
+上图中第1行表明验证数据集中样本数为240,需要迭代8步才能评估完所有验证数据;第5行用于表明第2轮的模型已经完成保存操作;第6行则表明当前保存的模型中,第2轮的模型在验证集上指标最优(分类任务看`acc1`,此时`acc1`值为0.258333),最优模型会保存在`best_model`目录中。
+
+## 分类特有统计信息
+
+### 训练日志字段
+
+分类任务的训练日志除了通用统计信息外,还包括`acc1`和`acc5`两个特有字段。
+
+> 注: acck准确率是针对一张图片进行计算的:把模型在各个类别上的预测得分按从高往低进行排序,取出前k个预测类别,若这k个预测类别包含了真值类,则认为该图片分类正确。
+
+
+
+
+上图中第1行中的`acc1`表示参与当前迭代步数的训练样本的平均top1准确率,值越高代表模型越优;`acc5`表示参与当前迭代步数的训练样本的平均top5(若类别数n少于5,则为topn)准确率,值越高代表模型越优。第4行中的`loss`表示整个训练集的平均损失函数值,`acc1`表示整个训练集的平均top1准确率,`acc5`表示整个训练集的平均top5准确率。
+
+
+### 评估日志字段
+
+
+
+上图中第3行中的`acc1`表示整个验证集的平均top1准确率,`acc5`表示整个验证集的平均top5准确率。
+
+
+## 检测特有统计信息
+
+### 训练日志字段
+
+#### YOLOv3
+
+YOLOv3的训练日志只包括训练通用统计信息(见[训练通用统计信息](http://yq01-qianmo-com-255-134-17.yq01.baidu.com:8199/metrics.html#id2))。
+
+
+
+上图中第5行`loss`表示整个训练集的平均损失函数loss值。
+
+#### FasterRCNN
+
+FasterRCNN的训练日志除了通用统计信息外,还包括`loss_cls`、`loss_bbox`、`loss_rpn_cls`和`loss_rpn_bbox`,这些字段的含义如下:
+
+| 字段 | 含义 |
+| -------------- | --------------------------------------------- |
+| loss_cls | RCNN子网络中分类损失函数值 |
+| loss_bbox | RCNN子网络中检测框回归损失函数值 |
+| loss_rpn_cls | RPN子网络中分类损失函数值 |
+| loss_rpn_bbox | RPN子网络中检测框回归损失函数值 |
+| loss | 所有子网络损失函数值之和 |
+
+
+
+上图中第1行`loss`, `loss_cls`、`loss_bbox`、`loss_rpn_clss`、`loss_rpn_bbox`都是参与当前迭代步数的训练样本的损失值,而第7行是针整个训练集的损失函数值。
+
+#### MaskRCNN
+
+MaskRCNN的训练日志除了通用统计信息外,还包括`loss_cls`、`loss_bbox`、`loss_mask`、`loss_rpn_cls`和`loss_rpn_bbox`,这些字段的含义如下:
+
+
+| 字段 | 含义 |
+| -------------- | --------------------------------------------- |
+| loss_cls | RCNN子网络中分类损失函数值 |
+| loss_bbox | RCNN子网络中检测框回归损失函数值 |
+| loss_mask | RCNN子网络中Mask回归损失函数值 |
+| loss_rpn_cls | RPN子网络中分类损失函数值 |
+| loss_rpn_bbox | RPN子网络中检测框回归损失函数值 |
+| loss | 所有子网络损失函数值之和 |
+
+
+
+上图中第1行`loss`, `loss_cls`、`loss_bbox`、`loss_mask`、`loss_rpn_clss`、`loss_rpn_bbox`都是参与当前迭代步数的训练样本的损失值,而第7行是针整个训练集的损失函数值。
+
+### 评估日志字段
+
+检测可以使用两种评估标准:VOC评估标准和COCO评估标准。
+
+#### VOC评估标准
+
+
+
+> 注:`map`为平均准确率的平均值,即IoU(Intersection Over Union)取0.5时各个类别的准确率-召回率曲线下面积的平均值。
+
+上图中第3行`bbox_map`表示检测任务中整个验证集的平均准确率平均值。
+
+#### COCO评估标准
+
+> 注:COCO评估指标可参见[COCO官网解释](http://cocodataset.org/#detection-eval)。PaddleX主要反馈`mmAP`,即AP at IoU=.50:.05:.95这项指标,为在各个IoU阈值下平均准确率平均值(mAP)的平均值。
+
+COCO格式的数据集不仅可以用于训练目标检测模型,也可以用于训练实例分割模型。在目标检测中,PaddleX主要反馈针对检测框的`bbox_mmAP`指标;在实例分割中,还包括针对Mask的`seg_mmAP`指标。如下所示,第一张日志截图为目标检测的评估结果,第二张日志截图为实例分割的评估结果。
+
+
+
+上图中红框标注的`bbox_mmap`表示整个验证集的检测框平均准确率平均值。
+
+
+上图中红框标注的`bbox_mmap`和`seg_mmap`分别表示整个验证集的检测框平均准确率平均值、Mask平均准确率平均值。
+
+## 分割特有统计信息
+
+### 训练日志字段
+
+语义分割的训练日志只包括训练通用统计信息(见[训练通用统计信息](http://yq01-qianmo-com-255-134-17.yq01.baidu.com:8199/metrics.html#id2))。
+
+
+
+### 评估日志字段
+
+语义分割的评估日志包括了`miou`、`category_iou`、`macc`、`category_acc`、`kappa`,这些字段的含义如下:
+
+| 字段 | 含义 |
+| -------------- | --------------------------------------------- |
+| miou | 各类IoU(Intersection Over Union)的平均值 |
+| category_iou | 各类别的IoU |
+| macc | 平均准确率,即预测正确的像素数/总像素数 |
+| category_acc | 各类别的准确率,即各类别预测正确的像素数/预测为该类别的总像素数 |
+| kappa | kappa系数,用于一致性检验 |
+
+
diff --git a/docs/model_zoo.md b/docs/model_zoo.md
new file mode 100644
index 0000000000000000000000000000000000000000..a4e81b562169b253de0842248ca5b31a05b30d71
--- /dev/null
+++ b/docs/model_zoo.md
@@ -0,0 +1,69 @@
+# 模型库
+本文档梳理了PaddleX v0.1.0支持的模型,同时也提供了在各个数据集上的预训练模型和对应验证集上的指标。用户也可自行下载对应的代码,在安装PaddleX后,即可使用相应代码训练模型。
+
+表中相关模型也可下载好作为相应模型的预训练模型,通过`pretrain_weights`指定目录加载使用。
+
+## 图像分类模型
+> 表中模型相关指标均为在ImageNet数据集上使用PaddlePaddle Python预测接口测试得到(测试GPU型号为Nvidia Tesla P4),预测速度为每张图片预测用时(不包括预处理和后处理),表中符号`-`表示相关指标暂未测试。
+
+
+| 模型 | 模型大小 | 预测速度(毫秒) | Top1准确率 | Top5准确率 |
+| :----| :------- | :----------- | :--------- | :--------- |
+| ResNet18| 46.9MB | 3.456 | 70.98% | 89.92% |
+| ResNet34| 87.5MB | 5.668 | 74.57% | 92.14% |
+| ResNet50| 102.7MB | 8.787 | 76.50% | 93.00% |
+| ResNet101 |179.1MB | 15.447 | 77.56% | 93.64% |
+| ResNet50_vd |102.8MB | 9.058 | 79.12% | 94.44% |
+| ResNet101_vd| 179.2MB | 15.685 | 80.17% | 94.97% |
+| DarkNet53|166.9MB | 11.969 | 78.04% | 94.05% |
+| MobileNetV1 | 16.4MB | 2.609 | 70.99% | 89.68% |
+| MobileNetV2 | 14.4MB | 4.546 | 72.15% | 90.65% |
+| MobileNetV3_large| 22.8MB | - | 75.3% | 75.3% |
+| MobileNetV3_small | 12.5MB | 6.809 | 67.46% | 87.12% |
+| Xception41 |92.4MB | 13.757 | 79.30% | 94.53% |
+| Xception65 | 144.6MB | 19.216 | 81.00% | 95.49% |
+| Xception71| 151.9MB | 23.291 | 81.11% | 95.45% |
+| DenseNet121 | 32.8MB | 12.437 | 75.66% | 92.58% |
+| DenseNet161|116.3MB | 27.717 | 78.57% | 94.14% |
+| DenseNet201| 84.6MB | 26.583 | 77.63% | 93.66% |
+| ShuffleNetV2 | 10.2MB | 6.101 | 68.8% | 88.5% |
+
+## 目标检测模型
+
+> 表中模型相关指标均为在MSCOCO数据集上使用PaddlePaddle Python预测接口测试得到(测试GPU型号为Nvidia Tesla V100测试得到,表中符号`-`表示相关指标暂未测试。
+
+| 模型 | 模型大小 | 预测时间(毫秒) | BoxAP |
+|:-------|:-----------|:-------------|:----------|
+|FasterRCNN-ResNet50|135.6MB| 78.450 | 35.2 |
+|FasterRCNN-ResNet50_vd| 135.7MB | 79.523 | 36.4 |
+|FasterRCNN-ResNet101| 211.7MB | 107.342 | 38.3 |
+|FasterRCNN-ResNet50-FPN| 167.2MB | 44.897 | 37.2 |
+|FasterRCNN-ResNet50_vd-FPN|168.7MB | 45.773 | 38.9 |
+|FasterRCNN-ResNet101-FPN| 251.7MB | 55.782 | 38.7 |
+|FasterRCNN-ResNet101_vd-FPN |252MB | 58.785 | 40.5 |
+|YOLOv3-DarkNet53|252.4MB | 21.944 | 38.9 |
+|YOLOv3-MobileNetv1 |101.2MB | 12.771 | 29.3 |
+|YOLOv3-MobileNetv3|94.6MB | - | 31.6 |
+| YOLOv3-ResNet34|169.7MB | 15.784 | 36.2 |
+
+## 实例分割模型
+
+> 表中模型相关指标均为在MSCOCO数据集上测试得到。
+
+| 模型 |模型大小 | 预测时间(毫秒) | BoxAP | SegAP |
+|:---------|:---------|:----------|:---------|:--------|
+|MaskRCNN-ResNet50|51.2MB| 86.096 | 36.5 |32.2|
+|MaskRCNN-ResNet50-FPN|184.6MB | 65.859 | 37.9 |34.2|
+|MaskRCNN-ResNet50_vd-FPN |185.5MB | 63.191 | 39.8 |35.4|
+|MaskRCNN-ResNet101-FPN|268.6MB | 77.024 | 39.5 |35.2|
+|MaskRCNN-ResNet101vd-FPN |268.6MB | 76.307 | 41.4 |36.8|
+
+## 语义分割模型
+
+> 表中符号`-`表示相关指标暂未测试。
+
+| 模型|数据集 | 模型大小 | 预测速度 | mIOU |
+|:--------|:----------|:----------|:----------|:----------|
+| UNet| | COCO | 53.7M | - |
+| DeepLabv3+/Xception65| Cityscapes | 165.1M | | 0.7930 |
+| DeepLabv3+/MobileNetV2 | Cityscapes | 7.4M | | 0.6981 |
diff --git a/docs/quick_start.md b/docs/quick_start.md
new file mode 100644
index 0000000000000000000000000000000000000000..dfbfe12ddedc31221fa54d62171a410b223c7385
--- /dev/null
+++ b/docs/quick_start.md
@@ -0,0 +1,91 @@
+# 10分钟快速上手使用
+
+本文档在一个小数据集上展示了如何通过PaddleX进行训练,您可以阅读文档[使用教程-模型训练](/tutorials/train)来了解更多模型任务的训练使用方式。
+
+## 1. 准备蔬菜分类数据集
+```
+wget https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz
+tar xzvf vegetables_cls.tar.gz
+```
+
+## 2. 训练代码开发
+通过如下`train.py`代码进行训练
+> 设置使用0号GPU卡
+```
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+import paddlex as pdx
+```
+
+> 定义训练和验证时的数据处理流程, 在`train_transforms`中加入了`RandomCrop`和`RandomHorizontalFlip`两种数据增强方式
+```
+from paddlex.cls import transforms
+train_transforms = transforms.Compose([
+ transforms.RandomCrop(crop_size=224),
+ transforms.RandomHorizontalFlip(),
+ transforms.Normalize()
+])
+eval_transforms = transforms.Compose([
+ transforms.ResizeByShort(short_size=256),
+ transforms.CenterCrop(crop_size=224),
+ transforms.Normalize()
+])
+```
+
+> 定义数据集,`pdx.datasets.ImageNet`表示读取ImageNet格式的分类数据集
+```
+train_dataset = pdx.datasets.ImageNet(
+ data_dir='vegetables_cls',
+ file_list='vegetables_cls/train_list.txt',
+ label_list='vegetables_cls/labels.txt',
+ transforms=train_transforms,
+ shuffle=True)
+eval_dataset = pdx.datasets.ImageNet(
+ data_dir='vegetables_cls',
+ file_list='vegetables_cls/train_list.txt',
+ label_list='vegetables_cls/labels.txt',
+ transforms=train_transforms)
+```
+> 模型训练
+
+```
+num_classes = len(train_dataset.labels)
+model = pdx.cls.MobileNetV2(num_classes=num_classes)
+model.train(num_epochs=10,
+ train_dataset=train_dataset,
+ train_batch_size=32,
+ eval_dataset=eval_dataset,
+ lr_decay_epochs=[4, 6, 8],
+ learning_rate=0.025,
+ save_dir='output/mobilenetv2',
+ use_vdl=True)
+```
+
+## 3. 模型训练
+> `train.py`与解压后的数据集目录`vegetables_cls`放在同一目录下,在此目录下运行`train.py`即可开始训练。如果您的电脑上有GPU,这将会在10分钟内训练完成,如果为CPU也大概会在30分钟内训练完毕。
+```
+python train.py
+```
+## 4. 训练过程中查看训练指标
+> 模型在训练过程中,所有的迭代信息将以标注输出流的形式,输出到命令执行的终端上,用户也可通过visualdl以可视化的方式查看训练指标的变化,通过如下方式启动visualdl后,在浏览器打开https://0.0.0.0:8001即可。
+```
+visualdl --logdir output/mobilenetv2/vdl_log --port 8000
+```
+
+## 5. 训练完成使用模型进行测试
+> 如使用训练过程中第8轮保存的模型进行测试
+```
+import paddlex as pdx
+model = pdx.load_model('output/mobilenetv2/epoch_8')
+result = model.predict('vegetables_cls/bocai/100.jpg', topk=3)
+print("Predict Result:", result)
+```
+> 预测结果输出如下,预测按score进行排序,得到前三分类结果
+```
+Predict Result: Predict Result: [{'score': 0.9999393, 'category': 'bocai', 'category_id': 0}, {'score': 6.010089e-05, 'category': 'hongxiancai', 'category_id': 2}, {'score': 5.593914e-07, 'category': 'xilanhua', 'category_id': 5}]
+```
+
+## 其它推荐
+- 1.[目标检测模型训练](tutorials/train/detection.md)
+- 2.[语义分割模型训练](tutorials/train/segmentation.md)
+- 3.[模型太大,想要更小的模型,试试模型裁剪吧!](tutorials/compress/classification.md)
diff --git a/docs/requirements.txt b/docs/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f11fa32f6f465f7b002d7fd37cbd78203206d8d7
--- /dev/null
+++ b/docs/requirements.txt
@@ -0,0 +1,4 @@
+sphinx
+recommonmark
+sphinx_markdown_tables
+sphinx_rtd_theme
diff --git a/docs/slim/index.rst b/docs/slim/index.rst
new file mode 100644
index 0000000000000000000000000000000000000000..48a16f6e08f3f80a7048d1666719b9b08e150362
--- /dev/null
+++ b/docs/slim/index.rst
@@ -0,0 +1,8 @@
+模型压缩
+============================
+
+.. toctree::
+ :maxdepth: 2
+
+ prune.md
+ quant.md
diff --git a/docs/slim/prune.md b/docs/slim/prune.md
new file mode 100644
index 0000000000000000000000000000000000000000..c1ff51e5e08c2ce8da5e2042d0a1c359a9e64dff
--- /dev/null
+++ b/docs/slim/prune.md
@@ -0,0 +1,54 @@
+# 模型裁剪
+
+## 原理介绍
+
+模型裁剪用于减小模型的计算量和体积,可以加快模型部署后的预测速度,是一种减小模型大小和降低模型计算复杂度的常用方式,通过裁剪卷积层中Kernel输出通道的大小及其关联层参数大小来实现,其关联裁剪的原理可参见[PaddleSlim相关文档](https://paddlepaddle.github.io/PaddleSlim/algo/algo.html#id16)。**一般而言,在同等模型精度前提下,数据复杂度越低,模型可以被裁剪的比例就越高**。
+
+## 裁剪方法
+PaddleX提供了两种方式:
+
+**1.用户自行计算裁剪配置(推荐),整体流程包含三个步骤,**
+> **第一步**: 使用数据集训练原始模型
+> **第二步**:利用第一步训练好的模型,在验证数据集上计算模型中各个参数的敏感度,并将敏感度信息存储至本地文件
+> **第三步**:使用数据集训练裁剪模型(与第一步差异在于需要在`train`接口中,将第二步计算得到的敏感信息文件传给接口的`sensitivities_file`参数)
+
+> 在如上三个步骤中,**相当于模型共需要训练两遍**,分别对应第一步和第三步,但其中第三步训练的是裁剪后的模型,因此训练速度较第一步会更快。
+> 第二步会遍历模型中的部分裁剪参数,分别计算各个参数裁剪后对于模型在验证集上效果的影响,**因此会反复在验证集上评估多次**。
+
+**2.使用PaddleX内置的裁剪方案**
+> PaddleX内置的模型裁剪方案是**基于标准数据集**上计算得到的参数敏感度信息,由于不同数据集特征分布会有较大差异,所以该方案相较于第1种方案训练得到的模型**精度一般而言会更低**(**且用户自定义数据集与标准数据集特征分布差异越大,导致训练的模型精度会越低**),仅在用户想节省时间的前提下可以参考使用,使用方式只需一步,
+
+> **一步**: 使用数据集训练裁剪模型,在训练调用`train`接口时,将接口中的`sensitivities_file`参数设置为'DEFAULT'字符串
+
+> 注:各模型内置的裁剪方案分别依据的数据集为: 图像分类——ImageNet数据集、目标检测——PascalVOC数据集、语义分割——CityScape数据集
+
+## 裁剪实验
+基于上述两种方案,我们在PaddleX上使用样例数据进行了实验,在Tesla P40上实验指标如下所示,
+
+### 图像分类
+实验背景:使用MobileNetV2模型,数据集为蔬菜分类示例数据,见[使用教程-模型压缩-图像分类](../tutorials/compress/classification.md)
+
+| 模型 | 裁剪情况 | 模型大小 | Top1准确率(%) |GPU预测速度 | CPU预测速度 |
+| :-----| :--------| :-------- | :---------- |:---------- |:----------|
+|MobileNetV2 | 无裁剪(原模型)| 13.0M | 97.50|6.47ms |47.44ms |
+|MobileNetV2 | 方案一(eval_metric_loss=0.10) | 2.1M | 99.58 |5.03ms |20.22ms |
+|MobileNetV2 | 方案二(eval_metric_loss=0.10) | 6.0M | 99.58 |5.42ms |29.06ms |
+
+### 目标检测
+实验背景:使用YOLOv3-MobileNetV1模型,数据集为昆虫检测示例数据,见[使用教程-模型压缩-目标检测](../tutorials/compress/detection.md)
+
+
+| 模型 | 裁剪情况 | 模型大小 | MAP(%) |GPU预测速度 | CPU预测速度 |
+| :-----| :--------| :-------- | :---------- |:---------- | :---------|
+|YOLOv3-MobileNetV1 | 无裁剪(原模型)| 139M | 67.57| 14.88ms |976.42ms |
+|YOLOv3-MobileNetV1 | 方案一(eval_metric_loss=0.10) | 34M | 75.49 |10.60ms |558.49ms |
+|YOLOv3-MobileNetV1 | 方案二(eval_metric_loss=0.05) | 29M | 50.27| 9.43ms |360.46ms |
+
+### 语义分割
+实验背景:使用UNet模型,数据集为视盘分割示例数据, 见[使用教程-模型压缩-语义分割](../tutorials/compress/segmentation.md)
+
+| 模型 | 裁剪情况 | 模型大小 | mIOU(%) |GPU预测速度 | CPU预测速度 |
+| :-----| :--------| :-------- | :---------- |:---------- | :---------|
+|UNet | 无裁剪(原模型)| 77M | 91.22 |33.28ms |9523.55ms |
+|UNet | 方案一(eval_metric_loss=0.10) |26M | 90.37 |21.04ms |3936.20ms |
+|UNet | 方案二(eval_metric_loss=0.10) |23M | 91.21 |18.61ms |3447.75ms |
diff --git a/docs/slim/quant.md b/docs/slim/quant.md
new file mode 100644
index 0000000000000000000000000000000000000000..1686a9fb8d33e770d55a378ebdf76876058514fb
--- /dev/null
+++ b/docs/slim/quant.md
@@ -0,0 +1,11 @@
+# 模型量化
+
+## 原理介绍
+为了满足低内存带宽、低功耗、低计算资源占用以及低模型存储等需求,定点量化被提出。为此我们提供了训练后量化,该量化使用KL散度确定量化比例因子,将FP32模型转成INT8模型,且不需要重新训练,可以快速得到量化模型。
+
+
+## 使用PaddleX量化模型
+PaddleX提供了`export_quant_model`接口,让用户以接口的形式完成模型以post_quantization方式量化并导出。点击查看[量化接口使用文档](../apis/slim.md)。
+
+## 量化性能对比
+模型量化后的性能对比指标请查阅[PaddleSlim模型库](https://paddlepaddle.github.io/PaddleSlim/model_zoo.html)
diff --git a/docs/tutorials/compress/classification.md b/docs/tutorials/compress/classification.md
new file mode 100644
index 0000000000000000000000000000000000000000..464881fc1d2971cb0e2b4e30b3440ec4c7b98997
--- /dev/null
+++ b/docs/tutorials/compress/classification.md
@@ -0,0 +1,54 @@
+# 分类模型裁剪
+
+---
+本文档训练代码可直接在PaddleX的Repo中下载,[代码tutorials/compress/classification](http://gitlab.baidu.com/Paddle/PaddleX/tree/develop/tutorials/compress/classification)
+本文档按如下方式对模型进行了裁剪
+> 第一步:在训练数据集上训练MobileNetV2
+> 第二步:在验证数据集上计算模型中各个参数的敏感度信息
+> 第三步:根据第二步计算的敏感度,设定`eval_metric_loss`,对模型裁剪后重新在训练数据集上训练
+
+## 步骤一 训练MobileNetV2
+> 模型训练使用文档可以直接参考[分类模型训练](../train/classification.md),本文档在该代码基础上添加了部分参数选项,用户可直接下载模型训练代码[tutorials/compress/classification/mobilenet.py](http://gitlab.baidu.com/Paddle/PaddleX/tree/develop/tutorials/compress/classification/mobilenet.py)
+> 使用如下命令开始模型训练
+```
+python mobilenet.py
+```
+
+## 步骤二 计算参数敏感度
+> 参数敏感度的计算可以直接使用PaddleX提供的API`paddlex.slim.cal_params_sensitivities`,使用代码如下, 敏感度信息文件会保存至`save_file`
+
+```
+import os
+# 选择使用0号卡
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+import paddlex as pdx
+
+model_dir = './output/mobilenet/best_model'
+model = pdx.load_model(model_dir)
+
+# 定义验证所用的数据集
+eval_dataset = pdx.datasets.ImageNet(
+ data_dir=dataset,
+ file_list=os.path.join(dataset, 'val_list.txt'),
+ label_list=os.path.join(dataset, 'labels.txt'),
+ transforms=model.eval_transforms)
+
+pdx.slim.cal_params_sensitivities(model,
+ save_file,
+ eval_dataset,
+ batch_size=8)
+```
+> 本步骤代码已整理至[tutorials/compress/classification/cal_sensitivities_file.py](http://gitlab.baidu.com/Paddle/PaddleX/tree/develop/tutorials/compress/classification/cal_sensitivities_file.py),用户可直接下载使用
+> 使用如下命令开始计算敏感度
+```
+python cal_sensitivities_file.py --model_dir output/mobilenet/best_model --dataset vegetables_cls --save_file sensitivities.data
+```
+
+## 步骤三 开始裁剪训练
+> 本步骤代码与步骤一使用同一份代码文件,使用如下命令开始裁剪训练
+```
+python mobilenet.py --model_dir output/mobilenet/best_model --sensitivities_file sensitivities.data --eval_metric_loss 0.10
+```
+
+## 实验效果
+本教程的实验效果可以查阅[模型压缩文档](../../slim/prune.md)
diff --git a/docs/tutorials/compress/detection.md b/docs/tutorials/compress/detection.md
new file mode 100644
index 0000000000000000000000000000000000000000..fb068703188f65c7f873db12afb93ed12f9b39ae
--- /dev/null
+++ b/docs/tutorials/compress/detection.md
@@ -0,0 +1,53 @@
+# 检测模型裁剪
+
+---
+本文档训练代码可直接在PaddleX的Repo中下载,[代码tutorials/compress/detection](http://gitlab.baidu.com/Paddle/PaddleX/tree/develop/tutorials/compress/detection)
+本文档按如下方式对模型进行了裁剪
+> 第一步:在训练数据集上训练YOLOv3
+> 第二步:在验证数据集上计算模型中各个参数的敏感度信息
+> 第三步:根据第二步计算的敏感度,设定`eval_metric_loss`,对模型裁剪后重新在训练数据集上训练
+
+## 步骤一 训练YOLOv3
+> 模型训练使用文档可以直接参考[检测模型训练](../train/detection.md),本文档在该代码基础上添加了部分参数选项,用户可直接下载模型训练代码[tutorials/compress/detection/yolov3_mobilnet.py](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop_details/tutorials/compress/detection/yolov3_mobilenet.py)
+> 使用如下命令开始模型训练
+```
+python yolov3_mobilenet.py
+```
+
+## 步骤二 计算参数敏感度
+> 参数敏感度的计算可以直接使用PaddleX提供的API`paddlex.slim.cal_params_sensitivities`,使用代码如下, 敏感度信息文件会保存至`save_file`
+
+```
+import os
+# 选择使用0号卡
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+import paddlex as pdx
+
+model = pdx.load_model(model_dir)
+
+# 定义验证所用的数据集
+eval_dataset = pdx.datasets.ImageNet(
+ data_dir=dataset,
+ file_list=os.path.join(dataset, 'val_list.txt'),
+ label_list=os.path.join(dataset, 'labels.txt'),
+ transforms=model.eval_transforms)
+
+pdx.slim.cal_params_sensitivities(model,
+ save_file,
+ eval_dataset,
+ batch_size=8)
+```
+> 本步骤代码已整理至[tutorials/compress/detection/cal_sensitivities_file.py](http://gitlab.baidu.com/Paddle/PaddleX/tree/develop/tutorials/compress/detection/cal_sensitivities_file.py),用户可直接下载使用
+> 使用如下命令开始计算敏感度
+```
+python cal_sensitivities_file.py --model_dir output/yolov3_mobile/best_model --dataset insect_det --save_file sensitivities.data
+```
+
+## 步骤三 开始裁剪训练
+> 本步骤代码与步骤一使用同一份代码文件,使用如下命令开始裁剪训练
+```
+python yolov3_mobilenet.py --model_dir output/yolov3_mobile/best_model --sensitivities_file sensitivities.data --eval_metric_loss 0.10
+```
+
+## 实验效果
+本教程的实验效果可以查阅[模型压缩文档](../../slim/prune.md)
diff --git a/docs/tutorials/compress/index.rst b/docs/tutorials/compress/index.rst
new file mode 100644
index 0000000000000000000000000000000000000000..3e0dcd752cdf09b93b0beda01b33b77a060c0711
--- /dev/null
+++ b/docs/tutorials/compress/index.rst
@@ -0,0 +1,10 @@
+模型压缩
+=========================
+
+.. toctree::
+ :maxdepth: 1
+
+ classification.md
+ detection.md
+ segmentation.md
+
diff --git a/docs/tutorials/compress/segmentation.md b/docs/tutorials/compress/segmentation.md
new file mode 100644
index 0000000000000000000000000000000000000000..a3a754853d5ba9f6b514ebccd98f99635da217e4
--- /dev/null
+++ b/docs/tutorials/compress/segmentation.md
@@ -0,0 +1,53 @@
+# 分割模型裁剪
+
+---
+本文档训练代码可直接在PaddleX的Repo中下载,[代码tutorials/compress/segmentation](http://gitlab.baidu.com/Paddle/PaddleX/tree/develop/tutorials/compress/segmentation)
+本文档按如下方式对模型进行了裁剪
+> 第一步:在训练数据集上训练UNet
+> 第二步:在验证数据集上计算模型中各个参数的敏感度信息
+> 第三步:根据第二步计算的敏感度,设定`eval_metric_loss`,对模型裁剪后重新在训练数据集上训练
+
+## 步骤一 训练UNet
+> 模型训练使用文档可以直接参考[检测模型训练](../train/segmentation.md),本文档在该代码基础上添加了部分参数选项,用户可直接下载模型训练代码[tutorials/compress/segmentation/unet.py](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop_details/tutorials/compress/segmentation/unet.py)
+> 使用如下命令开始模型训练
+```
+python unet.py
+```
+
+## 步骤二 计算参数敏感度
+> 参数敏感度的计算可以直接使用PaddleX提供的API`paddlex.slim.cal_params_sensitivities`,使用代码如下, 敏感度信息文件会保存至`save_file`
+
+```
+import os
+# 选择使用0号卡
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+import paddlex as pdx
+
+model = pdx.load_model(model_dir)
+
+# 定义验证所用的数据集
+eval_dataset = pdx.datasets.ImageNet(
+ data_dir=dataset,
+ file_list=os.path.join(dataset, 'val_list.txt'),
+ label_list=os.path.join(dataset, 'labels.txt'),
+ transforms=model.eval_transforms)
+
+pdx.slim.cal_params_sensitivities(model,
+ save_file,
+ eval_dataset,
+ batch_size=8)
+```
+> 本步骤代码已整理至[tutorials/compress/detection/cal_sensitivities_file.py](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop_details/tutorials/compress/segmentation/cal_sensitivities_file.py),用户可直接下载使用
+> 使用如下命令开始计算敏感度
+```
+python cal_sensitivities_file.py --model_dir output/unet/best_model --dataset optic_disc_seg --save_file sensitivities.data
+```
+
+## 步骤三 开始裁剪训练
+> 本步骤代码与步骤一使用同一份代码文件,使用如下命令开始裁剪训练
+```
+python unet.py --model_dir output/unet/best_model --sensitivities_file sensitivities.data --eval_metric_loss 0.10
+```
+
+## 实验效果
+本教程的实验效果可以查阅[模型压缩文档](../../slim/prune.md)
diff --git a/docs/tutorials/deploy/index.rst b/docs/tutorials/deploy/index.rst
new file mode 100644
index 0000000000000000000000000000000000000000..e243dc808f73936cd6f39fae5439136cd265bad6
--- /dev/null
+++ b/docs/tutorials/deploy/index.rst
@@ -0,0 +1,8 @@
+模型部署
+=========================
+
+.. toctree::
+ :maxdepth: 1
+
+ server.md
+ terminal.md
diff --git a/docs/tutorials/index.rst b/docs/tutorials/index.rst
new file mode 100644
index 0000000000000000000000000000000000000000..47eaf6abc6df7080a0d2dcba9681bcb247a35d19
--- /dev/null
+++ b/docs/tutorials/index.rst
@@ -0,0 +1,8 @@
+使用教程
+=========================
+
+.. toctree::
+ :maxdepth: 1
+
+ train/index.rst
+ compress/index.rst
diff --git a/docs/tutorials/train/classification.md b/docs/tutorials/train/classification.md
new file mode 100644
index 0000000000000000000000000000000000000000..42a24a2aa54cfcdd9cb1613f4bd686cdc7388d31
--- /dev/null
+++ b/docs/tutorials/train/classification.md
@@ -0,0 +1,108 @@
+# 训练图像分类模型
+
+---
+本文档训练代码可参考PaddleX的[代码tutorial/train/classification/mobilenetv2.py](http://gitlab.baidu.com/Paddle/PaddleX/tree/develop/tutorials/train/classification/mobilenetv2.py)
+
+**1.下载并解压训练所需的数据集**
+
+> 使用1张显卡训练并指定使用0号卡。
+
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+import paddlex as pdx
+```
+
+> 这里使用蔬菜数据集,训练集、验证集和测试集共包含6189个样本,18个类别。
+
+```python
+veg_dataset = 'https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz'
+pdx.utils.download_and_decompress(veg_dataset, path='./')
+```
+
+**2.定义训练和验证过程中的数据处理和增强操作**
+> transforms用于指定训练和验证过程中的数据处理和增强操作流程,如下代码在训练过程中使用了`RandomCrop`和`RandomHorizontalFlip`进行数据增强,transforms的使用见[paddlex.cls.transforms](../../apis/transforms/cls_transforms.html#paddlex-cls-transforms)
+
+```python
+from paddlex.cls import transforms
+train_transforms = transforms.Compose([
+ transforms.RandomCrop(crop_size=224),
+ transforms.RandomHorizontalFlip(),
+ transforms.Normalize()
+])
+eval_transforms = transforms.Compose([
+ transforms.ResizeByShort(short_size=256),
+ transforms.CenterCrop(crop_size=224),
+ transforms.Normalize()
+])
+```
+
+**3.创建数据集读取器,并绑定相应的数据预处理流程**
+> 通过不同的数据集读取器可以加载不同格式的数据集,数据集API的介绍见文档[paddlex.datasets](../../apis/datasets.md)
+
+```python
+train_dataset = pdx.datasets.ImageNet(
+ data_dir='vegetables_cls',
+ file_list='vegetables_cls/train_list.txt',
+ label_list='vegetables_cls/labels.txt',
+ transforms=train_transforms,
+ shuffle=True)
+eval_dataset = pdx.datasets.ImageNet(
+ data_dir='vegetables_cls',
+ file_list='vegetables_cls/val_list.txt',
+ label_list='vegetables_cls/labels.txt',
+ transforms=eval_transforms)
+```
+
+**4.创建模型进行训练**
+> 模型训练会默认自动下载和使用imagenet图像数据集上的预训练模型,用户也可自行指定`pretrain_weights`参数来设置预训练权重。模型训练过程每间隔`save_interval_epochs`轮会保存一次模型在`save_dir`目录下,同时在保存的过程中也会在验证数据集上计算相关指标。
+
+> 分类模型的接口可见文档[paddlex.cls.models](../../apis/models.md)
+
+```python
+model = pdx.cls.MobileNetV2(num_classes=len(train_dataset.labels))
+model.train(
+ num_epochs=10,
+ train_dataset=train_dataset,
+ train_batch_size=32,
+ eval_dataset=eval_dataset,
+ lr_decay_epochs=[4, 6, 8],
+ learning_rate=0.025,
+ save_dir='output/mobilenetv2',
+ use_vdl=True)
+```
+
+> 将`use_vdl`设置为`True`时可使用VisualDL查看训练指标。按以下方式启动VisualDL后,浏览器打开 https://0.0.0.0:8001即可。其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP。
+
+```shell
+visualdl --logdir output/mobilenetv2/vdl_log --port 8001
+```
+
+**5.验证或测试**
+> 利用训练完的模型可继续在验证集上进行验证。
+
+```python
+eval_metrics = model.evaluate(eval_dataset, batch_size=8)
+print("eval_metrics:", eval_metrics)
+```
+
+> 结果输出:
+```
+eval_metrics: OrderedDict([('acc1', 0.9895916733386709), ('acc5', 0.9983987189751802)])
+```
+
+> 训练完用模型对图片进行测试。
+
+```python
+predict_result = model.predict('./vegetables_cls/bocai/IMG_00000839.jpg', topk=5)
+print("predict_result:", predict_result)
+```
+
+> 结果输出:
+```
+predict_result: [{'category_id': 13, 'category': 'bocai', 'score': 0.8607276},
+ {'category_id': 11, 'category': 'kongxincai', 'score': 0.06386806},
+ {'category_id': 2, 'category': 'suanmiao', 'score': 0.03736042},
+ {'category_id': 12, 'category': 'heiqiezi', 'score': 0.007879922},
+ {'category_id': 17, 'category': 'huluobo', 'score': 0.006327283}]
+```
diff --git a/docs/tutorials/train/detection.md b/docs/tutorials/train/detection.md
new file mode 100644
index 0000000000000000000000000000000000000000..01bf5687eb746dcb55e9755c3af522c4520f0f44
--- /dev/null
+++ b/docs/tutorials/train/detection.md
@@ -0,0 +1,119 @@
+# 训练目标检测模型
+
+------
+
+更多检测模型在VOC数据集或COCO数据集上的训练代码可参考[代码tutorials/train/detection/faster_rcnn_r50_fpn.py](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/detection/faster_rcnn_r50_fpn.py)、[代码tutorials/train/detection/yolov3_mobilenetv1.py](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/detection/yolov3_mobilenetv1.py)。
+
+**1.下载并解压训练所需的数据集**
+
+> 使用1张显卡训练并指定使用0号卡。
+
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+import paddlex as pdx
+```
+
+> 这里使用昆虫数据集,训练集、验证集和测试集共包含1938个样本,6个类别。
+
+```python
+insect_dataset = 'https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz'
+pdx.utils.download_and_decompress(insect_dataset, path='./')
+```
+
+**2.定义训练和验证过程中的数据处理和增强操作**
+
+> 在训练过程中使用`RandomHorizontalFlip`进行数据增强,由于接下来选择的模型是带FPN结构的Faster RCNN,所以使用`Padding`将输入图像的尺寸补齐到32的倍数,以保证FPN中两个需做相加操作的特征层的尺寸完全相同。transforms的使用见[paddlex.det.transforms](../../apis/transforms/det_transforms.md)
+
+```python
+from paddlex.det import transforms
+train_transforms = transforms.Compose([
+ transforms.RandomHorizontalFlip(),
+ transforms.Normalize(),
+ transforms.ResizeByShort(short_size=800, max_size=1333),
+ transforms.Padding(coarsest_stride=32)
+])
+
+eval_transforms = transforms.Compose([
+ transforms.Normalize(),
+ transforms.ResizeByShort(short_size=800, max_size=1333),
+ transforms.Padding(coarsest_stride=32),
+])
+```
+
+**3.创建数据集读取器,并绑定相应的数据预处理流程**
+
+> 数据集读取器的介绍见文档[paddlex.datasets](../../apis/datasets.md)
+
+```python
+train_dataset = pdx.datasets.VOCDetection(
+ data_dir='insect_det',
+ file_list='insect_det/train_list.txt',
+ label_list='insect_det/labels.txt',
+ transforms=train_transforms,
+ shuffle=True)
+eval_dataset = pdx.datasets.VOCDetection(
+ data_dir='insect_det',
+ file_list='insect_det/val_list.txt',
+ label_list='insect_det/labels.txt',
+ transforms=eval_transforms)
+```
+
+**4.创建Faster RCNN模型,并进行训练**
+
+> 创建带FPN结构的Faster RCNN模型,`num_classes` 需要设置为包含背景类的类别数,即: 目标类别数量(6) + 1
+
+```python
+num_classes = len(train_dataset.labels) + 1
+model = pdx.det.FasterRCNN(num_classes=num_classes)
+```
+
+> 模型训练默认下载并使用在ImageNet数据集上训练得到的Backone,用户也可自行指定`pretrain_weights`参数来设置预训练权重。训练过程每间隔`save_interval_epochs`会在`save_dir`保存一次模型,与此同时也会在验证数据集上计算指标。检测模型的接口可见文档[paddlex.cv.models](../../apis/models.md#fasterrcnn)
+
+```python
+model.train(
+ num_epochs=12,
+ train_dataset=train_dataset,
+ train_batch_size=2,
+ eval_dataset=eval_dataset,
+ learning_rate=0.0025,
+ lr_decay_epochs=[8, 11],
+ save_dir='output/faster_rcnn_r50_fpn',
+ use_vdl=True)
+```
+
+> 将`use_vdl`设置为`True`时可使用VisualDL查看训练指标。按以下方式启动VisualDL后,浏览器打开 https://0.0.0.0:8001即可。其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP。
+
+```shell
+visualdl --logdir output/faster_rcnn_r50_fpn/vdl_log --port 8001
+```
+
+**5.验证或测试**
+
+> 训练完利用模型可继续在验证集上进行验证。
+
+```python
+eval_metrics = model.evaluate(eval_dataset, batch_size=2)
+print("eval_metrics:", eval_metrics)
+```
+
+> 结果输出:
+
+```python
+eval_metrics: {'bbox_map': 76.085371}
+
+```
+
+> 训练完用模型对图片进行测试。
+
+```python
+predict_result = model.predict('./insect_det/JPEGImages/1968.jpg')
+```
+
+> 可视化测试结果:
+
+```python
+pdx.det.visualize('./insect_det/JPEGImages/1968.jpg', predict_result, threshold=0.5, save_dir='./output/faster_rcnn_r50_fpn')
+```
+
+
diff --git a/docs/tutorials/train/index.rst b/docs/tutorials/train/index.rst
new file mode 100644
index 0000000000000000000000000000000000000000..3ba3b5498336d88a2bd573d1f5b16c33979b8e88
--- /dev/null
+++ b/docs/tutorials/train/index.rst
@@ -0,0 +1,11 @@
+模型训练
+=========================
+
+.. toctree::
+ :maxdepth: 1
+
+ classification.md
+ detection.md
+ instance_segmentation.md
+ segmentation.md
+ visualdl.md
diff --git a/docs/tutorials/train/instance_segmentation.md b/docs/tutorials/train/instance_segmentation.md
new file mode 100644
index 0000000000000000000000000000000000000000..80445c3bf39d8ce87b91540fc3bf0659d1c5b82f
--- /dev/null
+++ b/docs/tutorials/train/instance_segmentation.md
@@ -0,0 +1,116 @@
+# 训练实例分割模型
+
+------
+
+本文档训练代码可直接下载[代码tutorials/train/detection/mask_rcnn_r50_fpn.py](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/detection/mask_rcnn_r50_fpn.py)。
+
+**1.下载并解压训练所需的数据集**
+
+> 使用1张显卡训练并指定使用0号卡。
+
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+import paddlex as pdx
+```
+
+> 这里使用垃圾分拣数据集,训练集、验证集和测试共包含283个样本,6个类别。
+
+```python
+garbage_dataset = 'https://bj.bcebos.com/paddlex/datasets/garbage_ins_det.tar.gz'
+pdx.utils.download_and_decompress(garbage_dataset, path='./')
+```
+
+**2.定义训练和验证过程中的数据处理和增强操作**
+
+> 在训练过程中使用`RandomHorizontalFlip`进行数据增强,由于接下来选择的模型是带FPN结构的Mask RCNN,所以使用`PaddingImage`将输入图像的尺寸补齐到32的倍数,以保证FPN中两个需做相加操作的特征层的尺寸完全相同。transforms的使用见[paddlex.cv.transforms](../../apis/transforms/det_transforms.md)
+
+```python
+from paddlex.det import transforms
+train_transforms = transforms.Compose([
+ transforms.RandomHorizontalFlip(),
+ transforms.Normalize(),
+ transforms.ResizeByShort(short_size=800, max_size=1333),
+ transforms.Padding(coarsest_stride=32)
+])
+
+eval_transforms = transforms.Compose([
+ transforms.Normalize(),
+ transforms.ResizeByShort(short_size=800, max_size=1333),
+ transforms.Padding(coarsest_stride=32)
+])
+```
+
+**3.创建数据集读取器,并绑定相应的数据预处理流程**
+
+> 数据集读取器的介绍见文档[paddlex.datasets](../../apis/datasets.md)
+
+```python
+train_dataset = pdx.datasets.CocoDetection(
+ data_dir='garbage_ins_det/JPEGImages',
+ ann_file='garbage_ins_det/train.json',
+ transforms=train_transforms,
+ shuffle=True)
+eval_dataset = pdx.datasets.CocoDetection(
+ data_dir='garbage_ins_det/JPEGImages',
+ ann_file='garbage_ins_det/val.json',
+ transforms=eval_transforms)
+```
+
+**4.创建Mask RCNN模型,并进行训练**
+
+> 创建带FPN结构的Mask RCNN模型,`num_classes` 需要设置为包含背景类的类别数,即: 目标类别数量(6) + 1。
+
+```python
+num_classes = len(train_dataset.labels)
+model = pdx.det.MaskRCNN(num_classes=num_classes
+```
+
+> 模型训练默认下载并使用在ImageNet数据集上训练得到的Backone,用户也可自行指定`pretrain_weights`参数来设置预训练权重。训练过程每间隔`save_interval_epochs`会在`save_dir`保存一次模型,与此同时也会在验证数据集上计算指标。检测模型的接口可见文档[paddlex.det.models](../../apis/models.md)。
+
+```python
+model.train(
+ num_epochs=12,
+ train_dataset=train_dataset,
+ train_batch_size=1,
+ eval_dataset=eval_dataset,
+ learning_rate=0.00125,
+ lr_decay_epochs=[8, 11],
+ save_dir='output/mask_rcnn_r50_fpn',
+ use_vdl=True)
+```
+
+> 将`use_vdl`设置为`True`时可使用VisualDL查看训练指标。按以下方式启动VisualDL后,浏览器打开 https://0.0.0.0:8001即可。其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP。
+
+```shell
+visualdl --logdir output/faster_rcnn_r50_fpn/vdl_log --port 8001
+```
+
+**5.验证或测试**
+
+> 训练完利用模型可继续在验证集上进行验证。
+
+```python
+eval_metrics = model.evaluate(eval_dataset, batch_size=1)
+print("eval_metrics:", eval_metrics)
+```
+
+> 结果输出:
+
+```python
+eval_metrics: {'bbox_mmap': 0.858306, 'segm_mmap': 0.864278}
+
+```
+
+> 训练完用模型对图片进行测试。
+
+```python
+predict_result = model.predict('./garbage_ins_det/JPEGImages/000114.bmp')
+```
+
+> 可视化测试结果:
+
+```python
+pdx.det.visualize('./garbage_ins_det/JPEGImages/000114.bmp', predict_result, threshold=0.7, save_dir='./output/mask_rcnn_r50_fpn')
+```
+
diff --git a/docs/tutorials/train/segmentation.md b/docs/tutorials/train/segmentation.md
new file mode 100644
index 0000000000000000000000000000000000000000..39b295dcb376310a6145eb4b97150031291bb60d
--- /dev/null
+++ b/docs/tutorials/train/segmentation.md
@@ -0,0 +1,117 @@
+# 训练语义分割模型
+
+---
+更多语义分割模型在Cityscapes数据集上的训练代码可参考[代码tutorials/train/segmentation/unet.py](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/segmentation/unet.py)、[代码tutorials/train/segmentation/deeplabv3p.py](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/segmentation/deeplabv3p.py)。
+
+**1.下载并解压训练所需的数据集**
+
+> 使用1张显卡训练并指定使用0号卡。
+
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+import paddlex as pdx
+```
+
+> 这里使用视盘分割数据集,训练集、验证集和测试集共包含343个样本,2个类别。
+
+```python
+optic_dataset = 'https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz'
+pdx.utils.download_and_decompress(optic_dataset, path='./')
+```
+
+**2.定义训练和验证过程中的数据处理和增强操作**
+
+> 在训练过程中使用`RandomHorizontalFlip`和`RandomPaddingCrop`进行数据增强,transforms的使用见[paddlex.seg.transforms](../../apis/transforms/seg_transforms.md)
+
+```python
+train_transforms = transforms.Compose([
+ transforms.RandomHorizontalFlip(),
+ transforms.Resize(target_size=512),
+ transforms.RandomPaddingCrop(crop_size=500),
+ transforms.Normalize()
+])
+eval_transforms = transforms.Compose([
+ transforms.Resize(512),
+ transforms.Normalize()
+])
+```
+
+**3.创建数据集读取器,并绑定相应的数据预处理流程**
+
+> 数据集读取器的介绍见文档[paddlex.cv.datasets](../../apis/datasets.md)
+
+```python
+train_dataset = pdx.datasets.SegDataset(
+ data_dir='optic_disc_seg',
+ file_list='optic_disc_seg/train_list.txt',
+ label_list='optic_disc_seg/labels.txt',
+ transforms=train_transforms,
+ shuffle=True)
+eval_dataset = pdx.datasets.SegDataset(
+ data_dir='optic_disc_seg',
+ file_list='optic_disc_seg/val_list.txt',
+ label_list='optic_disc_seg/labels.txt',
+ transforms=eval_transforms)
+```
+
+**4.创建DeepLabv3+模型,并进行训练**
+
+> 创建DeepLabv3+模型,`num_classes` 需要设置为不包含背景类的类别数,即: 目标类别数量(1),详细代码可参见[demo](http://gitlab.baidu.com/Paddle/PaddleX/blob/develop/tutorials/train/segmentation/deeplabv3p.py#L44)。
+
+```python
+num_classes = num_classes
+model = pdx.seg.DeepLabv3p(num_classes=num_classes)
+```
+
+> 模型训练默认下载并使用在ImageNet数据集上训练得到的Backone,用户也可自行指定`pretrain_weights`参数来设置预训练权重。
+训练过程每间隔`save_interval_epochs`会在`save_dir`保存一次模型,与此同时也会在验证数据集上计算指标。
+检测模型的接口可见文档[paddlex.seg.models](../../apis/models.md)。
+
+```python
+model.train(
+ num_epochs=40,
+ train_dataset=train_dataset,
+ train_batch_size=4,
+ eval_dataset=eval_dataset,
+ learning_rate=0.01,
+ save_dir='output/deeplab',
+ use_vdl=True)
+```
+
+> 将`use_vdl`设置为`True`时可使用VisualDL查看训练指标。按以下方式启动VisualDL后,浏览器打开 https://0.0.0.0:8001即可。其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP。
+
+```shell
+visualdl --logdir output/deeplab/vdl_log --port 8001
+```
+
+**5.验证或测试**
+
+> 训练完利用模型可继续在验证集上进行验证。
+
+```python
+eval_metrics = model.evaluate(eval_dataset, batch_size=2)
+print("eval_metrics:", eval_metrics)
+```
+
+> 结果输出:
+
+```python
+eval_metrics: {'miou': 0.8915175875548873, 'category_iou': [0.9956445981924432, 0.7873905769173314], 'macc': 0.9957137358816046, 'category_acc': [0.9975360650317765, 0.8948120441157331], 'kappa': 0.8788684558629085}
+```
+
+> 训练完用模型对图片进行测试。
+
+```python
+image_name = 'optic_disc_seg/JPEGImages/H0005.jpg'
+predict_result = model.predict(image_name)
+```
+
+> 可视化测试结果:
+
+```python
+import paddlex as pdx
+pdx.seg.visualize(image_name, predict_result, weight=0.4)
+```
+
+
diff --git a/docs/tutorials/train/visualdl.md b/docs/tutorials/train/visualdl.md
new file mode 100644
index 0000000000000000000000000000000000000000..dc442b5847e048b7fe080c085e0192caada19c2b
--- /dev/null
+++ b/docs/tutorials/train/visualdl.md
@@ -0,0 +1,26 @@
+# VisualDL可视化训练指标
+在使用PaddleX训练模型过程中,各个训练指标和评估指标会直接输出到标准输出流,同时也可通过VisualDL对训练过程中的指标进行可视化,只需在调用`train`函数时,将`use_vdl`参数设为`True`即可,如下代码所示,
+```
+model = paddlex.cls.ResNet50(num_classes=1000)
+model.train(num_epochs=120, train_dataset=train_dataset,
+ train_batch_size=32, eval_dataset=eval_dataset,
+ log_interval_steps=10, save_interval_epochs=10,
+ save_dir='./output', use_vdl=True)
+```
+
+模型在训练过程中,会在`save_dir`下生成`vdl_log`目录,通过在命令行终端执行以下命令,启动VisualDL。
+```
+visualdl --logdir=output/vdl_log --port=8008
+```
+在浏览器打开`http://0.0.0.0:8008`便可直接查看随训练迭代动态变化的各个指标(0.0.0.0表示启动VisualDL所在服务器的IP,本机使用0.0.0.0即可)。
+
+在训练分类模型过程中,使用VisualDL进行可视化的示例图如下所示。
+
+> 训练过程中每个Step的`Loss`和相应`Top1准确率`变化趋势:
+
+
+> 训练过程中每个Step的`学习率lr`和相应`Top5准确率`变化趋势:
+
+
+> 训练过程中,每次保存模型时,模型在验证数据集上的`Top1准确率`和`Top5准确率`:
+
diff --git "a/images/00\346\225\260\346\215\256\351\233\206\345\257\274\345\205\245\350\257\264\346\230\216.png" "b/images/00\346\225\260\346\215\256\351\233\206\345\257\274\345\205\245\350\257\264\346\230\216.png"
deleted file mode 100644
index d837e655b323943d1c094651d959c727e3b97145..0000000000000000000000000000000000000000
Binary files "a/images/00\346\225\260\346\215\256\351\233\206\345\257\274\345\205\245\350\257\264\346\230\216.png" and /dev/null differ
diff --git "a/images/01\346\225\260\346\215\256\345\210\207\345\210\206\345\217\212\351\242\204\350\247\210.png" "b/images/01\346\225\260\346\215\256\345\210\207\345\210\206\345\217\212\351\242\204\350\247\210.png"
deleted file mode 100644
index 6e415e7d81c2021b8e7f842d5a5a9b6f79c83b08..0000000000000000000000000000000000000000
Binary files "a/images/01\346\225\260\346\215\256\345\210\207\345\210\206\345\217\212\351\242\204\350\247\210.png" and /dev/null differ
diff --git "a/images/02\345\210\233\345\273\272\351\241\271\347\233\256.png" "b/images/02\345\210\233\345\273\272\351\241\271\347\233\256.png"
deleted file mode 100644
index adf65d113eb7f6d644a5aedbd051856a0f9f3f28..0000000000000000000000000000000000000000
Binary files "a/images/02\345\210\233\345\273\272\351\241\271\347\233\256.png" and /dev/null differ
diff --git "a/images/03\351\200\211\346\213\251\346\225\260\346\215\256\351\233\206.png" "b/images/03\351\200\211\346\213\251\346\225\260\346\215\256\351\233\206.png"
deleted file mode 100644
index d9b0c83ec75978328e1f995b1d6f56a5ee4b5052..0000000000000000000000000000000000000000
Binary files "a/images/03\351\200\211\346\213\251\346\225\260\346\215\256\351\233\206.png" and /dev/null differ
diff --git "a/images/04\345\217\202\346\225\260\351\205\215\347\275\256-2.png" "b/images/04\345\217\202\346\225\260\351\205\215\347\275\256-2.png"
deleted file mode 100644
index 398c74c1fc3a00eb8ac1ceb7d811887584fcbbbe..0000000000000000000000000000000000000000
Binary files "a/images/04\345\217\202\346\225\260\351\205\215\347\275\256-2.png" and /dev/null differ
diff --git "a/images/05\350\256\255\347\273\203\345\217\257\350\247\206\345\214\226.png" "b/images/05\350\256\255\347\273\203\345\217\257\350\247\206\345\214\226.png"
deleted file mode 100644
index a299238432098648259d622fb4d6017790478ea8..0000000000000000000000000000000000000000
Binary files "a/images/05\350\256\255\347\273\203\345\217\257\350\247\206\345\214\226.png" and /dev/null differ
diff --git a/images/06VisualDL.png b/images/06VisualDL.png
deleted file mode 100644
index 3e9642f07809b85fe1652f81916ce5f3928e1c92..0000000000000000000000000000000000000000
Binary files a/images/06VisualDL.png and /dev/null differ
diff --git "a/images/07\346\250\241\345\236\213\350\257\204\344\274\260.jpg" "b/images/07\346\250\241\345\236\213\350\257\204\344\274\260.jpg"
deleted file mode 100644
index b663009afc974f87101825496ec4b0beac067953..0000000000000000000000000000000000000000
Binary files "a/images/07\346\250\241\345\236\213\350\257\204\344\274\260.jpg" and /dev/null differ
diff --git "a/images/08\346\250\241\345\236\213\345\217\221\345\270\203.png" "b/images/08\346\250\241\345\236\213\345\217\221\345\270\203.png"
deleted file mode 100644
index f0f6cbfedef8a58d6f7cc0e4105ca32bb8002031..0000000000000000000000000000000000000000
Binary files "a/images/08\346\250\241\345\236\213\345\217\221\345\270\203.png" and /dev/null differ
diff --git "a/images/09qq\347\276\244\344\272\214\347\273\264\347\240\201.png" "b/images/09qq\347\276\244\344\272\214\347\273\264\347\240\201.png"
deleted file mode 100644
index 27a5841ae50e5db58872ac1fec859eb3d85851fc..0000000000000000000000000000000000000000
Binary files "a/images/09qq\347\276\244\344\272\214\347\273\264\347\240\201.png" and /dev/null differ
diff --git a/paddlex.png b/paddlex.png
new file mode 100644
index 0000000000000000000000000000000000000000..12738b6445ab2da2f5343856ce5f52ff3cb87b0d
Binary files /dev/null and b/paddlex.png differ
diff --git a/paddlex/__init__.py b/paddlex/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..49a7a1952e179d219cc0d2e91ddb508dd3339e3c
--- /dev/null
+++ b/paddlex/__init__.py
@@ -0,0 +1,28 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from .utils.utils import get_environ_info
+from . import cv
+from . import det
+from . import seg
+from . import cls
+from . import slim
+
+env_info = get_environ_info()
+load_model = cv.models.load_model
+datasets = cv.datasets
+
+log_level = 2
+__version__ = '0.1.0.github'
diff --git a/paddlex/cls.py b/paddlex/cls.py
new file mode 100644
index 0000000000000000000000000000000000000000..25573c3709559bb8e1dbaace7f3c7876e36762f3
--- /dev/null
+++ b/paddlex/cls.py
@@ -0,0 +1,36 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from . import cv
+
+ResNet18 = cv.models.ResNet18
+ResNet34 = cv.models.ResNet34
+ResNet50 = cv.models.ResNet50
+ResNet101 = cv.models.ResNet101
+ResNet50_vd = cv.models.ResNet50_vd
+ResNet101_vd = cv.models.ResNet101_vd
+DarkNet53 = cv.models.DarkNet53
+MobileNetV1 = cv.models.MobileNetV1
+MobileNetV2 = cv.models.MobileNetV2
+MobileNetV3_small = cv.models.MobileNetV3_small
+MobileNetV3_large = cv.models.MobileNetV3_large
+Xception41 = cv.models.Xception41
+Xception65 = cv.models.Xception65
+DenseNet121 = cv.models.DenseNet121
+DenseNet161 = cv.models.DenseNet161
+DenseNet201 = cv.models.DenseNet201
+ShuffleNetV2 = cv.models.ShuffleNetV2
+
+transforms = cv.transforms.cls_transforms
diff --git a/paddlex/command.py b/paddlex/command.py
new file mode 100644
index 0000000000000000000000000000000000000000..dcfc510ebc8fa785220960a79bbaf61491a0944e
--- /dev/null
+++ b/paddlex/command.py
@@ -0,0 +1,61 @@
+from six import text_type as _text_type
+import argparse
+import sys
+
+
+def arg_parser():
+ parser = argparse.ArgumentParser()
+ parser.add_argument(
+ "--model_dir",
+ "-m",
+ type=_text_type,
+ default=None,
+ help="define model directory path")
+ parser.add_argument(
+ "--save_dir",
+ "-s",
+ type=_text_type,
+ default=None,
+ help="path to save inference model")
+ parser.add_argument(
+ "--version",
+ "-v",
+ action="store_true",
+ default=False,
+ help="get version of PaddleX")
+ parser.add_argument(
+ "--export_inference",
+ "-e",
+ action="store_true",
+ default=False,
+ help="export inference model for C++/Python deployment")
+
+ return parser
+
+
+def main():
+ import os
+ os.environ['CUDA_VISIBLE_DEVICES'] = ""
+
+ import paddlex as pdx
+
+ if len(sys.argv) < 2:
+ print("Use command 'paddlex -h` to print the help information\n")
+ return
+ parser = arg_parser()
+ args = parser.parse_args()
+
+ if args.version:
+ print("PaddleX-{}".format(pdx.__version__))
+ print("Repo: https://github.com/PaddlePaddle/PaddleX.git")
+ print("Email: paddlex@baidu.com")
+ return
+ if args.export_inference:
+ assert args.model_dir is not None, "--model_dir should be defined while exporting inference model"
+ assert args.save_dir is not None, "--save_dir should be defined to save inference model"
+ model = pdx.load_model(args.model_dir)
+ model.export_inference_model(args.save_dir)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/paddlex/cv/__init__.py b/paddlex/cv/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..de2ed215de0a00a69da827683ad6563afd862ed9
--- /dev/null
+++ b/paddlex/cv/__init__.py
@@ -0,0 +1,33 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from . import models
+from . import nets
+from . import transforms
+from . import datasets
+
+cls_transforms = transforms.cls_transforms
+det_transforms = transforms.det_transforms
+seg_transforms = transforms.seg_transforms
+
+# classification
+ResNet50 = models.ResNet50
+DarkNet53 = models.DarkNet53
+# detection
+YOLOv3 = models.YOLOv3
+#EAST = models.EAST
+FasterRCNN = models.FasterRCNN
+MaskRCNN = models.MaskRCNN
+UNet = models.UNet
+DeepLabv3p = models.DeepLabv3p
diff --git a/paddlex/cv/datasets/__init__.py b/paddlex/cv/datasets/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..c686e4539e1de67187822c130b37dae998ac049b
--- /dev/null
+++ b/paddlex/cv/datasets/__init__.py
@@ -0,0 +1,18 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .imagenet import ImageNet
+from .voc import VOCDetection
+from .coco import CocoDetection
+from .seg_dataset import SegDataset
diff --git a/paddlex/cv/datasets/coco.py b/paddlex/cv/datasets/coco.py
new file mode 100644
index 0000000000000000000000000000000000000000..52a3336ce60334bd95fa0d84d1adfb164dfb686b
--- /dev/null
+++ b/paddlex/cv/datasets/coco.py
@@ -0,0 +1,131 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+import copy
+import os.path as osp
+import random
+import numpy as np
+import paddlex.utils.logging as logging
+import paddlex as pst
+from pycocotools.coco import COCO
+from .voc import VOCDetection
+from .dataset import is_pic
+
+
+class CocoDetection(VOCDetection):
+ """读取MSCOCO格式的检测数据集,并对样本进行相应的处理,该格式的数据集同样可以应用到实例分割模型的训练中。
+
+ Args:
+ data_dir (str): 数据集所在的目录路径。
+ ann_file (str): 数据集的标注文件,为一个独立的json格式文件。
+ transforms (paddlex.det.transforms): 数据集中每个样本的预处理/增强算子。
+ num_workers (int|str): 数据集中样本在预处理过程中的线程或进程数。默认为'auto'。当设为'auto'时,根据
+ 系统的实际CPU核数设置`num_workers`: 如果CPU核数的一半大于8,则`num_workers`为8,否则为CPU核数的一半。
+ buffer_size (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。
+ parallel_method (str): 数据集中样本在预处理过程中并行处理的方式,支持'thread'
+ 线程和'process'进程两种方式。默认为'thread'(Windows和Mac下会强制使用thread,该参数无效)。
+ shuffle (bool): 是否需要对数据集中样本打乱顺序。默认为False。
+ """
+
+ def __init__(self,
+ data_dir,
+ ann_file,
+ transforms=None,
+ num_workers='auto',
+ buffer_size=100,
+ parallel_method='process',
+ shuffle=False):
+ super(VOCDetection, self).__init__(
+ transforms=transforms,
+ num_workers=num_workers,
+ buffer_size=buffer_size,
+ parallel_method=parallel_method,
+ shuffle=shuffle)
+ self.file_list = list()
+ self.labels = list()
+ self._epoch = 0
+
+ coco = COCO(ann_file)
+ self.coco_gt = coco
+ img_ids = coco.getImgIds()
+ cat_ids = coco.getCatIds()
+ catid2clsid = dict({catid: i + 1 for i, catid in enumerate(cat_ids)})
+ cname2cid = dict({
+ coco.loadCats(catid)[0]['name']: clsid
+ for catid, clsid in catid2clsid.items()
+ })
+ for label, cid in sorted(cname2cid.items(), key=lambda d: d[1]):
+ self.labels.append(label)
+ logging.info("Starting to read file list from dataset...")
+ for img_id in img_ids:
+ img_anno = coco.loadImgs(img_id)[0]
+ im_fname = osp.join(data_dir, img_anno['file_name'])
+ if not is_pic(im_fname):
+ continue
+ im_w = float(img_anno['width'])
+ im_h = float(img_anno['height'])
+ ins_anno_ids = coco.getAnnIds(imgIds=img_id, iscrowd=False)
+ instances = coco.loadAnns(ins_anno_ids)
+
+ bboxes = []
+ for inst in instances:
+ x, y, box_w, box_h = inst['bbox']
+ x1 = max(0, x)
+ y1 = max(0, y)
+ x2 = min(im_w - 1, x1 + max(0, box_w - 1))
+ y2 = min(im_h - 1, y1 + max(0, box_h - 1))
+ if inst['area'] > 0 and x2 >= x1 and y2 >= y1:
+ inst['clean_bbox'] = [x1, y1, x2, y2]
+ bboxes.append(inst)
+ else:
+ logging.warning(
+ "Found an invalid bbox in annotations: im_id: {}, area: {} x1: {}, y1: {}, x2: {}, y2: {}."
+ .format(img_id, float(inst['area']), x1, y1, x2, y2))
+ num_bbox = len(bboxes)
+ gt_bbox = np.zeros((num_bbox, 4), dtype=np.float32)
+ gt_class = np.zeros((num_bbox, 1), dtype=np.int32)
+ gt_score = np.ones((num_bbox, 1), dtype=np.float32)
+ is_crowd = np.zeros((num_bbox, 1), dtype=np.int32)
+ difficult = np.zeros((num_bbox, 1), dtype=np.int32)
+ gt_poly = [None] * num_bbox
+
+ for i, box in enumerate(bboxes):
+ catid = box['category_id']
+ gt_class[i][0] = catid2clsid[catid]
+ gt_bbox[i, :] = box['clean_bbox']
+ is_crowd[i][0] = box['iscrowd']
+ if 'segmentation' in box:
+ gt_poly[i] = box['segmentation']
+
+ im_info = {
+ 'im_id': np.array([img_id]).astype('int32'),
+ 'origin_shape': np.array([im_h, im_w]).astype('int32'),
+ }
+ label_info = {
+ 'is_crowd': is_crowd,
+ 'gt_class': gt_class,
+ 'gt_bbox': gt_bbox,
+ 'gt_score': gt_score,
+ 'gt_poly': gt_poly,
+ 'difficult': difficult
+ }
+ coco_rec = (im_info, label_info)
+ self.file_list.append([im_fname, coco_rec])
+
+ if not len(self.file_list) > 0:
+ raise Exception('not found any coco record in %s' % (ann_file))
+ logging.info("{} samples in file {}".format(
+ len(self.file_list), ann_file))
+ self.num_samples = len(self.file_list)
diff --git a/paddlex/cv/datasets/dataset.py b/paddlex/cv/datasets/dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..09c042778147cdaef6539faaa1ef9865b149e815
--- /dev/null
+++ b/paddlex/cv/datasets/dataset.py
@@ -0,0 +1,256 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from threading import Thread
+import multiprocessing
+import collections
+import numpy as np
+import six
+import sys
+import copy
+import random
+import platform
+import chardet
+import paddlex.utils.logging as logging
+
+
+class EndSignal():
+ pass
+
+
+def is_pic(img_name):
+ valid_suffix = ['JPEG', 'jpeg', 'JPG', 'jpg', 'BMP', 'bmp', 'PNG', 'png']
+ suffix = img_name.split('.')[-1]
+ if suffix not in valid_suffix:
+ return False
+ return True
+
+
+def is_valid(sample):
+ if sample is None:
+ return False
+ if isinstance(sample, tuple):
+ for s in sample:
+ if s is None:
+ return False
+ elif isinstance(s, np.ndarray) and s.size == 0:
+ return False
+ elif isinstance(s, collections.Sequence) and len(s) == 0:
+ return False
+ return True
+
+
+def get_encoding(path):
+ f = open(path, 'rb')
+ data = f.read()
+ file_encoding = chardet.detect(data).get('encoding')
+ return file_encoding
+
+
+def multithread_reader(mapper,
+ reader,
+ num_workers=4,
+ buffer_size=1024,
+ batch_size=8,
+ drop_last=True):
+ from queue import Queue
+ end = EndSignal()
+
+ # define a worker to read samples from reader to in_queue
+ def read_worker(reader, in_queue):
+ for i in reader():
+ in_queue.put(i)
+ in_queue.put(end)
+
+ # define a worker to handle samples from in_queue by mapper
+ # and put mapped samples into out_queue
+ def handle_worker(in_queue, out_queue, mapper):
+ sample = in_queue.get()
+ while not isinstance(sample, EndSignal):
+ if len(sample) == 2:
+ r = mapper(sample[0], sample[1])
+ elif len(sample) == 3:
+ r = mapper(sample[0], sample[1], sample[2])
+ else:
+ raise Exception('The sample\'s length must be 2 or 3.')
+ if is_valid(r):
+ out_queue.put(r)
+ sample = in_queue.get()
+ in_queue.put(end)
+ out_queue.put(end)
+
+ def xreader():
+ in_queue = Queue(buffer_size)
+ out_queue = Queue(buffer_size)
+ # start a read worker in a thread
+ target = read_worker
+ t = Thread(target=target, args=(reader, in_queue))
+ t.daemon = True
+ t.start()
+ # start several handle_workers
+ target = handle_worker
+ args = (in_queue, out_queue, mapper)
+ workers = []
+ for i in range(num_workers):
+ worker = Thread(target=target, args=args)
+ worker.daemon = True
+ workers.append(worker)
+ for w in workers:
+ w.start()
+
+ batch_data = []
+ sample = out_queue.get()
+ while not isinstance(sample, EndSignal):
+ batch_data.append(sample)
+ if len(batch_data) == batch_size:
+ batch_data = GenerateMiniBatch(batch_data)
+ yield batch_data
+ batch_data = []
+ sample = out_queue.get()
+ finish = 1
+ while finish < num_workers:
+ sample = out_queue.get()
+ if isinstance(sample, EndSignal):
+ finish += 1
+ else:
+ batch_data.append(sample)
+ if len(batch_data) == batch_size:
+ batch_data = GenerateMiniBatch(batch_data)
+ yield batch_data
+ batch_data = []
+ if not drop_last and len(batch_data) != 0:
+ batch_data = GenerateMiniBatch(batch_data)
+ yield batch_data
+ batch_data = []
+
+ return xreader
+
+
+def multiprocess_reader(mapper,
+ reader,
+ num_workers=4,
+ buffer_size=1024,
+ batch_size=8,
+ drop_last=True):
+ from .shared_queue import SharedQueue as Queue
+
+ def _read_into_queue(samples, mapper, queue):
+ end = EndSignal()
+ try:
+ for sample in samples:
+ if sample is None:
+ raise ValueError("sample has None")
+ if len(sample) == 2:
+ result = mapper(sample[0], sample[1])
+ elif len(sample) == 3:
+ result = mapper(sample[0], sample[1], sample[2])
+ else:
+ raise Exception('The sample\'s length must be 2 or 3.')
+ if is_valid(result):
+ queue.put(result)
+ queue.put(end)
+ except:
+ queue.put("")
+ six.reraise(*sys.exc_info())
+
+ def queue_reader():
+ queue = Queue(buffer_size, memsize=3 * 1024**3)
+ total_samples = [[] for i in range(num_workers)]
+ for i, sample in enumerate(reader()):
+ index = i % num_workers
+ total_samples[index].append(sample)
+ for i in range(num_workers):
+ p = multiprocessing.Process(
+ target=_read_into_queue,
+ args=(total_samples[i], mapper, queue))
+ p.start()
+
+ finish_num = 0
+ batch_data = list()
+ while finish_num < num_workers:
+ sample = queue.get()
+ if isinstance(sample, EndSignal):
+ finish_num += 1
+ elif sample == "":
+ raise ValueError("multiprocess reader raises an exception")
+ else:
+ batch_data.append(sample)
+ if len(batch_data) == batch_size:
+ batch_data = GenerateMiniBatch(batch_data)
+ yield batch_data
+ batch_data = []
+ if len(batch_data) != 0 and not drop_last:
+ batch_data = GenerateMiniBatch(batch_data)
+ yield batch_data
+ batch_data = []
+
+ return queue_reader
+
+
+def GenerateMiniBatch(batch_data):
+ if len(batch_data) == 1:
+ return batch_data
+ width = [data[0].shape[2] for data in batch_data]
+ height = [data[0].shape[1] for data in batch_data]
+ if len(set(width)) == 1 and len(set(height)) == 1:
+ return batch_data
+ max_shape = np.array([data[0].shape for data in batch_data]).max(axis=0)
+ padding_batch = []
+ for data in batch_data:
+ im_c, im_h, im_w = data[0].shape[:]
+ padding_im = np.zeros((im_c, max_shape[1], max_shape[2]),
+ dtype=np.float32)
+ padding_im[:, :im_h, :im_w] = data[0]
+ padding_batch.append((padding_im, ) + data[1:])
+ return padding_batch
+
+
+class Dataset:
+ def __init__(self,
+ transforms=None,
+ num_workers='auto',
+ buffer_size=100,
+ parallel_method='process',
+ shuffle=False):
+ if num_workers == 'auto':
+ import multiprocessing as mp
+ num_workers = mp.cpu_count() // 2 if mp.cpu_count() // 2 < 8 else 8
+ if platform.platform().startswith(
+ "Darwin") or platform.platform().startswith("Windows"):
+ parallel_method = 'thread'
+ if transforms is None:
+ raise Exception("transform should be defined.")
+ self.transforms = transforms
+ self.num_workers = num_workers
+ self.buffer_size = buffer_size
+ self.parallel_method = parallel_method
+ self.shuffle = shuffle
+
+ def generator(self, batch_size=1, drop_last=True):
+ self.batch_size = batch_size
+ parallel_reader = multithread_reader
+ if self.parallel_method == "process":
+ if platform.platform().startswith("Windows"):
+ logging.debug(
+ "multiprocess_reader is not supported in Windows platform, force to use multithread_reader."
+ )
+ else:
+ parallel_reader = multiprocess_reader
+ return parallel_reader(
+ self.transforms,
+ self.iterator,
+ num_workers=self.num_workers,
+ buffer_size=self.buffer_size,
+ batch_size=batch_size,
+ drop_last=drop_last)
diff --git a/paddlex/cv/datasets/imagenet.py b/paddlex/cv/datasets/imagenet.py
new file mode 100644
index 0000000000000000000000000000000000000000..06ca1cedae8bc5e0bf67c6068cd8b75376f87bfd
--- /dev/null
+++ b/paddlex/cv/datasets/imagenet.py
@@ -0,0 +1,91 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+import os.path as osp
+import random
+import copy
+import paddlex.utils.logging as logging
+from .dataset import Dataset
+from .dataset import is_pic
+from .dataset import get_encoding
+
+
+class ImageNet(Dataset):
+ """读取ImageNet格式的分类数据集,并对样本进行相应的处理。
+
+ Args:
+ data_dir (str): 数据集所在的目录路径。
+ file_list (str): 描述数据集图片文件和类别id的文件路径(文本内每行路径为相对data_dir的相对路)。
+ label_list (str): 描述数据集包含的类别信息文件路径。
+ transforms (paddlex.cls.transforms): 数据集中每个样本的预处理/增强算子。
+ num_workers (int|str): 数据集中样本在预处理过程中的线程或进程数。默认为'auto'。当设为'auto'时,根据
+ 系统的实际CPU核数设置`num_workers`: 如果CPU核数的一半大于8,则`num_workers`为8,否则为CPU核
+ 数的一半。
+ buffer_size (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。
+ parallel_method (str): 数据集中样本在预处理过程中并行处理的方式,支持'thread'
+ 线程和'process'进程两种方式。默认为'thread'(Windows和Mac下会强制使用thread,该参数无效)。
+ shuffle (bool): 是否需要对数据集中样本打乱顺序。默认为False。
+ """
+
+ def __init__(self,
+ data_dir,
+ file_list,
+ label_list,
+ transforms=None,
+ num_workers='auto',
+ buffer_size=100,
+ parallel_method='process',
+ shuffle=False):
+ super(ImageNet, self).__init__(
+ transforms=transforms,
+ num_workers=num_workers,
+ buffer_size=buffer_size,
+ parallel_method=parallel_method,
+ shuffle=shuffle)
+ self.file_list = list()
+ self.labels = list()
+ self._epoch = 0
+
+ with open(label_list, encoding=get_encoding(label_list)) as f:
+ for line in f:
+ item = line.strip()
+ self.labels.append(item)
+ logging.info("Starting to read file list from dataset...")
+ with open(file_list, encoding=get_encoding(file_list)) as f:
+ for line in f:
+ items = line.strip().split()
+ if not is_pic(items[0]):
+ continue
+ full_path = osp.join(data_dir, items[0])
+ if not osp.exists(full_path):
+ raise IOError(
+ 'The image file {} is not exist!'.format(full_path))
+ self.file_list.append([full_path, int(items[1])])
+ self.num_samples = len(self.file_list)
+ logging.info("{} samples in file {}".format(
+ len(self.file_list), file_list))
+
+ def iterator(self):
+ self._epoch += 1
+ self._pos = 0
+ files = copy.deepcopy(self.file_list)
+ if self.shuffle:
+ random.shuffle(files)
+ files = files[:self.num_samples]
+ self.num_samples = len(files)
+ for f in files:
+ records = f[1]
+ sample = [f[0], records]
+ yield sample
diff --git a/paddlex/cv/datasets/seg_dataset.py b/paddlex/cv/datasets/seg_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..2c56c5d001135abe90415444baf17877012f9541
--- /dev/null
+++ b/paddlex/cv/datasets/seg_dataset.py
@@ -0,0 +1,93 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+import os.path as osp
+import random
+import copy
+import paddlex.utils.logging as logging
+from .dataset import Dataset
+from .dataset import get_encoding
+from .dataset import is_pic
+
+
+class SegDataset(Dataset):
+ """读取语义分割任务数据集,并对样本进行相应的处理。
+
+ Args:
+ data_dir (str): 数据集所在的目录路径。
+ file_list (str): 描述数据集图片文件和对应标注文件的文件路径(文本内每行路径为相对data_dir的相对路)。
+ label_list (str): 描述数据集包含的类别信息文件路径。
+ transforms (list): 数据集中每个样本的预处理/增强算子。
+ num_workers (int): 数据集中样本在预处理过程中的线程或进程数。默认为4。
+ buffer_size (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。
+ parallel_method (str): 数据集中样本在预处理过程中并行处理的方式,支持'thread'
+ 线程和'process'进程两种方式。默认为'thread'(Windows和Mac下会强制使用thread,该参数无效)。
+ shuffle (bool): 是否需要对数据集中样本打乱顺序。默认为False。
+ """
+
+ def __init__(self,
+ data_dir,
+ file_list,
+ label_list,
+ transforms=None,
+ num_workers='auto',
+ buffer_size=100,
+ parallel_method='process',
+ shuffle=False):
+ super(SegDataset, self).__init__(
+ transforms=transforms,
+ num_workers=num_workers,
+ buffer_size=buffer_size,
+ parallel_method=parallel_method,
+ shuffle=shuffle)
+ self.file_list = list()
+ self.labels = list()
+ self._epoch = 0
+
+ with open(label_list, encoding=get_encoding(label_list)) as f:
+ for line in f:
+ item = line.strip()
+ self.labels.append(item)
+
+ with open(file_list, encoding=get_encoding(file_list)) as f:
+ for line in f:
+ items = line.strip().split()
+ if not is_pic(items[0]):
+ continue
+ full_path_im = osp.join(data_dir, items[0])
+ full_path_label = osp.join(data_dir, items[1])
+ if not osp.exists(full_path_im):
+ raise IOError(
+ 'The image file {} is not exist!'.format(full_path_im))
+ if not osp.exists(full_path_label):
+ raise IOError('The image file {} is not exist!'.format(
+ full_path_label))
+ self.file_list.append([full_path_im, full_path_label])
+ self.num_samples = len(self.file_list)
+ logging.info("{} samples in file {}".format(
+ len(self.file_list), file_list))
+
+ def iterator(self):
+ self._epoch += 1
+ self._pos = 0
+ files = copy.deepcopy(self.file_list)
+ if self.shuffle:
+ random.shuffle(files)
+ files = files[:self.num_samples]
+ self.num_samples = len(files)
+ for f in files:
+ label_path = f[1]
+ sample = [f[0], None, label_path]
+ yield sample
diff --git a/paddlex/cv/datasets/shared_queue/__init__.py b/paddlex/cv/datasets/shared_queue/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..f4c3990e67d6ade96d20abd1aa34b34b1ff891cb
--- /dev/null
+++ b/paddlex/cv/datasets/shared_queue/__init__.py
@@ -0,0 +1,25 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+__all__ = ['SharedBuffer', 'SharedMemoryMgr', 'SharedQueue']
+
+from .sharedmemory import SharedBuffer
+from .sharedmemory import SharedMemoryMgr
+from .sharedmemory import SharedMemoryError
+from .queue import SharedQueue
diff --git a/paddlex/cv/datasets/shared_queue/queue.py b/paddlex/cv/datasets/shared_queue/queue.py
new file mode 100644
index 0000000000000000000000000000000000000000..157df0a51ee3d552c810bafe5e826c1072c75649
--- /dev/null
+++ b/paddlex/cv/datasets/shared_queue/queue.py
@@ -0,0 +1,102 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import sys
+import six
+if six.PY3:
+ import pickle
+ from io import BytesIO as StringIO
+else:
+ import cPickle as pickle
+ from cStringIO import StringIO
+
+import logging
+import traceback
+import multiprocessing as mp
+from multiprocessing.queues import Queue
+from .sharedmemory import SharedMemoryMgr
+
+logger = logging.getLogger(__name__)
+
+
+class SharedQueueError(ValueError):
+ """ SharedQueueError
+ """
+ pass
+
+
+class SharedQueue(Queue):
+ """ a Queue based on shared memory to communicate data between Process,
+ and it's interface is compatible with 'multiprocessing.queues.Queue'
+ """
+
+ def __init__(self, maxsize=0, mem_mgr=None, memsize=None, pagesize=None):
+ """ init
+ """
+ if six.PY3:
+ super(SharedQueue, self).__init__(maxsize, ctx=mp.get_context())
+ else:
+ super(SharedQueue, self).__init__(maxsize)
+
+ if mem_mgr is not None:
+ self._shared_mem = mem_mgr
+ else:
+ self._shared_mem = SharedMemoryMgr(
+ capacity=memsize, pagesize=pagesize)
+
+ def put(self, obj, **kwargs):
+ """ put an object to this queue
+ """
+ obj = pickle.dumps(obj, -1)
+ buff = None
+ try:
+ buff = self._shared_mem.malloc(len(obj))
+ buff.put(obj)
+ super(SharedQueue, self).put(buff, **kwargs)
+ except Exception as e:
+ stack_info = traceback.format_exc()
+ err_msg = 'failed to put a element to SharedQueue '\
+ 'with stack info[%s]' % (stack_info)
+ logger.warn(err_msg)
+
+ if buff is not None:
+ buff.free()
+ raise e
+
+ def get(self, **kwargs):
+ """ get an object from this queue
+ """
+ buff = None
+ try:
+ buff = super(SharedQueue, self).get(**kwargs)
+ data = buff.get()
+ return pickle.load(StringIO(data))
+ except Exception as e:
+ stack_info = traceback.format_exc()
+ err_msg = 'failed to get element from SharedQueue '\
+ 'with stack info[%s]' % (stack_info)
+ logger.warn(err_msg)
+ raise e
+ finally:
+ if buff is not None:
+ buff.free()
+
+ def release(self):
+ self._shared_mem.release()
+ self._shared_mem = None
diff --git a/paddlex/cv/datasets/shared_queue/sharedmemory.py b/paddlex/cv/datasets/shared_queue/sharedmemory.py
new file mode 100644
index 0000000000000000000000000000000000000000..2712fc42b728ee87bf4413fab869cbc9e7609029
--- /dev/null
+++ b/paddlex/cv/datasets/shared_queue/sharedmemory.py
@@ -0,0 +1,535 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# utils for memory management which is allocated on sharedmemory,
+# note that these structures may not be thread-safe
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import os
+import time
+import math
+import struct
+import sys
+import six
+
+if six.PY3:
+ import pickle
+else:
+ import cPickle as pickle
+
+import json
+import uuid
+import random
+import numpy as np
+import weakref
+import logging
+from multiprocessing import Lock
+from multiprocessing import RawArray
+
+logger = logging.getLogger(__name__)
+
+
+class SharedMemoryError(ValueError):
+ """ SharedMemoryError
+ """
+ pass
+
+
+class SharedBufferError(SharedMemoryError):
+ """ SharedBufferError
+ """
+ pass
+
+
+class MemoryFullError(SharedMemoryError):
+ """ MemoryFullError
+ """
+
+ def __init__(self, errmsg=''):
+ super(MemoryFullError, self).__init__()
+ self.errmsg = errmsg
+
+
+def memcopy(dst, src, offset=0, length=None):
+ """ copy data from 'src' to 'dst' in bytes
+ """
+ length = length if length is not None else len(src)
+ assert type(dst) == np.ndarray, 'invalid type for "dst" in memcopy'
+ if type(src) is not np.ndarray:
+ if type(src) is str and six.PY3:
+ src = src.encode()
+ src = np.frombuffer(src, dtype='uint8', count=len(src))
+
+ dst[:] = src[offset:offset + length]
+
+
+class SharedBuffer(object):
+ """ Buffer allocated from SharedMemoryMgr, and it stores data on shared memory
+
+ note that:
+ every instance of this should be freed explicitely by calling 'self.free'
+ """
+
+ def __init__(self, owner, capacity, pos, size=0, alloc_status=''):
+ """ Init
+
+ Args:
+ owner (str): manager to own this buffer
+ capacity (int): capacity in bytes for this buffer
+ pos (int): page position in shared memory
+ size (int): bytes already used
+ alloc_status (str): debug info about allocator when allocate this
+ """
+ self._owner = owner
+ self._cap = capacity
+ self._pos = pos
+ self._size = size
+ self._alloc_status = alloc_status
+ assert self._pos >= 0 and self._cap > 0, \
+ "invalid params[%d:%d] to construct SharedBuffer" \
+ % (self._pos, self._cap)
+
+ def owner(self):
+ """ get owner
+ """
+ return SharedMemoryMgr.get_mgr(self._owner)
+
+ def put(self, data, override=False):
+ """ put data to this buffer
+
+ Args:
+ data (str): data to be stored in this buffer
+
+ Returns:
+ None
+
+ Raises:
+ SharedMemoryError when not enough space in this buffer
+ """
+ assert type(data) in [str, bytes], \
+ 'invalid type[%s] for SharedBuffer::put' % (str(type(data)))
+ if self._size > 0 and not override:
+ raise SharedBufferError('already has already been setted before')
+
+ if self.capacity() < len(data):
+ raise SharedBufferError('data[%d] is larger than size of buffer[%s]'\
+ % (len(data), str(self)))
+
+ self.owner().put_data(self, data)
+ self._size = len(data)
+
+ def get(self, offset=0, size=None, no_copy=True):
+ """ get the data stored this buffer
+
+ Args:
+ offset (int): position for the start point to 'get'
+ size (int): size to get
+
+ Returns:
+ data (np.ndarray('uint8')): user's data in numpy
+ which is passed in by 'put'
+ None: if no data stored in
+ """
+ offset = offset if offset >= 0 else self._size + offset
+ if self._size <= 0:
+ return None
+
+ size = self._size if size is None else size
+ assert offset + size <= self._cap, 'invalid offset[%d] '\
+ 'or size[%d] for capacity[%d]' % (offset, size, self._cap)
+ return self.owner().get_data(self, offset, size, no_copy=no_copy)
+
+ def size(self):
+ """ bytes of used memory
+ """
+ return self._size
+
+ def resize(self, size):
+ """ resize the used memory to 'size', should not be greater than capacity
+ """
+ assert size >= 0 and size <= self._cap, \
+ "invalid size[%d] for resize" % (size)
+
+ self._size = size
+
+ def capacity(self):
+ """ size of allocated memory
+ """
+ return self._cap
+
+ def __str__(self):
+ """ human readable format
+ """
+ return "SharedBuffer(owner:%s, pos:%d, size:%d, "\
+ "capacity:%d, alloc_status:[%s], pid:%d)" \
+ % (str(self._owner), self._pos, self._size, \
+ self._cap, self._alloc_status, os.getpid())
+
+ def free(self):
+ """ free this buffer to it's owner
+ """
+ if self._owner is not None:
+ self.owner().free(self)
+ self._owner = None
+ self._cap = 0
+ self._pos = -1
+ self._size = 0
+ return True
+ else:
+ return False
+
+
+class PageAllocator(object):
+ """ allocator used to malloc and free shared memory which
+ is split into pages
+ """
+ s_allocator_header = 12
+
+ def __init__(self, base, total_pages, page_size):
+ """ init
+ """
+ self._magic_num = 1234321000 + random.randint(100, 999)
+ self._base = base
+ self._total_pages = total_pages
+ self._page_size = page_size
+
+ header_pages = int(
+ math.ceil((total_pages + self.s_allocator_header) / page_size))
+
+ self._header_pages = header_pages
+ self._free_pages = total_pages - header_pages
+ self._header_size = self._header_pages * page_size
+ self._reset()
+
+ def _dump_alloc_info(self, fname):
+ hpages, tpages, pos, used = self.header()
+
+ start = self.s_allocator_header
+ end = start + self._page_size * hpages
+ alloc_flags = self._base[start:end].tostring()
+ info = {
+ 'magic_num': self._magic_num,
+ 'header_pages': hpages,
+ 'total_pages': tpages,
+ 'pos': pos,
+ 'used': used
+ }
+ info['alloc_flags'] = alloc_flags
+ fname = fname + '.' + str(uuid.uuid4())[:6]
+ with open(fname, 'wb') as f:
+ f.write(pickle.dumps(info, -1))
+ logger.warn('dump alloc info to file[%s]' % (fname))
+
+ def _reset(self):
+ alloc_page_pos = self._header_pages
+ used_pages = self._header_pages
+ header_info = struct.pack(
+ str('III'), self._magic_num, alloc_page_pos, used_pages)
+ assert len(header_info) == self.s_allocator_header, \
+ 'invalid size of header_info'
+
+ memcopy(self._base[0:self.s_allocator_header], header_info)
+ self.set_page_status(0, self._header_pages, '1')
+ self.set_page_status(self._header_pages, self._free_pages, '0')
+
+ def header(self):
+ """ get header info of this allocator
+ """
+ header_str = self._base[0:self.s_allocator_header].tostring()
+ magic, pos, used = struct.unpack(str('III'), header_str)
+
+ assert magic == self._magic_num, \
+ 'invalid header magic[%d] in shared memory' % (magic)
+ return self._header_pages, self._total_pages, pos, used
+
+ def empty(self):
+ """ are all allocatable pages available
+ """
+ header_pages, pages, pos, used = self.header()
+ return header_pages == used
+
+ def full(self):
+ """ are all allocatable pages used
+ """
+ header_pages, pages, pos, used = self.header()
+ return header_pages + used == pages
+
+ def __str__(self):
+ header_pages, pages, pos, used = self.header()
+ desc = '{page_info[magic:%d,total:%d,used:%d,header:%d,alloc_pos:%d,pagesize:%d]}' \
+ % (self._magic_num, pages, used, header_pages, pos, self._page_size)
+ return 'PageAllocator:%s' % (desc)
+
+ def set_alloc_info(self, alloc_pos, used_pages):
+ """ set allocating position to new value
+ """
+ memcopy(self._base[4:12], struct.pack(
+ str('II'), alloc_pos, used_pages))
+
+ def set_page_status(self, start, page_num, status):
+ """ set pages from 'start' to 'end' with new same status 'status'
+ """
+ assert status in ['0', '1'], 'invalid status[%s] for page status '\
+ 'in allocator[%s]' % (status, str(self))
+ start += self.s_allocator_header
+ end = start + page_num
+ assert start >= 0 and end <= self._header_size, 'invalid end[%d] of pages '\
+ 'in allocator[%s]' % (end, str(self))
+ memcopy(self._base[start:end], str(status * page_num))
+
+ def get_page_status(self, start, page_num, ret_flag=False):
+ start += self.s_allocator_header
+ end = start + page_num
+ assert start >= 0 and end <= self._header_size, 'invalid end[%d] of pages '\
+ 'in allocator[%s]' % (end, str(self))
+ status = self._base[start:end].tostring().decode()
+ if ret_flag:
+ return status
+
+ zero_num = status.count('0')
+ if zero_num == 0:
+ return (page_num, 1)
+ else:
+ return (zero_num, 0)
+
+ def malloc_page(self, page_num):
+ header_pages, pages, pos, used = self.header()
+ end = pos + page_num
+ if end > pages:
+ pos = self._header_pages
+ end = pos + page_num
+
+ start_pos = pos
+ flags = ''
+ while True:
+ # maybe flags already has some '0' pages,
+ # so just check 'page_num - len(flags)' pages
+ flags = self.get_page_status(pos, page_num, ret_flag=True)
+
+ if flags.count('0') == page_num:
+ break
+
+ # not found enough pages, so shift to next few pages
+ free_pos = flags.rfind('1') + 1
+ pos += free_pos
+ end = pos + page_num
+ if end > pages:
+ pos = self._header_pages
+ end = pos + page_num
+ flags = ''
+
+ # not found available pages after scan all pages
+ if pos <= start_pos and end >= start_pos:
+ logger.debug('not found available pages after scan all pages')
+ break
+
+ page_status = (flags.count('0'), 0)
+ if page_status != (page_num, 0):
+ free_pages = self._total_pages - used
+ if free_pages == 0:
+ err_msg = 'all pages have been used:%s' % (str(self))
+ else:
+ err_msg = 'not found available pages with page_status[%s] '\
+ 'and %d free pages' % (str(page_status), free_pages)
+ err_msg = 'failed to malloc %d pages at pos[%d] for reason[%s] and allocator status[%s]' \
+ % (page_num, pos, err_msg, str(self))
+ raise MemoryFullError(err_msg)
+
+ self.set_page_status(pos, page_num, '1')
+ used += page_num
+ self.set_alloc_info(end, used)
+ return pos
+
+ def free_page(self, start, page_num):
+ """ free 'page_num' pages start from 'start'
+ """
+ page_status = self.get_page_status(start, page_num)
+ assert page_status == (page_num, 1), \
+ 'invalid status[%s] when free [%d, %d]' \
+ % (str(page_status), start, page_num)
+ self.set_page_status(start, page_num, '0')
+ _, _, pos, used = self.header()
+ used -= page_num
+ self.set_alloc_info(pos, used)
+
+
+DEFAULT_SHARED_MEMORY_SIZE = 1024 * 1024 * 1024
+
+
+class SharedMemoryMgr(object):
+ """ manage a continouse block of memory, provide
+ 'malloc' to allocate new buffer, and 'free' to free buffer
+ """
+ s_memory_mgrs = weakref.WeakValueDictionary()
+ s_mgr_num = 0
+ s_log_statis = False
+
+ @classmethod
+ def get_mgr(cls, id):
+ """ get a SharedMemoryMgr with size of 'capacity'
+ """
+ assert id in cls.s_memory_mgrs, 'invalid id[%s] for memory managers' % (
+ id)
+ return cls.s_memory_mgrs[id]
+
+ def __init__(self, capacity=None, pagesize=None):
+ """ init
+ """
+ logger.debug('create SharedMemoryMgr')
+
+ pagesize = 64 * 1024 if pagesize is None else pagesize
+ assert type(pagesize) is int, "invalid type of pagesize[%s]" \
+ % (str(pagesize))
+
+ capacity = DEFAULT_SHARED_MEMORY_SIZE if capacity is None else capacity
+ assert type(capacity) is int, "invalid type of capacity[%s]" \
+ % (str(capacity))
+
+ assert capacity > 0, '"size of shared memory should be greater than 0'
+ self._released = False
+ self._cap = capacity
+ self._page_size = pagesize
+
+ assert self._cap % self._page_size == 0, \
+ "capacity[%d] and pagesize[%d] are not consistent" \
+ % (self._cap, self._page_size)
+ self._total_pages = self._cap // self._page_size
+
+ self._pid = os.getpid()
+ SharedMemoryMgr.s_mgr_num += 1
+ self._id = self._pid * 100 + SharedMemoryMgr.s_mgr_num
+ SharedMemoryMgr.s_memory_mgrs[self._id] = self
+ self._locker = Lock()
+ self._setup()
+
+ def _setup(self):
+ self._shared_mem = RawArray('c', self._cap)
+ self._base = np.frombuffer(
+ self._shared_mem, dtype='uint8', count=self._cap)
+ self._locker.acquire()
+ try:
+ self._allocator = PageAllocator(self._base, self._total_pages,
+ self._page_size)
+ finally:
+ self._locker.release()
+
+ def malloc(self, size, wait=True):
+ """ malloc a new SharedBuffer
+
+ Args:
+ size (int): buffer size to be malloc
+ wait (bool): whether to wait when no enough memory
+
+ Returns:
+ SharedBuffer
+
+ Raises:
+ SharedMemoryError when not found available memory
+ """
+ page_num = int(math.ceil(size / self._page_size))
+ size = page_num * self._page_size
+
+ start = None
+ ct = 0
+ errmsg = ''
+ while True:
+ self._locker.acquire()
+ try:
+ start = self._allocator.malloc_page(page_num)
+ alloc_status = str(self._allocator)
+ except MemoryFullError as e:
+ start = None
+ errmsg = e.errmsg
+ if not wait:
+ raise e
+ finally:
+ self._locker.release()
+
+ if start is None:
+ time.sleep(0.1)
+ if ct % 100 == 0:
+ logger.warn('not enough space for reason[%s]' % (errmsg))
+
+ ct += 1
+ else:
+ break
+
+ return SharedBuffer(self._id, size, start, alloc_status=alloc_status)
+
+ def free(self, shared_buf):
+ """ free a SharedBuffer
+
+ Args:
+ shared_buf (SharedBuffer): buffer to be freed
+
+ Returns:
+ None
+
+ Raises:
+ SharedMemoryError when failed to release this buffer
+ """
+ assert shared_buf._owner == self._id, "invalid shared_buf[%s] "\
+ "for it's not allocated from me[%s]" % (str(shared_buf), str(self))
+ cap = shared_buf.capacity()
+ start_page = shared_buf._pos
+ page_num = cap // self._page_size
+
+ #maybe we don't need this lock here
+ self._locker.acquire()
+ try:
+ self._allocator.free_page(start_page, page_num)
+ finally:
+ self._locker.release()
+
+ def put_data(self, shared_buf, data):
+ """ fill 'data' into 'shared_buf'
+ """
+ assert len(data) <= shared_buf.capacity(), 'too large data[%d] '\
+ 'for this buffer[%s]' % (len(data), str(shared_buf))
+ start = shared_buf._pos * self._page_size
+ end = start + len(data)
+ assert start >= 0 and end <= self._cap, "invalid start "\
+ "position[%d] when put data to buff:%s" % (start, str(shared_buf))
+ self._base[start:end] = np.frombuffer(data, 'uint8', len(data))
+
+ def get_data(self, shared_buf, offset, size, no_copy=True):
+ """ extract 'data' from 'shared_buf' in range [offset, offset + size)
+ """
+ start = shared_buf._pos * self._page_size
+ start += offset
+ if no_copy:
+ return self._base[start:start + size]
+ else:
+ return self._base[start:start + size].tostring()
+
+ def __str__(self):
+ return 'SharedMemoryMgr:{id:%d, %s}' % (self._id, str(self._allocator))
+
+ def __del__(self):
+ if SharedMemoryMgr.s_log_statis:
+ logger.info('destroy [%s]' % (self))
+
+ if not self._released and not self._allocator.empty():
+ logger.debug(
+ 'not empty when delete this SharedMemoryMgr[%s]' % (self))
+ else:
+ self._released = True
+
+ if self._id in SharedMemoryMgr.s_memory_mgrs:
+ del SharedMemoryMgr.s_memory_mgrs[self._id]
+ SharedMemoryMgr.s_mgr_num -= 1
diff --git a/paddlex/cv/datasets/voc.py b/paddlex/cv/datasets/voc.py
new file mode 100644
index 0000000000000000000000000000000000000000..6ab985fed760001f06d499987baf5d5c6b4dd049
--- /dev/null
+++ b/paddlex/cv/datasets/voc.py
@@ -0,0 +1,207 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+import copy
+import os.path as osp
+import random
+import numpy as np
+import xml.etree.ElementTree as ET
+from pycocotools.coco import COCO
+import paddlex.utils.logging as logging
+from .dataset import Dataset
+from .dataset import is_pic
+from .dataset import get_encoding
+
+
+class VOCDetection(Dataset):
+ """读取PascalVOC格式的检测数据集,并对样本进行相应的处理。
+
+ Args:
+ data_dir (str): 数据集所在的目录路径。
+ file_list (str): 描述数据集图片文件和对应标注文件的文件路径(文本内每行路径为相对data_dir的相对路)。
+ label_list (str): 描述数据集包含的类别信息文件路径。
+ transforms (paddlex.det.transforms): 数据集中每个样本的预处理/增强算子。
+ num_workers (int|str): 数据集中样本在预处理过程中的线程或进程数。默认为'auto'。当设为'auto'时,根据
+ 系统的实际CPU核数设置`num_workers`: 如果CPU核数的一半大于8,则`num_workers`为8,否则为CPU核数的
+ 一半。
+ buffer_size (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。
+ parallel_method (str): 数据集中样本在预处理过程中并行处理的方式,支持'thread'
+ 线程和'process'进程两种方式。默认为'thread'(Windows和Mac下会强制使用thread,该参数无效)。
+ shuffle (bool): 是否需要对数据集中样本打乱顺序。默认为False。
+ """
+
+ def __init__(self,
+ data_dir,
+ file_list,
+ label_list,
+ transforms=None,
+ num_workers='auto',
+ buffer_size=100,
+ parallel_method='process',
+ shuffle=False):
+ super(VOCDetection, self).__init__(
+ transforms=transforms,
+ num_workers=num_workers,
+ buffer_size=buffer_size,
+ parallel_method=parallel_method,
+ shuffle=shuffle)
+ self.file_list = list()
+ self.labels = list()
+ self._epoch = 0
+
+ annotations = {}
+ annotations['images'] = []
+ annotations['categories'] = []
+ annotations['annotations'] = []
+
+ cname2cid = {}
+ label_id = 1
+ with open(label_list, 'r', encoding=get_encoding(label_list)) as fr:
+ for line in fr.readlines():
+ cname2cid[line.strip()] = label_id
+ label_id += 1
+ self.labels.append(line.strip())
+ logging.info("Starting to read file list from dataset...")
+ for k, v in cname2cid.items():
+ annotations['categories'].append({
+ 'supercategory': 'component',
+ 'id': v,
+ 'name': k
+ })
+ ct = 0
+ ann_ct = 0
+ with open(file_list, 'r', encoding=get_encoding(file_list)) as fr:
+ while True:
+ line = fr.readline()
+ if not line:
+ break
+ img_file, xml_file = [osp.join(data_dir, x) \
+ for x in line.strip().split()[:2]]
+ if not is_pic(img_file):
+ continue
+ if not osp.isfile(xml_file):
+ continue
+ if not osp.exists(img_file):
+ raise IOError(
+ 'The image file {} is not exist!'.format(img_file))
+ tree = ET.parse(xml_file)
+ if tree.find('id') is None:
+ im_id = np.array([ct])
+ else:
+ ct = int(tree.find('id').text)
+ im_id = np.array([int(tree.find('id').text)])
+
+ objs = tree.findall('object')
+ im_w = float(tree.find('size').find('width').text)
+ im_h = float(tree.find('size').find('height').text)
+ gt_bbox = np.zeros((len(objs), 4), dtype=np.float32)
+ gt_class = np.zeros((len(objs), 1), dtype=np.int32)
+ gt_score = np.ones((len(objs), 1), dtype=np.float32)
+ is_crowd = np.zeros((len(objs), 1), dtype=np.int32)
+ difficult = np.zeros((len(objs), 1), dtype=np.int32)
+ for i, obj in enumerate(objs):
+ cname = obj.find('name').text
+ gt_class[i][0] = cname2cid[cname]
+ _difficult = int(obj.find('difficult').text)
+ x1 = float(obj.find('bndbox').find('xmin').text)
+ y1 = float(obj.find('bndbox').find('ymin').text)
+ x2 = float(obj.find('bndbox').find('xmax').text)
+ y2 = float(obj.find('bndbox').find('ymax').text)
+ x1 = max(0, x1)
+ y1 = max(0, y1)
+ x2 = min(im_w - 1, x2)
+ y2 = min(im_h - 1, y2)
+ gt_bbox[i] = [x1, y1, x2, y2]
+ is_crowd[i][0] = 0
+ difficult[i][0] = _difficult
+ annotations['annotations'].append({
+ 'iscrowd':
+ 0,
+ 'image_id':
+ int(im_id[0]),
+ 'bbox': [x1, y1, x2 - x1 + 1, y2 - y1 + 1],
+ 'area':
+ float((x2 - x1 + 1) * (y2 - y1 + 1)),
+ 'category_id':
+ cname2cid[cname],
+ 'id':
+ ann_ct,
+ 'difficult':
+ _difficult
+ })
+ ann_ct += 1
+
+ im_info = {
+ 'im_id': im_id,
+ 'origin_shape': np.array([im_h, im_w]).astype('int32'),
+ }
+ label_info = {
+ 'is_crowd': is_crowd,
+ 'gt_class': gt_class,
+ 'gt_bbox': gt_bbox,
+ 'gt_score': gt_score,
+ 'gt_poly': [],
+ 'difficult': difficult
+ }
+ voc_rec = (im_info, label_info)
+ if len(objs) != 0:
+ self.file_list.append([img_file, voc_rec])
+ ct += 1
+ annotations['images'].append({
+ 'height':
+ im_h,
+ 'width':
+ im_w,
+ 'id':
+ int(im_id[0]),
+ 'file_name':
+ osp.split(img_file)[1]
+ })
+
+ if not len(self.file_list) > 0:
+ raise Exception('not found any voc record in %s' % (file_list))
+ logging.info("{} samples in file {}".format(
+ len(self.file_list), file_list))
+ self.num_samples = len(self.file_list)
+ self.coco_gt = COCO()
+ self.coco_gt.dataset = annotations
+ self.coco_gt.createIndex()
+
+ def iterator(self):
+ self._epoch += 1
+ self._pos = 0
+ files = copy.deepcopy(self.file_list)
+ if self.shuffle:
+ random.shuffle(files)
+ files = files[:self.num_samples]
+ self.num_samples = len(files)
+ for f in files:
+ records = f[1]
+ im_info = copy.deepcopy(records[0])
+ label_info = copy.deepcopy(records[1])
+ im_info['epoch'] = self._epoch
+ if self.num_samples > 1:
+ mix_idx = random.randint(1, self.num_samples - 1)
+ mix_pos = (mix_idx + self._pos) % self.num_samples
+ else:
+ mix_pos = 0
+ im_info['mixup'] = [
+ files[mix_pos][0],
+ copy.deepcopy(files[mix_pos][1][0]),
+ copy.deepcopy(files[mix_pos][1][1])
+ ]
+ self._pos += 1
+ sample = [f[0], im_info, label_info]
+ yield sample
diff --git a/paddlex/cv/models/__init__.py b/paddlex/cv/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..3ba26a7308fcb37e5483bc8b79dc84ad3c408f55
--- /dev/null
+++ b/paddlex/cv/models/__init__.py
@@ -0,0 +1,40 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .classifier import BaseClassifier
+from .classifier import ResNet18
+from .classifier import ResNet34
+from .classifier import ResNet50
+from .classifier import ResNet101
+from .classifier import ResNet50_vd
+from .classifier import ResNet101_vd
+from .classifier import DarkNet53
+from .classifier import MobileNetV1
+from .classifier import MobileNetV2
+from .classifier import MobileNetV3_small
+from .classifier import MobileNetV3_large
+from .classifier import Xception41
+from .classifier import Xception65
+from .classifier import DenseNet121
+from .classifier import DenseNet161
+from .classifier import DenseNet201
+from .classifier import ShuffleNetV2
+from .base import BaseAPI
+from .yolo_v3 import YOLOv3
+from .faster_rcnn import FasterRCNN
+from .mask_rcnn import MaskRCNN
+from .unet import UNet
+from .deeplabv3p import DeepLabv3p
+from .load_model import load_model
+from .slim import prune
diff --git a/paddlex/cv/models/base.py b/paddlex/cv/models/base.py
new file mode 100644
index 0000000000000000000000000000000000000000..0acba25ec8fa40d456557545ecb3226f89b1d81c
--- /dev/null
+++ b/paddlex/cv/models/base.py
@@ -0,0 +1,509 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+from __future__ import absolute_import
+import paddle.fluid as fluid
+import os
+import numpy as np
+import time
+import math
+import yaml
+import copy
+import json
+import functools
+import paddlex.utils.logging as logging
+from paddlex.utils import seconds_to_hms
+import paddlex
+from collections import OrderedDict
+from os import path as osp
+from paddle.fluid.framework import Program
+from .utils.pretrain_weights import get_pretrain_weights
+
+
+def dict2str(dict_input):
+ out = ''
+ for k, v in dict_input.items():
+ try:
+ v = round(float(v), 6)
+ except:
+ pass
+ out = out + '{}={}, '.format(k, v)
+ return out.strip(', ')
+
+
+class BaseAPI:
+ def __init__(self, model_type):
+ self.model_type = model_type
+ # 现有的CV模型都有这个属性,而这个属且也需要在eval时用到
+ self.num_classes = None
+ self.labels = None
+ self.version = paddlex.__version__
+ if paddlex.env_info['place'] == 'cpu':
+ self.places = fluid.cpu_places()
+ else:
+ self.places = fluid.cuda_places()
+ self.exe = fluid.Executor(self.places[0])
+ self.train_prog = None
+ self.test_prog = None
+ self.parallel_train_prog = None
+ self.train_inputs = None
+ self.test_inputs = None
+ self.train_outputs = None
+ self.test_outputs = None
+ self.train_data_loader = None
+ self.eval_metrics = None
+ # 若模型是从inference model加载进来的,无法调用训练接口进行训练
+ self.trainable = True
+ # 是否使用多卡间同步BatchNorm均值和方差
+ self.sync_bn = False
+ # 当前模型状态
+ self.status = 'Normal'
+
+ def _get_single_card_bs(self, batch_size):
+ if batch_size % len(self.places) == 0:
+ return int(batch_size // len(self.places))
+ else:
+ raise Exception("Please support correct batch_size, \
+ which can be divided by available cards({}) in {}".
+ format(paddlex.env_info['num'],
+ paddlex.env_info['place']))
+
+ def build_program(self):
+ # 构建训练网络
+ self.train_inputs, self.train_outputs = self.build_net(mode='train')
+ self.train_prog = fluid.default_main_program()
+ startup_prog = fluid.default_startup_program()
+
+ # 构建预测网络
+ self.test_prog = fluid.Program()
+ with fluid.program_guard(self.test_prog, startup_prog):
+ with fluid.unique_name.guard():
+ self.test_inputs, self.test_outputs = self.build_net(
+ mode='test')
+ self.test_prog = self.test_prog.clone(for_test=True)
+
+ def arrange_transforms(self, transforms, mode='train'):
+ # 给transforms添加arrange操作
+ if self.model_type == 'classifier':
+ arrange_transform = paddlex.cls.transforms.ArrangeClassifier
+ elif self.model_type == 'segmenter':
+ arrange_transform = paddlex.seg.transforms.ArrangeSegmenter
+ elif self.model_type == 'detector':
+ arrange_name = 'Arrange{}'.format(self.__class__.__name__)
+ arrange_transform = getattr(paddlex.det.transforms, arrange_name)
+ else:
+ raise Exception("Unrecognized model type: {}".format(
+ self.model_type))
+ if type(transforms.transforms[-1]).__name__.startswith('Arrange'):
+ transforms.transforms[-1] = arrange_transform(mode=mode)
+ else:
+ transforms.transforms.append(arrange_transform(mode=mode))
+
+ def build_train_data_loader(self, dataset, batch_size):
+ # 初始化data_loader
+ if self.train_data_loader is None:
+ self.train_data_loader = fluid.io.DataLoader.from_generator(
+ feed_list=list(self.train_inputs.values()),
+ capacity=64,
+ use_double_buffer=True,
+ iterable=True)
+ batch_size_each_gpu = self._get_single_card_bs(batch_size)
+ generator = dataset.generator(
+ batch_size=batch_size_each_gpu, drop_last=True)
+ self.train_data_loader.set_sample_list_generator(
+ dataset.generator(batch_size=batch_size_each_gpu),
+ places=self.places)
+
+ def export_quant_model(self,
+ dataset,
+ save_dir,
+ batch_size=1,
+ batch_num=10,
+ cache_dir="./temp"):
+ self.arrange_transforms(transforms=dataset.transforms, mode='quant')
+ dataset.num_samples = batch_size * batch_num
+ try:
+ from .slim.post_quantization import PaddleXPostTrainingQuantization
+ except:
+ raise Exception(
+ "Model Quantization is not available, try to upgrade your paddlepaddle>=1.7.0"
+ )
+ is_use_cache_file = True
+ if cache_dir is None:
+ is_use_cache_file = False
+ post_training_quantization = PaddleXPostTrainingQuantization(
+ executor=self.exe,
+ dataset=dataset,
+ program=self.test_prog,
+ inputs=self.test_inputs,
+ outputs=self.test_outputs,
+ batch_size=batch_size,
+ batch_nums=batch_num,
+ scope=None,
+ algo='KL',
+ quantizable_op_type=["conv2d", "depthwise_conv2d", "mul"],
+ is_full_quantize=False,
+ is_use_cache_file=is_use_cache_file,
+ cache_dir=cache_dir)
+ post_training_quantization.quantize()
+ post_training_quantization.save_quantized_model(save_dir)
+ model_info = self.get_model_info()
+ model_info['status'] = 'Quant'
+
+ # 保存模型输出的变量描述
+ model_info['_ModelInputsOutputs'] = dict()
+ model_info['_ModelInputsOutputs']['test_inputs'] = [
+ [k, v.name] for k, v in self.test_inputs.items()
+ ]
+ model_info['_ModelInputsOutputs']['test_outputs'] = [
+ [k, v.name] for k, v in self.test_outputs.items()
+ ]
+
+ with open(
+ osp.join(save_dir, 'model.yml'), encoding='utf-8',
+ mode='w') as f:
+ yaml.dump(model_info, f)
+
+ def net_initialize(self,
+ startup_prog=None,
+ pretrain_weights=None,
+ fuse_bn=False,
+ save_dir='.',
+ sensitivities_file=None,
+ eval_metric_loss=0.05):
+ pretrain_dir = osp.join(save_dir, 'pretrain')
+ if not os.path.isdir(pretrain_dir):
+ if os.path.exists(pretrain_dir):
+ os.remove(pretrain_dir)
+ os.makedirs(pretrain_dir)
+ if hasattr(self, 'backbone'):
+ backbone = self.backbone
+ else:
+ backbone = self.__class__.__name__
+ pretrain_weights = get_pretrain_weights(
+ pretrain_weights, self.model_type, backbone, pretrain_dir)
+ if startup_prog is None:
+ startup_prog = fluid.default_startup_program()
+ self.exe.run(startup_prog)
+ if pretrain_weights is not None:
+ logging.info(
+ "Load pretrain weights from {}.".format(pretrain_weights))
+ paddlex.utils.utils.load_pretrain_weights(
+ self.exe, self.train_prog, pretrain_weights, fuse_bn)
+ # 进行裁剪
+ if sensitivities_file is not None:
+ from .slim.prune_config import get_sensitivities
+ sensitivities_file = get_sensitivities(sensitivities_file, self,
+ save_dir)
+ from .slim.prune import get_params_ratios, prune_program
+ prune_params_ratios = get_params_ratios(
+ sensitivities_file, eval_metric_loss=eval_metric_loss)
+ prune_program(self, prune_params_ratios)
+ self.status = 'Prune'
+
+ def get_model_info(self):
+ info = dict()
+ info['version'] = paddlex.__version__
+ info['Model'] = self.__class__.__name__
+ info['_Attributes'] = {'model_type': self.model_type}
+ if 'self' in self.init_params:
+ del self.init_params['self']
+ if '__class__' in self.init_params:
+ del self.init_params['__class__']
+ info['_init_params'] = self.init_params
+
+ info['_Attributes']['num_classes'] = self.num_classes
+ info['_Attributes']['labels'] = self.labels
+ try:
+ primary_metric_key = list(self.eval_metrics.keys())[0]
+ primary_metric_value = float(self.eval_metrics[primary_metric_key])
+ info['_Attributes']['eval_metrics'] = {
+ primary_metric_key: primary_metric_value
+ }
+ except:
+ pass
+
+ if hasattr(self.test_transforms, 'to_rgb'):
+ if self.test_transforms.to_rgb:
+ info['TransformsMode'] = 'RGB'
+ else:
+ info['TransformsMode'] = 'BGR'
+
+ if hasattr(self, 'test_transforms'):
+ if self.test_transforms is not None:
+ info['Transforms'] = list()
+ for op in self.test_transforms.transforms:
+ name = op.__class__.__name__
+ attr = op.__dict__
+ info['Transforms'].append({name: attr})
+ return info
+
+ def save_model(self, save_dir):
+ if not osp.isdir(save_dir):
+ if osp.exists(save_dir):
+ os.remove(save_dir)
+ os.makedirs(save_dir)
+ fluid.save(self.train_prog, osp.join(save_dir, 'model'))
+ model_info = self.get_model_info()
+ model_info['status'] = self.status
+ with open(
+ osp.join(save_dir, 'model.yml'), encoding='utf-8',
+ mode='w') as f:
+ yaml.dump(model_info, f)
+ # 评估结果保存
+ if hasattr(self, 'eval_details'):
+ with open(osp.join(save_dir, 'eval_details.json'), 'w') as f:
+ json.dump(self.eval_details, f)
+
+ if self.status == 'Prune':
+ # 保存裁剪的shape
+ shapes = {}
+ for block in self.train_prog.blocks:
+ for param in block.all_parameters():
+ pd_var = fluid.global_scope().find_var(param.name)
+ pd_param = pd_var.get_tensor()
+ shapes[param.name] = np.array(pd_param).shape
+ with open(
+ osp.join(save_dir, 'prune.yml'), encoding='utf-8',
+ mode='w') as f:
+ yaml.dump(shapes, f)
+
+ # 模型保存成功的标志
+ open(osp.join(save_dir, '.success'), 'w').close()
+ logging.info("Model saved in {}.".format(save_dir))
+
+ def export_inference_model(self, save_dir):
+ test_input_names = [
+ var.name for var in list(self.test_inputs.values())
+ ]
+ test_outputs = list(self.test_outputs.values())
+ if self.__class__.__name__ == 'MaskRCNN':
+ from paddlex.utils.save import save_mask_inference_model
+ save_mask_inference_model(
+ dirname=save_dir,
+ executor=self.exe,
+ params_filename='__params__',
+ feeded_var_names=test_input_names,
+ target_vars=test_outputs,
+ main_program=self.test_prog)
+ else:
+ fluid.io.save_inference_model(
+ dirname=save_dir,
+ executor=self.exe,
+ params_filename='__params__',
+ feeded_var_names=test_input_names,
+ target_vars=test_outputs,
+ main_program=self.test_prog)
+ model_info = self.get_model_info()
+ model_info['status'] = 'Infer'
+
+ # 保存模型输出的变量描述
+ model_info['_ModelInputsOutputs'] = dict()
+ model_info['_ModelInputsOutputs']['test_inputs'] = [
+ [k, v.name] for k, v in self.test_inputs.items()
+ ]
+ model_info['_ModelInputsOutputs']['test_outputs'] = [
+ [k, v.name] for k, v in self.test_outputs.items()
+ ]
+
+ with open(
+ osp.join(save_dir, 'model.yml'), encoding='utf-8',
+ mode='w') as f:
+ yaml.dump(model_info, f)
+ # 模型保存成功的标志
+ open(osp.join(save_dir, '.success'), 'w').close()
+ logging.info(
+ "Model for inference deploy saved in {}.".format(save_dir))
+
+ def train_loop(self,
+ num_epochs,
+ train_dataset,
+ train_batch_size,
+ eval_dataset=None,
+ save_interval_epochs=1,
+ log_interval_steps=10,
+ save_dir='output',
+ use_vdl=False):
+ if not osp.isdir(save_dir):
+ if osp.exists(save_dir):
+ os.remove(save_dir)
+ os.makedirs(save_dir)
+ if use_vdl:
+ from visualdl import LogWriter
+ vdl_logdir = osp.join(save_dir, 'vdl_log')
+ # 给transform添加arrange操作
+ self.arrange_transforms(
+ transforms=train_dataset.transforms, mode='train')
+ # 构建train_data_loader
+ self.build_train_data_loader(
+ dataset=train_dataset, batch_size=train_batch_size)
+
+ if eval_dataset is not None:
+ self.eval_transforms = eval_dataset.transforms
+ self.test_transforms = copy.deepcopy(eval_dataset.transforms)
+
+ # 获取实时变化的learning rate
+ lr = self.optimizer._learning_rate
+ if isinstance(lr, fluid.framework.Variable):
+ self.train_outputs['lr'] = lr
+
+ # 在多卡上跑训练
+ if self.parallel_train_prog is None:
+ build_strategy = fluid.compiler.BuildStrategy()
+ build_strategy.fuse_all_optimizer_ops = False
+ if paddlex.env_info['place'] != 'cpu' and len(self.places) > 1:
+ build_strategy.sync_batch_norm = self.sync_bn
+ exec_strategy = fluid.ExecutionStrategy()
+ exec_strategy.num_iteration_per_drop_scope = 1
+ self.parallel_train_prog = fluid.CompiledProgram(
+ self.train_prog).with_data_parallel(
+ loss_name=self.train_outputs['loss'].name,
+ build_strategy=build_strategy,
+ exec_strategy=exec_strategy)
+
+ total_num_steps = math.floor(
+ train_dataset.num_samples / train_batch_size)
+ num_steps = 0
+ time_stat = list()
+ time_train_one_epoch = None
+ time_eval_one_epoch = None
+
+ total_num_steps_eval = 0
+ # 模型总共的评估次数
+ total_eval_times = math.ceil(num_epochs / save_interval_epochs)
+ # 检测目前仅支持单卡评估,训练数据batch大小与显卡数量之商为验证数据batch大小。
+ eval_batch_size = train_batch_size
+ if self.model_type == 'detector':
+ eval_batch_size = self._get_single_card_bs(train_batch_size)
+ if eval_dataset is not None:
+ total_num_steps_eval = math.ceil(
+ eval_dataset.num_samples / eval_batch_size)
+
+ if use_vdl:
+ # VisualDL component
+ log_writer = LogWriter(vdl_logdir, sync_cycle=20)
+ train_step_component = OrderedDict()
+ eval_component = OrderedDict()
+
+ best_accuracy_key = ""
+ best_accuracy = -1.0
+ best_model_epoch = 1
+ for i in range(num_epochs):
+ records = list()
+ step_start_time = time.time()
+ epoch_start_time = time.time()
+ for step, data in enumerate(self.train_data_loader()):
+ outputs = self.exe.run(
+ self.parallel_train_prog,
+ feed=data,
+ fetch_list=list(self.train_outputs.values()))
+ outputs_avg = np.mean(np.array(outputs), axis=1)
+ records.append(outputs_avg)
+
+ # 训练完成剩余时间预估
+ current_time = time.time()
+ step_cost_time = current_time - step_start_time
+ step_start_time = current_time
+ if len(time_stat) < 20:
+ time_stat.append(step_cost_time)
+ else:
+ time_stat[num_steps % 20] = step_cost_time
+
+ # 每间隔log_interval_steps,输出loss信息
+ num_steps += 1
+ if num_steps % log_interval_steps == 0:
+ step_metrics = OrderedDict(
+ zip(list(self.train_outputs.keys()), outputs_avg))
+
+ if use_vdl:
+ for k, v in step_metrics.items():
+ if k not in train_step_component.keys():
+ with log_writer.mode('Each_Step_while_Training'
+ ) as step_logger:
+ train_step_component[
+ k] = step_logger.scalar(
+ 'Training: {}'.format(k))
+ train_step_component[k].add_record(num_steps, v)
+
+ # 估算剩余时间
+ avg_step_time = np.mean(time_stat)
+ if time_train_one_epoch is not None:
+ eta = (num_epochs - i - 1) * time_train_one_epoch + (
+ total_num_steps - step - 1) * avg_step_time
+ else:
+ eta = ((num_epochs - i) * total_num_steps - step -
+ 1) * avg_step_time
+ if time_eval_one_epoch is not None:
+ eval_eta = (total_eval_times - i //
+ save_interval_epochs) * time_eval_one_epoch
+ else:
+ eval_eta = (
+ total_eval_times - i // save_interval_epochs
+ ) * total_num_steps_eval * avg_step_time
+ eta_str = seconds_to_hms(eta + eval_eta)
+
+ logging.info(
+ "[TRAIN] Epoch={}/{}, Step={}/{}, {}, time_each_step={}s, eta={}"
+ .format(i + 1, num_epochs, step + 1, total_num_steps,
+ dict2str(step_metrics), round(
+ avg_step_time, 2), eta_str))
+ train_metrics = OrderedDict(
+ zip(list(self.train_outputs.keys()), np.mean(records, axis=0)))
+ logging.info('[TRAIN] Epoch {} finished, {} .'.format(
+ i + 1, dict2str(train_metrics)))
+ time_train_one_epoch = time.time() - epoch_start_time
+ epoch_start_time = time.time()
+
+ # 每间隔save_interval_epochs, 在验证集上评估和对模型进行保存
+ eval_epoch_start_time = time.time()
+ if (i + 1) % save_interval_epochs == 0 or i == num_epochs - 1:
+ current_save_dir = osp.join(save_dir, "epoch_{}".format(i + 1))
+ if not osp.isdir(current_save_dir):
+ os.makedirs(current_save_dir)
+ if eval_dataset is not None:
+ self.eval_metrics, self.eval_details = self.evaluate(
+ eval_dataset=eval_dataset,
+ batch_size=eval_batch_size,
+ epoch_id=i + 1,
+ return_details=True)
+ logging.info('[EVAL] Finished, Epoch={}, {} .'.format(
+ i + 1, dict2str(self.eval_metrics)))
+ # 保存最优模型
+ best_accuracy_key = list(self.eval_metrics.keys())[0]
+ current_accuracy = self.eval_metrics[best_accuracy_key]
+ if current_accuracy > best_accuracy:
+ best_accuracy = current_accuracy
+ best_model_epoch = i + 1
+ best_model_dir = osp.join(save_dir, "best_model")
+ self.save_model(save_dir=best_model_dir)
+ if use_vdl:
+ for k, v in self.eval_metrics.items():
+ if isinstance(v, list):
+ continue
+ if isinstance(v, np.ndarray):
+ if v.size > 1:
+ continue
+ if k not in eval_component:
+ with log_writer.mode('Each_Epoch_on_Eval_Data'
+ ) as eval_logger:
+ eval_component[k] = eval_logger.scalar(
+ 'Evaluation: {}'.format(k))
+ eval_component[k].add_record(i + 1, v)
+ self.save_model(save_dir=current_save_dir)
+ time_eval_one_epoch = time.time() - eval_epoch_start_time
+ eval_epoch_start_time = time.time()
+ logging.info(
+ 'Current evaluated best model in eval_dataset is epoch_{}, {}={}'
+ .format(best_model_epoch, best_accuracy_key,
+ best_accuracy))
diff --git a/paddlex/cv/models/classifier.py b/paddlex/cv/models/classifier.py
new file mode 100644
index 0000000000000000000000000000000000000000..d37553bc8e817e077d97059fba9dd868f0255659
--- /dev/null
+++ b/paddlex/cv/models/classifier.py
@@ -0,0 +1,368 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+from __future__ import absolute_import
+import numpy as np
+import time
+import math
+import tqdm
+import paddle.fluid as fluid
+import paddlex.utils.logging as logging
+from paddlex.utils import seconds_to_hms
+import paddlex
+from collections import OrderedDict
+from .base import BaseAPI
+
+
+class BaseClassifier(BaseAPI):
+ """构建分类器,并实现其训练、评估、预测和模型导出。
+
+ Args:
+ model_name (str): 分类器的模型名字,取值范围为['ResNet18',
+ 'ResNet34', 'ResNet50', 'ResNet101',
+ 'ResNet50_vd', 'ResNet101_vd', 'DarkNet53',
+ 'MobileNetV1', 'MobileNetV2', 'Xception41',
+ 'Xception65', 'Xception71']。默认为'ResNet50'。
+ num_classes (int): 类别数。默认为1000。
+ """
+
+ def __init__(self, model_name='ResNet50', num_classes=1000):
+ self.init_params = locals()
+ super(BaseClassifier, self).__init__('classifier')
+ if not hasattr(paddlex.cv.nets, str.lower(model_name)):
+ raise Exception(
+ "ERROR: There's no model named {}.".format(model_name))
+ self.model_name = model_name
+ self.labels = None
+ self.num_classes = num_classes
+
+ def build_net(self, mode='train'):
+ image = fluid.data(
+ dtype='float32', shape=[None, 3, None, None], name='image')
+ if mode != 'test':
+ label = fluid.data(dtype='int64', shape=[None, 1], name='label')
+ model = getattr(paddlex.cv.nets, str.lower(self.model_name))
+ net_out = model(image, num_classes=self.num_classes)
+ softmax_out = fluid.layers.softmax(net_out, use_cudnn=False)
+ inputs = OrderedDict([('image', image)])
+ outputs = OrderedDict([('predict', softmax_out)])
+ if mode != 'test':
+ cost = fluid.layers.cross_entropy(input=softmax_out, label=label)
+ avg_cost = fluid.layers.mean(cost)
+ acc1 = fluid.layers.accuracy(input=softmax_out, label=label, k=1)
+ k = min(5, self.num_classes)
+ acck = fluid.layers.accuracy(input=softmax_out, label=label, k=k)
+ if mode == 'train':
+ self.optimizer.minimize(avg_cost)
+ inputs = OrderedDict([('image', image), ('label', label)])
+ outputs = OrderedDict([('loss', avg_cost), ('acc1', acc1),
+ ('acc{}'.format(k), acck)])
+ if mode == 'eval':
+ del outputs['loss']
+ return inputs, outputs
+
+ def default_optimizer(self, learning_rate, lr_decay_epochs, lr_decay_gamma,
+ num_steps_each_epoch):
+ boundaries = [b * num_steps_each_epoch for b in lr_decay_epochs]
+ values = [
+ learning_rate * (lr_decay_gamma**i)
+ for i in range(len(lr_decay_epochs) + 1)
+ ]
+ lr_decay = fluid.layers.piecewise_decay(
+ boundaries=boundaries, values=values)
+ optimizer = fluid.optimizer.Momentum(
+ lr_decay,
+ momentum=0.9,
+ regularization=fluid.regularizer.L2Decay(1e-04))
+ return optimizer
+
+ def train(self,
+ num_epochs,
+ train_dataset,
+ train_batch_size=64,
+ eval_dataset=None,
+ save_interval_epochs=1,
+ log_interval_steps=2,
+ save_dir='output',
+ pretrain_weights='IMAGENET',
+ optimizer=None,
+ learning_rate=0.025,
+ lr_decay_epochs=[30, 60, 90],
+ lr_decay_gamma=0.1,
+ use_vdl=False,
+ sensitivities_file=None,
+ eval_metric_loss=0.05):
+ """训练。
+
+ Args:
+ num_epochs (int): 训练迭代轮数。
+ train_dataset (paddlex.datasets): 训练数据读取器。
+ train_batch_size (int): 训练数据batch大小。同时作为验证数据batch大小。默认值为64。
+ eval_dataset (paddlex.datasets: 验证数据读取器。
+ save_interval_epochs (int): 模型保存间隔(单位:迭代轮数)。默认为1。
+ log_interval_steps (int): 训练日志输出间隔(单位:迭代步数)。默认为2。
+ save_dir (str): 模型保存路径。
+ pretrain_weights (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',
+ 则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认为'IMAGENET'。
+ optimizer (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:
+ fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
+ learning_rate (float): 默认优化器的初始学习率。默认为0.025。
+ lr_decay_epochs (list): 默认优化器的学习率衰减轮数。默认为[30, 60, 90]。
+ lr_decay_gamma (float): 默认优化器的学习率衰减率。默认为0.1。
+ use_vdl (bool): 是否使用VisualDL进行可视化。默认值为False。
+ sensitivities_file (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',
+ 则自动下载在ImageNet图片数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
+ eval_metric_loss (float): 可容忍的精度损失。默认为0.05。
+
+ Raises:
+ ValueError: 模型从inference model进行加载。
+ """
+ if not self.trainable:
+ raise ValueError(
+ "Model is not trainable since it was loaded from a inference model."
+ )
+ self.labels = train_dataset.labels
+ if optimizer is None:
+ num_steps_each_epoch = train_dataset.num_samples // train_batch_size
+ optimizer = self.default_optimizer(
+ learning_rate=learning_rate,
+ lr_decay_epochs=lr_decay_epochs,
+ lr_decay_gamma=lr_decay_gamma,
+ num_steps_each_epoch=num_steps_each_epoch)
+ self.optimizer = optimizer
+ # 构建训练、验证、预测网络
+ self.build_program()
+ # 初始化网络权重
+ self.net_initialize(
+ startup_prog=fluid.default_startup_program(),
+ pretrain_weights=pretrain_weights,
+ save_dir=save_dir,
+ sensitivities_file=sensitivities_file,
+ eval_metric_loss=eval_metric_loss)
+
+ # 训练
+ self.train_loop(
+ num_epochs=num_epochs,
+ train_dataset=train_dataset,
+ train_batch_size=train_batch_size,
+ eval_dataset=eval_dataset,
+ save_interval_epochs=save_interval_epochs,
+ log_interval_steps=log_interval_steps,
+ save_dir=save_dir,
+ use_vdl=use_vdl)
+
+ def evaluate(self,
+ eval_dataset,
+ batch_size=1,
+ epoch_id=None,
+ return_details=False):
+ """评估。
+
+ Args:
+ eval_dataset (paddlex.datasets): 验证数据读取器。
+ batch_size (int): 验证数据批大小。默认为1。
+ epoch_id (int): 当前评估模型所在的训练轮数。
+ return_details (bool): 是否返回详细信息。
+
+ Returns:
+ dict: 当return_details为False时,返回dict, 包含关键字:'acc1'、'acc5',
+ 分别表示最大值的accuracy、前5个最大值的accuracy。
+ tuple (metrics, eval_details): 当return_details为True时,增加返回dict,
+ 包含关键字:'true_labels'、'pred_scores',分别代表真实类别id、每个类别的预测得分。
+ """
+ self.arrange_transforms(
+ transforms=eval_dataset.transforms, mode='eval')
+ data_generator = eval_dataset.generator(
+ batch_size=batch_size, drop_last=False)
+ k = min(5, self.num_classes)
+ total_steps = math.ceil(eval_dataset.num_samples * 1.0 / batch_size)
+ true_labels = list()
+ pred_scores = list()
+ if not hasattr(self, 'parallel_test_prog'):
+ self.parallel_test_prog = fluid.CompiledProgram(
+ self.test_prog).with_data_parallel(
+ share_vars_from=self.parallel_train_prog)
+ batch_size_each_gpu = self._get_single_card_bs(batch_size)
+ logging.info(
+ "Start to evaluating(total_samples={}, total_steps={})...".format(
+ eval_dataset.num_samples, total_steps))
+ for step, data in tqdm.tqdm(
+ enumerate(data_generator()), total=total_steps):
+ images = np.array([d[0] for d in data]).astype('float32')
+ labels = [d[1] for d in data]
+ num_samples = images.shape[0]
+ if num_samples < batch_size:
+ num_pad_samples = batch_size - num_samples
+ pad_images = np.tile(images[0:1], (num_pad_samples, 1, 1, 1))
+ images = np.concatenate([images, pad_images])
+ outputs = self.exe.run(
+ self.parallel_test_prog,
+ feed={'image': images},
+ fetch_list=list(self.test_outputs.values()))
+ outputs = [outputs[0][:num_samples]]
+ true_labels.extend(labels)
+ pred_scores.extend(outputs[0].tolist())
+ logging.debug("[EVAL] Epoch={}, Step={}/{}".format(
+ epoch_id, step + 1, total_steps))
+
+ pred_top1_label = np.argsort(pred_scores)[:, -1]
+ pred_topk_label = np.argsort(pred_scores)[:, -k:]
+ acc1 = sum(pred_top1_label == true_labels) / len(true_labels)
+ acck = sum(
+ [np.isin(x, y)
+ for x, y in zip(true_labels, pred_topk_label)]) / len(true_labels)
+ metrics = OrderedDict([('acc1', acc1), ('acc{}'.format(k), acck)])
+ if return_details:
+ eval_details = {
+ 'true_labels': true_labels,
+ 'pred_scores': pred_scores
+ }
+ return metrics, eval_details
+ return metrics
+
+ def predict(self, img_file, transforms=None, topk=1):
+ """预测。
+
+ Args:
+ img_file (str): 预测图像路径。
+ transforms (paddlex.cls.transforms): 数据预处理操作。
+ topk (int): 预测时前k个最大值。
+
+ Returns:
+ list: 其中元素均为字典。字典的关键字为'category_id'、'category'、'score',
+ 分别对应预测类别id、预测类别标签、预测得分。
+ """
+ if transforms is None and not hasattr(self, 'test_transforms'):
+ raise Exception("transforms need to be defined, now is None.")
+ true_topk = min(self.num_classes, topk)
+ if transforms is not None:
+ self.arrange_transforms(transforms=transforms, mode='test')
+ im = transforms(img_file)
+ else:
+ self.arrange_transforms(
+ transforms=self.test_transforms, mode='test')
+ im = self.test_transforms(img_file)
+ result = self.exe.run(
+ self.test_prog,
+ feed={'image': im},
+ fetch_list=list(self.test_outputs.values()))
+ pred_label = np.argsort(result[0][0])[::-1][:true_topk]
+ res = [{
+ 'category_id': l,
+ 'category': self.labels[l],
+ 'score': result[0][0][l]
+ } for l in pred_label]
+ return res
+
+
+class ResNet18(BaseClassifier):
+ def __init__(self, num_classes=1000):
+ super(ResNet18, self).__init__(
+ model_name='ResNet18', num_classes=num_classes)
+
+
+class ResNet34(BaseClassifier):
+ def __init__(self, num_classes=1000):
+ super(ResNet34, self).__init__(
+ model_name='ResNet34', num_classes=num_classes)
+
+
+class ResNet50(BaseClassifier):
+ def __init__(self, num_classes=1000):
+ super(ResNet50, self).__init__(
+ model_name='ResNet50', num_classes=num_classes)
+
+
+class ResNet101(BaseClassifier):
+ def __init__(self, num_classes=1000):
+ super(ResNet101, self).__init__(
+ model_name='ResNet101', num_classes=num_classes)
+
+
+class ResNet50_vd(BaseClassifier):
+ def __init__(self, num_classes=1000):
+ super(ResNet50_vd, self).__init__(
+ model_name='ResNet50_vd', num_classes=num_classes)
+
+
+class ResNet101_vd(BaseClassifier):
+ def __init__(self, num_classes=1000):
+ super(ResNet101_vd, self).__init__(
+ model_name='ResNet101_vd', num_classes=num_classes)
+
+
+class DarkNet53(BaseClassifier):
+ def __init__(self, num_classes=1000):
+ super(DarkNet53, self).__init__(
+ model_name='DarkNet53', num_classes=num_classes)
+
+
+class MobileNetV1(BaseClassifier):
+ def __init__(self, num_classes=1000):
+ super(MobileNetV1, self).__init__(
+ model_name='MobileNetV1', num_classes=num_classes)
+
+
+class MobileNetV2(BaseClassifier):
+ def __init__(self, num_classes=1000):
+ super(MobileNetV2, self).__init__(
+ model_name='MobileNetV2', num_classes=num_classes)
+
+
+class MobileNetV3_small(BaseClassifier):
+ def __init__(self, num_classes=1000):
+ super(MobileNetV3_small, self).__init__(
+ model_name='MobileNetV3_small', num_classes=num_classes)
+
+
+class MobileNetV3_large(BaseClassifier):
+ def __init__(self, num_classes=1000):
+ super(MobileNetV3_large, self).__init__(
+ model_name='MobileNetV3_large', num_classes=num_classes)
+
+
+class Xception65(BaseClassifier):
+ def __init__(self, num_classes=1000):
+ super(Xception65, self).__init__(
+ model_name='Xception65', num_classes=num_classes)
+
+
+class Xception41(BaseClassifier):
+ def __init__(self, num_classes=1000):
+ super(Xception41, self).__init__(
+ model_name='Xception41', num_classes=num_classes)
+
+
+class DenseNet121(BaseClassifier):
+ def __init__(self, num_classes=1000):
+ super(DenseNet121, self).__init__(
+ model_name='DenseNet121', num_classes=num_classes)
+
+
+class DenseNet161(BaseClassifier):
+ def __init__(self, num_classes=1000):
+ super(DenseNet161, self).__init__(
+ model_name='DenseNet161', num_classes=num_classes)
+
+
+class DenseNet201(BaseClassifier):
+ def __init__(self, num_classes=1000):
+ super(DenseNet201, self).__init__(
+ model_name='DenseNet201', num_classes=num_classes)
+
+
+class ShuffleNetV2(BaseClassifier):
+ def __init__(self, num_classes=1000):
+ super(ShuffleNetV2, self).__init__(
+ model_name='ShuffleNetV2', num_classes=num_classes)
diff --git a/paddlex/cv/models/deeplabv3p.py b/paddlex/cv/models/deeplabv3p.py
new file mode 100644
index 0000000000000000000000000000000000000000..8f46baf31542ef58adc8d1c07eae52755dd09e04
--- /dev/null
+++ b/paddlex/cv/models/deeplabv3p.py
@@ -0,0 +1,403 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+from __future__ import absolute_import
+import os.path as osp
+import numpy as np
+import tqdm
+import math
+import cv2
+import paddle.fluid as fluid
+import paddlex.utils.logging as logging
+import paddlex
+from collections import OrderedDict
+from .base import BaseAPI
+from .utils.seg_eval import ConfusionMatrix
+from .utils.visualize import visualize_segmentation
+
+
+class DeepLabv3p(BaseAPI):
+ """实现DeepLabv3+网络的构建并进行训练、评估、预测和模型导出。
+
+ Args:
+ num_classes (int): 类别数。
+ backbone (str): DeepLabv3+的backbone网络,实现特征图的计算,取值范围为['Xception65', 'Xception41',
+ 'MobileNetV2_x0.25', 'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5',
+ 'MobileNetV2_x2.0']。默认'MobileNetV2_x1.0'。
+ output_stride (int): backbone 输出特征图相对于输入的下采样倍数,一般取值为8或16。默认16。
+ aspp_with_sep_conv (bool): 在asspp模块是否采用separable convolutions。默认True。
+ decoder_use_sep_conv (bool): decoder模块是否采用separable convolutions。默认True。
+ encoder_with_aspp (bool): 是否在encoder阶段采用aspp模块。默认True。
+ enable_decoder (bool): 是否使用decoder模块。默认True。
+ use_bce_loss (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。默认False。
+ use_dice_loss (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用,
+ 当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。默认False。
+ class_weight (list/str): 交叉熵损失函数各类损失的权重。当class_weight为list的时候,长度应为
+ num_classes。当class_weight为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重
+ 自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None时,各类的权重1,
+ 即平时使用的交叉熵损失函数。
+ ignore_index (int): label上忽略的值,label为ignore_index的像素不参与损失函数的计算。默认255。
+
+ Raises:
+ ValueError: use_bce_loss或use_dice_loss为真且num_calsses > 2。
+ ValueError: backbone取值不在['Xception65', 'Xception41', 'MobileNetV2_x0.25',
+ 'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x2.0']之内。
+ ValueError: class_weight为list, 但长度不等于num_class。
+ class_weight为str, 但class_weight.low()不等于dynamic。
+ TypeError: class_weight不为None时,其类型不是list或str。
+ """
+
+ def __init__(self,
+ num_classes=2,
+ backbone='MobileNetV2_x1.0',
+ output_stride=16,
+ aspp_with_sep_conv=True,
+ decoder_use_sep_conv=True,
+ encoder_with_aspp=True,
+ enable_decoder=True,
+ use_bce_loss=False,
+ use_dice_loss=False,
+ class_weight=None,
+ ignore_index=255):
+ self.init_params = locals()
+ super(DeepLabv3p, self).__init__('segmenter')
+ # dice_loss或bce_loss只适用两类分割中
+ if num_classes > 2 and (use_bce_loss or use_dice_loss):
+ raise ValueError(
+ "dice loss and bce loss is only applicable to binary classfication"
+ )
+
+ self.output_stride = output_stride
+
+ if backbone not in [
+ 'Xception65', 'Xception41', 'MobileNetV2_x0.25',
+ 'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5',
+ 'MobileNetV2_x2.0'
+ ]:
+ raise ValueError(
+ "backbone: {} is set wrong. it should be one of "
+ "('Xception65', 'Xception41', 'MobileNetV2_x0.25', 'MobileNetV2_x0.5',"
+ " 'MobileNetV2_x1.0', 'MobileNetV2_x1.5', 'MobileNetV2_x2.0')".
+ format(backbone))
+
+ if class_weight is not None:
+ if isinstance(class_weight, list):
+ if len(class_weight) != num_classes:
+ raise ValueError(
+ "Length of class_weight should be equal to number of classes"
+ )
+ elif isinstance(class_weight, str):
+ if class_weight.lower() != 'dynamic':
+ raise ValueError(
+ "if class_weight is string, must be dynamic!")
+ else:
+ raise TypeError(
+ 'Expect class_weight is a list or string but receive {}'.
+ format(type(class_weight)))
+
+ self.backbone = backbone
+ self.num_classes = num_classes
+ self.use_bce_loss = use_bce_loss
+ self.use_dice_loss = use_dice_loss
+ self.class_weight = class_weight
+ self.ignore_index = ignore_index
+ self.aspp_with_sep_conv = aspp_with_sep_conv
+ self.decoder_use_sep_conv = decoder_use_sep_conv
+ self.encoder_with_aspp = encoder_with_aspp
+ self.enable_decoder = enable_decoder
+ self.labels = None
+ self.sync_bn = True
+
+ def _get_backbone(self, backbone):
+ def mobilenetv2(backbone):
+ # backbone: xception结构配置
+ # output_stride:下采样倍数
+ # end_points: mobilenetv2的block数
+ # decode_point: 从mobilenetv2中引出分支所在block数, 作为decoder输入
+ if '0.25' in backbone:
+ scale = 0.25
+ elif '0.5' in backbone:
+ scale = 0.5
+ elif '1.0' in backbone:
+ scale = 1.0
+ elif '1.5' in backbone:
+ scale = 1.5
+ elif '2.0' in backbone:
+ scale = 2.0
+ end_points = 18
+ decode_points = 4
+ return paddlex.cv.nets.MobileNetV2(
+ scale=scale,
+ output_stride=self.output_stride,
+ end_points=end_points,
+ decode_points=decode_points)
+
+ def xception(backbone):
+ # decode_point: 从Xception中引出分支所在block数,作为decoder输入
+ # end_point:Xception的block数
+ if '65' in backbone:
+ decode_points = 2
+ end_points = 21
+ layers = 65
+ if '41' in backbone:
+ decode_points = 2
+ end_points = 13
+ layers = 41
+ if '71' in backbone:
+ decode_points = 3
+ end_points = 23
+ layers = 71
+ return paddlex.cv.nets.Xception(
+ layers=layers,
+ output_stride=self.output_stride,
+ end_points=end_points,
+ decode_points=decode_points)
+
+ if 'Xception' in backbone:
+ return xception(backbone)
+ elif 'MobileNetV2' in backbone:
+ return mobilenetv2(backbone)
+
+ def build_net(self, mode='train'):
+ model = paddlex.cv.nets.segmentation.DeepLabv3p(
+ self.num_classes,
+ mode=mode,
+ backbone=self._get_backbone(self.backbone),
+ output_stride=self.output_stride,
+ aspp_with_sep_conv=self.aspp_with_sep_conv,
+ decoder_use_sep_conv=self.decoder_use_sep_conv,
+ encoder_with_aspp=self.encoder_with_aspp,
+ enable_decoder=self.enable_decoder,
+ use_bce_loss=self.use_bce_loss,
+ use_dice_loss=self.use_dice_loss,
+ class_weight=self.class_weight,
+ ignore_index=self.ignore_index)
+ inputs = model.generate_inputs()
+ model_out = model.build_net(inputs)
+ outputs = OrderedDict()
+ if mode == 'train':
+ self.optimizer.minimize(model_out)
+ outputs['loss'] = model_out
+ elif mode == 'eval':
+ outputs['loss'] = model_out[0]
+ outputs['pred'] = model_out[1]
+ outputs['label'] = model_out[2]
+ outputs['mask'] = model_out[3]
+ else:
+ outputs['pred'] = model_out[0]
+ outputs['logit'] = model_out[1]
+ return inputs, outputs
+
+ def default_optimizer(self,
+ learning_rate,
+ num_epochs,
+ num_steps_each_epoch,
+ lr_decay_power=0.9):
+ decay_step = num_epochs * num_steps_each_epoch
+ lr_decay = fluid.layers.polynomial_decay(
+ learning_rate,
+ decay_step,
+ end_learning_rate=0,
+ power=lr_decay_power)
+ optimizer = fluid.optimizer.Momentum(
+ lr_decay,
+ momentum=0.9,
+ regularization=fluid.regularizer.L2Decay(
+ regularization_coeff=4e-05))
+ return optimizer
+
+ def train(self,
+ num_epochs,
+ train_dataset,
+ train_batch_size=2,
+ eval_dataset=None,
+ save_interval_epochs=1,
+ log_interval_steps=2,
+ save_dir='output',
+ pretrain_weights='IMAGENET',
+ optimizer=None,
+ learning_rate=0.01,
+ lr_decay_power=0.9,
+ use_vdl=False,
+ sensitivities_file=None,
+ eval_metric_loss=0.05):
+ """训练。
+
+ Args:
+ num_epochs (int): 训练迭代轮数。
+ train_dataset (paddlex.datasets): 训练数据读取器。
+ train_batch_size (int): 训练数据batch大小。同时作为验证数据batch大小。默认为2。
+ eval_dataset (paddlex.datasets): 评估数据读取器。
+ save_interval_epochs (int): 模型保存间隔(单位:迭代轮数)。默认为1。
+ log_interval_steps (int): 训练日志输出间隔(单位:迭代次数)。默认为2。
+ save_dir (str): 模型保存路径。默认'output'。
+ pretrain_weights (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',
+ 则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认'IMAGENET。
+ optimizer (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认的优化器:使用
+ fluid.optimizer.Momentum优化方法,polynomial的学习率衰减策略。
+ learning_rate (float): 默认优化器的初始学习率。默认0.01。
+ lr_decay_power (float): 默认优化器学习率衰减指数。默认0.9。
+ use_vdl (bool): 是否使用VisualDL进行可视化。默认False。
+ sensitivities_file (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',
+ 则自动下载在ImageNet图片数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
+ eval_metric_loss (float): 可容忍的精度损失。默认为0.05。
+
+ Raises:
+ ValueError: 模型从inference model进行加载。
+ """
+ if not self.trainable:
+ raise ValueError(
+ "Model is not trainable since it was loaded from a inference model."
+ )
+
+ self.labels = train_dataset.labels
+
+ if optimizer is None:
+ num_steps_each_epoch = train_dataset.num_samples // train_batch_size
+ optimizer = self.default_optimizer(
+ learning_rate=learning_rate,
+ num_epochs=num_epochs,
+ num_steps_each_epoch=num_steps_each_epoch,
+ lr_decay_power=lr_decay_power)
+
+ self.optimizer = optimizer
+ # 构建训练、验证、预测网络
+ self.build_program()
+ # 初始化网络权重
+ self.net_initialize(
+ startup_prog=fluid.default_startup_program(),
+ pretrain_weights=pretrain_weights,
+ save_dir=save_dir,
+ sensitivities_file=sensitivities_file,
+ eval_metric_loss=eval_metric_loss)
+ # 训练
+ self.train_loop(
+ num_epochs=num_epochs,
+ train_dataset=train_dataset,
+ train_batch_size=train_batch_size,
+ eval_dataset=eval_dataset,
+ save_interval_epochs=save_interval_epochs,
+ log_interval_steps=log_interval_steps,
+ save_dir=save_dir,
+ use_vdl=use_vdl)
+
+ def evaluate(self,
+ eval_dataset,
+ batch_size=1,
+ epoch_id=None,
+ return_details=False):
+ """评估。
+
+ Args:
+ eval_dataset (paddlex.datasets): 评估数据读取器。
+ batch_size (int): 评估时的batch大小。默认1。
+ epoch_id (int): 当前评估模型所在的训练轮数。
+ return_details (bool): 是否返回详细信息。默认False。
+
+ Returns:
+ dict: 当return_details为False时,返回dict。包含关键字:'miou'、'category_iou'、'macc'、
+ 'category_acc'和'kappa',分别表示平均iou、各类别iou、平均准确率、各类别准确率和kappa系数。
+ tuple (metrics, eval_details):当return_details为True时,增加返回dict (eval_details),
+ 包含关键字:'confusion_matrix',表示评估的混淆矩阵。
+ """
+ self.arrange_transforms(
+ transforms=eval_dataset.transforms, mode='eval')
+ total_steps = math.ceil(eval_dataset.num_samples * 1.0 / batch_size)
+ conf_mat = ConfusionMatrix(self.num_classes, streaming=True)
+ data_generator = eval_dataset.generator(
+ batch_size=batch_size, drop_last=False)
+ if not hasattr(self, 'parallel_test_prog'):
+ self.parallel_test_prog = fluid.CompiledProgram(
+ self.test_prog).with_data_parallel(
+ share_vars_from=self.parallel_train_prog)
+ logging.info(
+ "Start to evaluating(total_samples={}, total_steps={})...".format(
+ eval_dataset.num_samples, total_steps))
+ for step, data in tqdm.tqdm(
+ enumerate(data_generator()), total=total_steps):
+ images = np.array([d[0] for d in data])
+ labels = np.array([d[1] for d in data])
+ num_samples = images.shape[0]
+ if num_samples < batch_size:
+ num_pad_samples = batch_size - num_samples
+ pad_images = np.tile(images[0:1], (num_pad_samples, 1, 1, 1))
+ images = np.concatenate([images, pad_images])
+ feed_data = {'image': images}
+ outputs = self.exe.run(
+ self.parallel_test_prog,
+ feed=feed_data,
+ fetch_list=list(self.test_outputs.values()),
+ return_numpy=True)
+ pred = outputs[0]
+ if num_samples < batch_size:
+ pred = pred[0:num_samples]
+
+ mask = labels != self.ignore_index
+ conf_mat.calculate(pred=pred, label=labels, ignore=mask)
+ _, iou = conf_mat.mean_iou()
+
+ logging.debug("[EVAL] Epoch={}, Step={}/{}, iou={}".format(
+ epoch_id, step + 1, total_steps, iou))
+
+ category_iou, miou = conf_mat.mean_iou()
+ category_acc, macc = conf_mat.accuracy()
+
+ metrics = OrderedDict(
+ zip(['miou', 'category_iou', 'macc', 'category_acc', 'kappa'],
+ [miou, category_iou, macc, category_acc,
+ conf_mat.kappa()]))
+ if return_details:
+ eval_details = {
+ 'confusion_matrix': conf_mat.confusion_matrix.tolist()
+ }
+ return metrics, eval_details
+ return metrics
+
+ def predict(self, im_file, transforms=None):
+ """预测。
+ Args:
+ img_file(str): 预测图像路径。
+ transforms(paddlex.cv.transforms): 数据预处理操作。
+
+ Returns:
+ dict: 包含关键字'label_map'和'score_map', 'label_map'存储预测结果灰度图,
+ 像素值表示对应的类别,'score_map'存储各类别的概率,shape=(h, w, num_classes)
+ """
+
+ if transforms is None and not hasattr(self, 'test_transforms'):
+ raise Exception("transforms need to be defined, now is None.")
+ if transforms is not None:
+ self.arrange_transforms(transforms=transforms, mode='test')
+ im, im_info = transforms(im_file)
+ else:
+ self.arrange_transforms(
+ transforms=self.test_transforms, mode='test')
+ im, im_info = self.test_transforms(im_file)
+ im = np.expand_dims(im, axis=0)
+ result = self.exe.run(
+ self.test_prog,
+ feed={'image': im},
+ fetch_list=list(self.test_outputs.values()))
+ pred = result[0]
+ pred = np.squeeze(pred).astype('uint8')
+ keys = list(im_info.keys())
+ for k in keys[::-1]:
+ if k == 'shape_before_resize':
+ h, w = im_info[k][0], im_info[k][1]
+ pred = cv2.resize(pred, (w, h), cv2.INTER_NEAREST)
+ elif k == 'shape_before_padding':
+ h, w = im_info[k][0], im_info[k][1]
+ pred = pred[0:h, 0:w]
+
+ return {'label_map': pred, 'score_map': result[1]}
diff --git a/paddlex/cv/models/faster_rcnn.py b/paddlex/cv/models/faster_rcnn.py
new file mode 100644
index 0000000000000000000000000000000000000000..3d585987de23f6fcc097769f29690fe7f6de9cee
--- /dev/null
+++ b/paddlex/cv/models/faster_rcnn.py
@@ -0,0 +1,381 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+from __future__ import absolute_import
+import math
+import tqdm
+import numpy as np
+import paddle.fluid as fluid
+import paddlex.utils.logging as logging
+import paddlex
+import os.path as osp
+import copy
+from .base import BaseAPI
+from collections import OrderedDict
+from .utils.detection_eval import eval_results, bbox2out
+
+
+class FasterRCNN(BaseAPI):
+ """构建FasterRCNN,并实现其训练、评估、预测和模型导出。
+
+ Args:
+ num_classes (int): 包含了背景类的类别数。默认为81。
+ backbone (str): FasterRCNN的backbone网络,取值范围为['ResNet18', 'ResNet50',
+ 'ResNet50vd', 'ResNet101', 'ResNet101vd']。默认为'ResNet50'。
+ with_fpn (bool): 是否使用FPN结构。默认为True。
+ aspect_ratios (list): 生成anchor高宽比的可选值。默认为[0.5, 1.0, 2.0]。
+ anchor_sizes (list): 生成anchor大小的可选值。默认为[32, 64, 128, 256, 512]。
+ """
+
+ def __init__(self,
+ num_classes=81,
+ backbone='ResNet50',
+ with_fpn=True,
+ aspect_ratios=[0.5, 1.0, 2.0],
+ anchor_sizes=[32, 64, 128, 256, 512]):
+ self.init_params = locals()
+ super(FasterRCNN, self).__init__('detector')
+ backbones = [
+ 'ResNet18', 'ResNet50', 'ResNet50vd', 'ResNet101', 'ResNet101vd'
+ ]
+ assert backbone in backbones, "backbone should be one of {}".format(
+ backbones)
+ self.backbone = backbone
+ self.num_classes = num_classes
+ self.with_fpn = with_fpn
+ self.aspect_ratios = aspect_ratios
+ self.anchor_sizes = anchor_sizes
+ self.labels = None
+
+ def _get_backbone(self, backbone_name):
+ norm_type = None
+ if backbone_name == 'ResNet18':
+ layers = 18
+ variant = 'b'
+ elif backbone_name == 'ResNet50':
+ layers = 50
+ variant = 'b'
+ elif backbone_name == 'ResNet50vd':
+ layers = 50
+ variant = 'd'
+ norm_type = 'affine_channel'
+ elif backbone_name == 'ResNet101':
+ layers = 101
+ variant = 'b'
+ norm_type = 'affine_channel'
+ elif backbone_name == 'ResNet101vd':
+ layers = 101
+ variant = 'd'
+ norm_type = 'affine_channel'
+ if self.with_fpn:
+ backbone = paddlex.cv.nets.resnet.ResNet(
+ norm_type='bn' if norm_type is None else norm_type,
+ layers=layers,
+ variant=variant,
+ freeze_norm=True,
+ norm_decay=0.,
+ feature_maps=[2, 3, 4, 5],
+ freeze_at=2)
+ else:
+ backbone = paddlex.cv.nets.resnet.ResNet(
+ norm_type='affine_channel' if norm_type is None else norm_type,
+ layers=layers,
+ variant=variant,
+ freeze_norm=True,
+ norm_decay=0.,
+ feature_maps=4,
+ freeze_at=2)
+ return backbone
+
+ def build_net(self, mode='train'):
+ train_pre_nms_top_n = 2000 if self.with_fpn else 12000
+ test_pre_nms_top_n = 1000 if self.with_fpn else 6000
+ model = paddlex.cv.nets.detection.FasterRCNN(
+ backbone=self._get_backbone(self.backbone),
+ mode=mode,
+ num_classes=self.num_classes,
+ with_fpn=self.with_fpn,
+ aspect_ratios=self.aspect_ratios,
+ anchor_sizes=self.anchor_sizes,
+ train_pre_nms_top_n=train_pre_nms_top_n,
+ test_pre_nms_top_n=test_pre_nms_top_n)
+ inputs = model.generate_inputs()
+ if mode == 'train':
+ model_out = model.build_net(inputs)
+ loss = model_out['loss']
+ self.optimizer.minimize(loss)
+ outputs = OrderedDict([('loss', model_out['loss']),
+ ('loss_cls', model_out['loss_cls']),
+ ('loss_bbox', model_out['loss_bbox']),
+ ('loss_rpn_cls', model_out['loss_rpn_cls']),
+ ('loss_rpn_bbox',
+ model_out['loss_rpn_bbox'])])
+ else:
+ outputs = model.build_net(inputs)
+ return inputs, outputs
+
+ def default_optimizer(self, learning_rate, warmup_steps, warmup_start_lr,
+ lr_decay_epochs, lr_decay_gamma,
+ num_steps_each_epoch):
+ if warmup_steps > lr_decay_epochs[0] * num_steps_each_epoch:
+ raise Exception("warmup_steps should less than {}".format(
+ lr_decay_epochs[0] * num_steps_each_epoch))
+ boundaries = [b * num_steps_each_epoch for b in lr_decay_epochs]
+ values = [(lr_decay_gamma**i) * learning_rate
+ for i in range(len(lr_decay_epochs) + 1)]
+ lr_decay = fluid.layers.piecewise_decay(
+ boundaries=boundaries, values=values)
+ lr_warmup = fluid.layers.linear_lr_warmup(
+ learning_rate=lr_decay,
+ warmup_steps=warmup_steps,
+ start_lr=warmup_start_lr,
+ end_lr=learning_rate)
+ optimizer = fluid.optimizer.Momentum(
+ learning_rate=lr_warmup,
+ momentum=0.9,
+ regularization=fluid.regularizer.L2Decay(1e-04))
+ return optimizer
+
+ def train(self,
+ num_epochs,
+ train_dataset,
+ train_batch_size=2,
+ eval_dataset=None,
+ save_interval_epochs=1,
+ log_interval_steps=2,
+ save_dir='output',
+ pretrain_weights='IMAGENET',
+ optimizer=None,
+ learning_rate=0.0025,
+ warmup_steps=500,
+ warmup_start_lr=1.0 / 1200,
+ lr_decay_epochs=[8, 11],
+ lr_decay_gamma=0.1,
+ metric=None,
+ use_vdl=False):
+ """训练。
+
+ Args:
+ num_epochs (int): 训练迭代轮数。
+ train_dataset (paddlex.datasets): 训练数据读取器。
+ train_batch_size (int): 训练数据batch大小。目前检测仅支持单卡评估,训练数据batch大小与
+ 显卡数量之商为验证数据batch大小。默认为2。
+ eval_dataset (paddlex.datasets): 验证数据读取器。
+ save_interval_epochs (int): 模型保存间隔(单位:迭代轮数)。默认为1。
+ log_interval_steps (int): 训练日志输出间隔(单位:迭代次数)。默认为20。
+ save_dir (str): 模型保存路径。默认值为'output'。
+ pretrain_weights (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',
+ 则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认为None。
+ optimizer (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:
+ fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
+ learning_rate (float): 默认优化器的初始学习率。默认为0.0025。
+ warmup_steps (int): 默认优化器进行warmup过程的步数。默认为500。
+ warmup_start_lr (int): 默认优化器warmup的起始学习率。默认为1.0/1200。
+ lr_decay_epochs (list): 默认优化器的学习率衰减轮数。默认为[8, 11]。
+ lr_decay_gamma (float): 默认优化器的学习率衰减率。默认为0.1。
+ metric (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认值为None。
+ use_vdl (bool): 是否使用VisualDL进行可视化。默认值为False。
+
+ Raises:
+ ValueError: 评估类型不在指定列表中。
+ ValueError: 模型从inference model进行加载。
+ """
+ if metric is None:
+ if isinstance(train_dataset, paddlex.datasets.CocoDetection):
+ metric = 'COCO'
+ elif isinstance(train_dataset, paddlex.datasets.VOCDetection):
+ metric = 'VOC'
+ else:
+ raise ValueError(
+ "train_dataset should be datasets.VOCDetection or datasets.COCODetection."
+ )
+ assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
+ self.metric = metric
+ if not self.trainable:
+ raise ValueError(
+ "Model is not trainable since it was loaded from a inference model."
+ )
+ self.labels = copy.deepcopy(train_dataset.labels)
+ self.labels.insert(0, 'background')
+ # 构建训练网络
+ if optimizer is None:
+ # 构建默认的优化策略
+ num_steps_each_epoch = train_dataset.num_samples // train_batch_size
+ optimizer = self.default_optimizer(
+ learning_rate, warmup_steps, warmup_start_lr, lr_decay_epochs,
+ lr_decay_gamma, num_steps_each_epoch)
+ self.optimizer = optimizer
+ # 构建训练、验证、测试网络
+ self.build_program()
+ fuse_bn = True
+ if self.with_fpn and self.backbone in ['ResNet18', 'ResNet50']:
+ fuse_bn = False
+ self.net_initialize(
+ startup_prog=fluid.default_startup_program(),
+ pretrain_weights=pretrain_weights,
+ fuse_bn=fuse_bn,
+ save_dir=save_dir)
+ # 训练
+ self.train_loop(
+ num_epochs=num_epochs,
+ train_dataset=train_dataset,
+ train_batch_size=train_batch_size,
+ eval_dataset=eval_dataset,
+ save_interval_epochs=save_interval_epochs,
+ log_interval_steps=log_interval_steps,
+ save_dir=save_dir,
+ use_vdl=use_vdl)
+
+ def evaluate(self,
+ eval_dataset,
+ batch_size=1,
+ epoch_id=None,
+ metric=None,
+ return_details=False):
+ """评估。
+
+ Args:
+ eval_dataset (paddlex.datasets): 验证数据读取器。
+ batch_size (int): 验证数据批大小。默认为1。
+ epoch_id (int): 当前评估模型所在的训练轮数。
+ metric (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认为None,
+ 根据用户传入的Dataset自动选择,如为VOCDetection,则metric为'VOC';
+ 如为COCODetection,则metric为'COCO'。
+ return_details (bool): 是否返回详细信息。默认值为False。
+
+ Returns:
+ tuple (metrics, eval_details) /dict (metrics): 当return_details为True时,返回(metrics, eval_details),
+ 当return_details为False时,返回metrics。metrics为dict,包含关键字:'bbox_mmap'或者’bbox_map‘,
+ 分别表示平均准确率平均值在各个阈值下的结果取平均值的结果(mmAP)、平均准确率平均值(mAP)。
+ eval_details为dict,包含关键字:'bbox',对应元素预测结果列表,每个预测结果由图像id、
+ 预测框类别id、预测框坐标、预测框得分;’gt‘:真实标注框相关信息。
+ """
+ self.arrange_transforms(
+ transforms=eval_dataset.transforms, mode='eval')
+ if metric is None:
+ if hasattr(self, 'metric') and self.metric is not None:
+ metric = self.metric
+ else:
+ if isinstance(eval_dataset, paddlex.datasets.CocoDetection):
+ metric = 'COCO'
+ elif isinstance(eval_dataset, paddlex.datasets.VOCDetection):
+ metric = 'VOC'
+ else:
+ raise Exception(
+ "eval_dataset should be datasets.VOCDetection or datasets.COCODetection."
+ )
+ assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
+
+ dataset = eval_dataset.generator(
+ batch_size=batch_size, drop_last=False)
+
+ total_steps = math.ceil(eval_dataset.num_samples * 1.0 / batch_size)
+ results = list()
+ logging.info(
+ "Start to evaluating(total_samples={}, total_steps={})...".format(
+ eval_dataset.num_samples, total_steps))
+ for step, data in tqdm.tqdm(enumerate(dataset()), total=total_steps):
+ images = np.array([d[0] for d in data]).astype('float32')
+ im_infos = np.array([d[1] for d in data]).astype('float32')
+ im_shapes = np.array([d[3] for d in data]).astype('float32')
+ feed_data = {
+ 'image': images,
+ 'im_info': im_infos,
+ 'im_shape': im_shapes,
+ }
+ outputs = self.exe.run(
+ self.test_prog,
+ feed=[feed_data],
+ fetch_list=list(self.test_outputs.values()),
+ return_numpy=False)
+ res = {
+ 'bbox': (np.array(outputs[0]),
+ outputs[0].recursive_sequence_lengths())
+ }
+ res_im_id = [d[2] for d in data]
+ res['im_info'] = (im_infos, [])
+ res['im_shape'] = (im_shapes, [])
+ res['im_id'] = (np.array(res_im_id), [])
+ if metric == 'VOC':
+ res_gt_box = []
+ res_gt_label = []
+ res_is_difficult = []
+ for d in data:
+ res_gt_box.extend(d[4])
+ res_gt_label.extend(d[5])
+ res_is_difficult.extend(d[6])
+ res_gt_box_lod = [d[4].shape[0] for d in data]
+ res_gt_label_lod = [d[5].shape[0] for d in data]
+ res_is_difficult_lod = [d[6].shape[0] for d in data]
+ res['gt_box'] = (np.array(res_gt_box), [res_gt_box_lod])
+ res['gt_label'] = (np.array(res_gt_label), [res_gt_label_lod])
+ res['is_difficult'] = (np.array(res_is_difficult),
+ [res_is_difficult_lod])
+ results.append(res)
+ logging.debug("[EVAL] Epoch={}, Step={}/{}".format(
+ epoch_id, step + 1, total_steps))
+ box_ap_stats, eval_details = eval_results(
+ results, metric, eval_dataset.coco_gt, with_background=True)
+ metrics = OrderedDict(
+ zip(['bbox_mmap' if metric == 'COCO' else 'bbox_map'],
+ box_ap_stats))
+ if return_details:
+ return metrics, eval_details
+ return metrics
+
+ def predict(self, img_file, transforms=None):
+ """预测。
+
+ Args:
+ img_file (str): 预测图像路径。
+ transforms (paddlex.det.transforms): 数据预处理操作。
+
+ Returns:
+ list: 预测结果列表,每个预测结果由预测框类别标签、
+ 预测框类别名称、预测框坐标、预测框得分组成。
+ """
+ if transforms is None and not hasattr(self, 'test_transforms'):
+ raise Exception("transforms need to be defined, now is None.")
+ if transforms is not None:
+ self.arrange_transforms(transforms=transforms, mode='test')
+ im, im_resize_info, im_shape = transforms(img_file)
+ else:
+ self.arrange_transforms(
+ transforms=self.test_transforms, mode='test')
+ im, im_resize_info, im_shape = self.test_transforms(img_file)
+ im = np.expand_dims(im, axis=0)
+ im_resize_info = np.expand_dims(im_resize_info, axis=0)
+ im_shape = np.expand_dims(im_shape, axis=0)
+ outputs = self.exe.run(
+ self.test_prog,
+ feed={
+ 'image': im,
+ 'im_info': im_resize_info,
+ 'im_shape': im_shape
+ },
+ fetch_list=list(self.test_outputs.values()),
+ return_numpy=False)
+ res = {
+ k: (np.array(v), v.recursive_sequence_lengths())
+ for k, v in zip(list(self.test_outputs.keys()), outputs)
+ }
+ res['im_id'] = (np.array([[0]]).astype('int32'), [])
+ clsid2catid = dict({i: i for i in range(self.num_classes)})
+ xywh_results = bbox2out([res], clsid2catid)
+ results = list()
+ for xywh_res in xywh_results:
+ del xywh_res['image_id']
+ xywh_res['category'] = self.labels[xywh_res['category_id']]
+ results.append(xywh_res)
+ return results
diff --git a/paddlex/cv/models/load_model.py b/paddlex/cv/models/load_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..2e06ea5d7075472b7c77586d4172a0bb6808ddd6
--- /dev/null
+++ b/paddlex/cv/models/load_model.py
@@ -0,0 +1,111 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import yaml
+import os.path as osp
+import six
+import copy
+from collections import OrderedDict
+import paddle.fluid as fluid
+from paddle.fluid.framework import Parameter
+import paddlex
+import paddlex.utils.logging as logging
+
+
+def load_model(model_dir):
+ if not osp.exists(osp.join(model_dir, "model.yml")):
+ raise Exception("There's not model.yml in {}".format(model_dir))
+ with open(osp.join(model_dir, "model.yml")) as f:
+ info = yaml.load(f.read(), Loader=yaml.Loader)
+ status = info['status']
+
+ if not hasattr(paddlex.cv.models, info['Model']):
+ raise Exception("There's no attribute {} in paddlex.cv.models".format(
+ info['Model']))
+
+ if info['_Attributes']['model_type'] == 'classifier':
+ model = paddlex.cv.models.BaseClassifier(**info['_init_params'])
+ else:
+ model = getattr(paddlex.cv.models,
+ info['Model'])(**info['_init_params'])
+ if status == "Normal" or \
+ status == "Prune":
+ startup_prog = fluid.Program()
+ model.test_prog = fluid.Program()
+ with fluid.program_guard(model.test_prog, startup_prog):
+ with fluid.unique_name.guard():
+ model.test_inputs, model.test_outputs = model.build_net(
+ mode='test')
+ model.test_prog = model.test_prog.clone(for_test=True)
+ model.exe.run(startup_prog)
+ if status == "Prune":
+ from .slim.prune import update_program
+ model.test_prog = update_program(model.test_prog, model_dir,
+ model.places[0])
+ import pickle
+ with open(osp.join(model_dir, 'model.pdparams'), 'rb') as f:
+ load_dict = pickle.load(f)
+ fluid.io.set_program_state(model.test_prog, load_dict)
+
+ elif status == "Infer" or \
+ status == "Quant":
+ [prog, input_names, outputs] = fluid.io.load_inference_model(
+ model_dir, model.exe, params_filename='__params__')
+ model.test_prog = prog
+ test_outputs_info = info['_ModelInputsOutputs']['test_outputs']
+ model.test_inputs = OrderedDict()
+ model.test_outputs = OrderedDict()
+ for name in input_names:
+ model.test_inputs[name] = model.test_prog.global_block().var(name)
+ for i, out in enumerate(outputs):
+ var_desc = test_outputs_info[i]
+ model.test_outputs[var_desc[0]] = out
+ if 'Transforms' in info:
+ transforms_mode = info.get('TransformsMode', 'RGB')
+ if transforms_mode == 'RGB':
+ to_rgb = True
+ else:
+ to_rgb = False
+ model.test_transforms = build_transforms(model.model_type,
+ info['Transforms'], to_rgb)
+ model.eval_transforms = copy.deepcopy(model.test_transforms)
+
+ if '_Attributes' in info:
+ for k, v in info['_Attributes'].items():
+ if k in model.__dict__:
+ model.__dict__[k] = v
+
+ logging.info("Model[{}] loaded.".format(info['Model']))
+ return model
+
+
+def build_transforms(model_type, transforms_info, to_rgb=True):
+ if model_type == "classifier":
+ import paddlex.cv.transforms.cls_transforms as T
+ elif model_type == "detector":
+ import paddlex.cv.transforms.det_transforms as T
+ elif model_type == "segmenter":
+ import paddlex.cv.transforms.seg_transforms as T
+ transforms = list()
+ for op_info in transforms_info:
+ op_name = list(op_info.keys())[0]
+ op_attr = op_info[op_name]
+ if not hasattr(T, op_name):
+ raise Exception(
+ "There's no operator named '{}' in transforms of {}".format(
+ op_name, model_type))
+ transforms.append(getattr(T, op_name)(**op_attr))
+ eval_transforms = T.Compose(transforms)
+ eval_transforms.to_rgb = to_rgb
+ return eval_transforms
diff --git a/paddlex/cv/models/mask_rcnn.py b/paddlex/cv/models/mask_rcnn.py
new file mode 100644
index 0000000000000000000000000000000000000000..b30dd1e00a2856c79ac179c01967d1cddf053122
--- /dev/null
+++ b/paddlex/cv/models/mask_rcnn.py
@@ -0,0 +1,358 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+from __future__ import absolute_import
+import math
+import tqdm
+import numpy as np
+import paddle.fluid as fluid
+import paddlex.utils.logging as logging
+import paddlex
+import copy
+import os.path as osp
+from collections import OrderedDict
+from .faster_rcnn import FasterRCNN
+from .utils.detection_eval import eval_results, bbox2out, mask2out
+
+
+class MaskRCNN(FasterRCNN):
+ """构建MaskRCNN,并实现其训练、评估、预测和模型导出。
+
+ Args:
+ num_classes (int): 包含了背景类的类别数。默认为81。
+ backbone (str): MaskRCNN的backbone网络,取值范围为['ResNet18', 'ResNet50',
+ 'ResNet50vd', 'ResNet101', 'ResNet101vd']。默认为'ResNet50'。
+ with_fpn (bool): 是否使用FPN结构。默认为True。
+ aspect_ratios (list): 生成anchor高宽比的可选值。默认为[0.5, 1.0, 2.0]。
+ anchor_sizes (list): 生成anchor大小的可选值。默认为[32, 64, 128, 256, 512]。
+ """
+
+ def __init__(self,
+ num_classes=81,
+ backbone='ResNet50',
+ with_fpn=True,
+ aspect_ratios=[0.5, 1.0, 2.0],
+ anchor_sizes=[32, 64, 128, 256, 512]):
+ self.init_params = locals()
+ backbones = [
+ 'ResNet18', 'ResNet50', 'ResNet50vd', 'ResNet101', 'ResNet101vd'
+ ]
+ assert backbone in backbones, "backbone should be one of {}".format(
+ backbones)
+ super(FasterRCNN, self).__init__('detector')
+ self.backbone = backbone
+ self.num_classes = num_classes
+ self.with_fpn = with_fpn
+ self.anchor_sizes = anchor_sizes
+ self.labels = None
+ if with_fpn:
+ self.mask_head_resolution = 28
+ else:
+ self.mask_head_resolution = 14
+
+ def build_net(self, mode='train'):
+ train_pre_nms_top_n = 2000 if self.with_fpn else 12000
+ test_pre_nms_top_n = 1000 if self.with_fpn else 6000
+ num_convs = 4 if self.with_fpn else 0
+ model = paddlex.cv.nets.detection.MaskRCNN(
+ backbone=self._get_backbone(self.backbone),
+ num_classes=self.num_classes,
+ mode=mode,
+ with_fpn=self.with_fpn,
+ train_pre_nms_top_n=train_pre_nms_top_n,
+ test_pre_nms_top_n=test_pre_nms_top_n,
+ num_convs=num_convs,
+ mask_head_resolution=self.mask_head_resolution)
+ inputs = model.generate_inputs()
+ if mode == 'train':
+ model_out = model.build_net(inputs)
+ loss = model_out['loss']
+ self.optimizer.minimize(loss)
+ outputs = OrderedDict([('loss', model_out['loss']),
+ ('loss_cls', model_out['loss_cls']),
+ ('loss_bbox', model_out['loss_bbox']),
+ ('loss_mask', model_out['loss_mask']),
+ ('loss_rpn_cls', model_out['loss_rpn_cls']),
+ ('loss_rpn_bbox',
+ model_out['loss_rpn_bbox'])])
+ else:
+ outputs = model.build_net(inputs)
+ return inputs, outputs
+
+ def default_optimizer(self, learning_rate, warmup_steps, warmup_start_lr,
+ lr_decay_epochs, lr_decay_gamma,
+ num_steps_each_epoch):
+ if warmup_steps > lr_decay_epochs[0] * num_steps_each_epoch:
+ raise Exception("warmup_step should less than {}".format(
+ lr_decay_epochs[0] * num_steps_each_epoch))
+ boundaries = [b * num_steps_each_epoch for b in lr_decay_epochs]
+ values = [(lr_decay_gamma**i) * learning_rate
+ for i in range(len(lr_decay_epochs) + 1)]
+ lr_decay = fluid.layers.piecewise_decay(
+ boundaries=boundaries, values=values)
+ lr_warmup = fluid.layers.linear_lr_warmup(
+ learning_rate=lr_decay,
+ warmup_steps=warmup_steps,
+ start_lr=warmup_start_lr,
+ end_lr=learning_rate)
+ optimizer = fluid.optimizer.Momentum(
+ learning_rate=lr_warmup,
+ momentum=0.9,
+ regularization=fluid.regularizer.L2Decay(1e-04))
+ return optimizer
+
+ def train(self,
+ num_epochs,
+ train_dataset,
+ train_batch_size=1,
+ eval_dataset=None,
+ save_interval_epochs=1,
+ log_interval_steps=2,
+ save_dir='output',
+ pretrain_weights='IMAGENET',
+ optimizer=None,
+ learning_rate=1.0 / 800,
+ warmup_steps=500,
+ warmup_start_lr=1.0 / 2400,
+ lr_decay_epochs=[8, 11],
+ lr_decay_gamma=0.1,
+ metric=None,
+ use_vdl=False):
+ """训练。
+
+ Args:
+ num_epochs (int): 训练迭代轮数。
+ train_dataset (paddlex.datasets): 训练数据读取器。
+ train_batch_size (int): 训练或验证数据batch大小。目前检测仅支持单卡评估,训练数据batch大小与
+ 显卡数量之商为验证数据batch大小。默认值为1。
+ eval_dataset (paddlex.datasets): 验证数据读取器。
+ save_interval_epochs (int): 模型保存间隔(单位:迭代轮数)。默认为1。
+ log_interval_steps (int): 训练日志输出间隔(单位:迭代次数)。默认为20。
+ save_dir (str): 模型保存路径。默认值为'output'。
+ pretrain_weights (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',
+ 则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认为None。
+ optimizer (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:
+ fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
+ learning_rate (float): 默认优化器的学习率。默认为1.0/800。
+ warmup_steps (int): 默认优化器进行warmup过程的步数。默认为500。
+ warmup_start_lr (int): 默认优化器warmup的起始学习率。默认为1.0/2400。
+ lr_decay_epochs (list): 默认优化器的学习率衰减轮数。默认为[8, 11]。
+ lr_decay_gamma (float): 默认优化器的学习率衰减率。默认为0.1。
+ metric (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。
+ use_vdl (bool): 是否使用VisualDL进行可视化。默认值为False。
+
+ Raises:
+ ValueError: 评估类型不在指定列表中。
+ ValueError: 模型从inference model进行加载。
+ """
+ if metric is None:
+ if isinstance(train_dataset, paddlex.datasets.CocoDetection):
+ metric = 'COCO'
+ else:
+ raise Exception(
+ "train_dataset should be datasets.COCODetection.")
+ assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
+ self.metric = metric
+ if not self.trainable:
+ raise Exception(
+ "Model is not trainable since it was loaded from a inference model."
+ )
+ self.labels = copy.deepcopy(train_dataset.labels)
+ self.labels.insert(0, 'background')
+ # 构建训练网络
+ if optimizer is None:
+ # 构建默认的优化策略
+ num_steps_each_epoch = train_dataset.num_samples // train_batch_size
+ optimizer = self.default_optimizer(
+ learning_rate=learning_rate,
+ warmup_steps=warmup_steps,
+ warmup_start_lr=warmup_start_lr,
+ lr_decay_epochs=lr_decay_epochs,
+ lr_decay_gamma=lr_decay_gamma,
+ num_steps_each_epoch=num_steps_each_epoch)
+ self.optimizer = optimizer
+ # 构建训练、验证、测试网络
+ self.build_program()
+ fuse_bn = True
+ if self.with_fpn and self.backbone in ['ResNet18', 'ResNet50']:
+ fuse_bn = False
+ self.net_initialize(
+ startup_prog=fluid.default_startup_program(),
+ pretrain_weights=pretrain_weights,
+ fuse_bn=fuse_bn,
+ save_dir=save_dir)
+ # 训练
+ self.train_loop(
+ num_epochs=num_epochs,
+ train_dataset=train_dataset,
+ train_batch_size=train_batch_size,
+ eval_dataset=eval_dataset,
+ save_interval_epochs=save_interval_epochs,
+ log_interval_steps=log_interval_steps,
+ save_dir=save_dir,
+ use_vdl=use_vdl)
+
+ def evaluate(self,
+ eval_dataset,
+ batch_size=1,
+ epoch_id=None,
+ metric=None,
+ return_details=False):
+ """评估。
+
+ Args:
+ eval_dataset (paddlex.datasets): 验证数据读取器。
+ batch_size (int): 验证数据批大小。默认为1。
+ epoch_id (int): 当前评估模型所在的训练轮数。
+ metric (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认为None,
+ 根据用户传入的Dataset自动选择,如为VOCDetection,则metric为'VOC';
+ 如为COCODetection,则metric为'COCO'。
+ return_details (bool): 是否返回详细信息。默认值为False。
+
+ Returns:
+ tuple (metrics, eval_details) /dict (metrics): 当return_details为True时,返回(metrics, eval_details),
+ 当return_details为False时,返回metrics。metrics为dict,包含关键字:'bbox_mmap'和'segm_mmap'
+ 或者’bbox_map‘和'segm_map',分别表示预测框和分割区域平均准确率平均值在
+ 各个IoU阈值下的结果取平均值的结果(mmAP)、平均准确率平均值(mAP)。eval_details为dict,
+ 包含关键字:'bbox',对应元素预测框结果列表,每个预测结果由图像id、预测框类别id、
+ 预测框坐标、预测框得分;'mask',对应元素预测区域结果列表,每个预测结果由图像id、
+ 预测区域类别id、预测区域坐标、预测区域得分;’gt‘:真实标注框和标注区域相关信息。
+ """
+ self.arrange_transforms(
+ transforms=eval_dataset.transforms, mode='eval')
+ if metric is None:
+ if hasattr(self, 'metric') and self.metric is not None:
+ metric = self.metric
+ else:
+ if isinstance(eval_dataset, paddlex.datasets.CocoDetection):
+ metric = 'COCO'
+ else:
+ raise Exception(
+ "eval_dataset should be datasets.COCODetection.")
+ assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
+ data_generator = eval_dataset.generator(
+ batch_size=batch_size, drop_last=False)
+
+ total_steps = math.ceil(eval_dataset.num_samples * 1.0 / batch_size)
+ results = list()
+ logging.info(
+ "Start to evaluating(total_samples={}, total_steps={})...".format(
+ eval_dataset.num_samples, total_steps))
+ for step, data in tqdm.tqdm(
+ enumerate(data_generator()), total=total_steps):
+ images = np.array([d[0] for d in data]).astype('float32')
+ im_infos = np.array([d[1] for d in data]).astype('float32')
+ im_shapes = np.array([d[3] for d in data]).astype('float32')
+ feed_data = {
+ 'image': images,
+ 'im_info': im_infos,
+ 'im_shape': im_shapes,
+ }
+ outputs = self.exe.run(
+ self.test_prog,
+ feed=[feed_data],
+ fetch_list=list(self.test_outputs.values()),
+ return_numpy=False)
+ res = {
+ 'bbox': (np.array(outputs[0]),
+ outputs[0].recursive_sequence_lengths()),
+ 'mask': (np.array(outputs[1]),
+ outputs[1].recursive_sequence_lengths())
+ }
+ res_im_id = [d[2] for d in data]
+ res['im_info'] = (im_infos, [])
+ res['im_shape'] = (im_shapes, [])
+ res['im_id'] = (np.array(res_im_id), [])
+ results.append(res)
+ logging.debug("[EVAL] Epoch={}, Step={}/{}".format(
+ epoch_id, step + 1, total_steps))
+
+ ap_stats, eval_details = eval_results(
+ results,
+ 'COCO',
+ eval_dataset.coco_gt,
+ with_background=True,
+ resolution=self.mask_head_resolution)
+ if metric == 'VOC':
+ if isinstance(ap_stats[0], np.ndarray) and isinstance(
+ ap_stats[1], np.ndarray):
+ metrics = OrderedDict(
+ zip(['bbox_map', 'segm_map'],
+ [ap_stats[0][1], ap_stats[1][1]]))
+ else:
+ metrics = OrderedDict(
+ zip(['bbox_map', 'segm_map'], [0.0, 0.0]))
+ elif metric == 'COCO':
+ if isinstance(ap_stats[0], np.ndarray) and isinstance(
+ ap_stats[1], np.ndarray):
+ metrics = OrderedDict(
+ zip(['bbox_mmap', 'segm_mmap'],
+ [ap_stats[0][0], ap_stats[1][0]]))
+ else:
+ metrics = OrderedDict(
+ zip(['bbox_mmap', 'segm_mmap'], [0.0, 0.0]))
+ if return_details:
+ return metrics, eval_details
+ return metrics
+
+ def predict(self, img_file, transforms=None):
+ """预测。
+
+ Args:
+ img_file (str): 预测图像路径。
+ transforms (paddlex.det.transforms): 数据预处理操作。
+
+ Returns:
+ dict: 预测结果列表,每个预测结果由预测框类别标签、预测框类别名称、预测框坐标、预测框内的二值图、
+ 预测框得分组成。
+ """
+ if transforms is None and not hasattr(self, 'test_transforms'):
+ raise Exception("transforms need to be defined, now is None.")
+ if transforms is not None:
+ self.arrange_transforms(transforms=transforms, mode='test')
+ im, im_resize_info, im_shape = transforms(img_file)
+ else:
+ self.arrange_transforms(
+ transforms=self.test_transforms, mode='test')
+ im, im_resize_info, im_shape = self.test_transforms(img_file)
+ im = np.expand_dims(im, axis=0)
+ im_resize_info = np.expand_dims(im_resize_info, axis=0)
+ im_shape = np.expand_dims(im_shape, axis=0)
+ outputs = self.exe.run(
+ self.test_prog,
+ feed={
+ 'image': im,
+ 'im_info': im_resize_info,
+ 'im_shape': im_shape
+ },
+ fetch_list=list(self.test_outputs.values()),
+ return_numpy=False)
+ res = {
+ k: (np.array(v), v.recursive_sequence_lengths())
+ for k, v in zip(list(self.test_outputs.keys()), outputs)
+ }
+ res['im_id'] = (np.array([[0]]).astype('int32'), [])
+ res['im_shape'] = (np.array(im_shape), [])
+ clsid2catid = dict({i: i for i in range(self.num_classes)})
+ xywh_results = bbox2out([res], clsid2catid)
+ segm_results = mask2out([res], clsid2catid, self.mask_head_resolution)
+ results = list()
+ import pycocotools.mask as mask_util
+ for index, xywh_res in enumerate(xywh_results):
+ del xywh_res['image_id']
+ xywh_res['mask'] = mask_util.decode(
+ segm_results[index]['segmentation'])
+ xywh_res['category'] = self.labels[xywh_res['category_id']]
+ results.append(xywh_res)
+ return results
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/README.md" b/paddlex/cv/models/slim/__init__.py
similarity index 100%
rename from "DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/1_Windows/README.md"
rename to paddlex/cv/models/slim/__init__.py
diff --git a/paddlex/cv/models/slim/post_quantization.py b/paddlex/cv/models/slim/post_quantization.py
new file mode 100644
index 0000000000000000000000000000000000000000..ead88401af76997c8b470011da41a75167b6a782
--- /dev/null
+++ b/paddlex/cv/models/slim/post_quantization.py
@@ -0,0 +1,223 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from paddle.fluid.contrib.slim.quantization.quantization_pass import QuantizationTransformPass
+from paddle.fluid.contrib.slim.quantization.quantization_pass import AddQuantDequantPass
+from paddle.fluid.contrib.slim.quantization.quantization_pass import _op_real_in_out_name
+from paddle.fluid.contrib.slim.quantization import PostTrainingQuantization
+import paddlex.utils.logging as logging
+import paddle.fluid as fluid
+import os
+
+
+class PaddleXPostTrainingQuantization(PostTrainingQuantization):
+ def __init__(self,
+ executor,
+ dataset,
+ program,
+ inputs,
+ outputs,
+ batch_size=10,
+ batch_nums=None,
+ scope=None,
+ algo="KL",
+ quantizable_op_type=["conv2d", "depthwise_conv2d", "mul"],
+ is_full_quantize=False,
+ is_use_cache_file=False,
+ cache_dir="./temp_post_training"):
+ '''
+ The class utilizes post training quantization methon to quantize the
+ fp32 model. It uses calibrate data to calculate the scale factor of
+ quantized variables, and inserts fake quant/dequant op to obtain the
+ quantized model.
+
+ Args:
+ executor(fluid.Executor): The executor to load, run and save the
+ quantized model.
+ dataset(Python Iterator): The data Reader.
+ program(fluid.Program): The paddle program, save the parameters for model.
+ inputs(dict): The input of prigram.
+ outputs(dict): The output of program.
+ batch_size(int, optional): The batch size of DataLoader. Default is 10.
+ batch_nums(int, optional): If batch_nums is not None, the number of
+ calibrate data is batch_size*batch_nums. If batch_nums is None, use
+ all data provided by sample_generator as calibrate data.
+ scope(fluid.Scope, optional): The scope of the program, use it to load
+ and save variables. If scope=None, get scope by global_scope().
+ algo(str, optional): If algo=KL, use KL-divergenc method to
+ get the more precise scale factor. If algo='direct', use
+ abs_max methon to get the scale factor. Default is KL.
+ quantizable_op_type(list[str], optional): List the type of ops
+ that will be quantized. Default is ["conv2d", "depthwise_conv2d",
+ "mul"].
+ is_full_quantized(bool, optional): If set is_full_quantized as True,
+ apply quantization to all supported quantizable op type. If set
+ is_full_quantized as False, only apply quantization to the op type
+ according to the input quantizable_op_type.
+ is_use_cache_file(bool, optional): If set is_use_cache_file as False,
+ all temp data will be saved in memory. If set is_use_cache_file as True,
+ it will save temp data to disk. When the fp32 model is complex or
+ the number of calibrate data is large, we should set is_use_cache_file
+ as True. Defalut is False.
+ cache_dir(str, optional): When is_use_cache_file is True, set cache_dir as
+ the directory for saving temp data. Default is ./temp_post_training.
+ Returns:
+ None
+ '''
+ self._executor = executor
+ self._dataset = dataset
+ self._batch_size = batch_size
+ self._batch_nums = batch_nums
+ self._scope = fluid.global_scope() if scope == None else scope
+ self._algo = algo
+ self._is_use_cache_file = is_use_cache_file
+ self._cache_dir = cache_dir
+ if self._is_use_cache_file and not os.path.exists(self._cache_dir):
+ os.mkdir(self._cache_dir)
+
+ supported_quantizable_op_type = \
+ QuantizationTransformPass._supported_quantizable_op_type + \
+ AddQuantDequantPass._supported_quantizable_op_type
+ if is_full_quantize:
+ self._quantizable_op_type = supported_quantizable_op_type
+ else:
+ self._quantizable_op_type = quantizable_op_type
+ for op_type in self._quantizable_op_type:
+ assert op_type in supported_quantizable_op_type + \
+ AddQuantDequantPass._activation_type, \
+ op_type + " is not supported for quantization."
+
+ self._place = self._executor.place
+ self._program = program
+ self._feed_list = list(inputs.values())
+ self._fetch_list = list(outputs.values())
+ self._data_loader = None
+
+ self._op_real_in_out_name = _op_real_in_out_name
+ self._bit_length = 8
+ self._quantized_weight_var_name = set()
+ self._quantized_act_var_name = set()
+ self._sampling_data = {}
+ self._quantized_var_scale_factor = {}
+
+ def quantize(self):
+ '''
+ Quantize the fp32 model. Use calibrate data to calculate the scale factor of
+ quantized variables, and inserts fake quant/dequant op to obtain the
+ quantized model.
+
+ Args:
+ None
+ Returns:
+ the program of quantized model.
+ '''
+ self._preprocess()
+
+ batch_id = 0
+ for data in self._data_loader():
+ self._executor.run(
+ program=self._program,
+ feed=data,
+ fetch_list=self._fetch_list,
+ return_numpy=False)
+ self._sample_data(batch_id)
+
+ if batch_id % 5 == 0:
+ logging.info("run batch: {}".format(batch_id))
+ batch_id += 1
+ if self._batch_nums and batch_id >= self._batch_nums:
+ break
+ logging.info("all run batch: ".format(batch_id))
+ logging.info("calculate scale factor ...")
+ self._calculate_scale_factor()
+ logging.info("update the program ...")
+ self._update_program()
+
+ self._save_output_scale()
+ return self._program
+
+ def save_quantized_model(self, save_model_path):
+ '''
+ Save the quantized model to the disk.
+
+ Args:
+ save_model_path(str): The path to save the quantized model
+ Returns:
+ None
+ '''
+ feed_vars_names = [var.name for var in self._feed_list]
+ fluid.io.save_inference_model(
+ dirname=save_model_path,
+ feeded_var_names=feed_vars_names,
+ target_vars=self._fetch_list,
+ executor=self._executor,
+ params_filename='__params__',
+ main_program=self._program)
+
+ def _preprocess(self):
+ '''
+ Load model and set data loader, collect the variable names for sampling,
+ and set activation variables to be persistable.
+ '''
+ feed_vars = [fluid.framework._get_var(var.name, self._program) \
+ for var in self._feed_list]
+
+ self._data_loader = fluid.io.DataLoader.from_generator(
+ feed_list=feed_vars, capacity=3 * self._batch_size, iterable=True)
+ self._data_loader.set_sample_list_generator(
+ self._dataset.generator(self._batch_size, drop_last=True),
+ places=self._place)
+
+ # collect the variable names for sampling
+ persistable_var_names = []
+ for var in self._program.list_vars():
+ if var.persistable:
+ persistable_var_names.append(var.name)
+
+ for op in self._program.global_block().ops:
+ op_type = op.type
+ if op_type in self._quantizable_op_type:
+ if op_type in ("conv2d", "depthwise_conv2d"):
+ self._quantized_act_var_name.add(op.input("Input")[0])
+ self._quantized_weight_var_name.add(op.input("Filter")[0])
+ self._quantized_act_var_name.add(op.output("Output")[0])
+ elif op_type == "mul":
+ if self._is_input_all_not_persistable(
+ op, persistable_var_names):
+ op._set_attr("skip_quant", True)
+ logging.warning(
+ "Skip quant a mul op for two input variables are not persistable"
+ )
+ else:
+ self._quantized_act_var_name.add(op.input("X")[0])
+ self._quantized_weight_var_name.add(op.input("Y")[0])
+ self._quantized_act_var_name.add(op.output("Out")[0])
+ else:
+ # process other quantizable op type, the input must all not persistable
+ if self._is_input_all_not_persistable(
+ op, persistable_var_names):
+ input_output_name_list = self._op_real_in_out_name[
+ op_type]
+ for input_name in input_output_name_list[0]:
+ for var_name in op.input(input_name):
+ self._quantized_act_var_name.add(var_name)
+ for output_name in input_output_name_list[1]:
+ for var_name in op.output(output_name):
+ self._quantized_act_var_name.add(var_name)
+
+ # set activation variables to be persistable, so can obtain
+ # the tensor data in sample_data
+ for var in self._program.list_vars():
+ if var.name in self._quantized_act_var_name:
+ var.persistable = True
diff --git a/paddlex/cv/models/slim/prune.py b/paddlex/cv/models/slim/prune.py
new file mode 100644
index 0000000000000000000000000000000000000000..1a57c1d7427c945fe090e29b9071b2e3c93088d2
--- /dev/null
+++ b/paddlex/cv/models/slim/prune.py
@@ -0,0 +1,311 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import numpy as np
+import yaml
+import time
+import pickle
+import os
+import os.path as osp
+from functools import reduce
+import paddle.fluid as fluid
+from multiprocessing import Process, Queue
+import paddleslim
+from paddleslim.prune import Pruner, load_sensitivities
+from paddleslim.core import GraphWrapper
+from .prune_config import get_prune_params
+import paddlex.utils.logging as logging
+from paddlex.utils import seconds_to_hms
+
+
+def sensitivity(program,
+ place,
+ param_names,
+ eval_func,
+ sensitivities_file=None,
+ pruned_ratios=None):
+ scope = fluid.global_scope()
+ graph = GraphWrapper(program)
+ sensitivities = load_sensitivities(sensitivities_file)
+
+ if pruned_ratios is None:
+ pruned_ratios = np.arange(0.1, 1, step=0.1)
+
+ total_evaluate_iters = 1
+ for name in param_names:
+ if name not in sensitivities:
+ sensitivities[name] = {}
+ total_evaluate_iters += len(list(pruned_ratios))
+ else:
+ total_evaluate_iters += (
+ len(list(pruned_ratios)) - len(sensitivities[name]))
+ eta = '-'
+ start_time = time.time()
+ progress = 1.0 / total_evaluate_iters
+ progress = "%.2f%%" % (progress * 100)
+ logging.info(
+ "Total evaluate iters={}, current={}, progress={}, eta={}".format(
+ total_evaluate_iters, 1, progress, eta),
+ use_color=True)
+ baseline = eval_func(graph.program)
+ cost = time.time() - start_time
+ eta = cost * (total_evaluate_iters - 1)
+ current_iter = 1
+ for name in sensitivities:
+ for ratio in pruned_ratios:
+ if ratio in sensitivities[name]:
+ logging.debug('{}, {} has computed.'.format(name, ratio))
+ continue
+
+ progress = float(current_iter) / total_evaluate_iters
+ progress = "%.2f%%" % (progress * 100)
+ logging.info(
+ "Total evaluate iters={}, current={}, progress={}, eta={}".
+ format(
+ total_evaluate_iters, current_iter, progress,
+ seconds_to_hms(
+ int(cost * (total_evaluate_iters - current_iter)))),
+ use_color=True)
+ current_iter += 1
+
+ pruner = Pruner()
+ logging.info("sensitive - param: {}; ratios: {}".format(
+ name, ratio))
+ pruned_program, param_backup, _ = pruner.prune(
+ program=graph.program,
+ scope=scope,
+ params=[name],
+ ratios=[ratio],
+ place=place,
+ lazy=True,
+ only_graph=False,
+ param_backup=True)
+ pruned_metric = eval_func(pruned_program)
+ loss = (baseline - pruned_metric) / baseline
+ logging.info("pruned param: {}; {}; loss={}".format(
+ name, ratio, loss))
+
+ sensitivities[name][ratio] = loss
+
+ with open(sensitivities_file, 'wb') as f:
+ pickle.dump(sensitivities, f)
+
+ for param_name in param_backup.keys():
+ param_t = scope.find_var(param_name).get_tensor()
+ param_t.set(param_backup[param_name], place)
+ return sensitivities
+
+
+def channel_prune(program, prune_names, prune_ratios, place, only_graph=False):
+ """通道裁剪。
+
+ Args:
+ program (paddle.fluid.Program): 需要裁剪的Program,Program的具体介绍可参见
+ https://paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basic_concept/program.html#program。
+ prune_names (list): 由裁剪参数名组成的参数列表。
+ prune_ratios (list): 由裁剪率组成的参数列表,与prune_names中的参数列表意义对应。
+ place (paddle.fluid.CUDAPlace/paddle.fluid.CPUPlace): 运行设备。
+ only_graph (bool): 是否只修改网络图,当为False时代表同时修改网络图和
+ scope(全局作用域)中的参数。默认为False。
+
+ Returns:
+ paddle.fluid.Program: 裁剪后的Program。
+ """
+ scope = fluid.global_scope()
+ pruner = Pruner()
+ program, _, _ = pruner.prune(
+ program,
+ scope,
+ params=prune_names,
+ ratios=prune_ratios,
+ place=place,
+ lazy=False,
+ only_graph=only_graph,
+ param_backup=False,
+ param_shape_backup=False)
+ return program
+
+
+def prune_program(model, prune_params_ratios=None):
+ """根据裁剪参数和裁剪率裁剪Program。
+
+ 1. 裁剪训练Program和测试Program。
+ 2. 使用裁剪后的Program更新模型中的train_prog和test_prog。
+ 【注意】Program的具体介绍可参见
+ https://paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basic_concept/program.html#program。
+
+ Args:
+ model (paddlex.cv.models): paddlex中的模型。
+ prune_params_ratios (dict): 由裁剪参数名和裁剪率组成的字典,当为None时
+ 使用默认裁剪参数名和裁剪率。默认为None。
+ """
+ place = model.places[0]
+ train_prog = model.train_prog
+ eval_prog = model.test_prog
+ valid_prune_names = get_prune_params(model)
+ assert set(list(prune_params_ratios.keys())) & set(valid_prune_names), \
+ "All params in 'prune_params_ratios' can't be pruned!"
+ prune_names = list(
+ set(list(prune_params_ratios.keys())) & set(valid_prune_names))
+ prune_ratios = [
+ prune_params_ratios[prune_name] for prune_name in prune_names
+ ]
+ model.train_prog = channel_prune(train_prog, prune_names, prune_ratios,
+ place)
+ model.test_prog = channel_prune(
+ eval_prog, prune_names, prune_ratios, place, only_graph=True)
+
+
+def update_program(program, model_dir, place):
+ """根据裁剪信息更新Program和参数。
+
+ Args:
+ program (paddle.fluid.Program): 需要更新的Program,Program的具体介绍可参见
+ https://paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basic_concept/program.html#program。
+ model_dir (str): 模型存储路径。
+ place (paddle.fluid.CUDAPlace/paddle.fluid.CPUPlace): 运行设备。
+
+ Returns:
+ paddle.fluid.Program: 更新后的Program。
+ """
+ graph = GraphWrapper(program)
+ with open(osp.join(model_dir, "prune.yml")) as f:
+ shapes = yaml.load(f.read(), Loader=yaml.Loader)
+ for param, shape in shapes.items():
+ graph.var(param).set_shape(shape)
+ for block in program.blocks:
+ for param in block.all_parameters():
+ if param.name in shapes:
+ param_tensor = fluid.global_scope().find_var(
+ param.name).get_tensor()
+ param_tensor.set(
+ np.zeros(list(shapes[param.name])).astype('float32'),
+ place)
+ graph.update_groups_of_conv()
+ graph.infer_shape()
+ return program
+
+
+def cal_params_sensitivities(model, save_file, eval_dataset, batch_size=8):
+ """计算模型中可裁剪卷积Kernel的敏感度。
+
+ 1. 获取模型中可裁剪卷积Kernel的名称。
+ 2. 计算每个可裁剪卷积Kernel不同裁剪率下的敏感度。
+ 【注意】卷积的敏感度是指在不同裁剪率下评估数据集预测精度的损失,
+ 通过得到的敏感度,可以决定最终模型需要裁剪的参数列表和各裁剪参数对应的裁剪率。
+
+ Args:
+ model (paddlex.cv.models): paddlex中的模型。
+ save_file (str): 计算的得到的sensetives文件存储路径。
+ eval_dataset (paddlex.datasets): 验证数据读取器。
+ batch_size (int): 验证数据批大小。默认为8。
+
+ Returns:
+ dict: 由参数名和不同裁剪率下敏感度组成的字典。存储的信息如下:
+ .. code-block:: python
+
+ {"weight_0":
+ {0.1: 0.22,
+ 0.2: 0.33
+ },
+ "weight_1":
+ {0.1: 0.21,
+ 0.2: 0.4
+ }
+ }
+
+ 其中``weight_0``是卷积Kernel名;``sensitivities['weight_0']``是一个字典,key是裁剪率,value是敏感度。
+ """
+ prune_names = get_prune_params(model)
+
+ def eval_for_prune(program):
+ eval_metrics = model.evaluate(
+ eval_dataset=eval_dataset,
+ batch_size=batch_size,
+ return_details=False)
+ primary_key = list(eval_metrics.keys())[0]
+ return eval_metrics[primary_key]
+
+ sensitivitives = sensitivity(
+ model.test_prog,
+ model.places[0],
+ prune_names,
+ eval_for_prune,
+ sensitivities_file=save_file,
+ pruned_ratios=list(np.arange(0.1, 1, 0.1)))
+ return sensitivitives
+
+
+def get_params_ratios(sensitivities_file, eval_metric_loss=0.05):
+ """根据设定的精度损失容忍度metric_loss_thresh和计算保存的模型参数敏感度信息文件sensetive_file,
+ 获取裁剪的参数配置。
+
+ 【注意】metric_loss_thresh并不确保最终裁剪后的模型在fine-tune后的模型效果,仅为预估值。
+
+ Args:
+ sensitivities_file (str): 敏感度文件存储路径。
+ eval_metric_loss (float): 可容忍的精度损失。默认为0.05。
+
+ Returns:
+ dict: 由参数名和裁剪率组成的字典。存储的信息如下:
+ .. code-block:: python
+
+ {"weight_0": 0.1,
+ "weight_1": 0.2
+ }
+
+ 其中key是卷积Kernel名;value是裁剪率。
+ """
+ if not osp.exists(sensitivities_file):
+ raise Exception('The sensitivities file is not exists!')
+ sensitivitives = paddleslim.prune.load_sensitivities(sensitivities_file)
+ params_ratios = paddleslim.prune.get_ratios_by_loss(
+ sensitivitives, eval_metric_loss)
+ return params_ratios
+
+
+def cal_model_size(program, place, sensitivities_file, eval_metric_loss=0.05):
+ """在可容忍的精度损失下,计算裁剪后模型大小相对于当前模型大小的比例。
+
+ Args:
+ program (paddle.fluid.Program): 需要裁剪的Program,Program的具体介绍可参见
+ https://paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basic_concept/program.html#program。
+ place (paddle.fluid.CUDAPlace/paddle.fluid.CPUPlace): 运行设备。
+ sensitivities_file (str): 敏感度文件存储路径。
+ eval_metric_loss (float): 可容忍的精度损失。默认为0.05。
+
+ Returns:
+ float: 裁剪后模型大小相对于当前模型大小的比例。
+ """
+ prune_params_ratios = get_params_ratios(sensitivities_file,
+ eval_metric_loss)
+ prune_program = channel_prune(
+ program,
+ list(prune_params_ratios.keys()),
+ list(prune_params_ratios.values()),
+ place,
+ only_graph=True)
+ origin_size = 0
+ new_size = 0
+ for var in program.list_vars():
+ name = var.name
+ shape = var.shape
+ for prune_block in prune_program.blocks:
+ if prune_block.has_var(name):
+ prune_var = prune_block.var(name)
+ prune_shape = prune_var.shape
+ break
+ origin_size += reduce(lambda x, y: x * y, shape)
+ new_size += reduce(lambda x, y: x * y, prune_shape)
+ return (new_size * 1.0) / origin_size
diff --git a/paddlex/cv/models/slim/prune_config.py b/paddlex/cv/models/slim/prune_config.py
new file mode 100644
index 0000000000000000000000000000000000000000..7eab6b7defce11f874e95b910f2287d6a17faec7
--- /dev/null
+++ b/paddlex/cv/models/slim/prune_config.py
@@ -0,0 +1,231 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import numpy as np
+import os.path as osp
+import paddle.fluid as fluid
+import paddlehub as hub
+import paddlex
+
+sensitivities_data = {
+ 'ResNet18':
+ 'https://bj.bcebos.com/paddlex/slim_prune/resnet18.sensitivities',
+ 'ResNet34':
+ 'https://bj.bcebos.com/paddlex/slim_prune/resnet34.sensitivities',
+ 'ResNet50':
+ 'https://bj.bcebos.com/paddlex/slim_prune/resnet50.sensitivities',
+ 'ResNet101':
+ 'https://bj.bcebos.com/paddlex/slim_prune/resnet101.sensitivities',
+ 'ResNet50_vd':
+ 'https://bj.bcebos.com/paddlex/slim_prune/resnet50vd.sensitivities',
+ 'ResNet101_vd':
+ 'https://bj.bcebos.com/paddlex/slim_prune/resnet101vd.sensitivities',
+ 'DarkNet53':
+ 'https://bj.bcebos.com/paddlex/slim_prune/darknet53.sensitivities',
+ 'MobileNetV1':
+ 'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv1.sensitivities',
+ 'MobileNetV2':
+ 'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv2.sensitivities',
+ 'MobileNetV3_large':
+ 'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv3_large.sensitivities',
+ 'MobileNetV3_small':
+ 'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv3_small.sensitivities',
+ 'DenseNet121':
+ 'https://bj.bcebos.com/paddlex/slim_prune/densenet121.sensitivities',
+ 'DenseNet161':
+ 'https://bj.bcebos.com/paddlex/slim_prune/densenet161.sensitivities',
+ 'DenseNet201':
+ 'https://bj.bcebos.com/paddlex/slim_prune/densenet201.sensitivities',
+ 'Xception41':
+ 'https://bj.bcebos.com/paddlex/slim_prune/xception41.sensitivities',
+ 'Xception65':
+ 'https://bj.bcebos.com/paddlex/slim_prune/xception65.sensitivities',
+ 'YOLOv3_MobileNetV1':
+ 'https://bj.bcebos.com/paddlex/slim_prune/yolov3_mobilenetv1.sensitivities',
+ 'YOLOv3_MobileNetV3_large':
+ 'https://bj.bcebos.com/paddlex/slim_prune/yolov3_mobilenetv3.sensitivities',
+ 'YOLOv3_DarkNet53':
+ 'https://bj.bcebos.com/paddlex/slim_prune/yolov3_darknet53.sensitivities',
+ 'YOLOv3_ResNet34':
+ 'https://bj.bcebos.com/paddlex/slim_prune/yolov3_resnet34.sensitivities',
+ 'UNet':
+ 'https://bj.bcebos.com/paddlex/slim_prune/unet.sensitivities',
+ 'DeepLabv3p_MobileNetV2_x0.25':
+ 'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x0.25_no_aspp_decoder.sensitivities',
+ 'DeepLabv3p_MobileNetV2_x0.5':
+ 'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x0.5_no_aspp_decoder.sensitivities',
+ 'DeepLabv3p_MobileNetV2_x1.0':
+ 'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x1.0_no_aspp_decoder.sensitivities',
+ 'DeepLabv3p_MobileNetV2_x1.5':
+ 'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x1.5_no_aspp_decoder.sensitivities',
+ 'DeepLabv3p_MobileNetV2_x2.0':
+ 'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x2.0_no_aspp_decoder.sensitivities',
+ 'DeepLabv3p_MobileNetV2_x0.25_aspp_decoder':
+ 'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x0.25_with_aspp_decoder.sensitivities',
+ 'DeepLabv3p_MobileNetV2_x0.5_aspp_decoder':
+ 'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x0.5_with_aspp_decoder.sensitivities',
+ 'DeepLabv3p_MobileNetV2_x1.0_aspp_decoder':
+ 'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x1.0_with_aspp_decoder.sensitivities',
+ 'DeepLabv3p_MobileNetV2_x1.5_aspp_decoder':
+ 'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x1.5_with_aspp_decoder.sensitivities',
+ 'DeepLabv3p_MobileNetV2_x2.0_aspp_decoder':
+ 'https://bj.bcebos.com/paddlex/slim_prune/deeplab_mobilenetv2_x2.0_with_aspp_decoder.sensitivities',
+ 'DeepLabv3p_Xception65_aspp_decoder':
+ 'https://bj.bcebos.com/paddlex/slim_prune/deeplab_xception65_with_aspp_decoder.sensitivities',
+ 'DeepLabv3p_Xception41_aspp_decoder':
+ 'https://bj.bcebos.com/paddlex/slim_prune/deeplab_xception41_with_aspp_decoder.sensitivities'
+}
+
+
+def get_sensitivities(flag, model, save_dir):
+ model_name = model.__class__.__name__
+ model_type = model_name
+ if hasattr(model, 'backbone'):
+ model_type = model_name + '_' + model.backbone
+ if model_type.startswith('DeepLabv3p_Xception'):
+ model_type = model_type + '_' + 'aspp' + '_' + 'decoder'
+ elif hasattr(model, 'encoder_with_aspp') or hasattr(
+ model, 'enable_decoder'):
+ model_type = model_type + '_' + 'aspp' + '_' + 'decoder'
+ if osp.isfile(flag):
+ return flag
+ elif flag == 'DEFAULT':
+ assert model_type in sensitivities_data, "There is not sensitivities data file for {}, you may need to calculate it by your self.".format(
+ model_type)
+ url = sensitivities_data[model_type]
+ fname = osp.split(url)[-1]
+ try:
+ hub.download(fname, save_path=save_dir)
+ except Exception as e:
+ if isinstance(e, hub.ResourceNotFoundError):
+ raise Exception(
+ "Resource for model {}(key='{}') not found".format(
+ model_type, fname))
+ elif isinstance(e, hub.ServerConnectionError):
+ raise Exception(
+ "Cannot get reource for model {}(key='{}'), please check your internet connecgtion"
+ .format(model_type, fname))
+ else:
+ raise Exception(
+ "Unexpected error, please make sure paddlehub >= 1.6.2 {}".
+ format(str(e)))
+ return osp.join(save_dir, fname)
+ else:
+ raise Exception(
+ "sensitivities need to be defined as directory path or `DEFAULT`(download sensitivities automatically)."
+ )
+
+
+def get_prune_params(model):
+ prune_names = []
+ model_type = model.__class__.__name__
+ if model_type == 'BaseClassifier':
+ model_type = model.model_name
+ if hasattr(model, 'backbone'):
+ backbone = model.backbone
+ model_type += ('_' + backbone)
+ program = model.test_prog
+ if model_type.startswith('ResNet') or \
+ model_type.startswith('DenseNet') or \
+ model_type.startswith('DarkNet'):
+ for block in program.blocks:
+ for param in block.all_parameters():
+ pd_var = fluid.global_scope().find_var(param.name)
+ pd_param = pd_var.get_tensor()
+ if len(np.array(pd_param).shape) == 4:
+ prune_names.append(param.name)
+ elif model_type == "MobileNetV1":
+ prune_names.append("conv1_weights")
+ for param in program.global_block().all_parameters():
+ if "_sep_weights" in param.name:
+ prune_names.append(param.name)
+ elif model_type == "MobileNetV2":
+ for param in program.global_block().all_parameters():
+ if 'weight' not in param.name \
+ or 'dwise' in param.name \
+ or 'fc' in param.name :
+ continue
+ prune_names.append(param.name)
+ elif model_type.startswith("MobileNetV3"):
+ if model_type == 'MobileNetV3_small':
+ expand_prune_id = [3, 4]
+ else:
+ expand_prune_id = [2, 3, 4, 8, 9, 11]
+ for param in program.global_block().all_parameters():
+ if ('expand_weights' in param.name and \
+ int(param.name.split('_')[0][4:]) in expand_prune_id)\
+ or 'linear_weights' in param.name \
+ or 'se_1_weights' in param.name:
+ prune_names.append(param.name)
+ elif model_type.startswith('Xception') or \
+ model_type.startswith('DeepLabv3p_Xception'):
+ params_not_prune = [
+ 'weights',
+ 'xception_{}/exit_flow/block2/separable_conv3/pointwise/weights'.
+ format(model_type[-2:]), 'encoder/concat/weights',
+ 'decoder/concat/weights'
+ ]
+ for param in program.global_block().all_parameters():
+ if 'weight' not in param.name \
+ or 'dwise' in param.name \
+ or 'depthwise' in param.name \
+ or 'logit' in param.name:
+ continue
+ if param.name in params_not_prune:
+ continue
+ prune_names.append(param.name)
+ elif model_type.startswith('YOLOv3'):
+ for block in program.blocks:
+ for param in block.all_parameters():
+ if 'weights' in param.name and 'yolo_block' in param.name:
+ prune_names.append(param.name)
+ elif model_type.startswith('UNet'):
+ for param in program.global_block().all_parameters():
+ if 'weight' not in param.name:
+ continue
+ if 'logit' in param.name:
+ continue
+ prune_names.append(param.name)
+ params_not_prune = [
+ 'encode/block4/down/conv1/weights',
+ 'encode/block3/down/conv1/weights',
+ 'encode/block2/down/conv1/weights', 'encode/block1/conv1/weights'
+ ]
+ for i in params_not_prune:
+ if i in prune_names:
+ prune_names.remove(i)
+
+ elif model_type.startswith('DeepLabv3p'):
+ for param in program.global_block().all_parameters():
+ if 'weight' not in param.name:
+ continue
+ if 'dwise' in param.name or 'depthwise' in param.name or 'logit' in param.name:
+ continue
+ prune_names.append(param.name)
+ params_not_prune = [
+ 'xception_{}/exit_flow/block2/separable_conv3/pointwise/weights'.
+ format(model_type[-2:]), 'encoder/concat/weights',
+ 'decoder/concat/weights'
+ ]
+ if model.encoder_with_aspp == True:
+ params_not_prune.append(
+ 'xception_{}/exit_flow/block2/separable_conv3/pointwise/weights'
+ .format(model_type[-2:]))
+ params_not_prune.append('conv8_1_linear_weights')
+ for i in params_not_prune:
+ if i in prune_names:
+ prune_names.remove(i)
+ else:
+ raise Exception('The {} is not implement yet!'.format(model_type))
+ return prune_names
diff --git a/paddlex/cv/models/slim/visualize.py b/paddlex/cv/models/slim/visualize.py
new file mode 100644
index 0000000000000000000000000000000000000000..5fcbad9865c3a356ef514098fa236112a1cb3169
--- /dev/null
+++ b/paddlex/cv/models/slim/visualize.py
@@ -0,0 +1,64 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os.path as osp
+import tqdm
+import numpy as np
+import matplotlib
+matplotlib.use('Agg')
+import matplotlib.pyplot as plt
+from .prune import cal_model_size
+from paddleslim.prune import load_sensitivities
+
+
+def visualize(model, sensitivities_file, save_dir='./'):
+ """将模型裁剪率和每个参数裁剪后精度损失的关系可视化。
+ 可视化结果纵轴为eval_metric_loss参数值,横轴为对应的模型被裁剪的比例
+
+ Args:
+ model (paddlex.cv.models): paddlex中的模型。
+ sensitivities_file (str): 敏感度文件存储路径。
+ """
+ program = model.test_prog
+ place = model.places[0]
+ fig = plt.figure()
+ plt.xlabel("model prune ratio")
+ plt.ylabel("evaluation loss")
+ title_name = osp.split(sensitivities_file)[-1].split('.')[0]
+ plt.title(title_name)
+ plt.grid(linestyle='--', linewidth=0.5)
+ x = list()
+ y = list()
+ for loss_thresh in tqdm.tqdm(list(np.arange(0.05, 1, 0.05))):
+ prune_ratio = 1 - cal_model_size(
+ program, place, sensitivities_file, eval_metric_loss=loss_thresh)
+ x.append(prune_ratio)
+ y.append(loss_thresh)
+ plt.plot(x, y, color='green', linewidth=0.5, marker='o', markersize=3)
+ my_x_ticks = np.arange(
+ min(np.array(x)) - 0.01,
+ max(np.array(x)) + 0.01, 0.05)
+ my_y_ticks = np.arange(0.05, 1, 0.05)
+ plt.xticks(my_x_ticks, fontsize=3)
+ plt.yticks(my_y_ticks, fontsize=3)
+ for a, b in zip(x, y):
+ plt.text(
+ a,
+ b, (float('%0.4f' % a), float('%0.3f' % b)),
+ ha='center',
+ va='bottom',
+ fontsize=3)
+ suffix = osp.splitext(sensitivities_file)[-1]
+ plt.savefig('sensitivities.png', dpi=800)
+ plt.close()
diff --git a/paddlex/cv/models/unet.py b/paddlex/cv/models/unet.py
new file mode 100644
index 0000000000000000000000000000000000000000..a327b1e14091f106f58e042bf9c1ada8aa97f722
--- /dev/null
+++ b/paddlex/cv/models/unet.py
@@ -0,0 +1,149 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+from __future__ import absolute_import
+import paddlex
+from collections import OrderedDict
+from .deeplabv3p import DeepLabv3p
+
+
+class UNet(DeepLabv3p):
+ """实现UNet网络的构建并进行训练、评估、预测和模型导出。
+
+ Args:
+ num_classes (int): 类别数。
+ upsample_mode (str): UNet decode时采用的上采样方式,取值为'bilinear'时利用双线行差值进行上菜样,
+ 当输入其他选项时则利用反卷积进行上菜样,默认为'bilinear'。
+ use_bce_loss (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。默认False。
+ use_dice_loss (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。
+ 当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。默认False。
+ class_weight (list/str): 交叉熵损失函数各类损失的权重。当class_weight为list的时候,长度应为
+ num_classes。当class_weight为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重
+ 自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,
+ 即平时使用的交叉熵损失函数。
+ ignore_index (int): label上忽略的值,label为ignore_index的像素不参与损失函数的计算。默认255。
+
+ Raises:
+ ValueError: use_bce_loss或use_dice_loss为真且num_calsses > 2。
+ ValueError: class_weight为list, 但长度不等于num_class。
+ class_weight为str, 但class_weight.low()不等于dynamic。
+ TypeError: class_weight不为None时,其类型不是list或str。
+ """
+
+ def __init__(self,
+ num_classes=2,
+ upsample_mode='bilinear',
+ use_bce_loss=False,
+ use_dice_loss=False,
+ class_weight=None,
+ ignore_index=255):
+ self.init_params = locals()
+ super(DeepLabv3p, self).__init__('segmenter')
+ # dice_loss或bce_loss只适用两类分割中
+ if num_classes > 2 and (use_bce_loss or use_dice_loss):
+ raise ValueError(
+ "dice loss and bce loss is only applicable to binary classfication"
+ )
+
+ if class_weight is not None:
+ if isinstance(class_weight, list):
+ if len(class_weight) != num_classes:
+ raise ValueError(
+ "Length of class_weight should be equal to number of classes"
+ )
+ elif isinstance(class_weight, str):
+ if class_weight.lower() != 'dynamic':
+ raise ValueError(
+ "if class_weight is string, must be dynamic!")
+ else:
+ raise TypeError(
+ 'Expect class_weight is a list or string but receive {}'.
+ format(type(class_weight)))
+ self.num_classes = num_classes
+ self.upsample_mode = upsample_mode
+ self.use_bce_loss = use_bce_loss
+ self.use_dice_loss = use_dice_loss
+ self.class_weight = class_weight
+ self.ignore_index = ignore_index
+ self.labels = None
+
+ def build_net(self, mode='train'):
+ model = paddlex.cv.nets.segmentation.UNet(
+ self.num_classes,
+ mode=mode,
+ upsample_mode=self.upsample_mode,
+ use_bce_loss=self.use_bce_loss,
+ use_dice_loss=self.use_dice_loss,
+ class_weight=self.class_weight,
+ ignore_index=self.ignore_index)
+ inputs = model.generate_inputs()
+ model_out = model.build_net(inputs)
+ outputs = OrderedDict()
+ if mode == 'train':
+ self.optimizer.minimize(model_out)
+ outputs['loss'] = model_out
+ elif mode == 'eval':
+ outputs['loss'] = model_out[0]
+ outputs['pred'] = model_out[1]
+ outputs['label'] = model_out[2]
+ outputs['mask'] = model_out[3]
+ else:
+ outputs['pred'] = model_out[0]
+ outputs['logit'] = model_out[1]
+ return inputs, outputs
+
+ def train(self,
+ num_epochs,
+ train_dataset,
+ train_batch_size=2,
+ eval_dataset=None,
+ save_interval_epochs=1,
+ log_interval_steps=2,
+ save_dir='output',
+ pretrain_weights='COCO',
+ optimizer=None,
+ learning_rate=0.01,
+ lr_decay_power=0.9,
+ use_vdl=False,
+ sensitivities_file=None,
+ eval_metric_loss=0.05):
+ """训练。
+
+ Args:
+ num_epochs (int): 训练迭代轮数。
+ train_dataset (paddlex.datasets): 训练数据读取器。
+ train_batch_size (int): 训练数据batch大小。同时作为验证数据batch大小。默认2。
+ eval_dataset (paddlex.datasets): 评估数据读取器。
+ save_interval_epochs (int): 模型保存间隔(单位:迭代轮数)。默认为1。
+ log_interval_steps (int): 训练日志输出间隔(单位:迭代次数)。默认为2。
+ save_dir (str): 模型保存路径。默认'output'。
+ pretrain_weights (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'COCO',
+ 则自动下载在COCO图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认为'COCO'。
+ optimizer (paddle.fluid.optimizer): 优化器。当改参数为None时,使用默认的优化器:使用
+ fluid.optimizer.Momentum优化方法,polynomial的学习率衰减策略。
+ learning_rate (float): 默认优化器的初始学习率。默认0.01。
+ lr_decay_power (float): 默认优化器学习率多项式衰减系数。默认0.9。
+ use_vdl (bool): 是否使用VisualDL进行可视化。默认False。
+ sensitivities_file (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',
+ 则自动下载在ImageNet图片数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
+ eval_metric_loss (float): 可容忍的精度损失。默认为0.05。
+
+ Raises:
+ ValueError: 模型从inference model进行加载。
+ """
+ return super(UNet, self).train(
+ num_epochs, train_dataset, train_batch_size, eval_dataset,
+ save_interval_epochs, log_interval_steps, save_dir,
+ pretrain_weights, optimizer, learning_rate, lr_decay_power,
+ use_vdl, sensitivities_file, eval_metric_loss)
diff --git "a/DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/2_Ubuntu/README.md" b/paddlex/cv/models/utils/__init__.py
similarity index 100%
rename from "DataAnnotation/\346\240\207\346\263\250\345\267\245\345\205\267\345\256\211\350\243\205\345\222\214\344\275\277\347\224\250/2_Ubuntu/README.md"
rename to paddlex/cv/models/utils/__init__.py
diff --git a/paddlex/cv/models/utils/detection_eval.py b/paddlex/cv/models/utils/detection_eval.py
new file mode 100644
index 0000000000000000000000000000000000000000..b9dcdaa029265483c2b9fb919426686c36a411f5
--- /dev/null
+++ b/paddlex/cv/models/utils/detection_eval.py
@@ -0,0 +1,768 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+from __future__ import absolute_import
+
+import numpy as np
+import json
+import os
+import sys
+import cv2
+import copy
+import paddlex.utils.logging as logging
+
+# fix linspace problem for pycocotools while numpy > 1.17.2
+backup_linspace = np.linspace
+
+
+def fixed_linspace(start,
+ stop,
+ num=50,
+ endpoint=True,
+ retstep=False,
+ dtype=None,
+ axis=0):
+ num = int(num)
+ return backup_linspace(start, stop, num, endpoint, retstep, dtype, axis)
+
+
+def eval_results(results,
+ metric,
+ coco_gt,
+ with_background=True,
+ resolution=None,
+ is_bbox_normalized=False,
+ map_type='11point'):
+ """Evaluation for evaluation program results"""
+ box_ap_stats = []
+ coco_gt_data = copy.deepcopy(coco_gt)
+ eval_details = {'gt': copy.deepcopy(coco_gt.dataset)}
+ if metric == 'COCO':
+ np.linspace = fixed_linspace
+ if 'proposal' in results[0]:
+ proposal_eval(results, coco_gt_data)
+ if 'bbox' in results[0]:
+ box_ap_stats, xywh_results = coco_bbox_eval(
+ results,
+ coco_gt_data,
+ with_background,
+ is_bbox_normalized=is_bbox_normalized)
+
+ if 'mask' in results[0]:
+ mask_ap_stats, segm_results = mask_eval(results, coco_gt_data,
+ resolution)
+ ap_stats = [box_ap_stats, mask_ap_stats]
+ eval_details['bbox'] = xywh_results
+ eval_details['mask'] = segm_results
+ return ap_stats, eval_details
+ np.linspace = backup_linspace
+ else:
+ if 'accum_map' in results[-1]:
+ res = np.mean(results[-1]['accum_map'][0])
+ logging.debug('mAP: {:.2f}'.format(res * 100.))
+ box_ap_stats.append(res * 100.)
+ elif 'bbox' in results[0]:
+ box_ap, xywh_results = voc_bbox_eval(
+ results,
+ coco_gt_data,
+ with_background,
+ is_bbox_normalized=is_bbox_normalized,
+ map_type=map_type)
+ box_ap_stats.append(box_ap)
+ eval_details['bbox'] = xywh_results
+ return box_ap_stats, eval_details
+
+
+def proposal_eval(results, coco_gt, outputfile, max_dets=(100, 300, 1000)):
+ assert 'proposal' in results[0]
+ assert outfile.endswith('.json')
+
+ xywh_results = proposal2out(results)
+ assert len(
+ xywh_results) > 0, "The number of valid proposal detected is zero.\n \
+ Please use reasonable model and check input data."
+
+ with open(outfile, 'w') as f:
+ json.dump(xywh_results, f)
+
+ cocoapi_eval(xywh_results, 'proposal', coco_gt=coco_gt, max_dets=max_dets)
+ # flush coco evaluation result
+ sys.stdout.flush()
+
+
+def coco_bbox_eval(results,
+ coco_gt,
+ with_background=True,
+ is_bbox_normalized=False):
+ assert 'bbox' in results[0]
+ from pycocotools.coco import COCO
+
+ cat_ids = coco_gt.getCatIds()
+
+ # when with_background = True, mapping category to classid, like:
+ # background:0, first_class:1, second_class:2, ...
+ clsid2catid = dict(
+ {i + int(with_background): catid
+ for i, catid in enumerate(cat_ids)})
+
+ xywh_results = bbox2out(
+ results, clsid2catid, is_bbox_normalized=is_bbox_normalized)
+
+ results = copy.deepcopy(xywh_results)
+ if len(xywh_results) == 0:
+ logging.warning(
+ "The number of valid bbox detected is zero.\n Please use reasonable model and check input data.\n stop eval!"
+ )
+ return [0.0], results
+
+ map_stats = cocoapi_eval(xywh_results, 'bbox', coco_gt=coco_gt)
+ # flush coco evaluation result
+ sys.stdout.flush()
+ return map_stats, results
+
+
+def loadRes(coco_obj, anns):
+ """
+ Load result file and return a result api object.
+ :param resFile (str) : file name of result file
+ :return: res (obj) : result api object
+ """
+ from pycocotools.coco import COCO
+ import pycocotools.mask as maskUtils
+ import time
+ res = COCO()
+ res.dataset['images'] = [img for img in coco_obj.dataset['images']]
+
+ tic = time.time()
+ assert type(anns) == list, 'results in not an array of objects'
+ annsImgIds = [ann['image_id'] for ann in anns]
+ assert set(annsImgIds) == (set(annsImgIds) & set(coco_obj.getImgIds())), \
+ 'Results do not correspond to current coco set'
+ if 'caption' in anns[0]:
+ imgIds = set([img['id'] for img in res.dataset['images']]) & set(
+ [ann['image_id'] for ann in anns])
+ res.dataset['images'] = [
+ img for img in res.dataset['images'] if img['id'] in imgIds
+ ]
+ for id, ann in enumerate(anns):
+ ann['id'] = id + 1
+ elif 'bbox' in anns[0] and not anns[0]['bbox'] == []:
+ res.dataset['categories'] = copy.deepcopy(
+ coco_obj.dataset['categories'])
+ for id, ann in enumerate(anns):
+ bb = ann['bbox']
+ x1, x2, y1, y2 = [bb[0], bb[0] + bb[2], bb[1], bb[1] + bb[3]]
+ if not 'segmentation' in ann:
+ ann['segmentation'] = [[x1, y1, x1, y2, x2, y2, x2, y1]]
+ ann['area'] = bb[2] * bb[3]
+ ann['id'] = id + 1
+ ann['iscrowd'] = 0
+ elif 'segmentation' in anns[0]:
+ res.dataset['categories'] = copy.deepcopy(
+ coco_obj.dataset['categories'])
+ for id, ann in enumerate(anns):
+ # now only support compressed RLE format as segmentation results
+ ann['area'] = maskUtils.area(ann['segmentation'])
+ if not 'bbox' in ann:
+ ann['bbox'] = maskUtils.toBbox(ann['segmentation'])
+ ann['id'] = id + 1
+ ann['iscrowd'] = 0
+ elif 'keypoints' in anns[0]:
+ res.dataset['categories'] = copy.deepcopy(
+ coco_obj.dataset['categories'])
+ for id, ann in enumerate(anns):
+ s = ann['keypoints']
+ x = s[0::3]
+ y = s[1::3]
+ x0, x1, y0, y1 = np.min(x), np.max(x), np.min(y), np.max(y)
+ ann['area'] = (x1 - x0) * (y1 - y0)
+ ann['id'] = id + 1
+ ann['bbox'] = [x0, y0, x1 - x0, y1 - y0]
+
+ res.dataset['annotations'] = anns
+ res.createIndex()
+ return res
+
+
+def mask_eval(results, coco_gt, resolution, thresh_binarize=0.5):
+ assert 'mask' in results[0]
+ from pycocotools.coco import COCO
+
+ clsid2catid = {i + 1: v for i, v in enumerate(coco_gt.getCatIds())}
+
+ segm_results = mask2out(results, clsid2catid, resolution, thresh_binarize)
+ results = copy.deepcopy(segm_results)
+ if len(segm_results) == 0:
+ logging.warning(
+ "The number of valid mask detected is zero.\n Please use reasonable model and check input data."
+ )
+ return None, results
+
+ map_stats = cocoapi_eval(segm_results, 'segm', coco_gt=coco_gt)
+ return map_stats, results
+
+
+def cocoapi_eval(anns,
+ style,
+ coco_gt=None,
+ anno_file=None,
+ max_dets=(100, 300, 1000)):
+ """
+ Args:
+ anns: Evaluation result.
+ style: COCOeval style, can be `bbox` , `segm` and `proposal`.
+ coco_gt: Whether to load COCOAPI through anno_file,
+ eg: coco_gt = COCO(anno_file)
+ anno_file: COCO annotations file.
+ max_dets: COCO evaluation maxDets.
+ """
+ assert coco_gt != None or anno_file != None
+ from pycocotools.coco import COCO
+ from pycocotools.cocoeval import COCOeval
+
+ if coco_gt == None:
+ coco_gt = COCO(anno_file)
+ logging.debug("Start evaluate...")
+ coco_dt = loadRes(coco_gt, anns)
+ if style == 'proposal':
+ coco_eval = COCOeval(coco_gt, coco_dt, 'bbox')
+ coco_eval.params.useCats = 0
+ coco_eval.params.maxDets = list(max_dets)
+ else:
+ coco_eval = COCOeval(coco_gt, coco_dt, style)
+ coco_eval.evaluate()
+ coco_eval.accumulate()
+ coco_eval.summarize()
+ return coco_eval.stats
+
+
+def proposal2out(results, is_bbox_normalized=False):
+ xywh_res = []
+ for t in results:
+ bboxes = t['proposal'][0]
+ lengths = t['proposal'][1][0]
+ im_ids = np.array(t['im_id'][0]).flatten()
+ assert len(lengths) == im_ids.size
+ if bboxes.shape == (1, 1) or bboxes is None:
+ continue
+
+ k = 0
+ for i in range(len(lengths)):
+ num = lengths[i]
+ im_id = int(im_ids[i])
+ for j in range(num):
+ dt = bboxes[k]
+ xmin, ymin, xmax, ymax = dt.tolist()
+
+ if is_bbox_normalized:
+ xmin, ymin, xmax, ymax = \
+ clip_bbox([xmin, ymin, xmax, ymax])
+ w = xmax - xmin
+ h = ymax - ymin
+ else:
+ w = xmax - xmin + 1
+ h = ymax - ymin + 1
+
+ bbox = [xmin, ymin, w, h]
+ coco_res = {
+ 'image_id': im_id,
+ 'category_id': 1,
+ 'bbox': bbox,
+ 'score': 1.0
+ }
+ xywh_res.append(coco_res)
+ k += 1
+ return xywh_res
+
+
+def bbox2out(results, clsid2catid, is_bbox_normalized=False):
+ """
+ Args:
+ results: request a dict, should include: `bbox`, `im_id`,
+ if is_bbox_normalized=True, also need `im_shape`.
+ clsid2catid: class id to category id map of COCO2017 dataset.
+ is_bbox_normalized: whether or not bbox is normalized.
+ """
+ xywh_res = []
+ for t in results:
+ bboxes = t['bbox'][0]
+ lengths = t['bbox'][1][0]
+ im_ids = np.array(t['im_id'][0]).flatten()
+ if bboxes.shape == (1, 1) or bboxes is None:
+ continue
+
+ k = 0
+ for i in range(len(lengths)):
+ num = lengths[i]
+ im_id = int(im_ids[i])
+ for j in range(num):
+ dt = bboxes[k]
+ clsid, score, xmin, ymin, xmax, ymax = dt.tolist()
+ catid = (clsid2catid[int(clsid)])
+
+ if is_bbox_normalized:
+ xmin, ymin, xmax, ymax = \
+ clip_bbox([xmin, ymin, xmax, ymax])
+ w = xmax - xmin
+ h = ymax - ymin
+ im_shape = t['im_shape'][0][i].tolist()
+ im_height, im_width = int(im_shape[0]), int(im_shape[1])
+ xmin *= im_width
+ ymin *= im_height
+ w *= im_width
+ h *= im_height
+ else:
+ w = xmax - xmin + 1
+ h = ymax - ymin + 1
+
+ bbox = [xmin, ymin, w, h]
+ coco_res = {
+ 'image_id': im_id,
+ 'category_id': catid,
+ 'bbox': bbox,
+ 'score': score
+ }
+ xywh_res.append(coco_res)
+ k += 1
+ return xywh_res
+
+
+def mask2out(results, clsid2catid, resolution, thresh_binarize=0.5):
+ import pycocotools.mask as mask_util
+ scale = (resolution + 2.0) / resolution
+
+ segm_res = []
+
+ # for each batch
+ for t in results:
+ bboxes = t['bbox'][0]
+
+ lengths = t['bbox'][1][0]
+ im_ids = np.array(t['im_id'][0])
+ if bboxes.shape == (1, 1) or bboxes is None:
+ continue
+ if len(bboxes.tolist()) == 0:
+ continue
+
+ masks = t['mask'][0]
+
+ s = 0
+ # for each sample
+ for i in range(len(lengths)):
+ num = lengths[i]
+ im_id = int(im_ids[i][0])
+ im_shape = t['im_shape'][0][i]
+
+ bbox = bboxes[s:s + num][:, 2:]
+ clsid_scores = bboxes[s:s + num][:, 0:2]
+ mask = masks[s:s + num]
+ s += num
+
+ im_h = int(im_shape[0])
+ im_w = int(im_shape[1])
+
+ expand_bbox = expand_boxes(bbox, scale)
+ expand_bbox = expand_bbox.astype(np.int32)
+
+ padded_mask = np.zeros((resolution + 2, resolution + 2),
+ dtype=np.float32)
+
+ for j in range(num):
+ xmin, ymin, xmax, ymax = expand_bbox[j].tolist()
+ clsid, score = clsid_scores[j].tolist()
+ clsid = int(clsid)
+ padded_mask[1:-1, 1:-1] = mask[j, clsid, :, :]
+
+ catid = clsid2catid[clsid]
+
+ w = xmax - xmin + 1
+ h = ymax - ymin + 1
+ w = np.maximum(w, 1)
+ h = np.maximum(h, 1)
+
+ resized_mask = cv2.resize(padded_mask, (w, h))
+ resized_mask = np.array(
+ resized_mask > thresh_binarize, dtype=np.uint8)
+ im_mask = np.zeros((im_h, im_w), dtype=np.uint8)
+
+ x0 = min(max(xmin, 0), im_w)
+ x1 = min(max(xmax + 1, 0), im_w)
+ y0 = min(max(ymin, 0), im_h)
+ y1 = min(max(ymax + 1, 0), im_h)
+
+ im_mask[y0:y1, x0:x1] = resized_mask[(y0 - ymin):(y1 - ymin), (
+ x0 - xmin):(x1 - xmin)]
+ segm = mask_util.encode(
+ np.array(im_mask[:, :, np.newaxis], order='F'))[0]
+ catid = clsid2catid[clsid]
+ segm['counts'] = segm['counts'].decode('utf8')
+ coco_res = {
+ 'image_id': im_id,
+ 'category_id': catid,
+ 'segmentation': segm,
+ 'score': score
+ }
+ segm_res.append(coco_res)
+ return segm_res
+
+
+def expand_boxes(boxes, scale):
+ """
+ Expand an array of boxes by a given scale.
+ """
+ w_half = (boxes[:, 2] - boxes[:, 0]) * .5
+ h_half = (boxes[:, 3] - boxes[:, 1]) * .5
+ x_c = (boxes[:, 2] + boxes[:, 0]) * .5
+ y_c = (boxes[:, 3] + boxes[:, 1]) * .5
+
+ w_half *= scale
+ h_half *= scale
+
+ boxes_exp = np.zeros(boxes.shape)
+ boxes_exp[:, 0] = x_c - w_half
+ boxes_exp[:, 2] = x_c + w_half
+ boxes_exp[:, 1] = y_c - h_half
+ boxes_exp[:, 3] = y_c + h_half
+
+ return boxes_exp
+
+
+def voc_bbox_eval(results,
+ coco_gt,
+ with_background=False,
+ overlap_thresh=0.5,
+ map_type='11point',
+ is_bbox_normalized=False,
+ evaluate_difficult=False):
+ """
+ Bounding box evaluation for VOC dataset
+
+ Args:
+ results (list): prediction bounding box results.
+ class_num (int): evaluation class number.
+ overlap_thresh (float): the postive threshold of
+ bbox overlap
+ map_type (string): method for mAP calcualtion,
+ can only be '11point' or 'integral'
+ is_bbox_normalized (bool): whether bbox is normalized
+ to range [0, 1].
+ evaluate_difficult (bool): whether to evaluate
+ difficult gt bbox.
+ """
+ assert 'bbox' in results[0]
+
+ logging.debug("Start evaluate...")
+ from pycocotools.coco import COCO
+
+ cat_ids = coco_gt.getCatIds()
+
+ # when with_background = True, mapping category to classid, like:
+ # background:0, first_class:1, second_class:2, ...
+ clsid2catid = dict(
+ {i + int(with_background): catid
+ for i, catid in enumerate(cat_ids)})
+ class_num = len(clsid2catid) + int(with_background)
+ detection_map = DetectionMAP(
+ class_num=class_num,
+ overlap_thresh=overlap_thresh,
+ map_type=map_type,
+ is_bbox_normalized=is_bbox_normalized,
+ evaluate_difficult=evaluate_difficult)
+
+ xywh_res = []
+ det_nums = 0
+ gt_nums = 0
+ for t in results:
+ bboxes = t['bbox'][0]
+ bbox_lengths = t['bbox'][1][0]
+ im_ids = np.array(t['im_id'][0]).flatten()
+ if bboxes.shape == (1, 1) or bboxes is None:
+ continue
+
+ gt_boxes = t['gt_box'][0]
+ gt_labels = t['gt_label'][0]
+ difficults = t['is_difficult'][0] if not evaluate_difficult \
+ else None
+
+ if len(t['gt_box'][1]) == 0:
+ # gt_box, gt_label, difficult read as zero padded Tensor
+ bbox_idx = 0
+ for i in range(len(gt_boxes)):
+ gt_box = gt_boxes[i]
+ gt_label = gt_labels[i]
+ difficult = None if difficults is None \
+ else difficults[i]
+ bbox_num = bbox_lengths[i]
+ bbox = bboxes[bbox_idx:bbox_idx + bbox_num]
+ gt_box, gt_label, difficult = prune_zero_padding(
+ gt_box, gt_label, difficult)
+ detection_map.update(bbox, gt_box, gt_label, difficult)
+ bbox_idx += bbox_num
+ det_nums += bbox_num
+ gt_nums += gt_box.shape[0]
+
+ im_id = int(im_ids[i])
+ for b in bbox:
+ clsid, score, xmin, ymin, xmax, ymax = b.tolist()
+ w = xmax - xmin + 1
+ h = ymax - ymin + 1
+ bbox = [xmin, ymin, w, h]
+ coco_res = {
+ 'image_id': im_id,
+ 'category_id': clsid2catid[clsid],
+ 'bbox': bbox,
+ 'score': score
+ }
+ xywh_res.append(coco_res)
+ else:
+ # gt_box, gt_label, difficult read as LoDTensor
+ gt_box_lengths = t['gt_box'][1][0]
+ bbox_idx = 0
+ gt_box_idx = 0
+ for i in range(len(bbox_lengths)):
+ bbox_num = bbox_lengths[i]
+ gt_box_num = gt_box_lengths[i]
+ bbox = bboxes[bbox_idx:bbox_idx + bbox_num]
+ gt_box = gt_boxes[gt_box_idx:gt_box_idx + gt_box_num]
+ gt_label = gt_labels[gt_box_idx:gt_box_idx + gt_box_num]
+ difficult = None if difficults is None else \
+ difficults[gt_box_idx: gt_box_idx + gt_box_num]
+ detection_map.update(bbox, gt_box, gt_label, difficult)
+ bbox_idx += bbox_num
+ gt_box_idx += gt_box_num
+
+ im_id = int(im_ids[i])
+ for b in bbox:
+ clsid, score, xmin, ymin, xmax, ymax = b.tolist()
+ w = xmax - xmin + 1
+ h = ymax - ymin + 1
+ bbox = [xmin, ymin, w, h]
+ coco_res = {
+ 'image_id': im_id,
+ 'category_id': clsid2catid[clsid],
+ 'bbox': bbox,
+ 'score': score
+ }
+ xywh_res.append(coco_res)
+
+ logging.debug("Accumulating evaluatation results...")
+ detection_map.accumulate()
+ map_stat = 100. * detection_map.get_map()
+ logging.debug("mAP({:.2f}, {}) = {:.2f}".format(overlap_thresh, map_type,
+ map_stat))
+ return map_stat, xywh_res
+
+
+def prune_zero_padding(gt_box, gt_label, difficult=None):
+ valid_cnt = 0
+ for i in range(len(gt_box)):
+ if gt_box[i, 0] == 0 and gt_box[i, 1] == 0 and \
+ gt_box[i, 2] == 0 and gt_box[i, 3] == 0:
+ break
+ valid_cnt += 1
+ return (gt_box[:valid_cnt], gt_label[:valid_cnt],
+ difficult[:valid_cnt] if difficult is not None else None)
+
+
+def bbox_area(bbox, is_bbox_normalized):
+ """
+ Calculate area of a bounding box
+ """
+ norm = 1. - float(is_bbox_normalized)
+ width = bbox[2] - bbox[0] + norm
+ height = bbox[3] - bbox[1] + norm
+ return width * height
+
+
+def jaccard_overlap(pred, gt, is_bbox_normalized=False):
+ """
+ Calculate jaccard overlap ratio between two bounding box
+ """
+ if pred[0] >= gt[2] or pred[2] <= gt[0] or \
+ pred[1] >= gt[3] or pred[3] <= gt[1]:
+ return 0.
+ inter_xmin = max(pred[0], gt[0])
+ inter_ymin = max(pred[1], gt[1])
+ inter_xmax = min(pred[2], gt[2])
+ inter_ymax = min(pred[3], gt[3])
+ inter_size = bbox_area([inter_xmin, inter_ymin, inter_xmax, inter_ymax],
+ is_bbox_normalized)
+ pred_size = bbox_area(pred, is_bbox_normalized)
+ gt_size = bbox_area(gt, is_bbox_normalized)
+ overlap = float(inter_size) / (pred_size + gt_size - inter_size)
+ return overlap
+
+
+class DetectionMAP(object):
+ """
+ Calculate detection mean average precision.
+ Currently support two types: 11point and integral
+
+ Args:
+ class_num (int): the class number.
+ overlap_thresh (float): The threshold of overlap
+ ratio between prediction bounding box and
+ ground truth bounding box for deciding
+ true/false positive. Default 0.5.
+ map_type (str): calculation method of mean average
+ precision, currently support '11point' and
+ 'integral'. Default '11point'.
+ is_bbox_normalized (bool): whther bounding boxes
+ is normalized to range[0, 1]. Default False.
+ evaluate_difficult (bool): whether to evaluate
+ difficult bounding boxes. Default False.
+ """
+
+ def __init__(self,
+ class_num,
+ overlap_thresh=0.5,
+ map_type='11point',
+ is_bbox_normalized=False,
+ evaluate_difficult=False):
+ self.class_num = class_num
+ self.overlap_thresh = overlap_thresh
+ assert map_type in ['11point', 'integral'], \
+ "map_type currently only support '11point' "\
+ "and 'integral'"
+ self.map_type = map_type
+ self.is_bbox_normalized = is_bbox_normalized
+ self.evaluate_difficult = evaluate_difficult
+ self.reset()
+
+ def update(self, bbox, gt_box, gt_label, difficult=None):
+ """
+ Update metric statics from given prediction and ground
+ truth infomations.
+ """
+ if difficult is None:
+ difficult = np.zeros_like(gt_label)
+
+ # record class gt count
+ for gtl, diff in zip(gt_label, difficult):
+ if self.evaluate_difficult or int(diff) == 0:
+ self.class_gt_counts[int(np.array(gtl))] += 1
+
+ # record class score positive
+ visited = [False] * len(gt_label)
+ for b in bbox:
+ label, score, xmin, ymin, xmax, ymax = b.tolist()
+ pred = [xmin, ymin, xmax, ymax]
+ max_idx = -1
+ max_overlap = -1.0
+ for i, gl in enumerate(gt_label):
+ if int(gl) == int(label):
+ overlap = jaccard_overlap(pred, gt_box[i],
+ self.is_bbox_normalized)
+ if overlap > max_overlap:
+ max_overlap = overlap
+ max_idx = i
+
+ if max_overlap > self.overlap_thresh:
+ if self.evaluate_difficult or \
+ int(np.array(difficult[max_idx])) == 0:
+ if not visited[max_idx]:
+ self.class_score_poss[int(label)].append([score, 1.0])
+ visited[max_idx] = True
+ else:
+ self.class_score_poss[int(label)].append([score, 0.0])
+ else:
+ self.class_score_poss[int(label)].append([score, 0.0])
+
+ def reset(self):
+ """
+ Reset metric statics
+ """
+ self.class_score_poss = [[] for _ in range(self.class_num)]
+ self.class_gt_counts = [0] * self.class_num
+ self.mAP = None
+ self.APs = [None] * self.class_num
+
+ def accumulate(self):
+ """
+ Accumulate metric results and calculate mAP
+ """
+ mAP = 0.
+ valid_cnt = 0
+ for id, (score_pos, count) in enumerate(
+ zip(self.class_score_poss, self.class_gt_counts)):
+ if count == 0: continue
+ if len(score_pos) == 0:
+ valid_cnt += 1
+ continue
+
+ accum_tp_list, accum_fp_list = \
+ self._get_tp_fp_accum(score_pos)
+ precision = []
+ recall = []
+ for ac_tp, ac_fp in zip(accum_tp_list, accum_fp_list):
+ precision.append(float(ac_tp) / (ac_tp + ac_fp))
+ recall.append(float(ac_tp) / count)
+
+ if self.map_type == '11point':
+ max_precisions = [0.] * 11
+ start_idx = len(precision) - 1
+ for j in range(10, -1, -1):
+ for i in range(start_idx, -1, -1):
+ if recall[i] < float(j) / 10.:
+ start_idx = i
+ if j > 0:
+ max_precisions[j - 1] = max_precisions[j]
+ break
+ else:
+ if max_precisions[j] < precision[i]:
+ max_precisions[j] = precision[i]
+ mAP += sum(max_precisions) / 11.
+ self.APs[id] = sum(max_precisions) / 11.
+ valid_cnt += 1
+ elif self.map_type == 'integral':
+ import math
+ ap = 0.
+ prev_recall = 0.
+ for i in range(len(precision)):
+ recall_gap = math.fabs(recall[i] - prev_recall)
+ if recall_gap > 1e-6:
+ ap += precision[i] * recall_gap
+ prev_recall = recall[i]
+ mAP += ap
+ self.APs[id] = sum(max_precisions) / 11.
+ valid_cnt += 1
+ else:
+ raise Exception("Unspported mAP type {}".format(self.map_type))
+
+ self.mAP = mAP / float(valid_cnt) if valid_cnt > 0 else mAP
+
+ def get_map(self):
+ """
+ Get mAP result
+ """
+ if self.mAP is None:
+ raise Exception("mAP is not calculated.")
+ return self.mAP
+
+ def _get_tp_fp_accum(self, score_pos_list):
+ """
+ Calculate accumulating true/false positive results from
+ [score, pos] records
+ """
+ sorted_list = sorted(score_pos_list, key=lambda s: s[0], reverse=True)
+ accum_tp = 0
+ accum_fp = 0
+ accum_tp_list = []
+ accum_fp_list = []
+ for (score, pos) in sorted_list:
+ accum_tp += int(pos)
+ accum_tp_list.append(accum_tp)
+ accum_fp += 1 - int(pos)
+ accum_fp_list.append(accum_fp)
+ return accum_tp_list, accum_fp_list
diff --git a/paddlex/cv/models/utils/pretrain_weights.py b/paddlex/cv/models/utils/pretrain_weights.py
new file mode 100644
index 0000000000000000000000000000000000000000..5ed806d8b7bdb63c260f4faa896affc5284a1391
--- /dev/null
+++ b/paddlex/cv/models/utils/pretrain_weights.py
@@ -0,0 +1,113 @@
+import paddlex
+import paddlehub as hub
+import os
+import os.path as osp
+
+image_pretrain = {
+ 'ResNet18':
+ 'https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_pretrained.tar',
+ 'ResNet34':
+ 'https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_pretrained.tar',
+ 'ResNet50':
+ 'http://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.tar',
+ 'ResNet101':
+ 'http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar',
+ 'ResNet50_vd':
+ 'https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar',
+ 'ResNet101_vd':
+ 'https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar',
+ 'MobileNetV1':
+ 'http://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.tar',
+ 'MobileNetV2_x1.0':
+ 'https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.tar',
+ 'MobileNetV2_x0.5':
+ 'https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_5_pretrained.tar',
+ 'MobileNetV2_x2.0':
+ 'https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x2_0_pretrained.tar',
+ 'MobileNetV2_x0.25':
+ 'https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_25_pretrained.tar',
+ 'MobileNetV2_x1.5':
+ 'https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x1_5_pretrained.tar',
+ 'MobileNetV3_small':
+ 'https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x1_0_pretrained.tar',
+ 'MobileNetV3_large':
+ 'https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_pretrained.tar',
+ 'DarkNet53':
+ 'https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_ImageNet1k_pretrained.tar',
+ 'DenseNet121':
+ 'https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet121_pretrained.tar',
+ 'DenseNet161':
+ 'https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet161_pretrained.tar',
+ 'DenseNet201':
+ 'https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet201_pretrained.tar',
+ 'DetResNet50':
+ 'https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_cos_pretrained.tar',
+ 'SegXception41':
+ 'https://paddle-imagenet-models-name.bj.bcebos.com/Xception41_deeplab_pretrained.tar',
+ 'SegXception65':
+ 'https://paddle-imagenet-models-name.bj.bcebos.com/Xception65_deeplab_pretrained.tar',
+ 'ShuffleNetV2':
+ 'https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_pretrained.tar',
+}
+
+coco_pretrain = {
+ 'UNet': 'https://paddleseg.bj.bcebos.com/models/unet_coco_v3.tgz'
+}
+
+
+def get_pretrain_weights(flag, model_type, backbone, save_dir):
+ if flag is None:
+ return None
+ elif osp.isdir(flag):
+ return flag
+ elif flag == 'IMAGENET':
+ new_save_dir = save_dir
+ if hasattr(paddlex, 'pretrain_dir'):
+ new_save_dir = paddlex.pretrain_dir
+ if backbone.startswith('Xception'):
+ backbone = 'Seg{}'.format(backbone)
+ elif backbone == 'MobileNetV2':
+ backbone = 'MobileNetV2_x1.0'
+ if model_type == 'detector':
+ if backbone == 'ResNet50':
+ backbone = 'DetResNet50'
+ assert backbone in image_pretrain, "There is not ImageNet pretrain weights for {}, you may try COCO.".format(
+ backbone)
+ try:
+ hub.download(backbone, save_path=new_save_dir)
+ except Exception as e:
+ if isinstance(hub.ResourceNotFoundError):
+ raise Exception(
+ "Resource for backbone {} not found".format(backbone))
+ elif isinstance(hub.ServerConnectionError):
+ raise Exception(
+ "Cannot get reource for backbone {}, please check your internet connecgtion"
+ .format(backbone))
+ else:
+ raise Exception(
+ "Unexpected error, please make sure paddlehub >= 1.6.2")
+ return osp.join(new_save_dir, backbone)
+ elif flag == 'COCO':
+ new_save_dir = save_dir
+ if hasattr(paddlex, 'pretrain_dir'):
+ new_save_dir = paddlex.pretrain_dir
+ assert backbone in coco_pretrain, "There is not COCO pretrain weights for {}, you may try ImageNet.".format(
+ backbone)
+ try:
+ hub.download(backbone, save_path=new_save_dir)
+ except Exception as e:
+ if isinstance(hub.ResourceNotFoundError):
+ raise Exception(
+ "Resource for backbone {} not found".format(backbone))
+ elif isinstance(hub.ServerConnectionError):
+ raise Exception(
+ "Cannot get reource for backbone {}, please check your internet connecgtion"
+ .format(backbone))
+ else:
+ raise Exception(
+ "Unexpected error, please make sure paddlehub >= 1.6.2")
+ return osp.join(new_save_dir, backbone)
+ else:
+ raise Exception(
+ "pretrain_weights need to be defined as directory path or `IMAGENET` or 'COCO' (download pretrain weights automatically)."
+ )
diff --git a/paddlex/cv/models/utils/seg_eval.py b/paddlex/cv/models/utils/seg_eval.py
new file mode 100644
index 0000000000000000000000000000000000000000..745f75a48064e3b90902e0a0d48764db7deeba17
--- /dev/null
+++ b/paddlex/cv/models/utils/seg_eval.py
@@ -0,0 +1,144 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import sys
+import numpy as np
+from scipy.sparse import csr_matrix
+
+
+class ConfusionMatrix(object):
+ """
+ Confusion Matrix for segmentation evaluation
+ """
+
+ def __init__(self, num_classes=2, streaming=False):
+ self.confusion_matrix = np.zeros([num_classes, num_classes],
+ dtype='int64')
+ self.num_classes = num_classes
+ self.streaming = streaming
+
+ def calculate(self, pred, label, ignore=None):
+ # If not in streaming mode, clear matrix everytime when call `calculate`
+ if not self.streaming:
+ self.zero_matrix()
+
+ label = np.transpose(label, (0, 2, 3, 1))
+ ignore = np.transpose(ignore, (0, 2, 3, 1))
+ mask = np.array(ignore) == 1
+
+ label = np.asarray(label)[mask]
+ pred = np.asarray(pred)[mask]
+ one = np.ones_like(pred)
+ # Accumuate ([row=label, col=pred], 1) into sparse matrix
+ spm = csr_matrix((one, (label, pred)),
+ shape=(self.num_classes, self.num_classes))
+ spm = spm.todense()
+ self.confusion_matrix += spm
+
+ def zero_matrix(self):
+ """ Clear confusion matrix """
+ self.confusion_matrix = np.zeros([self.num_classes, self.num_classes],
+ dtype='int64')
+
+ def mean_iou(self):
+ iou_list = []
+ avg_iou = 0
+ # TODO: use numpy sum axis api to simpliy
+ vji = np.zeros(self.num_classes, dtype=int)
+ vij = np.zeros(self.num_classes, dtype=int)
+ for j in range(self.num_classes):
+ v_j = 0
+ for i in range(self.num_classes):
+ v_j += self.confusion_matrix[j][i]
+ vji[j] = v_j
+
+ for i in range(self.num_classes):
+ v_i = 0
+ for j in range(self.num_classes):
+ v_i += self.confusion_matrix[j][i]
+ vij[i] = v_i
+
+ for c in range(self.num_classes):
+ total = vji[c] + vij[c] - self.confusion_matrix[c][c]
+ if total == 0:
+ iou = 0
+ else:
+ iou = float(self.confusion_matrix[c][c]) / total
+ avg_iou += iou
+ iou_list.append(iou)
+ avg_iou = float(avg_iou) / float(self.num_classes)
+ return np.array(iou_list), avg_iou
+
+ def accuracy(self):
+ total = self.confusion_matrix.sum()
+ total_right = 0
+ for c in range(self.num_classes):
+ total_right += self.confusion_matrix[c][c]
+ if total == 0:
+ avg_acc = 0
+ else:
+ avg_acc = float(total_right) / total
+
+ vij = np.zeros(self.num_classes, dtype=int)
+ for i in range(self.num_classes):
+ v_i = 0
+ for j in range(self.num_classes):
+ v_i += self.confusion_matrix[j][i]
+ vij[i] = v_i
+
+ acc_list = []
+ for c in range(self.num_classes):
+ if vij[c] == 0:
+ acc = 0
+ else:
+ acc = self.confusion_matrix[c][c] / float(vij[c])
+ acc_list.append(acc)
+ return np.array(acc_list), avg_acc
+
+ def kappa(self):
+ vji = np.zeros(self.num_classes)
+ vij = np.zeros(self.num_classes)
+ for j in range(self.num_classes):
+ v_j = 0
+ for i in range(self.num_classes):
+ v_j += self.confusion_matrix[j][i]
+ vji[j] = v_j
+
+ for i in range(self.num_classes):
+ v_i = 0
+ for j in range(self.num_classes):
+ v_i += self.confusion_matrix[j][i]
+ vij[i] = v_i
+
+ total = self.confusion_matrix.sum()
+
+ # avoid spillovers
+ # TODO: is it reasonable to hard code 10000.0?
+ total = float(total) / 10000.0
+ vji = vji / 10000.0
+ vij = vij / 10000.0
+
+ tp = 0
+ tc = 0
+ for c in range(self.num_classes):
+ tp += vji[c] * vij[c]
+ tc += self.confusion_matrix[c][c]
+
+ tc = tc / 10000.0
+ pe = tp / (total * total)
+ po = tc / total
+
+ kappa = (po - pe) / (1 - pe)
+ return kappa
diff --git a/paddlex/cv/models/utils/visualize.py b/paddlex/cv/models/utils/visualize.py
new file mode 100644
index 0000000000000000000000000000000000000000..de96090fa1f30aff4b04b34721c073c128602a25
--- /dev/null
+++ b/paddlex/cv/models/utils/visualize.py
@@ -0,0 +1,162 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import os
+import cv2
+import numpy as np
+from PIL import Image, ImageDraw
+
+
+def visualize_detection(image, result, threshold=0.5, save_dir=None):
+ """
+ Visualize bbox and mask results
+ """
+
+ image_name = os.path.split(image)[-1]
+ image = Image.open(image).convert('RGB')
+ image = draw_bbox_mask(image, results, threshold=threshold)
+ if save_dir is not None:
+ if not os.path.exists(save_dir):
+ os.makedirs(save_dir)
+ out_path = os.path.join(save_dir, 'visualize_{}'.format(image_name))
+ image.save(out_path, quality=95)
+ else:
+ return image
+
+
+def visualize_segmentation(image, result, weight=0.6, save_dir=None):
+ """
+ Convert segment result to color image, and save added image.
+ Args:
+ image: the path of origin image
+ result: the predict result of image
+ weight: the image weight of visual image, and the result weight is (1 - weight)
+ save_dir: the directory for saving visual image
+ """
+ label_map = result['label_map']
+ color_map = get_color_map_list(256)
+ color_map = np.array(color_map).astype("uint8")
+ # Use OpenCV LUT for color mapping
+ c1 = cv2.LUT(label_map, color_map[:, 0])
+ c2 = cv2.LUT(label_map, color_map[:, 1])
+ c3 = cv2.LUT(label_map, color_map[:, 2])
+ pseudo_img = np.dstack((c1, c2, c3))
+
+ im = cv2.imread(image)
+ vis_result = cv2.addWeighted(im, weight, pseudo_img, 1 - weight, 0)
+
+ if save_dir is not None:
+ if not os.path.exists(save_dir):
+ os.makedirs(save_dir)
+ image_name = os.path.split(image)[-1]
+ out_path = os.path.join(save_dir, 'visualize_{}'.format(image_name))
+ cv2.imwrite(out_path, vis_result)
+ else:
+ return vis_result
+
+
+def get_color_map_list(num_classes):
+ """ Returns the color map for visualizing the segmentation mask,
+ which can support arbitrary number of classes.
+ Args:
+ num_classes: Number of classes
+ Returns:
+ The color map
+ """
+ color_map = num_classes * [0, 0, 0]
+ for i in range(0, num_classes):
+ j = 0
+ lab = i
+ while lab:
+ color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
+ color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
+ color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
+ j += 1
+ lab >>= 3
+ color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
+ return color_map
+
+
+# expand an array of boxes by a given scale.
+def expand_boxes(boxes, scale):
+ """
+ """
+ w_half = (boxes[:, 2] - boxes[:, 0]) * .5
+ h_half = (boxes[:, 3] - boxes[:, 1]) * .5
+ x_c = (boxes[:, 2] + boxes[:, 0]) * .5
+ y_c = (boxes[:, 3] + boxes[:, 1]) * .5
+
+ w_half *= scale
+ h_half *= scale
+
+ boxes_exp = np.zeros(boxes.shape)
+ boxes_exp[:, 0] = x_c - w_half
+ boxes_exp[:, 2] = x_c + w_half
+ boxes_exp[:, 1] = y_c - h_half
+ boxes_exp[:, 3] = y_c + h_half
+
+ return boxes_exp
+
+
+def clip_bbox(bbox):
+ xmin = max(min(bbox[0], 1.), 0.)
+ ymin = max(min(bbox[1], 1.), 0.)
+ xmax = max(min(bbox[2], 1.), 0.)
+ ymax = max(min(bbox[3], 1.), 0.)
+ return xmin, ymin, xmax, ymax
+
+
+def draw_bbox_mask(image, results, threshold=0.5, alpha=0.7):
+ labels = list()
+ for dt in np.array(results):
+ if dt['category'] not in labels:
+ labels.append(dt['category'])
+ color_map = get_color_map_list(len(labels))
+
+ for dt in np.array(results):
+ cname, bbox, score = dt['category'], dt['bbox'], dt['score']
+ if score < threshold:
+ continue
+
+ xmin, ymin, w, h = bbox
+ xmax = xmin + w
+ ymax = ymin + h
+
+ color = tuple(color_map[labels.index(cname)])
+
+ # draw bbox
+ draw = ImageDraw.Draw(image)
+ draw.line([(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
+ (xmin, ymin)],
+ width=2,
+ fill=color)
+
+ # draw label
+ text = "{} {:.2f}".format(cname, score)
+ tw, th = draw.textsize(text)
+ draw.rectangle([(xmin + 1, ymin - th), (xmin + tw + 1, ymin)],
+ fill=color)
+ draw.text((xmin + 1, ymin - th), text, fill=(255, 255, 255))
+
+ # draw mask
+ if 'mask' in dt:
+ mask = dt['mask']
+ color_mask = np.array(color_map[labels.index(
+ dt['category'])]).astype('float32')
+ img_array = np.array(image).astype('float32')
+ idx = np.nonzero(mask)
+ img_array[idx[0], idx[1], :] *= 1.0 - alpha
+ img_array[idx[0], idx[1], :] += alpha * color_mask
+ image = Image.fromarray(img_array.astype('uint8'))
+ return image
diff --git a/paddlex/cv/models/yolo_v3.py b/paddlex/cv/models/yolo_v3.py
new file mode 100644
index 0000000000000000000000000000000000000000..80205238ff181be75f76fdbf32b8a1e99c497c9c
--- /dev/null
+++ b/paddlex/cv/models/yolo_v3.py
@@ -0,0 +1,371 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+from __future__ import absolute_import
+import math
+import tqdm
+import os.path as osp
+import numpy as np
+import paddle.fluid as fluid
+import paddlex.utils.logging as logging
+import paddlex
+from .base import BaseAPI
+from collections import OrderedDict
+from .utils.detection_eval import eval_results, bbox2out
+import copy
+
+
+class YOLOv3(BaseAPI):
+ """构建YOLOv3,并实现其训练、评估、预测和模型导出。
+
+ Args:
+ num_classes (int): 类别数。默认为80。
+ backbone (str): YOLOv3的backbone网络,取值范围为['DarkNet53',
+ 'ResNet34', 'MobileNetV1', 'MobileNetV3_large']。默认为'MobileNetV1'。
+ anchors (list|tuple): anchor框的宽度和高度,为None时表示使用默认值
+ [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
+ [59, 119], [116, 90], [156, 198], [373, 326]]。
+ anchor_masks (list|tuple): 在计算YOLOv3损失时,使用anchor的mask索引,为None时表示使用默认值
+ [[6, 7, 8], [3, 4, 5], [0, 1, 2]]。
+ ignore_threshold (float): 在计算YOLOv3损失时,IoU大于`ignore_threshold`的预测框的置信度被忽略。默认为0.7。
+ nms_score_threshold (float): 检测框的置信度得分阈值,置信度得分低于阈值的框应该被忽略。默认为0.01。
+ nms_topk (int): 进行NMS时,根据置信度保留的最大检测框数。默认为1000。
+ nms_keep_topk (int): 进行NMS后,每个图像要保留的总检测框数。默认为100。
+ nms_iou_threshold (float): 进行NMS时,用于剔除检测框IOU的阈值。默认为0.45。
+ label_smooth (bool): 是否使用label smooth。默认值为False。
+ train_random_shapes (list|tuple): 训练时从列表中随机选择图像大小。默认值为[320, 352, 384, 416, 448, 480, 512, 544, 576, 608]。
+ """
+
+ def __init__(self,
+ num_classes=80,
+ backbone='MobileNetV1',
+ anchors=None,
+ anchor_masks=None,
+ ignore_threshold=0.7,
+ nms_score_threshold=0.01,
+ nms_topk=1000,
+ nms_keep_topk=100,
+ nms_iou_threshold=0.45,
+ label_smooth=False,
+ train_random_shapes=[
+ 320, 352, 384, 416, 448, 480, 512, 544, 576, 608
+ ]):
+ self.init_params = locals()
+ super(YOLOv3, self).__init__('detector')
+ backbones = [
+ 'DarkNet53', 'ResNet34', 'MobileNetV1', 'MobileNetV3_large'
+ ]
+ assert backbone in backbones, "backbone should be one of {}".format(
+ backbones)
+ self.backbone = backbone
+ self.num_classes = num_classes
+ self.anchors = anchors
+ self.anchor_masks = anchor_masks
+ self.ignore_threshold = ignore_threshold
+ self.nms_score_threshold = nms_score_threshold
+ self.nms_topk = nms_topk
+ self.nms_keep_topk = nms_keep_topk
+ self.nms_iou_threshold = nms_iou_threshold
+ self.label_smooth = label_smooth
+ self.sync_bn = True
+ self.train_random_shapes = train_random_shapes
+
+ def _get_backbone(self, backbone_name):
+ if backbone_name == 'DarkNet53':
+ backbone = paddlex.cv.nets.DarkNet(norm_type='sync_bn')
+ elif backbone_name == 'ResNet34':
+ backbone = paddlex.cv.nets.ResNet(
+ norm_type='sync_bn',
+ layers=34,
+ freeze_norm=False,
+ norm_decay=0.,
+ feature_maps=[3, 4, 5],
+ freeze_at=0)
+ elif backbone_name == 'MobileNetV1':
+ backbone = paddlex.cv.nets.MobileNetV1(norm_type='sync_bn')
+ elif backbone_name.startswith('MobileNetV3'):
+ models_name = backbone_name.split('_')[1]
+ backbone = paddlex.cv.nets.MobileNetV3(
+ norm_type='sync_bn', models_name=models_name)
+ return backbone
+
+ def build_net(self, mode='train'):
+ model = paddlex.cv.nets.detection.YOLOv3(
+ backbone=self._get_backbone(self.backbone),
+ num_classes=self.num_classes,
+ mode=mode,
+ anchors=self.anchors,
+ anchor_masks=self.anchor_masks,
+ ignore_threshold=self.ignore_threshold,
+ label_smooth=self.label_smooth,
+ nms_score_threshold=self.nms_score_threshold,
+ nms_topk=self.nms_topk,
+ nms_keep_topk=self.nms_keep_topk,
+ nms_iou_threshold=self.nms_iou_threshold,
+ train_random_shapes=self.train_random_shapes)
+ inputs = model.generate_inputs()
+ model_out = model.build_net(inputs)
+ outputs = OrderedDict([('bbox', model_out)])
+ if mode == 'train':
+ self.optimizer.minimize(model_out)
+ outputs = OrderedDict([('loss', model_out)])
+ return inputs, outputs
+
+ def default_optimizer(self, learning_rate, warmup_steps, warmup_start_lr,
+ lr_decay_epochs, lr_decay_gamma,
+ num_steps_each_epoch):
+ if warmup_steps > lr_decay_epochs[0] * num_steps_each_epoch:
+ raise Exception("warmup_steps should less than {}".format(
+ lr_decay_epochs[0] * num_steps_each_epoch))
+ boundaries = [b * num_steps_each_epoch for b in lr_decay_epochs]
+ values = [(lr_decay_gamma**i) * learning_rate
+ for i in range(len(lr_decay_epochs) + 1)]
+ lr_decay = fluid.layers.piecewise_decay(
+ boundaries=boundaries, values=values)
+ lr_warmup = fluid.layers.linear_lr_warmup(
+ learning_rate=lr_decay,
+ warmup_steps=warmup_steps,
+ start_lr=warmup_start_lr,
+ end_lr=learning_rate)
+ optimizer = fluid.optimizer.Momentum(
+ learning_rate=lr_warmup,
+ momentum=0.9,
+ regularization=fluid.regularizer.L2DecayRegularizer(5e-04))
+ return optimizer
+
+ def train(self,
+ num_epochs,
+ train_dataset,
+ train_batch_size=8,
+ eval_dataset=None,
+ save_interval_epochs=20,
+ log_interval_steps=2,
+ save_dir='output',
+ pretrain_weights='IMAGENET',
+ optimizer=None,
+ learning_rate=1.0 / 8000,
+ warmup_steps=1000,
+ warmup_start_lr=0.0,
+ lr_decay_epochs=[213, 240],
+ lr_decay_gamma=0.1,
+ metric=None,
+ use_vdl=False,
+ sensitivities_file=None,
+ eval_metric_loss=0.05):
+ """训练。
+
+ Args:
+ num_epochs (int): 训练迭代轮数。
+ train_dataset (paddlex.datasets): 训练数据读取器。
+ train_batch_size (int): 训练数据batch大小。目前检测仅支持单卡评估,训练数据batch大小与显卡
+ 数量之商为验证数据batch大小。默认值为8。
+ eval_dataset (paddlex.datasets): 验证数据读取器。
+ save_interval_epochs (int): 模型保存间隔(单位:迭代轮数)。默认为20。
+ log_interval_steps (int): 训练日志输出间隔(单位:迭代次数)。默认为10。
+ save_dir (str): 模型保存路径。默认值为'output'。
+ pretrain_weights (str): 若指定为路径时,则加载路径下预训练模型;若为字符串'IMAGENET',
+ 则自动下载在ImageNet图片数据上预训练的模型权重;若为None,则不使用预训练模型。默认为None。
+ optimizer (paddle.fluid.optimizer): 优化器。当该参数为None时,使用默认优化器:
+ fluid.layers.piecewise_decay衰减策略,fluid.optimizer.Momentum优化方法。
+ learning_rate (float): 默认优化器的学习率。默认为1.0/8000。
+ warmup_steps (int): 默认优化器进行warmup过程的步数。默认为1000。
+ warmup_start_lr (int): 默认优化器warmup的起始学习率。默认为0.0。
+ lr_decay_epochs (list): 默认优化器的学习率衰减轮数。默认为[213, 240]。
+ lr_decay_gamma (float): 默认优化器的学习率衰减率。默认为0.1。
+ metric (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认值为None。
+ use_vdl (bool): 是否使用VisualDL进行可视化。默认值为False。
+ sensitivities_file (str): 若指定为路径时,则加载路径下敏感度信息进行裁剪;若为字符串'DEFAULT',
+ 则自动下载在ImageNet图片数据上获得的敏感度信息进行裁剪;若为None,则不进行裁剪。默认为None。
+ eval_metric_loss (float): 可容忍的精度损失。默认为0.05。
+
+ Raises:
+ ValueError: 评估类型不在指定列表中。
+ ValueError: 模型从inference model进行加载。
+ """
+ if not self.trainable:
+ raise ValueError(
+ "Model is not trainable since it was loaded from a inference model."
+ )
+ if metric is None:
+ if isinstance(train_dataset, paddlex.datasets.CocoDetection):
+ metric = 'COCO'
+ elif isinstance(train_dataset, paddlex.datasets.VOCDetection):
+ metric = 'VOC'
+ else:
+ raise ValueError(
+ "train_dataset should be datasets.VOCDetection or datasets.COCODetection."
+ )
+ assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
+ self.metric = metric
+
+ self.labels = train_dataset.labels
+ # 构建训练网络
+ if optimizer is None:
+ # 构建默认的优化策略
+ num_steps_each_epoch = train_dataset.num_samples // train_batch_size
+ optimizer = self.default_optimizer(
+ learning_rate=learning_rate,
+ warmup_steps=warmup_steps,
+ warmup_start_lr=warmup_start_lr,
+ lr_decay_epochs=lr_decay_epochs,
+ lr_decay_gamma=lr_decay_gamma,
+ num_steps_each_epoch=num_steps_each_epoch)
+ self.optimizer = optimizer
+ # 构建训练、验证、预测网络
+ self.build_program()
+ # 初始化网络权重
+ self.net_initialize(
+ startup_prog=fluid.default_startup_program(),
+ pretrain_weights=pretrain_weights,
+ save_dir=save_dir,
+ sensitivities_file=sensitivities_file,
+ eval_metric_loss=eval_metric_loss)
+ # 训练
+ self.train_loop(
+ num_epochs=num_epochs,
+ train_dataset=train_dataset,
+ train_batch_size=train_batch_size,
+ eval_dataset=eval_dataset,
+ save_interval_epochs=save_interval_epochs,
+ log_interval_steps=log_interval_steps,
+ save_dir=save_dir,
+ use_vdl=use_vdl)
+
+ def evaluate(self,
+ eval_dataset,
+ batch_size=1,
+ epoch_id=None,
+ metric=None,
+ return_details=False):
+ """评估。
+
+ Args:
+ eval_dataset (paddlex.datasets): 验证数据读取器。
+ batch_size (int): 验证数据批大小。默认为1。
+ epoch_id (int): 当前评估模型所在的训练轮数。
+ metric (bool): 训练过程中评估的方式,取值范围为['COCO', 'VOC']。默认为None,
+ 根据用户传入的Dataset自动选择,如为VOCDetection,则metric为'VOC';
+ 如为COCODetection,则metric为'COCO'。
+ return_details (bool): 是否返回详细信息。
+
+ Returns:
+ tuple (metrics, eval_details) | dict (metrics): 当return_details为True时,返回(metrics, eval_details),
+ 当return_details为False时,返回metrics。metrics为dict,包含关键字:'bbox_mmap'或者’bbox_map‘,
+ 分别表示平均准确率平均值在各个IoU阈值下的结果取平均值的结果(mmAP)、平均准确率平均值(mAP)。
+ eval_details为dict,包含关键字:'bbox',对应元素预测结果列表,每个预测结果由图像id、
+ 预测框类别id、预测框坐标、预测框得分;’gt‘:真实标注框相关信息。
+ """
+ self.arrange_transforms(
+ transforms=eval_dataset.transforms, mode='eval')
+ if metric is None:
+ if hasattr(self, 'metric') and self.metric is not None:
+ metric = self.metric
+ else:
+ if isinstance(eval_dataset, paddlex.datasets.CocoDetection):
+ metric = 'COCO'
+ elif isinstance(eval_dataset, paddlex.datasets.VOCDetection):
+ metric = 'VOC'
+ else:
+ raise Exception(
+ "eval_dataset should be datasets.VOCDetection or datasets.COCODetection."
+ )
+ assert metric in ['COCO', 'VOC'], "Metric only support 'VOC' or 'COCO'"
+
+ total_steps = math.ceil(eval_dataset.num_samples * 1.0 / batch_size)
+ results = list()
+
+ data_generator = eval_dataset.generator(
+ batch_size=batch_size, drop_last=False)
+ logging.info(
+ "Start to evaluating(total_samples={}, total_steps={})...".format(
+ eval_dataset.num_samples, total_steps))
+ for step, data in tqdm.tqdm(
+ enumerate(data_generator()), total=total_steps):
+ images = np.array([d[0] for d in data])
+ im_sizes = np.array([d[1] for d in data])
+ feed_data = {'image': images, 'im_size': im_sizes}
+ outputs = self.exe.run(
+ self.test_prog,
+ feed=[feed_data],
+ fetch_list=list(self.test_outputs.values()),
+ return_numpy=False)
+ res = {
+ 'bbox': (np.array(outputs[0]),
+ outputs[0].recursive_sequence_lengths())
+ }
+ res_id = [np.array([d[2]]) for d in data]
+ res['im_id'] = (res_id, [])
+ if metric == 'VOC':
+ res_gt_box = [d[3].reshape(-1, 4) for d in data]
+ res_gt_label = [d[4].reshape(-1, 1) for d in data]
+ res_is_difficult = [d[5].reshape(-1, 1) for d in data]
+ res_id = [np.array([d[2]]) for d in data]
+ res['gt_box'] = (res_gt_box, [])
+ res['gt_label'] = (res_gt_label, [])
+ res['is_difficult'] = (res_is_difficult, [])
+ results.append(res)
+ logging.debug("[EVAL] Epoch={}, Step={}/{}".format(
+ epoch_id, step + 1, total_steps))
+ box_ap_stats, eval_details = eval_results(
+ results, metric, eval_dataset.coco_gt, with_background=False)
+ evaluate_metrics = OrderedDict(
+ zip(['bbox_mmap' if metric == 'COCO' else 'bbox_map'],
+ box_ap_stats))
+ if return_details:
+ return evaluate_metrics, eval_details
+ return evaluate_metrics
+
+ def predict(self, img_file, transforms=None):
+ """预测。
+
+ Args:
+ img_file (str): 预测图像路径。
+ transforms (paddlex.det.transforms): 数据预处理操作。
+
+ Returns:
+ list: 预测结果列表,每个预测结果由预测框类别标签、
+ 预测框类别名称、预测框坐标、预测框得分组成。
+ """
+ if transforms is None and not hasattr(self, 'test_transforms'):
+ raise Exception("transforms need to be defined, now is None.")
+ if transforms is not None:
+ self.arrange_transforms(transforms=transforms, mode='test')
+ im, im_size = transforms(img_file)
+ else:
+ self.arrange_transforms(
+ transforms=self.test_transforms, mode='test')
+ im, im_size = self.test_transforms(img_file)
+ im = np.expand_dims(im, axis=0)
+ im_size = np.expand_dims(im_size, axis=0)
+ outputs = self.exe.run(
+ self.test_prog,
+ feed={
+ 'image': im,
+ 'im_size': im_size
+ },
+ fetch_list=list(self.test_outputs.values()),
+ return_numpy=False)
+ res = {
+ k: (np.array(v), v.recursive_sequence_lengths())
+ for k, v in zip(list(self.test_outputs.keys()), outputs)
+ }
+ res['im_id'] = (np.array([[0]]).astype('int32'), [])
+ clsid2catid = dict({i: i for i in range(self.num_classes)})
+ xywh_results = bbox2out([res], clsid2catid)
+ results = list()
+ for xywh_res in xywh_results:
+ del xywh_res['image_id']
+ xywh_res['category'] = self.labels[xywh_res['category_id']]
+ results.append(xywh_res)
+ return results
diff --git a/paddlex/cv/nets/__init__.py b/paddlex/cv/nets/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..29203dfb0fc13738941050d1dab94d274aeced73
--- /dev/null
+++ b/paddlex/cv/nets/__init__.py
@@ -0,0 +1,115 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .resnet import ResNet
+from .darknet import DarkNet
+from .detection import FasterRCNN
+from .mobilenet_v1 import MobileNetV1
+from .mobilenet_v2 import MobileNetV2
+from .mobilenet_v3 import MobileNetV3
+from .segmentation import UNet
+from .segmentation import DeepLabv3p
+from .xception import Xception
+from .densenet import DenseNet
+from .shufflenet_v2 import ShuffleNetV2
+
+
+def resnet18(input, num_classes=1000):
+ model = ResNet(layers=18, num_classes=num_classes)
+ return model(input)
+
+
+def resnet34(input, num_classes=1000):
+ model = ResNet(layers=34, num_classes=num_classes)
+ return model(input)
+
+
+def resnet50(input, num_classes=1000):
+ model = ResNet(layers=50, num_classes=num_classes)
+ return model(input)
+
+
+def resnet101(input, num_classes=1000):
+ model = ResNet(layers=101, num_classes=num_classes)
+ return model(input)
+
+
+def resnet50_vd(input, num_classes=1000):
+ model = ResNet(layers=50, num_classes=num_classes, variant='d')
+ return model(input)
+
+
+def resnet101_vd(input, num_classes=1000):
+ model = ResNet(layers=101, num_classes=num_classes, variant='d')
+ return model(input)
+
+
+def darknet53(input, num_classes=1000):
+ model = DarkNet(depth=53, num_classes=num_classes, bn_act='relu')
+ return model(input)
+
+
+def mobilenetv1(input, num_classes=1000):
+ model = MobileNetV1(num_classes=num_classes)
+ return model(input)
+
+
+def mobilenetv2(input, num_classes=1000):
+ model = MobileNetV2(num_classes=num_classes)
+ return model(input)
+
+
+def mobilenetv3_small(input, num_classes=1000):
+ model = MobileNetV3(num_classes=num_classes, model_name='small')
+ return model(input)
+
+
+def mobilenetv3_large(input, num_classes=1000):
+ model = MobileNetV3(num_classes=num_classes, model_name='large')
+ return model(input)
+
+
+def xception65(input, num_classes=1000):
+ model = Xception(layers=65, num_classes=num_classes)
+ return model(input)
+
+
+def xception71(input, num_classes=1000):
+ model = Xception(layers=71, num_classes=num_classes)
+ return model(input)
+
+
+def xception41(input, num_classes=1000):
+ model = Xception(layers=41, num_classes=num_classes)
+ return model(input)
+
+
+def densenet121(input, num_classes=1000):
+ model = DenseNet(layers=121, num_classes=num_classes)
+ return model(input)
+
+
+def densenet161(input, num_classes=1000):
+ model = DenseNet(layers=161, num_classes=num_classes)
+ return model(input)
+
+
+def densenet201(input, num_classes=1000):
+ model = DenseNet(layers=201, num_classes=num_classes)
+ return model(input)
+
+
+def shufflenetv2(input, num_classes=1000):
+ model = ShuffleNetV2(num_classes=num_classes)
+ return model(input)
diff --git a/paddlex/cv/nets/backbone_utils.py b/paddlex/cv/nets/backbone_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..454be850a0c54d1d0bca63655eccaee662967e61
--- /dev/null
+++ b/paddlex/cv/nets/backbone_utils.py
@@ -0,0 +1,73 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+class NameAdapter(object):
+ """Fix the backbones variable names for pretrained weight"""
+
+ def __init__(self, model):
+ super(NameAdapter, self).__init__()
+ self.model = model
+
+ @property
+ def model_type(self):
+ return getattr(self.model, '_model_type', '')
+
+ @property
+ def variant(self):
+ return getattr(self.model, 'variant', '')
+
+ def fix_conv_norm_name(self, name):
+ if name == "conv1":
+ bn_name = "bn_" + name
+ else:
+ bn_name = "bn" + name[3:]
+ # the naming rule is same as pretrained weight
+ if self.model_type == 'SEResNeXt':
+ bn_name = name + "_bn"
+ return bn_name
+
+ def fix_shortcut_name(self, name):
+ if self.model_type == 'SEResNeXt':
+ name = 'conv' + name + '_prj'
+ return name
+
+ def fix_bottleneck_name(self, name):
+ if self.model_type == 'SEResNeXt':
+ conv_name1 = 'conv' + name + '_x1'
+ conv_name2 = 'conv' + name + '_x2'
+ conv_name3 = 'conv' + name + '_x3'
+ shortcut_name = name
+ else:
+ conv_name1 = name + "_branch2a"
+ conv_name2 = name + "_branch2b"
+ conv_name3 = name + "_branch2c"
+ shortcut_name = name + "_branch1"
+ return conv_name1, conv_name2, conv_name3, shortcut_name
+
+ def fix_layer_warp_name(self, stage_num, count, i):
+ name = 'res' + str(stage_num)
+ if count > 10 and stage_num == 4:
+ if i == 0:
+ conv_name = name + "a"
+ else:
+ conv_name = name + "b" + str(i)
+ else:
+ conv_name = name + chr(ord("a") + i)
+ if self.model_type == 'SEResNeXt':
+ conv_name = str(stage_num + 2) + '_' + str(i + 1)
+ return conv_name
+
+ def fix_c1_stage_name(self):
+ return "res_conv1" if self.model_type == 'ResNeXt' else "conv1"
diff --git a/paddlex/cv/nets/darknet.py b/paddlex/cv/nets/darknet.py
new file mode 100644
index 0000000000000000000000000000000000000000..7ba516836ce6a87fbe277f6e1ec824d906443990
--- /dev/null
+++ b/paddlex/cv/nets/darknet.py
@@ -0,0 +1,187 @@
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import six
+import math
+
+from paddle import fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.regularizer import L2Decay
+
+
+class DarkNet(object):
+ """
+ DarkNet, see https://pjreddie.com/darknet/yolo/
+ Args:
+ depth (int): network depth, currently only darknet 53 is supported
+ norm_type (str): normalization type, 'bn' and 'sync_bn' are supported
+ norm_decay (float): weight decay for normalization layer weights
+ """
+
+ def __init__(self,
+ depth=53,
+ num_classes=None,
+ norm_type='bn',
+ norm_decay=0.,
+ bn_act='leaky',
+ weight_prefix_name=''):
+ assert depth in [53], "unsupported depth value"
+ self.depth = depth
+ self.num_classes = num_classes
+ self.norm_type = norm_type
+ self.norm_decay = norm_decay
+ self.depth_cfg = {53: ([1, 2, 8, 8, 4], self.basicblock)}
+ self.bn_act = bn_act
+ self.prefix_name = weight_prefix_name
+
+ def _conv_norm(self,
+ input,
+ ch_out,
+ filter_size,
+ stride,
+ padding,
+ act='leaky',
+ name=None):
+ conv = fluid.layers.conv2d(
+ input=input,
+ num_filters=ch_out,
+ filter_size=filter_size,
+ stride=stride,
+ padding=padding,
+ act=None,
+ param_attr=ParamAttr(name=name + ".conv.weights"),
+ bias_attr=False)
+
+ bn_name = name + ".bn"
+
+ bn_param_attr = ParamAttr(
+ regularizer=L2Decay(float(self.norm_decay)),
+ name=bn_name + '.scale')
+ bn_bias_attr = ParamAttr(
+ regularizer=L2Decay(float(self.norm_decay)),
+ name=bn_name + '.offset')
+
+ out = fluid.layers.batch_norm(
+ input=conv,
+ param_attr=bn_param_attr,
+ bias_attr=bn_bias_attr,
+ moving_mean_name=bn_name + '.mean',
+ moving_variance_name=bn_name + '.var')
+
+ # leaky relu here has `alpha` as 0.1, can not be set by
+ # `act` param in fluid.layers.batch_norm above.
+ if act == 'leaky':
+ out = fluid.layers.leaky_relu(x=out, alpha=0.1)
+ if act == 'relu':
+ out = fluid.layers.relu(x=out)
+ return out
+
+ def _downsample(self,
+ input,
+ ch_out,
+ filter_size=3,
+ stride=2,
+ padding=1,
+ name=None):
+ return self._conv_norm(
+ input,
+ ch_out=ch_out,
+ filter_size=filter_size,
+ stride=stride,
+ padding=padding,
+ act=self.bn_act,
+ name=name)
+
+ def basicblock(self, input, ch_out, name=None):
+ conv1 = self._conv_norm(
+ input,
+ ch_out=ch_out,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ act=self.bn_act,
+ name=name + ".0")
+ conv2 = self._conv_norm(
+ conv1,
+ ch_out=ch_out * 2,
+ filter_size=3,
+ stride=1,
+ padding=1,
+ act=self.bn_act,
+ name=name + ".1")
+ out = fluid.layers.elementwise_add(x=input, y=conv2, act=None)
+ return out
+
+ def layer_warp(self, block_func, input, ch_out, count, name=None):
+ out = block_func(input, ch_out=ch_out, name='{}.0'.format(name))
+ for j in six.moves.xrange(1, count):
+ out = block_func(out, ch_out=ch_out, name='{}.{}'.format(name, j))
+ return out
+
+ def __call__(self, input):
+ """
+ Get the backbone of DarkNet, that is output for the 5 stages.
+
+ Args:
+ input (Variable): input variable.
+
+ Returns:
+ The last variables of each stage.
+ """
+ stages, block_func = self.depth_cfg[self.depth]
+ stages = stages[0:5]
+ conv = self._conv_norm(
+ input=input,
+ ch_out=32,
+ filter_size=3,
+ stride=1,
+ padding=1,
+ act=self.bn_act,
+ name=self.prefix_name + "yolo_input")
+ downsample_ = self._downsample(
+ input=conv,
+ ch_out=conv.shape[1] * 2,
+ name=self.prefix_name + "yolo_input.downsample")
+ blocks = []
+ for i, stage in enumerate(stages):
+ block = self.layer_warp(
+ block_func=block_func,
+ input=downsample_,
+ ch_out=32 * 2**i,
+ count=stage,
+ name=self.prefix_name + "stage.{}".format(i))
+ blocks.append(block)
+ if i < len(stages) - 1: # do not downsaple in the last stage
+ downsample_ = self._downsample(
+ input=block,
+ ch_out=block.shape[1] * 2,
+ name=self.prefix_name + "stage.{}.downsample".format(i))
+ if self.num_classes is not None:
+ pool = fluid.layers.pool2d(
+ input=blocks[-1], pool_type='avg', global_pooling=True)
+ stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
+ out = fluid.layers.fc(
+ input=pool,
+ size=self.num_classes,
+ param_attr=ParamAttr(
+ initializer=fluid.initializer.Uniform(-stdv, stdv),
+ name='fc_weights'),
+ bias_attr=ParamAttr(name='fc_offset'))
+ return out
+
+ return blocks
diff --git a/paddlex/cv/nets/densenet.py b/paddlex/cv/nets/densenet.py
new file mode 100644
index 0000000000000000000000000000000000000000..a7238b2cd8775f20210d04d41f6caa1343c68092
--- /dev/null
+++ b/paddlex/cv/nets/densenet.py
@@ -0,0 +1,176 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import paddle
+import paddle.fluid as fluid
+import math
+from paddle.fluid.param_attr import ParamAttr
+
+__all__ = ["DenseNet"]
+
+
+class DenseNet(object):
+ def __init__(self, layers=121, bn_size=4, dropout=0, num_classes=None):
+ assert layers in [121, 161, 169, 201, 264], \
+ "supported layers are {} but input layer is {}".format(supported_layers, layers)
+ self.layers = layers
+ self.bn_size = bn_size
+ self.dropout = dropout
+ self.num_classes = num_classes
+
+ def __call__(self, input):
+ layers = self.layers
+ densenet_spec = {
+ 121: (64, 32, [6, 12, 24, 16]),
+ 161: (96, 48, [6, 12, 36, 24]),
+ 169: (64, 32, [6, 12, 32, 32]),
+ 201: (64, 32, [6, 12, 48, 32]),
+ 264: (64, 32, [6, 12, 64, 48])
+ }
+ num_init_features, growth_rate, block_config = densenet_spec[layers]
+ conv = fluid.layers.conv2d(
+ input=input,
+ num_filters=num_init_features,
+ filter_size=7,
+ stride=2,
+ padding=3,
+ act=None,
+ param_attr=ParamAttr(name="conv1_weights"),
+ bias_attr=False)
+ conv = fluid.layers.batch_norm(
+ input=conv,
+ act='relu',
+ param_attr=ParamAttr(name='conv1_bn_scale'),
+ bias_attr=ParamAttr(name='conv1_bn_offset'),
+ moving_mean_name='conv1_bn_mean',
+ moving_variance_name='conv1_bn_variance')
+ conv = fluid.layers.pool2d(
+ input=conv,
+ pool_size=3,
+ pool_stride=2,
+ pool_padding=1,
+ pool_type='max')
+ num_features = num_init_features
+ for i, num_layers in enumerate(block_config):
+ conv = self.make_dense_block(
+ conv,
+ num_layers,
+ self.bn_size,
+ growth_rate,
+ self.dropout,
+ name='conv' + str(i + 2))
+ num_features = num_features + num_layers * growth_rate
+ if i != len(block_config) - 1:
+ conv = self.make_transition(
+ conv, num_features // 2, name='conv' + str(i + 2) + '_blk')
+ num_features = num_features // 2
+ conv = fluid.layers.batch_norm(
+ input=conv,
+ act='relu',
+ param_attr=ParamAttr(name='conv5_blk_bn_scale'),
+ bias_attr=ParamAttr(name='conv5_blk_bn_offset'),
+ moving_mean_name='conv5_blk_bn_mean',
+ moving_variance_name='conv5_blk_bn_variance')
+ if self.num_classes:
+ conv = fluid.layers.pool2d(
+ input=conv, pool_type='avg', global_pooling=True)
+ stdv = 1.0 / math.sqrt(conv.shape[1] * 1.0)
+ out = fluid.layers.fc(
+ input=conv,
+ size=self.num_classes,
+ param_attr=fluid.param_attr.ParamAttr(
+ initializer=fluid.initializer.Uniform(-stdv, stdv),
+ name="fc_weights"),
+ bias_attr=ParamAttr(name='fc_offset'))
+ return out
+
+ def make_transition(self, input, num_output_features, name=None):
+ bn_ac = fluid.layers.batch_norm(
+ input,
+ act='relu',
+ param_attr=ParamAttr(name=name + '_bn_scale'),
+ bias_attr=ParamAttr(name + '_bn_offset'),
+ moving_mean_name=name + '_bn_mean',
+ moving_variance_name=name + '_bn_variance')
+
+ bn_ac_conv = fluid.layers.conv2d(
+ input=bn_ac,
+ num_filters=num_output_features,
+ filter_size=1,
+ stride=1,
+ act=None,
+ bias_attr=False,
+ param_attr=ParamAttr(name=name + "_weights"))
+ pool = fluid.layers.pool2d(
+ input=bn_ac_conv, pool_size=2, pool_stride=2, pool_type='avg')
+ return pool
+
+ def make_dense_block(self,
+ input,
+ num_layers,
+ bn_size,
+ growth_rate,
+ dropout,
+ name=None):
+ conv = input
+ for layer in range(num_layers):
+ conv = self.make_dense_layer(
+ conv,
+ growth_rate,
+ bn_size,
+ dropout,
+ name=name + '_' + str(layer + 1))
+ return conv
+
+ def make_dense_layer(self, input, growth_rate, bn_size, dropout,
+ name=None):
+ bn_ac = fluid.layers.batch_norm(
+ input,
+ act='relu',
+ param_attr=ParamAttr(name=name + '_x1_bn_scale'),
+ bias_attr=ParamAttr(name + '_x1_bn_offset'),
+ moving_mean_name=name + '_x1_bn_mean',
+ moving_variance_name=name + '_x1_bn_variance')
+ bn_ac_conv = fluid.layers.conv2d(
+ input=bn_ac,
+ num_filters=bn_size * growth_rate,
+ filter_size=1,
+ stride=1,
+ act=None,
+ bias_attr=False,
+ param_attr=ParamAttr(name=name + "_x1_weights"))
+ bn_ac = fluid.layers.batch_norm(
+ bn_ac_conv,
+ act='relu',
+ param_attr=ParamAttr(name=name + '_x2_bn_scale'),
+ bias_attr=ParamAttr(name + '_x2_bn_offset'),
+ moving_mean_name=name + '_x2_bn_mean',
+ moving_variance_name=name + '_x2_bn_variance')
+ bn_ac_conv = fluid.layers.conv2d(
+ input=bn_ac,
+ num_filters=growth_rate,
+ filter_size=3,
+ stride=1,
+ padding=1,
+ act=None,
+ bias_attr=False,
+ param_attr=ParamAttr(name=name + "_x2_weights"))
+ if dropout:
+ bn_ac_conv = fluid.layers.dropout(
+ x=bn_ac_conv, dropout_prob=dropout)
+ bn_ac_conv = fluid.layers.concat([input, bn_ac_conv], axis=1)
+ return bn_ac_conv
diff --git a/paddlex/cv/nets/detection/__init__.py b/paddlex/cv/nets/detection/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..7b9d5d547c8aa7f9dc8254a389624a238843039d
--- /dev/null
+++ b/paddlex/cv/nets/detection/__init__.py
@@ -0,0 +1,17 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .yolo_v3 import YOLOv3
+from .faster_rcnn import FasterRCNN
+from .mask_rcnn import MaskRCNN
diff --git a/paddlex/cv/nets/detection/bbox_head.py b/paddlex/cv/nets/detection/bbox_head.py
new file mode 100644
index 0000000000000000000000000000000000000000..06cf35a7292d03ea975225470f0703d836a2f01e
--- /dev/null
+++ b/paddlex/cv/nets/detection/bbox_head.py
@@ -0,0 +1,242 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from collections import OrderedDict
+
+from paddle import fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import Normal, Xavier
+from paddle.fluid.regularizer import L2Decay
+from paddle.fluid.initializer import MSRA
+
+__all__ = ['BBoxHead', 'TwoFCHead']
+
+
+class TwoFCHead(object):
+ """
+ RCNN head with two Fully Connected layers
+
+ Args:
+ mlp_dim (int): num of filters for the fc layers
+ """
+
+ def __init__(self, mlp_dim=1024):
+ super(TwoFCHead, self).__init__()
+ self.mlp_dim = mlp_dim
+
+ def __call__(self, roi_feat):
+ fan = roi_feat.shape[1] * roi_feat.shape[2] * roi_feat.shape[3]
+
+ fc6 = fluid.layers.fc(
+ input=roi_feat,
+ size=self.mlp_dim,
+ act='relu',
+ name='fc6',
+ param_attr=ParamAttr(
+ name='fc6_w', initializer=Xavier(fan_out=fan)),
+ bias_attr=ParamAttr(
+ name='fc6_b', learning_rate=2., regularizer=L2Decay(0.)))
+ head_feat = fluid.layers.fc(
+ input=fc6,
+ size=self.mlp_dim,
+ act='relu',
+ name='fc7',
+ param_attr=ParamAttr(name='fc7_w', initializer=Xavier()),
+ bias_attr=ParamAttr(
+ name='fc7_b', learning_rate=2., regularizer=L2Decay(0.)))
+
+ return head_feat
+
+
+class BBoxHead(object):
+ def __init__(
+ self,
+ head,
+ #box_coder
+ prior_box_var=[0.1, 0.1, 0.2, 0.2],
+ code_type='decode_center_size',
+ box_normalized=False,
+ axis=1,
+ #MultiClassNMS
+ score_threshold=.05,
+ nms_top_k=-1,
+ keep_top_k=100,
+ nms_threshold=.5,
+ normalized=False,
+ nms_eta=1.0,
+ background_label=0,
+ #bbox_loss
+ sigma=1.0,
+ num_classes=81):
+ super(BBoxHead, self).__init__()
+ self.head = head
+ self.prior_box_var = prior_box_var
+ self.code_type = code_type
+ self.box_normalized = box_normalized
+ self.axis = axis
+ self.score_threshold = score_threshold
+ self.nms_top_k = nms_top_k
+ self.keep_top_k = keep_top_k
+ self.nms_threshold = nms_threshold
+ self.normalized = normalized
+ self.nms_eta = nms_eta
+ self.background_label = background_label
+ self.sigma = sigma
+ self.num_classes = num_classes
+ self.head_feat = None
+
+ def get_head_feat(self, input=None):
+ """
+ Get the bbox head feature map.
+ """
+
+ if input is not None:
+ feat = self.head(input)
+ if isinstance(feat, OrderedDict):
+ feat = list(feat.values())[0]
+ self.head_feat = feat
+ return self.head_feat
+
+ def _get_output(self, roi_feat):
+ """
+ Get bbox head output.
+
+ Args:
+ roi_feat (Variable): RoI feature from RoIExtractor.
+
+ Returns:
+ cls_score(Variable): Output of rpn head with shape of
+ [N, num_anchors, H, W].
+ bbox_pred(Variable): Output of rpn head with shape of
+ [N, num_anchors * 4, H, W].
+ """
+ head_feat = self.get_head_feat(roi_feat)
+ # when ResNetC5 output a single feature map
+ if not isinstance(self.head, TwoFCHead):
+ head_feat = fluid.layers.pool2d(
+ head_feat, pool_type='avg', global_pooling=True)
+ cls_score = fluid.layers.fc(
+ input=head_feat,
+ size=self.num_classes,
+ act=None,
+ name='cls_score',
+ param_attr=ParamAttr(
+ name='cls_score_w', initializer=Normal(loc=0.0, scale=0.01)),
+ bias_attr=ParamAttr(
+ name='cls_score_b', learning_rate=2., regularizer=L2Decay(0.)))
+ bbox_pred = fluid.layers.fc(
+ input=head_feat,
+ size=4 * self.num_classes,
+ act=None,
+ name='bbox_pred',
+ param_attr=ParamAttr(
+ name='bbox_pred_w', initializer=Normal(loc=0.0, scale=0.001)),
+ bias_attr=ParamAttr(
+ name='bbox_pred_b', learning_rate=2., regularizer=L2Decay(0.)))
+ return cls_score, bbox_pred
+
+ def get_loss(self, roi_feat, labels_int32, bbox_targets,
+ bbox_inside_weights, bbox_outside_weights):
+ """
+ Get bbox_head loss.
+
+ Args:
+ roi_feat (Variable): RoI feature from RoIExtractor.
+ labels_int32(Variable): Class label of a RoI with shape [P, 1].
+ P is the number of RoI.
+ bbox_targets(Variable): Box label of a RoI with shape
+ [P, 4 * class_nums].
+ bbox_inside_weights(Variable): Indicates whether a box should
+ contribute to loss. Same shape as bbox_targets.
+ bbox_outside_weights(Variable): Indicates whether a box should
+ contribute to loss. Same shape as bbox_targets.
+
+ Return:
+ Type: Dict
+ loss_cls(Variable): bbox_head loss.
+ loss_bbox(Variable): bbox_head loss.
+ """
+
+ cls_score, bbox_pred = self._get_output(roi_feat)
+
+ labels_int64 = fluid.layers.cast(x=labels_int32, dtype='int64')
+ labels_int64.stop_gradient = True
+ loss_cls = fluid.layers.softmax_with_cross_entropy(
+ logits=cls_score, label=labels_int64, numeric_stable_mode=True)
+ loss_cls = fluid.layers.reduce_mean(loss_cls)
+ loss_bbox = fluid.layers.smooth_l1(
+ x=bbox_pred,
+ y=bbox_targets,
+ inside_weight=bbox_inside_weights,
+ outside_weight=bbox_outside_weights,
+ sigma=self.sigma)
+ loss_bbox = fluid.layers.reduce_mean(loss_bbox)
+ return {'loss_cls': loss_cls, 'loss_bbox': loss_bbox}
+
+ def get_prediction(self,
+ roi_feat,
+ rois,
+ im_info,
+ im_shape,
+ return_box_score=False):
+ """
+ Get prediction bounding box in test stage.
+
+ Args:
+ roi_feat (Variable): RoI feature from RoIExtractor.
+ rois (Variable): Output of generate_proposals in rpn head.
+ im_info (Variable): A 2-D LoDTensor with shape [B, 3]. B is the
+ number of input images, each element consists of im_height,
+ im_width, im_scale.
+ im_shape (Variable): Actual shape of original image with shape
+ [B, 3]. B is the number of images, each element consists of
+ original_height, original_width, 1
+
+ Returns:
+ pred_result(Variable): Prediction result with shape [N, 6]. Each
+ row has 6 values: [label, confidence, xmin, ymin, xmax, ymax].
+ N is the total number of prediction.
+ """
+ cls_score, bbox_pred = self._get_output(roi_feat)
+
+ im_scale = fluid.layers.slice(im_info, [1], starts=[2], ends=[3])
+ im_scale = fluid.layers.sequence_expand(im_scale, rois)
+ boxes = rois / im_scale
+ cls_prob = fluid.layers.softmax(cls_score, use_cudnn=False)
+ bbox_pred = fluid.layers.reshape(bbox_pred, (-1, self.num_classes, 4))
+ decoded_box = fluid.layers.box_coder(
+ prior_box=boxes,
+ target_box=bbox_pred,
+ prior_box_var=self.prior_box_var,
+ code_type=self.code_type,
+ box_normalized=self.box_normalized,
+ axis=self.axis)
+ cliped_box = fluid.layers.box_clip(input=decoded_box, im_info=im_shape)
+ if return_box_score:
+ return {'bbox': cliped_box, 'score': cls_prob}
+ pred_result = fluid.layers.multiclass_nms(
+ bboxes=cliped_box,
+ scores=cls_prob,
+ score_threshold=self.score_threshold,
+ nms_top_k=self.nms_top_k,
+ keep_top_k=self.keep_top_k,
+ nms_threshold=self.nms_threshold,
+ normalized=self.normalized,
+ nms_eta=self.nms_eta,
+ background_label=self.background_label)
+ return {'bbox': pred_result}
diff --git a/paddlex/cv/nets/detection/faster_rcnn.py b/paddlex/cv/nets/detection/faster_rcnn.py
new file mode 100644
index 0000000000000000000000000000000000000000..92ff141284b3956f89fddd1d2fea3fcf7863ad60
--- /dev/null
+++ b/paddlex/cv/nets/detection/faster_rcnn.py
@@ -0,0 +1,254 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from collections import OrderedDict
+import copy
+
+from paddle import fluid
+
+from .fpn import FPN
+from .rpn_head import (RPNHead, FPNRPNHead)
+from .roi_extractor import (RoIAlign, FPNRoIAlign)
+from .bbox_head import (BBoxHead, TwoFCHead)
+from ..resnet import ResNetC5
+
+__all__ = ['FasterRCNN']
+
+
+class FasterRCNN(object):
+ """
+ Faster R-CNN architecture, see https://arxiv.org/abs/1506.01497
+ Args:
+ backbone (object): backbone instance
+ rpn_head (object): `RPNhead` instance
+ roi_extractor (object): ROI extractor instance
+ bbox_head (object): `BBoxHead` instance
+ fpn (object): feature pyramid network instance
+ """
+
+ def __init__(
+ self,
+ backbone,
+ mode='train',
+ num_classes=81,
+ with_fpn=False,
+ fpn=None,
+ #rpn_head
+ rpn_only=False,
+ rpn_head=None,
+ anchor_sizes=[32, 64, 128, 256, 512],
+ aspect_ratios=[0.5, 1.0, 2.0],
+ rpn_batch_size_per_im=256,
+ rpn_fg_fraction=0.5,
+ rpn_positive_overlap=0.7,
+ rpn_negative_overlap=0.3,
+ train_pre_nms_top_n=12000,
+ train_post_nms_top_n=2000,
+ train_nms_thresh=0.7,
+ test_pre_nms_top_n=6000,
+ test_post_nms_top_n=1000,
+ test_nms_thresh=0.7,
+ #roi_extractor
+ roi_extractor=None,
+ #bbox_head
+ bbox_head=None,
+ keep_top_k=100,
+ nms_threshold=0.5,
+ score_threshold=0.05,
+ #bbox_assigner
+ batch_size_per_im=512,
+ fg_fraction=.25,
+ fg_thresh=.5,
+ bg_thresh_hi=.5,
+ bg_thresh_lo=0.,
+ bbox_reg_weights=[0.1, 0.1, 0.2, 0.2]):
+ super(FasterRCNN, self).__init__()
+ self.backbone = backbone
+ self.mode = mode
+ if with_fpn and fpn is None:
+ fpn = FPN()
+ self.fpn = fpn
+ self.num_classes = num_classes
+ if rpn_head is None:
+ if self.fpn is None:
+ rpn_head = RPNHead(
+ anchor_sizes=anchor_sizes,
+ aspect_ratios=aspect_ratios,
+ rpn_batch_size_per_im=rpn_batch_size_per_im,
+ rpn_fg_fraction=rpn_fg_fraction,
+ rpn_positive_overlap=rpn_positive_overlap,
+ rpn_negative_overlap=rpn_negative_overlap,
+ train_pre_nms_top_n=train_pre_nms_top_n,
+ train_post_nms_top_n=train_post_nms_top_n,
+ train_nms_thresh=train_nms_thresh,
+ test_pre_nms_top_n=test_pre_nms_top_n,
+ test_post_nms_top_n=test_post_nms_top_n,
+ test_nms_thresh=test_nms_thresh)
+ else:
+ rpn_head = FPNRPNHead(
+ anchor_start_size=anchor_sizes[0],
+ aspect_ratios=aspect_ratios,
+ num_chan=self.fpn.num_chan,
+ min_level=self.fpn.min_level,
+ max_level=self.fpn.max_level,
+ rpn_batch_size_per_im=rpn_batch_size_per_im,
+ rpn_fg_fraction=rpn_fg_fraction,
+ rpn_positive_overlap=rpn_positive_overlap,
+ rpn_negative_overlap=rpn_negative_overlap,
+ train_pre_nms_top_n=train_pre_nms_top_n,
+ train_post_nms_top_n=train_post_nms_top_n,
+ train_nms_thresh=train_nms_thresh,
+ test_pre_nms_top_n=test_pre_nms_top_n,
+ test_post_nms_top_n=test_post_nms_top_n,
+ test_nms_thresh=test_nms_thresh)
+ self.rpn_head = rpn_head
+ if roi_extractor is None:
+ if self.fpn is None:
+ roi_extractor = RoIAlign(
+ resolution=14,
+ spatial_scale=1. / 2**self.backbone.feature_maps[0])
+ else:
+ roi_extractor = FPNRoIAlign(sampling_ratio=2)
+ self.roi_extractor = roi_extractor
+ if bbox_head is None:
+ if self.fpn is None:
+ head = ResNetC5(
+ layers=self.backbone.layers,
+ norm_type=self.backbone.norm_type,
+ freeze_norm=self.backbone.freeze_norm,
+ variant=self.backbone.variant)
+ else:
+ head = TwoFCHead()
+ bbox_head = BBoxHead(
+ head=head,
+ keep_top_k=keep_top_k,
+ nms_threshold=nms_threshold,
+ score_threshold=score_threshold,
+ num_classes=num_classes)
+ self.bbox_head = bbox_head
+ self.batch_size_per_im = batch_size_per_im
+ self.fg_fraction = fg_fraction
+ self.fg_thresh = fg_thresh
+ self.bg_thresh_hi = bg_thresh_hi
+ self.bg_thresh_lo = bg_thresh_lo
+ self.bbox_reg_weights = bbox_reg_weights
+ self.rpn_only = rpn_only
+
+ def build_net(self, inputs):
+ im = inputs['image']
+ im_info = inputs['im_info']
+ if self.mode == 'train':
+ gt_bbox = inputs['gt_box']
+ is_crowd = inputs['is_crowd']
+ else:
+ im_shape = inputs['im_shape']
+
+ body_feats = self.backbone(im)
+ body_feat_names = list(body_feats.keys())
+
+ if self.fpn is not None:
+ body_feats, spatial_scale = self.fpn.get_output(body_feats)
+
+ rois = self.rpn_head.get_proposals(body_feats, im_info, mode=self.mode)
+
+ if self.mode == 'train':
+ rpn_loss = self.rpn_head.get_loss(im_info, gt_bbox, is_crowd)
+ outputs = fluid.layers.generate_proposal_labels(
+ rpn_rois=rois,
+ gt_classes=inputs['gt_label'],
+ is_crowd=inputs['is_crowd'],
+ gt_boxes=inputs['gt_box'],
+ im_info=inputs['im_info'],
+ batch_size_per_im=self.batch_size_per_im,
+ fg_fraction=self.fg_fraction,
+ fg_thresh=self.fg_thresh,
+ bg_thresh_hi=self.bg_thresh_hi,
+ bg_thresh_lo=self.bg_thresh_lo,
+ bbox_reg_weights=self.bbox_reg_weights,
+ class_nums=self.num_classes,
+ use_random=self.rpn_head.use_random)
+
+ rois = outputs[0]
+ labels_int32 = outputs[1]
+ bbox_targets = outputs[2]
+ bbox_inside_weights = outputs[3]
+ bbox_outside_weights = outputs[4]
+ else:
+ if self.rpn_only:
+ im_scale = fluid.layers.slice(
+ im_info, [1], starts=[2], ends=[3])
+ im_scale = fluid.layers.sequence_expand(im_scale, rois)
+ rois = rois / im_scale
+ return {'proposal': rois}
+ if self.fpn is None:
+ # in models without FPN, roi extractor only uses the last level of
+ # feature maps. And body_feat_names[-1] represents the name of
+ # last feature map.
+ body_feat = body_feats[body_feat_names[-1]]
+ roi_feat = self.roi_extractor(body_feat, rois)
+ else:
+ roi_feat = self.roi_extractor(body_feats, rois, spatial_scale)
+
+ if self.mode == 'train':
+ loss = self.bbox_head.get_loss(roi_feat, labels_int32,
+ bbox_targets, bbox_inside_weights,
+ bbox_outside_weights)
+ loss.update(rpn_loss)
+ total_loss = fluid.layers.sum(list(loss.values()))
+ loss.update({'loss': total_loss})
+ return loss
+ else:
+ pred = self.bbox_head.get_prediction(roi_feat, rois, im_info,
+ im_shape)
+ return pred
+
+ def generate_inputs(self):
+ inputs = OrderedDict()
+ inputs['image'] = fluid.data(
+ dtype='float32', shape=[None, 3, None, None], name='image')
+ if self.mode == 'train':
+ inputs['im_info'] = fluid.data(
+ dtype='float32', shape=[None, 3], name='im_info')
+ inputs['gt_box'] = fluid.data(
+ dtype='float32', shape=[None, 4], lod_level=1, name='gt_box')
+ inputs['gt_label'] = fluid.data(
+ dtype='int32', shape=[None, 1], lod_level=1, name='gt_label')
+ inputs['is_crowd'] = fluid.data(
+ dtype='int32', shape=[None, 1], lod_level=1, name='is_crowd')
+ elif self.mode == 'eval':
+ inputs['im_info'] = fluid.data(
+ dtype='float32', shape=[None, 3], name='im_info')
+ inputs['im_id'] = fluid.data(
+ dtype='int64', shape=[None, 1], name='im_id')
+ inputs['im_shape'] = fluid.data(
+ dtype='float32', shape=[None, 3], name='im_shape')
+ inputs['gt_box'] = fluid.data(
+ dtype='float32', shape=[None, 4], lod_level=1, name='gt_box')
+ inputs['gt_label'] = fluid.data(
+ dtype='int32', shape=[None, 1], lod_level=1, name='gt_label')
+ inputs['is_difficult'] = fluid.data(
+ dtype='int32',
+ shape=[None, 1],
+ lod_level=1,
+ name='is_difficult')
+ elif self.mode == 'test':
+ inputs['im_info'] = fluid.data(
+ dtype='float32', shape=[None, 3], name='im_info')
+ inputs['im_shape'] = fluid.data(
+ dtype='float32', shape=[None, 3], name='im_shape')
+ return inputs
diff --git a/paddlex/cv/nets/detection/fpn.py b/paddlex/cv/nets/detection/fpn.py
new file mode 100644
index 0000000000000000000000000000000000000000..8fd843b149d38fc2f640aa34df9e26432a25899e
--- /dev/null
+++ b/paddlex/cv/nets/detection/fpn.py
@@ -0,0 +1,295 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from collections import OrderedDict
+import copy
+from paddle import fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import Xavier
+from paddle.fluid.regularizer import L2Decay
+
+__all__ = ['FPN']
+
+
+def ConvNorm(input,
+ num_filters,
+ filter_size,
+ stride=1,
+ groups=1,
+ norm_decay=0.,
+ norm_type='affine_channel',
+ norm_groups=32,
+ dilation=1,
+ lr_scale=1,
+ freeze_norm=False,
+ act=None,
+ norm_name=None,
+ initializer=None,
+ name=None):
+ fan = num_filters
+ conv = fluid.layers.conv2d(
+ input=input,
+ num_filters=num_filters,
+ filter_size=filter_size,
+ stride=stride,
+ padding=((filter_size - 1) // 2) * dilation,
+ dilation=dilation,
+ groups=groups,
+ act=None,
+ param_attr=ParamAttr(
+ name=name + "_weights",
+ initializer=initializer,
+ learning_rate=lr_scale),
+ bias_attr=False,
+ name=name + '.conv2d.output.1')
+
+ norm_lr = 0. if freeze_norm else 1.
+ pattr = ParamAttr(
+ name=norm_name + '_scale',
+ learning_rate=norm_lr * lr_scale,
+ regularizer=L2Decay(norm_decay))
+ battr = ParamAttr(
+ name=norm_name + '_offset',
+ learning_rate=norm_lr * lr_scale,
+ regularizer=L2Decay(norm_decay))
+
+ if norm_type in ['bn', 'sync_bn']:
+ global_stats = True if freeze_norm else False
+ out = fluid.layers.batch_norm(
+ input=conv,
+ act=act,
+ name=norm_name + '.output.1',
+ param_attr=pattr,
+ bias_attr=battr,
+ moving_mean_name=norm_name + '_mean',
+ moving_variance_name=norm_name + '_variance',
+ use_global_stats=global_stats)
+ scale = fluid.framework._get_var(pattr.name)
+ bias = fluid.framework._get_var(battr.name)
+ elif norm_type == 'gn':
+ out = fluid.layers.group_norm(
+ input=conv,
+ act=act,
+ name=norm_name + '.output.1',
+ groups=norm_groups,
+ param_attr=pattr,
+ bias_attr=battr)
+ scale = fluid.framework._get_var(pattr.name)
+ bias = fluid.framework._get_var(battr.name)
+ elif norm_type == 'affine_channel':
+ scale = fluid.layers.create_parameter(
+ shape=[conv.shape[1]],
+ dtype=conv.dtype,
+ attr=pattr,
+ default_initializer=fluid.initializer.Constant(1.))
+ bias = fluid.layers.create_parameter(
+ shape=[conv.shape[1]],
+ dtype=conv.dtype,
+ attr=battr,
+ default_initializer=fluid.initializer.Constant(0.))
+ out = fluid.layers.affine_channel(
+ x=conv, scale=scale, bias=bias, act=act)
+ if freeze_norm:
+ scale.stop_gradient = True
+ bias.stop_gradient = True
+ return out
+
+
+class FPN(object):
+ """
+ Feature Pyramid Network, see https://arxiv.org/abs/1612.03144
+
+ Args:
+ num_chan (int): number of feature channels
+ min_level (int): lowest level of the backbone feature map to use
+ max_level (int): highest level of the backbone feature map to use
+ spatial_scale (list): feature map scaling factor
+ has_extra_convs (bool): whether has extral convolutions in higher levels
+ norm_type (str|None): normalization type, 'bn'/'sync_bn'/'affine_channel'
+ """
+
+ def __init__(self,
+ num_chan=256,
+ min_level=2,
+ max_level=6,
+ spatial_scale=[1. / 32., 1. / 16., 1. / 8., 1. / 4.],
+ has_extra_convs=False,
+ norm_type=None,
+ freeze_norm=False):
+ self.freeze_norm = freeze_norm
+ self.num_chan = num_chan
+ self.min_level = min_level
+ self.max_level = max_level
+ self.spatial_scale = spatial_scale
+ self.has_extra_convs = has_extra_convs
+ self.norm_type = norm_type
+
+ def _add_topdown_lateral(self, body_name, body_input, upper_output):
+ lateral_name = 'fpn_inner_' + body_name + '_lateral'
+ topdown_name = 'fpn_topdown_' + body_name
+ fan = body_input.shape[1]
+ if self.norm_type:
+ initializer = Xavier(fan_out=fan)
+ lateral = ConvNorm(
+ body_input,
+ self.num_chan,
+ 1,
+ initializer=initializer,
+ norm_type=self.norm_type,
+ freeze_norm=self.freeze_norm,
+ name=lateral_name,
+ norm_name=lateral_name)
+ else:
+ lateral = fluid.layers.conv2d(
+ body_input,
+ self.num_chan,
+ 1,
+ param_attr=ParamAttr(
+ name=lateral_name + "_w", initializer=Xavier(fan_out=fan)),
+ bias_attr=ParamAttr(
+ name=lateral_name + "_b",
+ learning_rate=2.,
+ regularizer=L2Decay(0.)),
+ name=lateral_name)
+ topdown = fluid.layers.resize_nearest(
+ upper_output, scale=2., name=topdown_name)
+
+ return lateral + topdown
+
+ def get_output(self, body_dict):
+ """
+ Add FPN onto backbone.
+
+ Args:
+ body_dict(OrderedDict): Dictionary of variables and each element is the
+ output of backbone.
+
+ Return:
+ fpn_dict(OrderedDict): A dictionary represents the output of FPN with
+ their name.
+ spatial_scale(list): A list of multiplicative spatial scale factor.
+ """
+ spatial_scale = copy.deepcopy(self.spatial_scale)
+ body_name_list = list(body_dict.keys())[::-1]
+ num_backbone_stages = len(body_name_list)
+ self.fpn_inner_output = [[] for _ in range(num_backbone_stages)]
+ fpn_inner_name = 'fpn_inner_' + body_name_list[0]
+ body_input = body_dict[body_name_list[0]]
+ fan = body_input.shape[1]
+ if self.norm_type:
+ initializer = Xavier(fan_out=fan)
+ self.fpn_inner_output[0] = ConvNorm(
+ body_input,
+ self.num_chan,
+ 1,
+ initializer=initializer,
+ norm_type=self.norm_type,
+ freeze_norm=self.freeze_norm,
+ name=fpn_inner_name,
+ norm_name=fpn_inner_name)
+ else:
+ self.fpn_inner_output[0] = fluid.layers.conv2d(
+ body_input,
+ self.num_chan,
+ 1,
+ param_attr=ParamAttr(
+ name=fpn_inner_name + "_w",
+ initializer=Xavier(fan_out=fan)),
+ bias_attr=ParamAttr(
+ name=fpn_inner_name + "_b",
+ learning_rate=2.,
+ regularizer=L2Decay(0.)),
+ name=fpn_inner_name)
+ for i in range(1, num_backbone_stages):
+ body_name = body_name_list[i]
+ body_input = body_dict[body_name]
+ top_output = self.fpn_inner_output[i - 1]
+ fpn_inner_single = self._add_topdown_lateral(
+ body_name, body_input, top_output)
+ self.fpn_inner_output[i] = fpn_inner_single
+ fpn_dict = {}
+ fpn_name_list = []
+ for i in range(num_backbone_stages):
+ fpn_name = 'fpn_' + body_name_list[i]
+ fan = self.fpn_inner_output[i].shape[1] * 3 * 3
+ if self.norm_type:
+ initializer = Xavier(fan_out=fan)
+ fpn_output = ConvNorm(
+ self.fpn_inner_output[i],
+ self.num_chan,
+ 3,
+ initializer=initializer,
+ norm_type=self.norm_type,
+ freeze_norm=self.freeze_norm,
+ name=fpn_name,
+ norm_name=fpn_name)
+ else:
+ fpn_output = fluid.layers.conv2d(
+ self.fpn_inner_output[i],
+ self.num_chan,
+ filter_size=3,
+ padding=1,
+ param_attr=ParamAttr(
+ name=fpn_name + "_w", initializer=Xavier(fan_out=fan)),
+ bias_attr=ParamAttr(
+ name=fpn_name + "_b",
+ learning_rate=2.,
+ regularizer=L2Decay(0.)),
+ name=fpn_name)
+ fpn_dict[fpn_name] = fpn_output
+ fpn_name_list.append(fpn_name)
+ if not self.has_extra_convs and self.max_level - self.min_level == len(
+ spatial_scale):
+ body_top_name = fpn_name_list[0]
+ body_top_extension = fluid.layers.pool2d(
+ fpn_dict[body_top_name],
+ 1,
+ 'max',
+ pool_stride=2,
+ name=body_top_name + '_subsampled_2x')
+ fpn_dict[body_top_name + '_subsampled_2x'] = body_top_extension
+ fpn_name_list.insert(0, body_top_name + '_subsampled_2x')
+ spatial_scale.insert(0, spatial_scale[0] * 0.5)
+ # Coarser FPN levels introduced for RetinaNet
+ highest_backbone_level = self.min_level + len(spatial_scale) - 1
+ if self.has_extra_convs and self.max_level > highest_backbone_level:
+ fpn_blob = body_dict[body_name_list[0]]
+ for i in range(highest_backbone_level + 1, self.max_level + 1):
+ fpn_blob_in = fpn_blob
+ fpn_name = 'fpn_' + str(i)
+ if i > highest_backbone_level + 1:
+ fpn_blob_in = fluid.layers.relu(fpn_blob)
+ fan = fpn_blob_in.shape[1] * 3 * 3
+ fpn_blob = fluid.layers.conv2d(
+ input=fpn_blob_in,
+ num_filters=self.num_chan,
+ filter_size=3,
+ stride=2,
+ padding=1,
+ param_attr=ParamAttr(
+ name=fpn_name + "_w", initializer=Xavier(fan_out=fan)),
+ bias_attr=ParamAttr(
+ name=fpn_name + "_b",
+ learning_rate=2.,
+ regularizer=L2Decay(0.)),
+ name=fpn_name)
+ fpn_dict[fpn_name] = fpn_blob
+ fpn_name_list.insert(0, fpn_name)
+ spatial_scale.insert(0, spatial_scale[0] * 0.5)
+ res_dict = OrderedDict([(k, fpn_dict[k]) for k in fpn_name_list])
+ return res_dict, spatial_scale
diff --git a/paddlex/cv/nets/detection/mask_head.py b/paddlex/cv/nets/detection/mask_head.py
new file mode 100644
index 0000000000000000000000000000000000000000..2198f87a02ae2d048ff23cd108fdb9add67ca46a
--- /dev/null
+++ b/paddlex/cv/nets/detection/mask_head.py
@@ -0,0 +1,155 @@
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from paddle import fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import MSRA
+from paddle.fluid.regularizer import L2Decay
+
+from .fpn import ConvNorm
+
+__all__ = ['MaskHead']
+
+
+class MaskHead(object):
+ """
+ RCNN mask head
+ Args:
+ num_convs (int): num of convolutions, 4 for FPN, 1 otherwise
+ conv_dim (int): num of channels after first convolution
+ resolution (int): size of the output mask
+ dilation (int): dilation rate
+ num_classes (int): number of output classes
+ """
+
+ def __init__(self,
+ num_convs=0,
+ conv_dim=256,
+ resolution=14,
+ dilation=1,
+ num_classes=81,
+ norm_type=None):
+ super(MaskHead, self).__init__()
+ self.num_convs = num_convs
+ self.conv_dim = conv_dim
+ self.resolution = resolution
+ self.dilation = dilation
+ self.num_classes = num_classes
+ self.norm_type = norm_type
+
+ def _mask_conv_head(self, roi_feat, num_convs, norm_type):
+ if norm_type == 'gn':
+ for i in range(num_convs):
+ layer_name = "mask_inter_feat_" + str(i + 1)
+ fan = self.conv_dim * 3 * 3
+ initializer = MSRA(uniform=False, fan_in=fan)
+ roi_feat = ConvNorm(
+ roi_feat,
+ self.conv_dim,
+ 3,
+ act='relu',
+ dilation=self.dilation,
+ initializer=initializer,
+ norm_type=self.norm_type,
+ name=layer_name,
+ norm_name=layer_name)
+ else:
+ for i in range(num_convs):
+ layer_name = "mask_inter_feat_" + str(i + 1)
+ fan = self.conv_dim * 3 * 3
+ initializer = MSRA(uniform=False, fan_in=fan)
+ roi_feat = fluid.layers.conv2d(
+ input=roi_feat,
+ num_filters=self.conv_dim,
+ filter_size=3,
+ padding=1 * self.dilation,
+ act='relu',
+ stride=1,
+ dilation=self.dilation,
+ name=layer_name,
+ param_attr=ParamAttr(
+ name=layer_name + '_w', initializer=initializer),
+ bias_attr=ParamAttr(
+ name=layer_name + '_b',
+ learning_rate=2.,
+ regularizer=L2Decay(0.)))
+ fan = roi_feat.shape[1] * 2 * 2
+ feat = fluid.layers.conv2d_transpose(
+ input=roi_feat,
+ num_filters=self.conv_dim,
+ filter_size=2,
+ stride=2,
+ act='relu',
+ param_attr=ParamAttr(
+ name='conv5_mask_w',
+ initializer=MSRA(uniform=False, fan_in=fan)),
+ bias_attr=ParamAttr(
+ name='conv5_mask_b', learning_rate=2.,
+ regularizer=L2Decay(0.)))
+ return feat
+
+ def _get_output(self, roi_feat):
+ class_num = self.num_classes
+ # configure the conv number for FPN if necessary
+ head_feat = self._mask_conv_head(roi_feat, self.num_convs,
+ self.norm_type)
+ fan = class_num
+ mask_logits = fluid.layers.conv2d(
+ input=head_feat,
+ num_filters=class_num,
+ filter_size=1,
+ act=None,
+ param_attr=ParamAttr(
+ name='mask_fcn_logits_w',
+ initializer=MSRA(uniform=False, fan_in=fan)),
+ bias_attr=ParamAttr(
+ name="mask_fcn_logits_b",
+ learning_rate=2.,
+ regularizer=L2Decay(0.)))
+ return mask_logits
+
+ def get_loss(self, roi_feat, mask_int32):
+ mask_logits = self._get_output(roi_feat)
+ num_classes = self.num_classes
+ resolution = self.resolution
+ dim = num_classes * resolution * resolution
+ mask_logits = fluid.layers.reshape(mask_logits, (-1, dim))
+
+ mask_label = fluid.layers.cast(x=mask_int32, dtype='float32')
+ mask_label.stop_gradient = True
+ loss_mask = fluid.layers.sigmoid_cross_entropy_with_logits(
+ x=mask_logits, label=mask_label, ignore_index=-1, normalize=True)
+ loss_mask = fluid.layers.reduce_sum(loss_mask, name='loss_mask')
+ return {'loss_mask': loss_mask}
+
+ def get_prediction(self, roi_feat, bbox_pred):
+ """
+ Get prediction mask in test stage.
+
+ Args:
+ roi_feat (Variable): RoI feature from RoIExtractor.
+ bbox_pred (Variable): predicted bbox.
+
+ Returns:
+ mask_pred (Variable): Prediction mask with shape
+ [N, num_classes, resolution, resolution].
+ """
+ mask_logits = self._get_output(roi_feat)
+ mask_prob = fluid.layers.sigmoid(mask_logits)
+ mask_prob = fluid.layers.lod_reset(mask_prob, bbox_pred)
+ return mask_prob
diff --git a/paddlex/cv/nets/detection/mask_rcnn.py b/paddlex/cv/nets/detection/mask_rcnn.py
new file mode 100644
index 0000000000000000000000000000000000000000..9bb8ebc9f1742a088aee9264343e3584b6539bc6
--- /dev/null
+++ b/paddlex/cv/nets/detection/mask_rcnn.py
@@ -0,0 +1,334 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from collections import OrderedDict
+import copy
+
+import paddle.fluid as fluid
+
+from .fpn import FPN
+from .rpn_head import (RPNHead, FPNRPNHead)
+from .roi_extractor import (RoIAlign, FPNRoIAlign)
+from .bbox_head import (BBoxHead, TwoFCHead)
+from .mask_head import MaskHead
+from ..resnet import ResNetC5
+
+__all__ = ['MaskRCNN']
+
+
+class MaskRCNN(object):
+ """
+ Mask R-CNN architecture, see https://arxiv.org/abs/1703.06870
+ Args:
+ backbone (object): backbone instance
+ rpn_head (object): `RPNhead` instance
+ roi_extractor (object): ROI extractor instance
+ bbox_head (object): `BBoxHead` instance
+ mask_head (object): `MaskHead` instance
+ fpn (object): feature pyramid network instance
+ """
+
+ def __init__(
+ self,
+ backbone,
+ num_classes=81,
+ mode='train',
+ with_fpn=False,
+ fpn=None,
+ num_chan=256,
+ min_level=2,
+ max_level=6,
+ spatial_scale=[1. / 32., 1. / 16., 1. / 8., 1. / 4.],
+ #rpn_head
+ rpn_only=False,
+ rpn_head=None,
+ anchor_sizes=[32, 64, 128, 256, 512],
+ aspect_ratios=[0.5, 1.0, 2.0],
+ rpn_batch_size_per_im=256,
+ rpn_fg_fraction=0.5,
+ rpn_positive_overlap=0.7,
+ rpn_negative_overlap=0.3,
+ train_pre_nms_top_n=12000,
+ train_post_nms_top_n=2000,
+ train_nms_thresh=0.7,
+ test_pre_nms_top_n=6000,
+ test_post_nms_top_n=1000,
+ test_nms_thresh=0.7,
+ #roi_extractor
+ roi_extractor=None,
+ #bbox_head
+ bbox_head=None,
+ keep_top_k=100,
+ nms_threshold=0.5,
+ score_threshold=0.05,
+ #MaskHead
+ mask_head=None,
+ num_convs=0,
+ mask_head_resolution=14,
+ #bbox_assigner
+ batch_size_per_im=512,
+ fg_fraction=.25,
+ fg_thresh=.5,
+ bg_thresh_hi=.5,
+ bg_thresh_lo=0.,
+ bbox_reg_weights=[0.1, 0.1, 0.2, 0.2]):
+ super(MaskRCNN, self).__init__()
+ self.backbone = backbone
+ self.mode = mode
+ if with_fpn and fpn is None:
+ fpn = FPN(
+ num_chan=num_chan,
+ min_level=min_level,
+ max_level=max_level,
+ spatial_scale=spatial_scale)
+ self.fpn = fpn
+ self.num_classes = num_classes
+ if rpn_head is None:
+ if self.fpn is None:
+ rpn_head = RPNHead(
+ anchor_sizes=anchor_sizes,
+ aspect_ratios=aspect_ratios,
+ rpn_batch_size_per_im=rpn_batch_size_per_im,
+ rpn_fg_fraction=rpn_fg_fraction,
+ rpn_positive_overlap=rpn_positive_overlap,
+ rpn_negative_overlap=rpn_negative_overlap,
+ train_pre_nms_top_n=train_pre_nms_top_n,
+ train_post_nms_top_n=train_post_nms_top_n,
+ train_nms_thresh=train_nms_thresh,
+ test_pre_nms_top_n=test_pre_nms_top_n,
+ test_post_nms_top_n=test_post_nms_top_n,
+ test_nms_thresh=test_nms_thresh)
+ else:
+ rpn_head = FPNRPNHead(
+ anchor_start_size=anchor_sizes[0],
+ aspect_ratios=aspect_ratios,
+ num_chan=self.fpn.num_chan,
+ min_level=self.fpn.min_level,
+ max_level=self.fpn.max_level,
+ rpn_batch_size_per_im=rpn_batch_size_per_im,
+ rpn_fg_fraction=rpn_fg_fraction,
+ rpn_positive_overlap=rpn_positive_overlap,
+ rpn_negative_overlap=rpn_negative_overlap,
+ train_pre_nms_top_n=train_pre_nms_top_n,
+ train_post_nms_top_n=train_post_nms_top_n,
+ train_nms_thresh=train_nms_thresh,
+ test_pre_nms_top_n=test_pre_nms_top_n,
+ test_post_nms_top_n=test_post_nms_top_n,
+ test_nms_thresh=test_nms_thresh)
+ self.rpn_head = rpn_head
+ if roi_extractor is None:
+ if self.fpn is None:
+ roi_extractor = RoIAlign(
+ resolution=14,
+ spatial_scale=1. / 2**self.backbone.feature_maps[0])
+ else:
+ roi_extractor = FPNRoIAlign(sampling_ratio=2)
+ self.roi_extractor = roi_extractor
+ if bbox_head is None:
+ if self.fpn is None:
+ head = ResNetC5(
+ layers=self.backbone.layers,
+ norm_type=self.backbone.norm_type,
+ freeze_norm=self.backbone.freeze_norm)
+ else:
+ head = TwoFCHead()
+ bbox_head = BBoxHead(
+ head=head,
+ keep_top_k=keep_top_k,
+ nms_threshold=nms_threshold,
+ score_threshold=score_threshold,
+ num_classes=num_classes)
+ self.bbox_head = bbox_head
+ if mask_head is None:
+ mask_head = MaskHead(
+ num_convs=num_convs,
+ resolution=mask_head_resolution,
+ num_classes=num_classes)
+ self.mask_head = mask_head
+ self.batch_size_per_im = batch_size_per_im
+ self.fg_fraction = fg_fraction
+ self.fg_thresh = fg_thresh
+ self.bg_thresh_hi = bg_thresh_hi
+ self.bg_thresh_lo = bg_thresh_lo
+ self.bbox_reg_weights = bbox_reg_weights
+ self.rpn_only = rpn_only
+
+ def build_net(self, inputs):
+ im = inputs['image']
+ im_info = inputs['im_info']
+
+ # backbone
+ body_feats = self.backbone(im)
+
+ # FPN
+ spatial_scale = None
+ if self.fpn is not None:
+ body_feats, spatial_scale = self.fpn.get_output(body_feats)
+
+ # RPN proposals
+ rois = self.rpn_head.get_proposals(body_feats, im_info, mode=self.mode)
+
+ if self.mode == 'train':
+ rpn_loss = self.rpn_head.get_loss(im_info, inputs['gt_box'],
+ inputs['is_crowd'])
+ outputs = fluid.layers.generate_proposal_labels(
+ rpn_rois=rois,
+ gt_classes=inputs['gt_label'],
+ is_crowd=inputs['is_crowd'],
+ gt_boxes=inputs['gt_box'],
+ im_info=inputs['im_info'],
+ batch_size_per_im=self.batch_size_per_im,
+ fg_fraction=self.fg_fraction,
+ fg_thresh=self.fg_thresh,
+ bg_thresh_hi=self.bg_thresh_hi,
+ bg_thresh_lo=self.bg_thresh_lo,
+ bbox_reg_weights=self.bbox_reg_weights,
+ class_nums=self.num_classes,
+ use_random=self.rpn_head.use_random)
+
+ rois = outputs[0]
+ labels_int32 = outputs[1]
+
+ if self.fpn is None:
+ last_feat = body_feats[list(body_feats.keys())[-1]]
+ roi_feat = self.roi_extractor(last_feat, rois)
+ else:
+ roi_feat = self.roi_extractor(body_feats, rois, spatial_scale)
+
+ loss = self.bbox_head.get_loss(roi_feat, labels_int32,
+ *outputs[2:])
+ loss.update(rpn_loss)
+
+ mask_rois, roi_has_mask_int32, mask_int32 = fluid.layers.generate_mask_labels(
+ rois=rois,
+ gt_classes=inputs['gt_label'],
+ is_crowd=inputs['is_crowd'],
+ gt_segms=inputs['gt_mask'],
+ im_info=inputs['im_info'],
+ labels_int32=labels_int32,
+ num_classes=self.num_classes,
+ resolution=self.mask_head.resolution)
+ if self.fpn is None:
+ bbox_head_feat = self.bbox_head.get_head_feat()
+ feat = fluid.layers.gather(bbox_head_feat, roi_has_mask_int32)
+ else:
+ feat = self.roi_extractor(
+ body_feats, mask_rois, spatial_scale, is_mask=True)
+
+ mask_loss = self.mask_head.get_loss(feat, mask_int32)
+ loss.update(mask_loss)
+
+ total_loss = fluid.layers.sum(list(loss.values()))
+ loss.update({'loss': total_loss})
+ return loss
+
+ else:
+ if self.rpn_only:
+ im_scale = fluid.layers.slice(
+ im_info, [1], starts=[2], ends=[3])
+ im_scale = fluid.layers.sequence_expand(im_scale, rois)
+ rois = rois / im_scale
+ return {'proposal': rois}
+ mask_name = 'mask_pred'
+ mask_pred, bbox_pred = self._eval(body_feats, mask_name, rois,
+ im_info, inputs['im_shape'],
+ spatial_scale)
+ return OrderedDict(zip(['bbox', 'mask'], [bbox_pred, mask_pred]))
+
+ def _eval(self,
+ body_feats,
+ mask_name,
+ rois,
+ im_info,
+ im_shape,
+ spatial_scale,
+ bbox_pred=None):
+ if not bbox_pred:
+ if self.fpn is None:
+ last_feat = body_feats[list(body_feats.keys())[-1]]
+ roi_feat = self.roi_extractor(last_feat, rois)
+ else:
+ roi_feat = self.roi_extractor(body_feats, rois, spatial_scale)
+ bbox_pred = self.bbox_head.get_prediction(roi_feat, rois, im_info,
+ im_shape)
+ bbox_pred = bbox_pred['bbox']
+
+ # share weight
+ bbox_shape = fluid.layers.shape(bbox_pred)
+ bbox_size = fluid.layers.reduce_prod(bbox_shape)
+ bbox_size = fluid.layers.reshape(bbox_size, [1, 1])
+ size = fluid.layers.fill_constant([1, 1], value=6, dtype='int32')
+ cond = fluid.layers.less_than(x=bbox_size, y=size)
+
+ mask_pred = fluid.layers.create_global_var(
+ shape=[1],
+ value=0.0,
+ dtype='float32',
+ persistable=False,
+ name=mask_name)
+ with fluid.layers.control_flow.Switch() as switch:
+ with switch.case(cond):
+ fluid.layers.assign(input=bbox_pred, output=mask_pred)
+ with switch.default():
+ bbox = fluid.layers.slice(bbox_pred, [1], starts=[2], ends=[6])
+
+ im_scale = fluid.layers.slice(
+ im_info, [1], starts=[2], ends=[3])
+ im_scale = fluid.layers.sequence_expand(im_scale, bbox)
+
+ mask_rois = bbox * im_scale
+ if self.fpn is None:
+ last_feat = body_feats[list(body_feats.keys())[-1]]
+ mask_feat = self.roi_extractor(last_feat, mask_rois)
+ mask_feat = self.bbox_head.get_head_feat(mask_feat)
+ else:
+ mask_feat = self.roi_extractor(
+ body_feats, mask_rois, spatial_scale, is_mask=True)
+
+ mask_out = self.mask_head.get_prediction(mask_feat, bbox)
+ fluid.layers.assign(input=mask_out, output=mask_pred)
+ return mask_pred, bbox_pred
+
+ def generate_inputs(self):
+ inputs = OrderedDict()
+ inputs['image'] = fluid.data(
+ dtype='float32', shape=[None, 3, None, None], name='image')
+ if self.mode == 'train':
+ inputs['im_info'] = fluid.data(
+ dtype='float32', shape=[None, 3], name='im_info')
+ inputs['gt_box'] = fluid.data(
+ dtype='float32', shape=[None, 4], lod_level=1, name='gt_box')
+ inputs['gt_label'] = fluid.data(
+ dtype='int32', shape=[None, 1], lod_level=1, name='gt_label')
+ inputs['is_crowd'] = fluid.data(
+ dtype='int32', shape=[None, 1], lod_level=1, name='is_crowd')
+ inputs['gt_mask'] = fluid.data(
+ dtype='float32', shape=[None, 2], lod_level=3, name='gt_mask')
+ elif self.mode == 'eval':
+ inputs['im_info'] = fluid.data(
+ dtype='float32', shape=[None, 3], name='im_info')
+ inputs['im_id'] = fluid.data(
+ dtype='int64', shape=[None, 1], name='im_id')
+ inputs['im_shape'] = fluid.data(
+ dtype='float32', shape=[None, 3], name='im_shape')
+ elif self.mode == 'test':
+ inputs['im_info'] = fluid.data(
+ dtype='float32', shape=[None, 3], name='im_info')
+ inputs['im_shape'] = fluid.data(
+ dtype='float32', shape=[None, 3], name='im_shape')
+ return inputs
diff --git a/paddlex/cv/nets/detection/roi_extractor.py b/paddlex/cv/nets/detection/roi_extractor.py
new file mode 100644
index 0000000000000000000000000000000000000000..f0509fb297d8c3c6cefda98739ff9a1a4cb196a2
--- /dev/null
+++ b/paddlex/cv/nets/detection/roi_extractor.py
@@ -0,0 +1,111 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import paddle.fluid as fluid
+
+__all__ = ['RoIAlign', 'FPNRoIAlign']
+
+
+class RoIAlign(object):
+ def __init__(self, resolution=7, spatial_scale=1. / 16, sampling_ratio=0):
+ super(RoIAlign, self).__init__()
+ self.resolution = resolution
+ self.spatial_scale = spatial_scale
+ self.sampling_ratio = sampling_ratio
+
+ def __call__(self, head_inputs, rois):
+ roi_feat = fluid.layers.roi_align(
+ head_inputs,
+ rois,
+ pooled_height=self.resolution,
+ pooled_width=self.resolution,
+ spatial_scale=self.spatial_scale,
+ sampling_ratio=self.sampling_ratio)
+ return roi_feat
+
+
+class FPNRoIAlign(object):
+ """
+ RoI align pooling for FPN feature maps
+ Args:
+ sampling_ratio (int): number of sampling points
+ min_level (int): lowest level of FPN layer
+ max_level (int): highest level of FPN layer
+ canconical_level (int): the canconical FPN feature map level
+ canonical_size (int): the canconical FPN feature map size
+ box_resolution (int): box resolution
+ mask_resolution (int): mask roi resolution
+ """
+
+ def __init__(self,
+ sampling_ratio=0,
+ min_level=2,
+ max_level=5,
+ canconical_level=4,
+ canonical_size=224,
+ box_resolution=7,
+ mask_resolution=14):
+ super(FPNRoIAlign, self).__init__()
+ self.sampling_ratio = sampling_ratio
+ self.min_level = min_level
+ self.max_level = max_level
+ self.canconical_level = canconical_level
+ self.canonical_size = canonical_size
+ self.box_resolution = box_resolution
+ self.mask_resolution = mask_resolution
+
+ def __call__(self, head_inputs, rois, spatial_scale, is_mask=False):
+ """
+ Adopt RoI align onto several level of feature maps to get RoI features.
+ Distribute RoIs to different levels by area and get a list of RoI
+ features by distributed RoIs and their corresponding feature maps.
+
+ Returns:
+ roi_feat(Variable): RoI features with shape of [M, C, R, R],
+ where M is the number of RoIs and R is RoI resolution
+
+ """
+ k_min = self.min_level
+ k_max = self.max_level
+ num_roi_lvls = k_max - k_min + 1
+ name_list = list(head_inputs.keys())
+ input_name_list = name_list[-num_roi_lvls:]
+ spatial_scale = spatial_scale[-num_roi_lvls:]
+ rois_dist, restore_index = fluid.layers.distribute_fpn_proposals(
+ rois, k_min, k_max, self.canconical_level, self.canonical_size)
+ # rois_dist is in ascend order
+ roi_out_list = []
+ resolution = is_mask and self.mask_resolution or self.box_resolution
+ for lvl in range(num_roi_lvls):
+ name_index = num_roi_lvls - lvl - 1
+ rois_input = rois_dist[lvl]
+ head_input = head_inputs[input_name_list[name_index]]
+ sc = spatial_scale[name_index]
+ roi_out = fluid.layers.roi_align(
+ input=head_input,
+ rois=rois_input,
+ pooled_height=resolution,
+ pooled_width=resolution,
+ spatial_scale=sc,
+ sampling_ratio=self.sampling_ratio)
+ roi_out_list.append(roi_out)
+ roi_feat_shuffle = fluid.layers.concat(roi_out_list)
+ roi_feat_ = fluid.layers.gather(roi_feat_shuffle, restore_index)
+ roi_feat = fluid.layers.lod_reset(roi_feat_, rois)
+
+ return roi_feat
diff --git a/paddlex/cv/nets/detection/rpn_head.py b/paddlex/cv/nets/detection/rpn_head.py
new file mode 100644
index 0000000000000000000000000000000000000000..3f743416c9928fce0882684a297b4e8bdf4b7f47
--- /dev/null
+++ b/paddlex/cv/nets/detection/rpn_head.py
@@ -0,0 +1,583 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from paddle import fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.initializer import Normal
+from paddle.fluid.regularizer import L2Decay
+
+__all__ = ['RPNHead', 'FPNRPNHead']
+
+
+class RPNHead(object):
+ def __init__(
+ self,
+ #anchor_generator
+ stride=[16.0, 16.0],
+ anchor_sizes=[32, 64, 128, 256, 512],
+ aspect_ratios=[0.5, 1., 2.],
+ variance=[1., 1., 1., 1.],
+ #rpn_target_assign
+ rpn_batch_size_per_im=256,
+ rpn_straddle_thresh=0.,
+ rpn_fg_fraction=0.5,
+ rpn_positive_overlap=0.7,
+ rpn_negative_overlap=0.3,
+ use_random=True,
+ #train_proposal
+ train_pre_nms_top_n=12000,
+ train_post_nms_top_n=2000,
+ train_nms_thresh=.7,
+ train_min_size=.0,
+ train_eta=1.,
+ #test_proposal
+ test_pre_nms_top_n=6000,
+ test_post_nms_top_n=1000,
+ test_nms_thresh=.7,
+ test_min_size=.0,
+ test_eta=1.,
+ #num_classes
+ num_classes=1):
+ super(RPNHead, self).__init__()
+ self.stride = stride
+ self.anchor_sizes = anchor_sizes
+ self.aspect_ratios = aspect_ratios
+ self.variance = variance
+ self.rpn_batch_size_per_im = rpn_batch_size_per_im
+ self.rpn_straddle_thresh = rpn_straddle_thresh
+ self.rpn_fg_fraction = rpn_fg_fraction
+ self.rpn_positive_overlap = rpn_positive_overlap
+ self.rpn_negative_overlap = rpn_negative_overlap
+ self.use_random = use_random
+ self.train_pre_nms_top_n = train_pre_nms_top_n
+ self.train_post_nms_top_n = train_post_nms_top_n
+ self.train_nms_thresh = train_nms_thresh
+ self.train_min_size = train_min_size
+ self.train_eta = train_eta
+ self.test_pre_nms_top_n = test_pre_nms_top_n
+ self.test_post_nms_top_n = test_post_nms_top_n
+ self.test_nms_thresh = test_nms_thresh
+ self.test_min_size = test_min_size
+ self.test_eta = test_eta
+ self.num_classes = num_classes
+
+ def _get_output(self, input):
+ """
+ Get anchor and RPN head output.
+
+ Args:
+ input(Variable): feature map from backbone with shape of [N, C, H, W]
+
+ Returns:
+ rpn_cls_score(Variable): Output of rpn head with shape of
+ [N, num_anchors, H, W].
+ rpn_bbox_pred(Variable): Output of rpn head with shape of
+ [N, num_anchors * 4, H, W].
+ """
+ dim_out = input.shape[1]
+ rpn_conv = fluid.layers.conv2d(
+ input=input,
+ num_filters=dim_out,
+ filter_size=3,
+ stride=1,
+ padding=1,
+ act='relu',
+ name='conv_rpn',
+ param_attr=ParamAttr(
+ name="conv_rpn_w", initializer=Normal(loc=0., scale=0.01)),
+ bias_attr=ParamAttr(
+ name="conv_rpn_b", learning_rate=2., regularizer=L2Decay(0.)))
+ # Generate anchors
+ self.anchor, self.anchor_var = fluid.layers.anchor_generator(
+ input=rpn_conv,
+ stride=self.stride,
+ anchor_sizes=self.anchor_sizes,
+ aspect_ratios=self.aspect_ratios,
+ variance=self.variance)
+ num_anchor = self.anchor.shape[2]
+ # Proposal classification scores
+ self.rpn_cls_score = fluid.layers.conv2d(
+ rpn_conv,
+ num_filters=num_anchor * self.num_classes,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ act=None,
+ name='rpn_cls_score',
+ param_attr=ParamAttr(
+ name="rpn_cls_logits_w",
+ initializer=Normal(loc=0., scale=0.01)),
+ bias_attr=ParamAttr(
+ name="rpn_cls_logits_b",
+ learning_rate=2.,
+ regularizer=L2Decay(0.)))
+ # Proposal bbox regression deltas
+ self.rpn_bbox_pred = fluid.layers.conv2d(
+ rpn_conv,
+ num_filters=4 * num_anchor,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ act=None,
+ name='rpn_bbox_pred',
+ param_attr=ParamAttr(
+ name="rpn_bbox_pred_w", initializer=Normal(loc=0.,
+ scale=0.01)),
+ bias_attr=ParamAttr(
+ name="rpn_bbox_pred_b",
+ learning_rate=2.,
+ regularizer=L2Decay(0.)))
+ return self.rpn_cls_score, self.rpn_bbox_pred
+
+ def get_proposals(self, body_feats, im_info, mode='train'):
+ """
+ Get proposals according to the output of backbone.
+
+ Args:
+ body_feats (dict): The dictionary of feature maps from backbone.
+ im_info(Variable): The information of image with shape [N, 3] with
+ shape (height, width, scale).
+ body_feat_names(list): A list of names of feature maps from
+ backbone.
+
+ Returns:
+ rpn_rois(Variable): Output proposals with shape of (rois_num, 4).
+ """
+
+ # In RPN Heads, only the last feature map of backbone is used.
+ # And body_feat_names[-1] represents the last level name of backbone.
+ body_feat = list(body_feats.values())[-1]
+ rpn_cls_score, rpn_bbox_pred = self._get_output(body_feat)
+
+ if self.num_classes == 1:
+ rpn_cls_prob = fluid.layers.sigmoid(
+ rpn_cls_score, name='rpn_cls_prob')
+ else:
+ rpn_cls_score = fluid.layers.transpose(
+ rpn_cls_score, perm=[0, 2, 3, 1])
+ rpn_cls_score = fluid.layers.reshape(
+ rpn_cls_score, shape=(0, 0, 0, -1, self.num_classes))
+ rpn_cls_prob_tmp = fluid.layers.softmax(
+ rpn_cls_score, use_cudnn=False, name='rpn_cls_prob')
+ rpn_cls_prob_slice = fluid.layers.slice(
+ rpn_cls_prob_tmp,
+ axes=[4],
+ starts=[1],
+ ends=[self.num_classes])
+ rpn_cls_prob, _ = fluid.layers.topk(rpn_cls_prob_slice, 1)
+ rpn_cls_prob = fluid.layers.reshape(
+ rpn_cls_prob, shape=(0, 0, 0, -1))
+ rpn_cls_prob = fluid.layers.transpose(
+ rpn_cls_prob, perm=[0, 3, 1, 2])
+ if mode == 'train':
+ rpn_rois, rpn_roi_probs = fluid.layers.generate_proposals(
+ scores=rpn_cls_prob,
+ bbox_deltas=rpn_bbox_pred,
+ im_info=im_info,
+ anchors=self.anchor,
+ variances=self.anchor_var,
+ pre_nms_top_n=self.train_pre_nms_top_n,
+ post_nms_top_n=self.train_post_nms_top_n,
+ nms_thresh=self.train_nms_thresh,
+ min_size=self.train_min_size,
+ eta=self.train_eta)
+ else:
+ rpn_rois, rpn_roi_probs = fluid.layers.generate_proposals(
+ scores=rpn_cls_prob,
+ bbox_deltas=rpn_bbox_pred,
+ im_info=im_info,
+ anchors=self.anchor,
+ variances=self.anchor_var,
+ pre_nms_top_n=self.test_pre_nms_top_n,
+ post_nms_top_n=self.test_post_nms_top_n,
+ nms_thresh=self.test_nms_thresh,
+ min_size=self.test_min_size,
+ eta=self.test_eta)
+ return rpn_rois
+
+ def _transform_input(self, rpn_cls_score, rpn_bbox_pred, anchor,
+ anchor_var):
+ rpn_cls_score = fluid.layers.transpose(
+ rpn_cls_score, perm=[0, 2, 3, 1])
+ rpn_bbox_pred = fluid.layers.transpose(
+ rpn_bbox_pred, perm=[0, 2, 3, 1])
+ anchor = fluid.layers.reshape(anchor, shape=(-1, 4))
+ anchor_var = fluid.layers.reshape(anchor_var, shape=(-1, 4))
+ rpn_cls_score = fluid.layers.reshape(
+ x=rpn_cls_score, shape=(0, -1, self.num_classes))
+ rpn_bbox_pred = fluid.layers.reshape(x=rpn_bbox_pred, shape=(0, -1, 4))
+ return rpn_cls_score, rpn_bbox_pred, anchor, anchor_var
+
+ def _get_loss_input(self):
+ for attr in ['rpn_cls_score', 'rpn_bbox_pred', 'anchor', 'anchor_var']:
+ if not getattr(self, attr, None):
+ raise ValueError("self.{} should not be None,".format(attr),
+ "call RPNHead.get_proposals first")
+ return self._transform_input(self.rpn_cls_score, self.rpn_bbox_pred,
+ self.anchor, self.anchor_var)
+
+ def get_loss(self, im_info, gt_box, is_crowd, gt_label=None):
+ """
+ Sample proposals and Calculate rpn loss.
+
+ Args:
+ im_info(Variable): The information of image with shape [N, 3] with
+ shape (height, width, scale).
+ gt_box(Variable): The ground-truth bounding boxes with shape [M, 4].
+ M is the number of groundtruth.
+ is_crowd(Variable): Indicates groud-truth is crowd or not with
+ shape [M, 1]. M is the number of groundtruth.
+
+ Returns:
+ Type: dict
+ rpn_cls_loss(Variable): RPN classification loss.
+ rpn_bbox_loss(Variable): RPN bounding box regression loss.
+
+ """
+ rpn_cls, rpn_bbox, anchor, anchor_var = self._get_loss_input()
+ if self.num_classes == 1:
+ score_pred, loc_pred, score_tgt, loc_tgt, bbox_weight = \
+ fluid.layers.rpn_target_assign(
+ bbox_pred=rpn_bbox,
+ cls_logits=rpn_cls,
+ anchor_box=anchor,
+ anchor_var=anchor_var,
+ gt_boxes=gt_box,
+ is_crowd=is_crowd,
+ im_info=im_info,
+ rpn_batch_size_per_im=self.rpn_batch_size_per_im,
+ rpn_straddle_thresh=self.rpn_straddle_thresh,
+ rpn_fg_fraction=self.rpn_fg_fraction,
+ rpn_positive_overlap=self.rpn_positive_overlap,
+ rpn_negative_overlap=self.rpn_negative_overlap,
+ use_random=self.use_random)
+ score_tgt = fluid.layers.cast(x=score_tgt, dtype='float32')
+ score_tgt.stop_gradient = True
+ rpn_cls_loss = fluid.layers.sigmoid_cross_entropy_with_logits(
+ x=score_pred, label=score_tgt)
+ else:
+ score_pred, loc_pred, score_tgt, loc_tgt, bbox_weight = \
+ fluid.layers.rpn_target_assign(
+ bbox_pred=rpn_bbox,
+ cls_logits=rpn_cls,
+ anchor_box=anchor,
+ anchor_var=anchor_var,
+ gt_boxes=gt_box,
+ gt_labels=gt_label,
+ is_crowd=is_crowd,
+ num_classes=self.num_classes,
+ im_info=im_info,
+ rpn_batch_size_per_im=self.rpn_batch_size_per_im,
+ rpn_straddle_thresh=self.rpn_straddle_thresh,
+ rpn_fg_fraction=self.rpn_fg_fraction,
+ rpn_positive_overlap=self.rpn_positive_overlap,
+ rpn_negative_overlap=self.rpn_negative_overlap,
+ use_random=self.use_random)
+ labels_int64 = fluid.layers.cast(x=score_tgt, dtype='int64')
+ labels_int64.stop_gradient = True
+ rpn_cls_loss = fluid.layers.softmax_with_cross_entropy(
+ logits=score_pred,
+ label=labels_int64,
+ numeric_stable_mode=True)
+
+ rpn_cls_loss = fluid.layers.reduce_mean(
+ rpn_cls_loss, name='loss_rpn_cls')
+
+ loc_tgt = fluid.layers.cast(x=loc_tgt, dtype='float32')
+ loc_tgt.stop_gradient = True
+ rpn_reg_loss = fluid.layers.smooth_l1(
+ x=loc_pred,
+ y=loc_tgt,
+ sigma=3.0,
+ inside_weight=bbox_weight,
+ outside_weight=bbox_weight)
+ rpn_reg_loss = fluid.layers.reduce_sum(
+ rpn_reg_loss, name='loss_rpn_bbox')
+ score_shape = fluid.layers.shape(score_tgt)
+ score_shape = fluid.layers.cast(x=score_shape, dtype='float32')
+ norm = fluid.layers.reduce_prod(score_shape)
+ norm.stop_gradient = True
+ rpn_reg_loss = rpn_reg_loss / norm
+
+ return {'loss_rpn_cls': rpn_cls_loss, 'loss_rpn_bbox': rpn_reg_loss}
+
+
+class FPNRPNHead(RPNHead):
+ def __init__(
+ self,
+ anchor_start_size=32,
+ aspect_ratios=[0.5, 1., 2.],
+ variance=[1., 1., 1., 1.],
+ num_chan=256,
+ min_level=2,
+ max_level=6,
+ #rpn_target_assign
+ rpn_batch_size_per_im=256,
+ rpn_straddle_thresh=0.,
+ rpn_fg_fraction=0.5,
+ rpn_positive_overlap=0.7,
+ rpn_negative_overlap=0.3,
+ use_random=True,
+ #train_proposal
+ train_pre_nms_top_n=2000,
+ train_post_nms_top_n=2000,
+ train_nms_thresh=.7,
+ train_min_size=.0,
+ train_eta=1.,
+ #test_proposal
+ test_pre_nms_top_n=1000,
+ test_post_nms_top_n=1000,
+ test_nms_thresh=.7,
+ test_min_size=.0,
+ test_eta=1.,
+ #num_classes
+ num_classes=1):
+ super(FPNRPNHead, self).__init__(
+ aspect_ratios=aspect_ratios,
+ variance=variance,
+ rpn_batch_size_per_im=rpn_batch_size_per_im,
+ rpn_straddle_thresh=rpn_straddle_thresh,
+ rpn_fg_fraction=rpn_fg_fraction,
+ rpn_positive_overlap=rpn_positive_overlap,
+ rpn_negative_overlap=rpn_negative_overlap,
+ use_random=use_random,
+ train_pre_nms_top_n=train_pre_nms_top_n,
+ train_post_nms_top_n=train_post_nms_top_n,
+ train_nms_thresh=train_nms_thresh,
+ train_min_size=train_min_size,
+ train_eta=train_eta,
+ test_pre_nms_top_n=test_pre_nms_top_n,
+ test_post_nms_top_n=test_post_nms_top_n,
+ test_nms_thresh=test_nms_thresh,
+ test_min_size=test_min_size,
+ test_eta=test_eta,
+ num_classes=num_classes)
+ self.anchor_start_size = anchor_start_size
+ self.num_chan = num_chan
+ self.min_level = min_level
+ self.max_level = max_level
+ self.num_classes = num_classes
+
+ self.fpn_rpn_list = []
+ self.anchors_list = []
+ self.anchor_var_list = []
+
+ def _get_output(self, input, feat_lvl):
+ """
+ Get anchor and FPN RPN head output at one level.
+
+ Args:
+ input(Variable): Body feature from backbone.
+ feat_lvl(int): Indicate the level of rpn output corresponding
+ to the level of feature map.
+
+ Return:
+ rpn_cls_score(Variable): Output of one level of fpn rpn head with
+ shape of [N, num_anchors, H, W].
+ rpn_bbox_pred(Variable): Output of one level of fpn rpn head with
+ shape of [N, num_anchors * 4, H, W].
+ """
+ slvl = str(feat_lvl)
+ conv_name = 'conv_rpn_fpn' + slvl
+ cls_name = 'rpn_cls_logits_fpn' + slvl
+ bbox_name = 'rpn_bbox_pred_fpn' + slvl
+ conv_share_name = 'conv_rpn_fpn' + str(self.min_level)
+ cls_share_name = 'rpn_cls_logits_fpn' + str(self.min_level)
+ bbox_share_name = 'rpn_bbox_pred_fpn' + str(self.min_level)
+
+ num_anchors = len(self.aspect_ratios)
+ conv_rpn_fpn = fluid.layers.conv2d(
+ input=input,
+ num_filters=self.num_chan,
+ filter_size=3,
+ padding=1,
+ act='relu',
+ name=conv_name,
+ param_attr=ParamAttr(
+ name=conv_share_name + '_w',
+ initializer=Normal(loc=0., scale=0.01)),
+ bias_attr=ParamAttr(
+ name=conv_share_name + '_b',
+ learning_rate=2.,
+ regularizer=L2Decay(0.)))
+
+ self.anchors, self.anchor_var = fluid.layers.anchor_generator(
+ input=conv_rpn_fpn,
+ anchor_sizes=(self.anchor_start_size * 2.**
+ (feat_lvl - self.min_level), ),
+ stride=(2.**feat_lvl, 2.**feat_lvl),
+ aspect_ratios=self.aspect_ratios,
+ variance=self.variance)
+
+ cls_num_filters = num_anchors * self.num_classes
+ self.rpn_cls_score = fluid.layers.conv2d(
+ input=conv_rpn_fpn,
+ num_filters=cls_num_filters,
+ filter_size=1,
+ act=None,
+ name=cls_name,
+ param_attr=ParamAttr(
+ name=cls_share_name + '_w',
+ initializer=Normal(loc=0., scale=0.01)),
+ bias_attr=ParamAttr(
+ name=cls_share_name + '_b',
+ learning_rate=2.,
+ regularizer=L2Decay(0.)))
+ self.rpn_bbox_pred = fluid.layers.conv2d(
+ input=conv_rpn_fpn,
+ num_filters=num_anchors * 4,
+ filter_size=1,
+ act=None,
+ name=bbox_name,
+ param_attr=ParamAttr(
+ name=bbox_share_name + '_w',
+ initializer=Normal(loc=0., scale=0.01)),
+ bias_attr=ParamAttr(
+ name=bbox_share_name + '_b',
+ learning_rate=2.,
+ regularizer=L2Decay(0.)))
+ return self.rpn_cls_score, self.rpn_bbox_pred
+
+ def _get_single_proposals(self, body_feat, im_info, feat_lvl,
+ mode='train'):
+ """
+ Get proposals in one level according to the output of fpn rpn head
+
+ Args:
+ body_feat(Variable): the feature map from backone.
+ im_info(Variable): The information of image with shape [N, 3] with
+ format (height, width, scale).
+ feat_lvl(int): Indicate the level of proposals corresponding to
+ the feature maps.
+
+ Returns:
+ rpn_rois_fpn(Variable): Output proposals with shape of (rois_num, 4).
+ rpn_roi_probs_fpn(Variable): Scores of proposals with
+ shape of (rois_num, 1).
+ """
+
+ rpn_cls_score_fpn, rpn_bbox_pred_fpn = self._get_output(
+ body_feat, feat_lvl)
+
+ if self.num_classes == 1:
+ rpn_cls_prob_fpn = fluid.layers.sigmoid(
+ rpn_cls_score_fpn, name='rpn_cls_prob_fpn' + str(feat_lvl))
+ else:
+ rpn_cls_score_fpn = fluid.layers.transpose(
+ rpn_cls_score_fpn, perm=[0, 2, 3, 1])
+ rpn_cls_score_fpn = fluid.layers.reshape(
+ rpn_cls_score_fpn, shape=(0, 0, 0, -1, self.num_classes))
+ rpn_cls_prob_fpn = fluid.layers.softmax(
+ rpn_cls_score_fpn,
+ use_cudnn=False,
+ name='rpn_cls_prob_fpn' + str(feat_lvl))
+ rpn_cls_prob_fpn = fluid.layers.slice(
+ rpn_cls_prob_fpn,
+ axes=[4],
+ starts=[1],
+ ends=[self.num_classes])
+ rpn_cls_prob_fpn, _ = fluid.layers.topk(rpn_cls_prob_fpn, 1)
+ rpn_cls_prob_fpn = fluid.layers.reshape(
+ rpn_cls_prob_fpn, shape=(0, 0, 0, -1))
+ rpn_cls_prob_fpn = fluid.layers.transpose(
+ rpn_cls_prob_fpn, perm=[0, 3, 1, 2])
+
+ if mode == 'train':
+ rpn_rois_fpn, rpn_roi_prob_fpn = fluid.layers.generate_proposals(
+ scores=rpn_cls_prob_fpn,
+ bbox_deltas=rpn_bbox_pred_fpn,
+ im_info=im_info,
+ anchors=self.anchors,
+ variances=self.anchor_var,
+ pre_nms_top_n=self.train_pre_nms_top_n,
+ post_nms_top_n=self.train_post_nms_top_n,
+ nms_thresh=self.train_nms_thresh,
+ min_size=self.train_min_size,
+ eta=self.train_eta)
+ else:
+ rpn_rois_fpn, rpn_roi_prob_fpn = fluid.layers.generate_proposals(
+ scores=rpn_cls_prob_fpn,
+ bbox_deltas=rpn_bbox_pred_fpn,
+ im_info=im_info,
+ anchors=self.anchors,
+ variances=self.anchor_var,
+ pre_nms_top_n=self.test_pre_nms_top_n,
+ post_nms_top_n=self.test_post_nms_top_n,
+ nms_thresh=self.test_nms_thresh,
+ min_size=self.test_min_size,
+ eta=self.test_eta)
+
+ return rpn_rois_fpn, rpn_roi_prob_fpn
+
+ def get_proposals(self, fpn_feats, im_info, mode='train'):
+ """
+ Get proposals in multiple levels according to the output of fpn
+ rpn head
+
+ Args:
+ fpn_feats(dict): A dictionary represents the output feature map
+ of FPN with their name.
+ im_info(Variable): The information of image with shape [N, 3] with
+ format (height, width, scale).
+
+ Return:
+ rois_list(Variable): Output proposals in shape of [rois_num, 4]
+ """
+ rois_list = []
+ roi_probs_list = []
+ fpn_feat_names = list(fpn_feats.keys())
+ for lvl in range(self.min_level, self.max_level + 1):
+ fpn_feat_name = fpn_feat_names[self.max_level - lvl]
+ fpn_feat = fpn_feats[fpn_feat_name]
+ rois_fpn, roi_probs_fpn = self._get_single_proposals(
+ fpn_feat, im_info, lvl, mode)
+ self.fpn_rpn_list.append((self.rpn_cls_score, self.rpn_bbox_pred))
+ rois_list.append(rois_fpn)
+ roi_probs_list.append(roi_probs_fpn)
+ self.anchors_list.append(self.anchors)
+ self.anchor_var_list.append(self.anchor_var)
+ post_nms_top_n = self.train_post_nms_top_n if mode == 'train' else \
+ self.test_post_nms_top_n
+ rois_collect = fluid.layers.collect_fpn_proposals(
+ rois_list,
+ roi_probs_list,
+ self.min_level,
+ self.max_level,
+ post_nms_top_n,
+ name='collect')
+ return rois_collect
+
+ def _get_loss_input(self):
+ rpn_clses = []
+ rpn_bboxes = []
+ anchors = []
+ anchor_vars = []
+ for i in range(len(self.fpn_rpn_list)):
+ single_input = self._transform_input(
+ self.fpn_rpn_list[i][0], self.fpn_rpn_list[i][1],
+ self.anchors_list[i], self.anchor_var_list[i])
+ rpn_clses.append(single_input[0])
+ rpn_bboxes.append(single_input[1])
+ anchors.append(single_input[2])
+ anchor_vars.append(single_input[3])
+
+ rpn_cls = fluid.layers.concat(rpn_clses, axis=1)
+ rpn_bbox = fluid.layers.concat(rpn_bboxes, axis=1)
+ anchors = fluid.layers.concat(anchors)
+ anchor_var = fluid.layers.concat(anchor_vars)
+ return rpn_cls, rpn_bbox, anchors, anchor_var
diff --git a/paddlex/cv/nets/detection/yolo_v3.py b/paddlex/cv/nets/detection/yolo_v3.py
new file mode 100644
index 0000000000000000000000000000000000000000..2b2d784ed2fc64eee90d72574e770e6e4fcf83d8
--- /dev/null
+++ b/paddlex/cv/nets/detection/yolo_v3.py
@@ -0,0 +1,318 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from paddle import fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.regularizer import L2Decay
+from collections import OrderedDict
+
+
+class YOLOv3:
+ def __init__(self,
+ backbone,
+ num_classes,
+ mode='train',
+ anchors=None,
+ anchor_masks=None,
+ ignore_threshold=0.7,
+ label_smooth=False,
+ nms_score_threshold=0.01,
+ nms_topk=1000,
+ nms_keep_topk=100,
+ nms_iou_threshold=0.45,
+ train_random_shapes=[
+ 320, 352, 384, 416, 448, 480, 512, 544, 576, 608
+ ]):
+ if anchors is None:
+ anchors = [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
+ [59, 119], [116, 90], [156, 198], [373, 326]]
+ if anchor_masks is None:
+ anchor_masks = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+ self.anchors = anchors
+ self.anchor_masks = anchor_masks
+ self._parse_anchors(anchors)
+ self.mode = mode
+ self.num_classes = num_classes
+ self.backbone = backbone
+ self.ignore_thresh = ignore_threshold
+ self.label_smooth = label_smooth
+ self.nms_score_threshold = nms_score_threshold
+ self.nms_topk = nms_topk
+ self.nms_keep_topk = nms_keep_topk
+ self.nms_iou_threshold = nms_iou_threshold
+ self.norm_decay = 0.0
+ self.prefix_name = ''
+ self.train_random_shapes = train_random_shapes
+
+ def _head(self, feats):
+ outputs = []
+ out_layer_num = len(self.anchor_masks)
+ blocks = feats[-1:-out_layer_num - 1:-1]
+ route = None
+
+ for i, block in enumerate(blocks):
+ if i > 0:
+ block = fluid.layers.concat(input=[route, block], axis=1)
+ route, tip = self._detection_block(
+ block,
+ channel=512 // (2**i),
+ name=self.prefix_name + 'yolo_block.{}'.format(i))
+
+ num_filters = len(self.anchor_masks[i]) * (self.num_classes + 5)
+ block_out = fluid.layers.conv2d(
+ input=tip,
+ num_filters=num_filters,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ act=None,
+ param_attr=ParamAttr(name=self.prefix_name +
+ 'yolo_output.{}.conv.weights'.format(i)),
+ bias_attr=ParamAttr(
+ regularizer=L2Decay(0.0),
+ name=self.prefix_name +
+ 'yolo_output.{}.conv.bias'.format(i)))
+ outputs.append(block_out)
+
+ if i < len(blocks) - 1:
+ route = self._conv_bn(
+ input=route,
+ ch_out=256 // (2**i),
+ filter_size=1,
+ stride=1,
+ padding=0,
+ name=self.prefix_name + 'yolo_transition.{}'.format(i))
+ route = self._upsample(route)
+ return outputs
+
+ def _parse_anchors(self, anchors):
+ self.anchors = []
+ self.mask_anchors = []
+
+ assert len(anchors) > 0, "ANCHORS not set."
+ assert len(self.anchor_masks) > 0, "ANCHOR_MASKS not set."
+
+ for anchor in anchors:
+ assert len(anchor) == 2, "anchor {} len should be 2".format(anchor)
+ self.anchors.extend(anchor)
+
+ anchor_num = len(anchors)
+ for masks in self.anchor_masks:
+ self.mask_anchors.append([])
+ for mask in masks:
+ assert mask < anchor_num, "anchor mask index overflow"
+ self.mask_anchors[-1].extend(anchors[mask])
+
+ def _conv_bn(self,
+ input,
+ ch_out,
+ filter_size,
+ stride,
+ padding,
+ act='leaky',
+ is_test=False,
+ name=None):
+ conv = fluid.layers.conv2d(
+ input=input,
+ num_filters=ch_out,
+ filter_size=filter_size,
+ stride=stride,
+ padding=padding,
+ act=None,
+ param_attr=ParamAttr(name=name + '.conv.weights'),
+ bias_attr=False)
+ bn_name = name + '.bn'
+ bn_param_attr = ParamAttr(
+ regularizer=L2Decay(self.norm_decay), name=bn_name + '.scale')
+ bn_bias_attr = ParamAttr(
+ regularizer=L2Decay(self.norm_decay), name=bn_name + '.offset')
+ out = fluid.layers.batch_norm(
+ input=conv,
+ act=None,
+ is_test=is_test,
+ param_attr=bn_param_attr,
+ bias_attr=bn_bias_attr,
+ moving_mean_name=bn_name + '.mean',
+ moving_variance_name=bn_name + '.var')
+ if act == 'leaky':
+ out = fluid.layers.leaky_relu(x=out, alpha=0.1)
+ return out
+
+ def _upsample(self, input, scale=2, name=None):
+ out = fluid.layers.resize_nearest(
+ input=input, scale=float(scale), name=name)
+ return out
+
+ def _detection_block(self, input, channel, name=None):
+ assert channel % 2 == 0, "channel({}) cannot be divided by 2 in detection block({})".format(
+ channel, name)
+
+ is_test = False if self.mode == 'train' else True
+ conv = input
+ for i in range(2):
+ conv = self._conv_bn(
+ conv,
+ channel,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ is_test=is_test,
+ name='{}.{}.0'.format(name, i))
+ conv = self._conv_bn(
+ conv,
+ channel * 2,
+ filter_size=3,
+ stride=1,
+ padding=1,
+ is_test=is_test,
+ name='{}.{}.1'.format(name, i))
+ route = self._conv_bn(
+ conv,
+ channel,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ is_test=is_test,
+ name='{}.2'.format(name))
+ tip = self._conv_bn(
+ route,
+ channel * 2,
+ filter_size=3,
+ stride=1,
+ padding=1,
+ is_test=is_test,
+ name='{}.tip'.format(name))
+ return route, tip
+
+ def _get_loss(self, inputs, gt_box, gt_label, gt_score):
+ losses = []
+ downsample = 32
+ for i, input in enumerate(inputs):
+ loss = fluid.layers.yolov3_loss(
+ x=input,
+ gt_box=gt_box,
+ gt_label=gt_label,
+ gt_score=gt_score,
+ anchors=self.anchors,
+ anchor_mask=self.anchor_masks[i],
+ class_num=self.num_classes,
+ ignore_thresh=self.ignore_thresh,
+ downsample_ratio=downsample,
+ use_label_smooth=self.label_smooth,
+ name=self.prefix_name + 'yolo_loss' + str(i))
+ losses.append(fluid.layers.reduce_mean(loss))
+ downsample //= 2
+ return sum(losses)
+
+ def _get_prediction(self, inputs, im_size):
+ boxes = []
+ scores = []
+ downsample = 32
+ for i, input in enumerate(inputs):
+ box, score = fluid.layers.yolo_box(
+ x=input,
+ img_size=im_size,
+ anchors=self.mask_anchors[i],
+ class_num=self.num_classes,
+ conf_thresh=self.nms_score_threshold,
+ downsample_ratio=downsample,
+ name=self.prefix_name + 'yolo_box' + str(i))
+ boxes.append(box)
+ scores.append(fluid.layers.transpose(score, perm=[0, 2, 1]))
+ downsample //= 2
+ yolo_boxes = fluid.layers.concat(boxes, axis=1)
+ yolo_scores = fluid.layers.concat(scores, axis=2)
+ pred = fluid.layers.multiclass_nms(
+ bboxes=yolo_boxes,
+ scores=yolo_scores,
+ score_threshold=self.nms_score_threshold,
+ nms_top_k=self.nms_topk,
+ keep_top_k=self.nms_keep_topk,
+ nms_threshold=self.nms_iou_threshold,
+ normalized=False,
+ nms_eta=1.0,
+ background_label=-1)
+ return pred
+
+ def generate_inputs(self):
+ inputs = OrderedDict()
+ inputs['image'] = fluid.data(
+ dtype='float32', shape=[None, 3, None, None], name='image')
+ if self.mode == 'train':
+ inputs['gt_box'] = fluid.data(
+ dtype='float32', shape=[None, None, 4], name='gt_box')
+ inputs['gt_label'] = fluid.data(
+ dtype='int32', shape=[None, None], name='gt_label')
+ inputs['gt_score'] = fluid.data(
+ dtype='float32', shape=[None, None], name='gt_score')
+ inputs['im_size'] = fluid.data(
+ dtype='int32', shape=[None, 2], name='im_size')
+ elif self.mode == 'eval':
+ inputs['im_size'] = fluid.data(
+ dtype='int32', shape=[None, 2], name='im_size')
+ inputs['im_id'] = fluid.data(
+ dtype='int32', shape=[None, 1], name='im_id')
+ inputs['gt_box'] = fluid.data(
+ dtype='float32', shape=[None, None, 4], name='gt_box')
+ inputs['gt_label'] = fluid.data(
+ dtype='int32', shape=[None, None], name='gt_label')
+ inputs['is_difficult'] = fluid.data(
+ dtype='int32', shape=[None, None], name='is_difficult')
+ elif self.mode == 'test':
+ inputs['im_size'] = fluid.data(
+ dtype='int32', shape=[None, 2], name='im_size')
+ return inputs
+
+ def build_net(self, inputs):
+ image = inputs['image']
+ if self.mode == 'train':
+ if isinstance(self.train_random_shapes,
+ (list, tuple)) and len(self.train_random_shapes) > 0:
+ import numpy as np
+ shapes = np.array(self.train_random_shapes)
+ shapes = np.stack([shapes, shapes], axis=1).astype('float32')
+ shapes_tensor = fluid.layers.assign(shapes)
+ index = fluid.layers.uniform_random(
+ shape=[1], dtype='float32', min=0.0, max=1)
+ index = fluid.layers.cast(
+ index * len(self.train_random_shapes), dtype='int32')
+ shape = fluid.layers.gather(shapes_tensor, index)
+ shape = fluid.layers.reshape(shape, [-1])
+ shape = fluid.layers.cast(shape, dtype='int32')
+ image = fluid.layers.resize_nearest(
+ image, out_shape=shape, align_corners=False)
+ feats = self.backbone(image)
+ if isinstance(feats, OrderedDict):
+ feat_names = list(feats.keys())
+ feats = [feats[name] for name in feat_names]
+
+ head_outputs = self._head(feats)
+ if self.mode == 'train':
+ gt_box = inputs['gt_box']
+ gt_label = inputs['gt_label']
+ gt_score = inputs['gt_score']
+ im_size = inputs['im_size']
+ num_boxes = fluid.layers.shape(gt_box)[1]
+ im_size_wh = fluid.layers.reverse(im_size, axis=1)
+ whwh = fluid.layers.concat([im_size_wh, im_size_wh], axis=1)
+ whwh = fluid.layers.unsqueeze(whwh, axes=[1])
+ whwh = fluid.layers.expand(whwh, expand_times=[1, num_boxes, 1])
+ whwh = fluid.layers.cast(whwh, dtype='float32')
+ whwh.stop_gradient = True
+ normalized_box = fluid.layers.elementwise_div(gt_box, whwh)
+ return self._get_loss(head_outputs, normalized_box, gt_label,
+ gt_score)
+ else:
+ im_size = inputs['im_size']
+ return self._get_prediction(head_outputs, im_size)
diff --git a/paddlex/cv/nets/mobilenet_v1.py b/paddlex/cv/nets/mobilenet_v1.py
new file mode 100755
index 0000000000000000000000000000000000000000..682623b97adb311769e2ca061e02a40c1c7bae48
--- /dev/null
+++ b/paddlex/cv/nets/mobilenet_v1.py
@@ -0,0 +1,217 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from paddle import fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.regularizer import L2Decay
+
+
+class MobileNetV1(object):
+ """
+ MobileNet v1, see https://arxiv.org/abs/1704.04861
+
+ Args:
+ norm_type (str): normalization type, 'bn' and 'sync_bn' are supported
+ norm_decay (float): weight decay for normalization layer weights
+ conv_group_scale (int): scaling factor for convolution groups
+ with_extra_blocks (bool): if extra blocks should be added
+ extra_block_filters (list): number of filter for each extra block
+ """
+
+ def __init__(self,
+ norm_type='bn',
+ norm_decay=0.,
+ conv_group_scale=1,
+ conv_learning_rate=1.0,
+ with_extra_blocks=False,
+ extra_block_filters=[[256, 512], [128, 256], [128, 256],
+ [64, 128]],
+ weight_prefix_name='',
+ num_classes=None):
+ self.norm_type = norm_type
+ self.norm_decay = norm_decay
+ self.conv_group_scale = conv_group_scale
+ self.conv_learning_rate = conv_learning_rate
+ self.with_extra_blocks = with_extra_blocks
+ self.extra_block_filters = extra_block_filters
+ self.prefix_name = weight_prefix_name
+ self.num_classes = num_classes
+
+ def _conv_norm(self,
+ input,
+ filter_size,
+ num_filters,
+ stride,
+ padding,
+ num_groups=1,
+ act='relu',
+ use_cudnn=True,
+ name=None):
+ parameter_attr = ParamAttr(
+ learning_rate=self.conv_learning_rate,
+ initializer=fluid.initializer.MSRA(),
+ name=name + "_weights")
+ conv = fluid.layers.conv2d(
+ input=input,
+ num_filters=num_filters,
+ filter_size=filter_size,
+ stride=stride,
+ padding=padding,
+ groups=num_groups,
+ act=None,
+ use_cudnn=use_cudnn,
+ param_attr=parameter_attr,
+ bias_attr=False)
+
+ bn_name = name + "_bn"
+ norm_decay = self.norm_decay
+ bn_param_attr = ParamAttr(
+ regularizer=L2Decay(norm_decay), name=bn_name + '_scale')
+ bn_bias_attr = ParamAttr(
+ regularizer=L2Decay(norm_decay), name=bn_name + '_offset')
+ return fluid.layers.batch_norm(
+ input=conv,
+ act=act,
+ param_attr=bn_param_attr,
+ bias_attr=bn_bias_attr,
+ moving_mean_name=bn_name + '_mean',
+ moving_variance_name=bn_name + '_variance')
+
+ def depthwise_separable(self,
+ input,
+ num_filters1,
+ num_filters2,
+ num_groups,
+ stride,
+ scale,
+ name=None):
+ depthwise_conv = self._conv_norm(
+ input=input,
+ filter_size=3,
+ num_filters=int(num_filters1 * scale),
+ stride=stride,
+ padding=1,
+ num_groups=int(num_groups * scale),
+ use_cudnn=False,
+ name=name + "_dw")
+
+ pointwise_conv = self._conv_norm(
+ input=depthwise_conv,
+ filter_size=1,
+ num_filters=int(num_filters2 * scale),
+ stride=1,
+ padding=0,
+ name=name + "_sep")
+ return pointwise_conv
+
+ def _extra_block(self,
+ input,
+ num_filters1,
+ num_filters2,
+ num_groups,
+ stride,
+ name=None):
+ pointwise_conv = self._conv_norm(
+ input=input,
+ filter_size=1,
+ num_filters=int(num_filters1),
+ stride=1,
+ num_groups=int(num_groups),
+ padding=0,
+ name=name + "_extra1")
+ normal_conv = self._conv_norm(
+ input=pointwise_conv,
+ filter_size=3,
+ num_filters=int(num_filters2),
+ stride=2,
+ num_groups=int(num_groups),
+ padding=1,
+ name=name + "_extra2")
+ return normal_conv
+
+ def __call__(self, input):
+ scale = self.conv_group_scale
+
+ blocks = []
+ # input 1/1
+ out = self._conv_norm(
+ input, 3, int(32 * scale), 2, 1, name=self.prefix_name + "conv1")
+ # 1/2
+ out = self.depthwise_separable(
+ out, 32, 64, 32, 1, scale, name=self.prefix_name + "conv2_1")
+ out = self.depthwise_separable(
+ out, 64, 128, 64, 2, scale, name=self.prefix_name + "conv2_2")
+ # 1/4
+ out = self.depthwise_separable(
+ out, 128, 128, 128, 1, scale, name=self.prefix_name + "conv3_1")
+ out = self.depthwise_separable(
+ out, 128, 256, 128, 2, scale, name=self.prefix_name + "conv3_2")
+ # 1/8
+ blocks.append(out)
+ out = self.depthwise_separable(
+ out, 256, 256, 256, 1, scale, name=self.prefix_name + "conv4_1")
+ out = self.depthwise_separable(
+ out, 256, 512, 256, 2, scale, name=self.prefix_name + "conv4_2")
+ # 1/16
+ blocks.append(out)
+ for i in range(5):
+ out = self.depthwise_separable(
+ out,
+ 512,
+ 512,
+ 512,
+ 1,
+ scale,
+ name=self.prefix_name + "conv5_" + str(i + 1))
+ module11 = out
+
+ out = self.depthwise_separable(
+ out, 512, 1024, 512, 2, scale, name=self.prefix_name + "conv5_6")
+ # 1/32
+ out = self.depthwise_separable(
+ out, 1024, 1024, 1024, 1, scale, name=self.prefix_name + "conv6")
+ module13 = out
+ blocks.append(out)
+ if self.num_classes:
+ out = fluid.layers.pool2d(
+ input=out, pool_type='avg', global_pooling=True)
+ output = fluid.layers.fc(
+ input=out,
+ size=self.num_classes,
+ param_attr=ParamAttr(
+ initializer=fluid.initializer.MSRA(), name="fc7_weights"),
+ bias_attr=ParamAttr(name="fc7_offset"))
+ return output
+
+ if not self.with_extra_blocks:
+ return blocks
+
+ num_filters = self.extra_block_filters
+ module14 = self._extra_block(module13, num_filters[0][0],
+ num_filters[0][1], 1, 2,
+ self.prefix_name + "conv7_1")
+ module15 = self._extra_block(module14, num_filters[1][0],
+ num_filters[1][1], 1, 2,
+ self.prefix_name + "conv7_2")
+ module16 = self._extra_block(module15, num_filters[2][0],
+ num_filters[2][1], 1, 2,
+ self.prefix_name + "conv7_3")
+ module17 = self._extra_block(module16, num_filters[3][0],
+ num_filters[3][1], 1, 2,
+ self.prefix_name + "conv7_4")
+ return module11, module13, module14, module15, module16, module17
diff --git a/paddlex/cv/nets/mobilenet_v2.py b/paddlex/cv/nets/mobilenet_v2.py
new file mode 100644
index 0000000000000000000000000000000000000000..845d5a3f6b997e2c323e577275670bdfc5193530
--- /dev/null
+++ b/paddlex/cv/nets/mobilenet_v2.py
@@ -0,0 +1,242 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+
+
+class MobileNetV2:
+ def __init__(self,
+ num_classes=None,
+ scale=1.0,
+ output_stride=None,
+ end_points=None,
+ decode_points=None):
+ self.scale = scale
+ self.num_classes = num_classes
+ self.output_stride = output_stride
+ self.end_points = end_points
+ self.decode_points = decode_points
+ self.bottleneck_params_list = [(1, 16, 1, 1), (6, 24, 2, 2),
+ (6, 32, 3, 2), (6, 64, 4, 2),
+ (6, 96, 3, 1), (6, 160, 3, 2),
+ (6, 320, 1, 1)]
+ self.modify_bottle_params(output_stride)
+
+ def __call__(self, input):
+ scale = self.scale
+ decode_ends = dict()
+
+ def check_points(count, points):
+ if points is None:
+ return False
+ else:
+ if isinstance(points, list):
+ return (True if count in points else False)
+ else:
+ return (True if count == points else False)
+
+ # conv1
+ input = self.conv_bn_layer(
+ input,
+ num_filters=int(32 * scale),
+ filter_size=3,
+ stride=2,
+ padding=1,
+ if_act=True,
+ name='conv1_1')
+
+ layer_count = 1
+
+ if check_points(layer_count, self.decode_points):
+ decode_ends[layer_count] = input
+
+ if check_points(layer_count, self.end_points):
+ return input, decode_ends
+
+ # bottleneck sequences
+ i = 1
+ in_c = int(32 * scale)
+ for layer_setting in self.bottleneck_params_list:
+ t, c, n, s = layer_setting
+ i += 1
+ input, depthwise_output = self.invresi_blocks(
+ input=input,
+ in_c=in_c,
+ t=t,
+ c=int(c * scale),
+ n=n,
+ s=s,
+ name='conv' + str(i))
+ in_c = int(c * scale)
+ layer_count += n
+
+ if check_points(layer_count, self.decode_points):
+ decode_ends[layer_count] = depthwise_output
+
+ if check_points(layer_count, self.end_points):
+ return input, decode_ends
+
+ # last_conv
+ output = self.conv_bn_layer(
+ input=input,
+ num_filters=int(1280 * scale) if scale > 1.0 else 1280,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ if_act=True,
+ name='conv9')
+
+ if self.num_classes is not None:
+ output = fluid.layers.pool2d(
+ input=output, pool_type='avg', global_pooling=True)
+
+ output = fluid.layers.fc(
+ input=output,
+ size=self.num_classes,
+ param_attr=ParamAttr(name='fc10_weights'),
+ bias_attr=ParamAttr(name='fc10_offset'))
+ return output
+
+ def modify_bottle_params(self, output_stride=None):
+ if output_stride is not None and output_stride % 2 != 0:
+ raise Exception("output stride must to be even number")
+ if output_stride is None:
+ return
+ else:
+ stride = 2
+ for i, layer_setting in enumerate(self.bottleneck_params_list):
+ t, c, n, s = layer_setting
+ stride = stride * s
+ if stride > output_stride:
+ s = 1
+ self.bottleneck_params_list[i] = (t, c, n, s)
+
+ def conv_bn_layer(self,
+ input,
+ filter_size,
+ num_filters,
+ stride,
+ padding,
+ channels=None,
+ num_groups=1,
+ if_act=True,
+ name=None,
+ use_cudnn=True):
+ conv = fluid.layers.conv2d(
+ input=input,
+ num_filters=num_filters,
+ filter_size=filter_size,
+ stride=stride,
+ padding=padding,
+ groups=num_groups,
+ act=None,
+ use_cudnn=use_cudnn,
+ param_attr=ParamAttr(name=name + '_weights'),
+ bias_attr=False)
+ bn_name = name + '_bn'
+ bn = fluid.layers.batch_norm(
+ input=conv,
+ param_attr=ParamAttr(name=bn_name + "_scale"),
+ bias_attr=ParamAttr(name=bn_name + "_offset"),
+ moving_mean_name=bn_name + '_mean',
+ moving_variance_name=bn_name + '_variance')
+ if if_act:
+ return fluid.layers.relu6(bn)
+ else:
+ return bn
+
+ def shortcut(self, input, data_residual):
+ return fluid.layers.elementwise_add(input, data_residual)
+
+ def inverted_residual_unit(self,
+ input,
+ num_in_filter,
+ num_filters,
+ ifshortcut,
+ stride,
+ filter_size,
+ padding,
+ expansion_factor,
+ name=None):
+ num_expfilter = int(round(num_in_filter * expansion_factor))
+
+ channel_expand = self.conv_bn_layer(
+ input=input,
+ num_filters=num_expfilter,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ num_groups=1,
+ if_act=True,
+ name=name + '_expand')
+
+ bottleneck_conv = self.conv_bn_layer(
+ input=channel_expand,
+ num_filters=num_expfilter,
+ filter_size=filter_size,
+ stride=stride,
+ padding=padding,
+ num_groups=num_expfilter,
+ if_act=True,
+ name=name + '_dwise',
+ use_cudnn=False)
+
+ depthwise_output = bottleneck_conv
+
+ linear_out = self.conv_bn_layer(
+ input=bottleneck_conv,
+ num_filters=num_filters,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ num_groups=1,
+ if_act=False,
+ name=name + '_linear')
+
+ if ifshortcut:
+ out = self.shortcut(input=input, data_residual=linear_out)
+ return out, depthwise_output
+ else:
+ return linear_out, depthwise_output
+
+ def invresi_blocks(self, input, in_c, t, c, n, s, name=None):
+ first_block, depthwise_output = self.inverted_residual_unit(
+ input=input,
+ num_in_filter=in_c,
+ num_filters=c,
+ ifshortcut=False,
+ stride=s,
+ filter_size=3,
+ padding=1,
+ expansion_factor=t,
+ name=name + '_1')
+
+ last_residual_block = first_block
+ last_c = c
+
+ for i in range(1, n):
+ last_residual_block, depthwise_output = self.inverted_residual_unit(
+ input=last_residual_block,
+ num_in_filter=last_c,
+ num_filters=c,
+ ifshortcut=True,
+ stride=1,
+ filter_size=3,
+ padding=1,
+ expansion_factor=t,
+ name=name + '_' + str(i + 1))
+ return last_residual_block, depthwise_output
diff --git a/paddlex/cv/nets/mobilenet_v3.py b/paddlex/cv/nets/mobilenet_v3.py
new file mode 100644
index 0000000000000000000000000000000000000000..2d5a4d9869f9b698eb4db85e6968808f1f05e7f3
--- /dev/null
+++ b/paddlex/cv/nets/mobilenet_v3.py
@@ -0,0 +1,345 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.regularizer import L2Decay
+
+import math
+
+
+class MobileNetV3():
+ """
+ MobileNet v3, see https://arxiv.org/abs/1905.02244
+ Args:
+ scale (float): scaling factor for convolution groups proportion of mobilenet_v3.
+ model_name (str): There are two modes, small and large.
+ norm_type (str): normalization type, 'bn' and 'sync_bn' are supported.
+ norm_decay (float): weight decay for normalization layer weights.
+ conv_decay (float): weight decay for convolution layer weights.
+ with_extra_blocks (bool): if extra blocks should be added.
+ extra_block_filters (list): number of filter for each extra block.
+ """
+
+ def __init__(self,
+ scale=1.0,
+ model_name='small',
+ with_extra_blocks=False,
+ conv_decay=0.0,
+ norm_type='bn',
+ norm_decay=0.0,
+ extra_block_filters=[[256, 512], [128, 256], [128, 256],
+ [64, 128]],
+ num_classes=None):
+ self.scale = scale
+ self.with_extra_blocks = with_extra_blocks
+ self.extra_block_filters = extra_block_filters
+ self.conv_decay = conv_decay
+ self.norm_decay = norm_decay
+ self.inplanes = 16
+ self.end_points = []
+ self.block_stride = 1
+ self.num_classes = num_classes
+ if model_name == "large":
+ self.cfg = [
+ # kernel_size, expand, channel, se_block, act_mode, stride
+ [3, 16, 16, False, 'relu', 1],
+ [3, 64, 24, False, 'relu', 2],
+ [3, 72, 24, False, 'relu', 1],
+ [5, 72, 40, True, 'relu', 2],
+ [5, 120, 40, True, 'relu', 1],
+ [5, 120, 40, True, 'relu', 1],
+ [3, 240, 80, False, 'hard_swish', 2],
+ [3, 200, 80, False, 'hard_swish', 1],
+ [3, 184, 80, False, 'hard_swish', 1],
+ [3, 184, 80, False, 'hard_swish', 1],
+ [3, 480, 112, True, 'hard_swish', 1],
+ [3, 672, 112, True, 'hard_swish', 1],
+ [5, 672, 160, True, 'hard_swish', 2],
+ [5, 960, 160, True, 'hard_swish', 1],
+ [5, 960, 160, True, 'hard_swish', 1],
+ ]
+ self.cls_ch_squeeze = 960
+ self.cls_ch_expand = 1280
+ elif model_name == "small":
+ self.cfg = [
+ # kernel_size, expand, channel, se_block, act_mode, stride
+ [3, 16, 16, True, 'relu', 2],
+ [3, 72, 24, False, 'relu', 2],
+ [3, 88, 24, False, 'relu', 1],
+ [5, 96, 40, True, 'hard_swish', 2],
+ [5, 240, 40, True, 'hard_swish', 1],
+ [5, 240, 40, True, 'hard_swish', 1],
+ [5, 120, 48, True, 'hard_swish', 1],
+ [5, 144, 48, True, 'hard_swish', 1],
+ [5, 288, 96, True, 'hard_swish', 2],
+ [5, 576, 96, True, 'hard_swish', 1],
+ [5, 576, 96, True, 'hard_swish', 1],
+ ]
+ self.cls_ch_squeeze = 576
+ self.cls_ch_expand = 1280
+ else:
+ raise NotImplementedError
+
+ def _conv_bn_layer(self,
+ input,
+ filter_size,
+ num_filters,
+ stride,
+ padding,
+ num_groups=1,
+ if_act=True,
+ act=None,
+ name=None,
+ use_cudnn=True):
+ conv_param_attr = ParamAttr(
+ name=name + '_weights', regularizer=L2Decay(self.conv_decay))
+ conv = fluid.layers.conv2d(
+ input=input,
+ num_filters=num_filters,
+ filter_size=filter_size,
+ stride=stride,
+ padding=padding,
+ groups=num_groups,
+ act=None,
+ use_cudnn=use_cudnn,
+ param_attr=conv_param_attr,
+ bias_attr=False)
+ bn_name = name + '_bn'
+ bn_param_attr = ParamAttr(
+ name=bn_name + "_scale", regularizer=L2Decay(self.norm_decay))
+ bn_bias_attr = ParamAttr(
+ name=bn_name + "_offset", regularizer=L2Decay(self.norm_decay))
+ bn = fluid.layers.batch_norm(
+ input=conv,
+ param_attr=bn_param_attr,
+ bias_attr=bn_bias_attr,
+ moving_mean_name=bn_name + '_mean',
+ moving_variance_name=bn_name + '_variance')
+ if if_act:
+ if act == 'relu':
+ bn = fluid.layers.relu(bn)
+ elif act == 'hard_swish':
+ bn = self._hard_swish(bn)
+ elif act == 'relu6':
+ bn = fluid.layers.relu6(bn)
+ return bn
+
+ def _hard_swish(self, x):
+ return x * fluid.layers.relu6(x + 3) / 6.
+
+ def _se_block(self, input, num_out_filter, ratio=4, name=None):
+ num_mid_filter = int(num_out_filter // ratio)
+ pool = fluid.layers.pool2d(
+ input=input, pool_type='avg', global_pooling=True, use_cudnn=False)
+ conv1 = fluid.layers.conv2d(
+ input=pool,
+ filter_size=1,
+ num_filters=num_mid_filter,
+ act='relu',
+ param_attr=ParamAttr(name=name + '_1_weights'),
+ bias_attr=ParamAttr(name=name + '_1_offset'))
+ conv2 = fluid.layers.conv2d(
+ input=conv1,
+ filter_size=1,
+ num_filters=num_out_filter,
+ act='hard_sigmoid',
+ param_attr=ParamAttr(name=name + '_2_weights'),
+ bias_attr=ParamAttr(name=name + '_2_offset'))
+
+ scale = fluid.layers.elementwise_mul(x=input, y=conv2, axis=0)
+ return scale
+
+ def _residual_unit(self,
+ input,
+ num_in_filter,
+ num_mid_filter,
+ num_out_filter,
+ stride,
+ filter_size,
+ act=None,
+ use_se=False,
+ name=None):
+ input_data = input
+ conv0 = self._conv_bn_layer(
+ input=input,
+ filter_size=1,
+ num_filters=num_mid_filter,
+ stride=1,
+ padding=0,
+ if_act=True,
+ act=act,
+ name=name + '_expand')
+ if self.block_stride == 16 and stride == 2:
+ self.end_points.append(conv0)
+ conv1 = self._conv_bn_layer(
+ input=conv0,
+ filter_size=filter_size,
+ num_filters=num_mid_filter,
+ stride=stride,
+ padding=int((filter_size - 1) // 2),
+ if_act=True,
+ act=act,
+ num_groups=num_mid_filter,
+ use_cudnn=False,
+ name=name + '_depthwise')
+
+ if use_se:
+ conv1 = self._se_block(
+ input=conv1, num_out_filter=num_mid_filter, name=name + '_se')
+
+ conv2 = self._conv_bn_layer(
+ input=conv1,
+ filter_size=1,
+ num_filters=num_out_filter,
+ stride=1,
+ padding=0,
+ if_act=False,
+ name=name + '_linear')
+ if num_in_filter != num_out_filter or stride != 1:
+ return conv2
+ else:
+ return fluid.layers.elementwise_add(
+ x=input_data, y=conv2, act=None)
+
+ def _extra_block_dw(self,
+ input,
+ num_filters1,
+ num_filters2,
+ stride,
+ name=None):
+ pointwise_conv = self._conv_bn_layer(
+ input=input,
+ filter_size=1,
+ num_filters=int(num_filters1),
+ stride=1,
+ padding="SAME",
+ act='relu6',
+ name=name + "_extra1")
+ depthwise_conv = self._conv_bn_layer(
+ input=pointwise_conv,
+ filter_size=3,
+ num_filters=int(num_filters2),
+ stride=stride,
+ padding="SAME",
+ num_groups=int(num_filters1),
+ act='relu6',
+ use_cudnn=False,
+ name=name + "_extra2_dw")
+ normal_conv = self._conv_bn_layer(
+ input=depthwise_conv,
+ filter_size=1,
+ num_filters=int(num_filters2),
+ stride=1,
+ padding="SAME",
+ act='relu6',
+ name=name + "_extra2_sep")
+ return normal_conv
+
+ def __call__(self, input):
+ scale = self.scale
+ inplanes = self.inplanes
+ cfg = self.cfg
+ blocks = []
+
+ #conv1
+ conv = self._conv_bn_layer(
+ input,
+ filter_size=3,
+ num_filters=inplanes if scale <= 1.0 else int(inplanes * scale),
+ stride=2,
+ padding=1,
+ num_groups=1,
+ if_act=True,
+ act='hard_swish',
+ name='conv1')
+ i = 0
+ for layer_cfg in cfg:
+ self.block_stride *= layer_cfg[5]
+ if layer_cfg[5] == 2:
+ blocks.append(conv)
+ conv = self._residual_unit(
+ input=conv,
+ num_in_filter=inplanes,
+ num_mid_filter=int(scale * layer_cfg[1]),
+ num_out_filter=int(scale * layer_cfg[2]),
+ act=layer_cfg[4],
+ stride=layer_cfg[5],
+ filter_size=layer_cfg[0],
+ use_se=layer_cfg[3],
+ name='conv' + str(i + 2))
+
+ inplanes = int(scale * layer_cfg[2])
+ i += 1
+ blocks.append(conv)
+
+ if self.num_classes:
+ conv = self._conv_bn_layer(
+ input=conv,
+ filter_size=1,
+ num_filters=int(scale * self.cls_ch_squeeze),
+ stride=1,
+ padding=0,
+ num_groups=1,
+ if_act=True,
+ act='hard_swish',
+ name='conv_last')
+
+ conv = fluid.layers.pool2d(
+ input=conv,
+ pool_type='avg',
+ global_pooling=True,
+ use_cudnn=False)
+ conv = fluid.layers.conv2d(
+ input=conv,
+ num_filters=self.cls_ch_expand,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ act=None,
+ param_attr=ParamAttr(name='last_1x1_conv_weights'),
+ bias_attr=False)
+ conv = self._hard_swish(conv)
+ drop = fluid.layers.dropout(x=conv, dropout_prob=0.2)
+ out = fluid.layers.fc(
+ input=drop,
+ size=self.num_classes,
+ param_attr=ParamAttr(name='fc_weights'),
+ bias_attr=ParamAttr(name='fc_offset'))
+ return out
+
+ if not self.with_extra_blocks:
+ return blocks
+
+ # extra block
+ conv_extra = self._conv_bn_layer(
+ conv,
+ filter_size=1,
+ num_filters=int(scale * cfg[-1][1]),
+ stride=1,
+ padding="SAME",
+ num_groups=1,
+ if_act=True,
+ act='hard_swish',
+ name='conv' + str(i + 2))
+ self.end_points.append(conv_extra)
+ i += 1
+ for block_filter in self.extra_block_filters:
+ conv_extra = self._extra_block_dw(conv_extra, block_filter[0],
+ block_filter[1], 2,
+ 'conv' + str(i + 2))
+ self.end_points.append(conv_extra)
+ i += 1
+
+ return self.end_points
diff --git a/paddlex/cv/nets/resnet.py b/paddlex/cv/nets/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..1cbec9f1ad429118e8d1c7227f3049ab08c7780c
--- /dev/null
+++ b/paddlex/cv/nets/resnet.py
@@ -0,0 +1,487 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import math
+from collections import OrderedDict
+
+import paddle
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddle.fluid.framework import Variable
+from paddle.fluid.regularizer import L2Decay
+from paddle.fluid.initializer import Constant
+
+from numbers import Integral
+
+from .backbone_utils import NameAdapter
+
+__all__ = ['ResNet', 'ResNetC5']
+
+
+class ResNet(object):
+ """
+ Residual Network, see https://arxiv.org/abs/1512.03385
+ Args:
+ layers (int): ResNet layers, should be 18, 34, 50, 101, 152.
+ freeze_at (int): freeze the backbone at which stage
+ norm_type (str): normalization type, 'bn'/'sync_bn'/'affine_channel'
+ freeze_norm (bool): freeze normalization layers
+ norm_decay (float): weight decay for normalization layer weights
+ variant (str): ResNet variant, supports 'a', 'b', 'c', 'd' currently
+ feature_maps (list): index of stages whose feature maps are returned
+ dcn_v2_stages (list): index of stages who select deformable conv v2
+ nonlocal_stages (list): index of stages who select nonlocal networks
+ gcb_stages (list): index of stages who select gc blocks
+ gcb_params (dict): gc blocks config, includes ratio(default as 1.0/16),
+ pooling_type(default as "att") and
+ fusion_types(default as ['channel_add'])
+ """
+
+ def __init__(self,
+ layers=50,
+ freeze_at=0,
+ norm_type='bn',
+ freeze_norm=False,
+ norm_decay=0.,
+ variant='b',
+ feature_maps=[2, 3, 4, 5],
+ dcn_v2_stages=[],
+ weight_prefix_name='',
+ nonlocal_stages=[],
+ gcb_stages=[],
+ gcb_params=dict(),
+ num_classes=None):
+ super(ResNet, self).__init__()
+
+ if isinstance(feature_maps, Integral):
+ feature_maps = [feature_maps]
+
+ assert layers in [18, 34, 50, 101, 152, 200], \
+ "layers {} not in [18, 34, 50, 101, 152, 200]"
+ assert variant in ['a', 'b', 'c', 'd'], "invalid ResNet variant"
+ assert 0 <= freeze_at <= 5, "freeze_at should be 0, 1, 2, 3, 4 or 5"
+ assert len(feature_maps) > 0, "need one or more feature maps"
+ assert norm_type in ['bn', 'sync_bn', 'affine_channel']
+ assert not (len(nonlocal_stages)>0 and layers<50), \
+ "non-local is not supported for resnet18 or resnet34"
+
+ self.layers = layers
+ self.freeze_at = freeze_at
+ self.norm_type = norm_type
+ self.norm_decay = norm_decay
+ self.freeze_norm = freeze_norm
+ self.variant = variant
+ self._model_type = 'ResNet'
+ self.feature_maps = feature_maps
+ self.dcn_v2_stages = dcn_v2_stages
+ self.layers_cfg = {
+ 18: ([2, 2, 2, 2], self.basicblock),
+ 34: ([3, 4, 6, 3], self.basicblock),
+ 50: ([3, 4, 6, 3], self.bottleneck),
+ 101: ([3, 4, 23, 3], self.bottleneck),
+ 152: ([3, 8, 36, 3], self.bottleneck),
+ 200: ([3, 12, 48, 3], self.bottleneck),
+ }
+ self.stage_filters = [64, 128, 256, 512]
+ self._c1_out_chan_num = 64
+ self.na = NameAdapter(self)
+ self.prefix_name = weight_prefix_name
+
+ self.nonlocal_stages = nonlocal_stages
+ self.nonlocal_mod_cfg = {
+ 50: 2,
+ 101: 5,
+ 152: 8,
+ 200: 12,
+ }
+
+ self.gcb_stages = gcb_stages
+ self.gcb_params = gcb_params
+ self.num_classes = num_classes
+
+ def _conv_offset(self,
+ input,
+ filter_size,
+ stride,
+ padding,
+ act=None,
+ name=None):
+ out_channel = filter_size * filter_size * 3
+ out = fluid.layers.conv2d(
+ input,
+ num_filters=out_channel,
+ filter_size=filter_size,
+ stride=stride,
+ padding=padding,
+ param_attr=ParamAttr(
+ initializer=Constant(0.0), name=name + ".w_0"),
+ bias_attr=ParamAttr(initializer=Constant(0.0), name=name + ".b_0"),
+ act=act,
+ name=name)
+ return out
+
+ def _conv_norm(self,
+ input,
+ num_filters,
+ filter_size,
+ stride=1,
+ groups=1,
+ act=None,
+ name=None,
+ dcn_v2=False):
+ _name = self.prefix_name + name if self.prefix_name != '' else name
+ if not dcn_v2:
+ conv = fluid.layers.conv2d(
+ input=input,
+ num_filters=num_filters,
+ filter_size=filter_size,
+ stride=stride,
+ padding=(filter_size - 1) // 2,
+ groups=groups,
+ act=None,
+ param_attr=ParamAttr(name=_name + "_weights"),
+ bias_attr=False,
+ name=_name + '.conv2d.output.1')
+ else:
+ # select deformable conv"
+ offset_mask = self._conv_offset(
+ input=input,
+ filter_size=filter_size,
+ stride=stride,
+ padding=(filter_size - 1) // 2,
+ act=None,
+ name=_name + "_conv_offset")
+ offset_channel = filter_size**2 * 2
+ mask_channel = filter_size**2
+ offset, mask = fluid.layers.split(
+ input=offset_mask,
+ num_or_sections=[offset_channel, mask_channel],
+ dim=1)
+ mask = fluid.layers.sigmoid(mask)
+ conv = fluid.layers.deformable_conv(
+ input=input,
+ offset=offset,
+ mask=mask,
+ num_filters=num_filters,
+ filter_size=filter_size,
+ stride=stride,
+ padding=(filter_size - 1) // 2,
+ groups=groups,
+ deformable_groups=1,
+ im2col_step=1,
+ param_attr=ParamAttr(name=_name + "_weights"),
+ bias_attr=False,
+ name=_name + ".conv2d.output.1")
+
+ bn_name = self.na.fix_conv_norm_name(name)
+ bn_name = self.prefix_name + bn_name if self.prefix_name != '' else bn_name
+
+ norm_lr = 0. if self.freeze_norm else 1.
+ norm_decay = self.norm_decay
+ pattr = ParamAttr(
+ name=bn_name + '_scale',
+ learning_rate=norm_lr,
+ regularizer=L2Decay(norm_decay))
+ battr = ParamAttr(
+ name=bn_name + '_offset',
+ learning_rate=norm_lr,
+ regularizer=L2Decay(norm_decay))
+
+ if self.norm_type in ['bn', 'sync_bn']:
+ global_stats = True if self.freeze_norm else False
+ out = fluid.layers.batch_norm(
+ input=conv,
+ act=act,
+ name=bn_name + '.output.1',
+ param_attr=pattr,
+ bias_attr=battr,
+ moving_mean_name=bn_name + '_mean',
+ moving_variance_name=bn_name + '_variance',
+ use_global_stats=global_stats)
+ scale = fluid.framework._get_var(pattr.name)
+ bias = fluid.framework._get_var(battr.name)
+ elif self.norm_type == 'affine_channel':
+ scale = fluid.layers.create_parameter(
+ shape=[conv.shape[1]],
+ dtype=conv.dtype,
+ attr=pattr,
+ default_initializer=fluid.initializer.Constant(1.))
+ bias = fluid.layers.create_parameter(
+ shape=[conv.shape[1]],
+ dtype=conv.dtype,
+ attr=battr,
+ default_initializer=fluid.initializer.Constant(0.))
+ out = fluid.layers.affine_channel(
+ x=conv, scale=scale, bias=bias, act=act)
+ if self.freeze_norm:
+ scale.stop_gradient = True
+ bias.stop_gradient = True
+ return out
+
+ def _shortcut(self, input, ch_out, stride, is_first, name):
+ max_pooling_in_short_cut = self.variant == 'd'
+ ch_in = input.shape[1]
+ # the naming rule is same as pretrained weight
+ name = self.na.fix_shortcut_name(name)
+ std_senet = getattr(self, 'std_senet', False)
+ if ch_in != ch_out or stride != 1 or (self.layers < 50 and is_first):
+ if std_senet:
+ if is_first:
+ return self._conv_norm(input, ch_out, 1, stride, name=name)
+ else:
+ return self._conv_norm(input, ch_out, 3, stride, name=name)
+ if max_pooling_in_short_cut and not is_first:
+ input = fluid.layers.pool2d(
+ input=input,
+ pool_size=2,
+ pool_stride=2,
+ pool_padding=0,
+ ceil_mode=True,
+ pool_type='avg')
+ return self._conv_norm(input, ch_out, 1, 1, name=name)
+ return self._conv_norm(input, ch_out, 1, stride, name=name)
+ else:
+ return input
+
+ def bottleneck(self,
+ input,
+ num_filters,
+ stride,
+ is_first,
+ name,
+ dcn_v2=False,
+ gcb=False,
+ gcb_name=None):
+ if self.variant == 'a':
+ stride1, stride2 = stride, 1
+ else:
+ stride1, stride2 = 1, stride
+
+ # ResNeXt
+ groups = getattr(self, 'groups', 1)
+ group_width = getattr(self, 'group_width', -1)
+ if groups == 1:
+ expand = 4
+ elif (groups * group_width) == 256:
+ expand = 1
+ else: # FIXME hard code for now, handles 32x4d, 64x4d and 32x8d
+ num_filters = num_filters // 2
+ expand = 2
+
+ conv_name1, conv_name2, conv_name3, \
+ shortcut_name = self.na.fix_bottleneck_name(name)
+ std_senet = getattr(self, 'std_senet', False)
+ if std_senet:
+ conv_def = [[
+ int(num_filters / 2), 1, stride1, 'relu', 1, conv_name1
+ ], [num_filters, 3, stride2, 'relu', groups, conv_name2],
+ [num_filters * expand, 1, 1, None, 1, conv_name3]]
+ else:
+ conv_def = [[num_filters, 1, stride1, 'relu', 1, conv_name1],
+ [num_filters, 3, stride2, 'relu', groups, conv_name2],
+ [num_filters * expand, 1, 1, None, 1, conv_name3]]
+
+ residual = input
+ for i, (c, k, s, act, g, _name) in enumerate(conv_def):
+ residual = self._conv_norm(
+ input=residual,
+ num_filters=c,
+ filter_size=k,
+ stride=s,
+ act=act,
+ groups=g,
+ name=_name,
+ dcn_v2=(i == 1 and dcn_v2))
+ short = self._shortcut(
+ input,
+ num_filters * expand,
+ stride,
+ is_first=is_first,
+ name=shortcut_name)
+ # Squeeze-and-Excitation
+ if callable(getattr(self, '_squeeze_excitation', None)):
+ residual = self._squeeze_excitation(
+ input=residual, num_channels=num_filters, name='fc' + name)
+ if gcb:
+ residual = add_gc_block(residual, name=gcb_name, **self.gcb_params)
+ return fluid.layers.elementwise_add(
+ x=short, y=residual, act='relu', name=name + ".add.output.5")
+
+ def basicblock(self,
+ input,
+ num_filters,
+ stride,
+ is_first,
+ name,
+ dcn_v2=False,
+ gcb=False,
+ gcb_name=None):
+ assert dcn_v2 is False, "Not implemented yet."
+ assert gcb is False, "Not implemented yet."
+ conv0 = self._conv_norm(
+ input=input,
+ num_filters=num_filters,
+ filter_size=3,
+ act='relu',
+ stride=stride,
+ name=name + "_branch2a")
+ conv1 = self._conv_norm(
+ input=conv0,
+ num_filters=num_filters,
+ filter_size=3,
+ act=None,
+ name=name + "_branch2b")
+ short = self._shortcut(
+ input, num_filters, stride, is_first, name=name + "_branch1")
+ return fluid.layers.elementwise_add(x=short, y=conv1, act='relu')
+
+ def layer_warp(self, input, stage_num):
+ """
+ Args:
+ input (Variable): input variable.
+ stage_num (int): the stage number, should be 2, 3, 4, 5
+
+ Returns:
+ The last variable in endpoint-th stage.
+ """
+ assert stage_num in [2, 3, 4, 5]
+
+ stages, block_func = self.layers_cfg[self.layers]
+ count = stages[stage_num - 2]
+
+ ch_out = self.stage_filters[stage_num - 2]
+ is_first = False if stage_num != 2 else True
+ dcn_v2 = True if stage_num in self.dcn_v2_stages else False
+
+ nonlocal_mod = 1000
+ if stage_num in self.nonlocal_stages:
+ nonlocal_mod = self.nonlocal_mod_cfg[
+ self.layers] if stage_num == 4 else 2
+
+ # Make the layer name and parameter name consistent
+ # with ImageNet pre-trained model
+ conv = input
+ for i in range(count):
+ conv_name = self.na.fix_layer_warp_name(stage_num, count, i)
+ if self.layers < 50:
+ is_first = True if i == 0 and stage_num == 2 else False
+
+ gcb = stage_num in self.gcb_stages
+ gcb_name = "gcb_res{}_b{}".format(stage_num, i)
+ conv = block_func(
+ input=conv,
+ num_filters=ch_out,
+ stride=2 if i == 0 and stage_num != 2 else 1,
+ is_first=is_first,
+ name=conv_name,
+ dcn_v2=dcn_v2,
+ gcb=gcb,
+ gcb_name=gcb_name)
+
+ # add non local model
+ dim_in = conv.shape[1]
+ nonlocal_name = "nonlocal_conv{}".format(stage_num)
+ if i % nonlocal_mod == nonlocal_mod - 1:
+ conv = add_space_nonlocal(conv, dim_in, dim_in,
+ nonlocal_name + '_{}'.format(i),
+ int(dim_in / 2))
+ return conv
+
+ def c1_stage(self, input):
+ out_chan = self._c1_out_chan_num
+
+ conv1_name = self.na.fix_c1_stage_name()
+
+ if self.variant in ['c', 'd']:
+ conv_def = [
+ [out_chan // 2, 3, 2, "conv1_1"],
+ [out_chan // 2, 3, 1, "conv1_2"],
+ [out_chan, 3, 1, "conv1_3"],
+ ]
+ else:
+ conv_def = [[out_chan, 7, 2, conv1_name]]
+
+ for (c, k, s, _name) in conv_def:
+ input = self._conv_norm(
+ input=input,
+ num_filters=c,
+ filter_size=k,
+ stride=s,
+ act='relu',
+ name=_name)
+
+ output = fluid.layers.pool2d(
+ input=input,
+ pool_size=3,
+ pool_stride=2,
+ pool_padding=1,
+ pool_type='max')
+ return output
+
+ def __call__(self, input):
+ assert isinstance(input, Variable)
+ assert not (set(self.feature_maps) - set([1, 2, 3, 4, 5])), \
+ "feature maps {} not in [1, 2, 3, 4, 5]".format(self.feature_maps)
+
+ res_endpoints = []
+
+ res = input
+ feature_maps = self.feature_maps
+ severed_head = getattr(self, 'severed_head', False)
+ if not severed_head:
+ res = self.c1_stage(res)
+ feature_maps = range(2, max(self.feature_maps) + 1)
+
+ for i in feature_maps:
+ res = self.layer_warp(res, i)
+ if i in self.feature_maps:
+ res_endpoints.append(res)
+ if self.freeze_at >= i:
+ res.stop_gradient = True
+
+ if self.num_classes is not None:
+ pool = fluid.layers.pool2d(
+ input=res, pool_type='avg', global_pooling=True)
+ stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
+ out = fluid.layers.fc(
+ input=pool,
+ size=self.num_classes,
+ param_attr=fluid.param_attr.ParamAttr(
+ initializer=fluid.initializer.Uniform(-stdv, stdv)))
+ return out
+
+ return OrderedDict([('res{}_sum'.format(self.feature_maps[idx]), feat)
+ for idx, feat in enumerate(res_endpoints)])
+
+
+class ResNetC5(ResNet):
+ __doc__ = ResNet.__doc__
+
+ def __init__(self,
+ layers=50,
+ freeze_at=2,
+ norm_type='affine_channel',
+ freeze_norm=True,
+ norm_decay=0.,
+ variant='b',
+ feature_maps=[5],
+ weight_prefix_name=''):
+ super(ResNetC5,
+ self).__init__(layers, freeze_at, norm_type, freeze_norm,
+ norm_decay, variant, feature_maps)
+ self.severed_head = True
diff --git a/paddlex/cv/nets/segmentation/__init__.py b/paddlex/cv/nets/segmentation/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..f5af8c95426abb9c7b181ff8c717fe99edbf9760
--- /dev/null
+++ b/paddlex/cv/nets/segmentation/__init__.py
@@ -0,0 +1,18 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .unet import UNet
+from .deeplabv3p import DeepLabv3p
+from .model_utils import libs
+from .model_utils import loss
diff --git a/paddlex/cv/nets/segmentation/deeplabv3p.py b/paddlex/cv/nets/segmentation/deeplabv3p.py
new file mode 100644
index 0000000000000000000000000000000000000000..ab97f076b2b7ff40f2620989885b26d19fff5961
--- /dev/null
+++ b/paddlex/cv/nets/segmentation/deeplabv3p.py
@@ -0,0 +1,383 @@
+# coding: utf8
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from collections import OrderedDict
+
+import paddle.fluid as fluid
+from .model_utils.libs import scope, name_scope
+from .model_utils.libs import bn, bn_relu, relu
+from .model_utils.libs import conv, max_pool, deconv
+from .model_utils.libs import separate_conv
+from .model_utils.libs import sigmoid_to_softmax
+from .model_utils.loss import softmax_with_loss
+from .model_utils.loss import dice_loss
+from .model_utils.loss import bce_loss
+import paddlex.utils.logging as logging
+from paddlex.cv.nets.xception import Xception
+from paddlex.cv.nets.mobilenet_v2 import MobileNetV2
+
+
+class DeepLabv3p(object):
+ """实现DeepLabv3+模型
+ `"Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation"
+ `
+
+ Args:
+ num_classes (int): 类别数。
+ backbone (paddlex.cv.nets): 神经网络,实现DeepLabv3+特征图的计算。
+ mode (str): 网络运行模式,根据mode构建网络的输入和返回。
+ 当mode为'train'时,输入为image(-1, 3, -1, -1)和label (-1, 1, -1, -1) 返回loss。
+ 当mode为'train'时,输入为image (-1, 3, -1, -1)和label (-1, 1, -1, -1),返回loss,
+ pred (与网络输入label 相同大小的预测结果,值代表相应的类别),label,mask(非忽略值的mask,
+ 与label相同大小,bool类型)。
+ 当mode为'test'时,输入为image(-1, 3, -1, -1)返回pred (-1, 1, -1, -1)和
+ logit (-1, num_classes, -1, -1) 通道维上代表每一类的概率值。
+ output_stride (int): backbone 输出特征图相对于输入的下采样倍数,一般取值为8或16。
+ aspp_with_sep_conv (bool): 在asspp模块是否采用separable convolutions。
+ decoder_use_sep_conv (bool): decoder模块是否采用separable convolutions。
+ encoder_with_aspp (bool): 是否在encoder阶段采用aspp模块。
+ enable_decoder (bool): 是否使用decoder模块。
+ use_bce_loss (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。
+ use_dice_loss (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。
+ 当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。
+ class_weight (list/str): 交叉熵损失函数各类损失的权重。当class_weight为list的时候,长度应为
+ num_classes。当class_weight为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重
+ 自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,
+ 即平时使用的交叉熵损失函数。
+ ignore_index (int): label上忽略的值,label为ignore_index的像素不参与损失函数的计算。
+
+ Raises:
+ ValueError: use_bce_loss或use_dice_loss为真且num_calsses > 2。
+ ValueError: class_weight为list, 但长度不等于num_class。
+ class_weight为str, 但class_weight.low()不等于dynamic。
+ TypeError: class_weight不为None时,其类型不是list或str。
+ """
+
+ def __init__(self,
+ num_classes,
+ backbone,
+ mode='train',
+ output_stride=16,
+ aspp_with_sep_conv=True,
+ decoder_use_sep_conv=True,
+ encoder_with_aspp=True,
+ enable_decoder=True,
+ use_bce_loss=False,
+ use_dice_loss=False,
+ class_weight=None,
+ ignore_index=255):
+ # dice_loss或bce_loss只适用两类分割中
+ if num_classes > 2 and (use_bce_loss or use_dice_loss):
+ raise ValueError(
+ "dice loss and bce loss is only applicable to binary classfication"
+ )
+
+ if class_weight is not None:
+ if isinstance(class_weight, list):
+ if len(class_weight) != num_classes:
+ raise ValueError(
+ "Length of class_weight should be equal to number of classes"
+ )
+ elif isinstance(class_weight, str):
+ if class_weight.lower() != 'dynamic':
+ raise ValueError(
+ "if class_weight is string, must be dynamic!")
+ else:
+ raise TypeError(
+ 'Expect class_weight is a list or string but receive {}'.
+ format(type(class_weight)))
+
+ self.num_classes = num_classes
+ self.backbone = backbone
+ self.mode = mode
+ self.use_bce_loss = use_bce_loss
+ self.use_dice_loss = use_dice_loss
+ self.class_weight = class_weight
+ self.ignore_index = ignore_index
+ self.output_stride = output_stride
+ self.aspp_with_sep_conv = aspp_with_sep_conv
+ self.decoder_use_sep_conv = decoder_use_sep_conv
+ self.encoder_with_aspp = encoder_with_aspp
+ self.enable_decoder = enable_decoder
+
+ def _encoder(self, input):
+ # 编码器配置,采用ASPP架构,pooling + 1x1_conv + 三个不同尺度的空洞卷积并行, concat后1x1conv
+ # ASPP_WITH_SEP_CONV:默认为真,使用depthwise可分离卷积,否则使用普通卷积
+ # OUTPUT_STRIDE: 下采样倍数,8或16,决定aspp_ratios大小
+ # aspp_ratios:ASPP模块空洞卷积的采样率
+
+ if self.output_stride == 16:
+ aspp_ratios = [6, 12, 18]
+ elif self.output_stride == 8:
+ aspp_ratios = [12, 24, 36]
+ else:
+ raise Exception("DeepLabv3p only support stride 8 or 16")
+
+ param_attr = fluid.ParamAttr(
+ name=name_scope + 'weights',
+ regularizer=None,
+ initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
+ with scope('encoder'):
+ channel = 256
+ with scope("image_pool"):
+ image_avg = fluid.layers.reduce_mean(
+ input, [2, 3], keep_dim=True)
+ image_avg = bn_relu(
+ conv(
+ image_avg,
+ channel,
+ 1,
+ 1,
+ groups=1,
+ padding=0,
+ param_attr=param_attr))
+ input_shape = fluid.layers.shape(input)
+ image_avg = fluid.layers.resize_bilinear(
+ image_avg, input_shape[2:])
+
+ with scope("aspp0"):
+ aspp0 = bn_relu(
+ conv(
+ input,
+ channel,
+ 1,
+ 1,
+ groups=1,
+ padding=0,
+ param_attr=param_attr))
+ with scope("aspp1"):
+ if self.aspp_with_sep_conv:
+ aspp1 = separate_conv(
+ input,
+ channel,
+ 1,
+ 3,
+ dilation=aspp_ratios[0],
+ act=relu)
+ else:
+ aspp1 = bn_relu(
+ conv(
+ input,
+ channel,
+ stride=1,
+ filter_size=3,
+ dilation=aspp_ratios[0],
+ padding=aspp_ratios[0],
+ param_attr=param_attr))
+ with scope("aspp2"):
+ if self.aspp_with_sep_conv:
+ aspp2 = separate_conv(
+ input,
+ channel,
+ 1,
+ 3,
+ dilation=aspp_ratios[1],
+ act=relu)
+ else:
+ aspp2 = bn_relu(
+ conv(
+ input,
+ channel,
+ stride=1,
+ filter_size=3,
+ dilation=aspp_ratios[1],
+ padding=aspp_ratios[1],
+ param_attr=param_attr))
+ with scope("aspp3"):
+ if self.aspp_with_sep_conv:
+ aspp3 = separate_conv(
+ input,
+ channel,
+ 1,
+ 3,
+ dilation=aspp_ratios[2],
+ act=relu)
+ else:
+ aspp3 = bn_relu(
+ conv(
+ input,
+ channel,
+ stride=1,
+ filter_size=3,
+ dilation=aspp_ratios[2],
+ padding=aspp_ratios[2],
+ param_attr=param_attr))
+ with scope("concat"):
+ data = fluid.layers.concat(
+ [image_avg, aspp0, aspp1, aspp2, aspp3], axis=1)
+ data = bn_relu(
+ conv(
+ data,
+ channel,
+ 1,
+ 1,
+ groups=1,
+ padding=0,
+ param_attr=param_attr))
+ data = fluid.layers.dropout(data, 0.9)
+ return data
+
+ def _decoder(self, encode_data, decode_shortcut):
+ # 解码器配置
+ # encode_data:编码器输出
+ # decode_shortcut: 从backbone引出的分支, resize后与encode_data concat
+ # decoder_use_sep_conv: 默认为真,则concat后连接两个可分离卷积,否则为普通卷积
+ param_attr = fluid.ParamAttr(
+ name=name_scope + 'weights',
+ regularizer=None,
+ initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
+ with scope('decoder'):
+ with scope('concat'):
+ decode_shortcut = bn_relu(
+ conv(
+ decode_shortcut,
+ 48,
+ 1,
+ 1,
+ groups=1,
+ padding=0,
+ param_attr=param_attr))
+
+ decode_shortcut_shape = fluid.layers.shape(decode_shortcut)
+ encode_data = fluid.layers.resize_bilinear(
+ encode_data, decode_shortcut_shape[2:])
+ encode_data = fluid.layers.concat(
+ [encode_data, decode_shortcut], axis=1)
+ if self.decoder_use_sep_conv:
+ with scope("separable_conv1"):
+ encode_data = separate_conv(
+ encode_data, 256, 1, 3, dilation=1, act=relu)
+ with scope("separable_conv2"):
+ encode_data = separate_conv(
+ encode_data, 256, 1, 3, dilation=1, act=relu)
+ else:
+ with scope("decoder_conv1"):
+ encode_data = bn_relu(
+ conv(
+ encode_data,
+ 256,
+ stride=1,
+ filter_size=3,
+ dilation=1,
+ padding=1,
+ param_attr=param_attr))
+ with scope("decoder_conv2"):
+ encode_data = bn_relu(
+ conv(
+ encode_data,
+ 256,
+ stride=1,
+ filter_size=3,
+ dilation=1,
+ padding=1,
+ param_attr=param_attr))
+ return encode_data
+
+ def _get_loss(self, logit, label, mask):
+ avg_loss = 0
+ if not (self.use_dice_loss or self.use_bce_loss):
+ avg_loss += softmax_with_loss(
+ logit,
+ label,
+ mask,
+ num_classes=self.num_classes,
+ weight=self.class_weight,
+ ignore_index=self.ignore_index)
+ else:
+ if self.use_dice_loss:
+ avg_loss += dice_loss(logit, label, mask)
+ if self.use_bce_loss:
+ avg_loss += bce_loss(
+ logit, label, mask, ignore_index=self.ignore_index)
+
+ return avg_loss
+
+ def generate_inputs(self):
+ inputs = OrderedDict()
+ inputs['image'] = fluid.data(
+ dtype='float32', shape=[None, 3, None, None], name='image')
+ if self.mode == 'train':
+ inputs['label'] = fluid.data(
+ dtype='int32', shape=[None, 1, None, None], name='label')
+ elif self.mode == 'eval':
+ inputs['label'] = fluid.data(
+ dtype='int32', shape=[None, 1, None, None], name='label')
+ return inputs
+
+ def build_net(self, inputs):
+ # 在两类分割情况下,当loss函数选择dice_loss或bce_loss的时候,最后logit输出通道数设置为1
+ if self.use_dice_loss or self.use_bce_loss:
+ self.num_classes = 1
+ image = inputs['image']
+
+ data, decode_shortcuts = self.backbone(image)
+ decode_shortcut = decode_shortcuts[self.backbone.decode_points]
+
+ # 编码器解码器设置
+ if self.encoder_with_aspp:
+ data = self._encoder(data)
+ if self.enable_decoder:
+ data = self._decoder(data, decode_shortcut)
+
+ # 根据类别数设置最后一个卷积层输出,并resize到图片原始尺寸
+ param_attr = fluid.ParamAttr(
+ name=name_scope + 'weights',
+ regularizer=fluid.regularizer.L2DecayRegularizer(
+ regularization_coeff=0.0),
+ initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
+ with scope('logit'):
+ with fluid.name_scope('last_conv'):
+ logit = conv(
+ data,
+ self.num_classes,
+ 1,
+ stride=1,
+ padding=0,
+ bias_attr=True,
+ param_attr=param_attr)
+ image_shape = fluid.layers.shape(image)
+ logit = fluid.layers.resize_bilinear(logit, image_shape[2:])
+
+ if self.num_classes == 1:
+ out = sigmoid_to_softmax(logit)
+ out = fluid.layers.transpose(out, [0, 2, 3, 1])
+ else:
+ out = fluid.layers.transpose(logit, [0, 2, 3, 1])
+
+ pred = fluid.layers.argmax(out, axis=3)
+ pred = fluid.layers.unsqueeze(pred, axes=[3])
+
+ if self.mode == 'train':
+ label = inputs['label']
+ mask = label != self.ignore_index
+ return self._get_loss(logit, label, mask)
+
+ elif self.mode == 'eval':
+ label = inputs['label']
+ mask = label != self.ignore_index
+ loss = self._get_loss(logit, label, mask)
+ return loss, pred, label, mask
+ else:
+ if self.num_classes == 1:
+ logit = sigmoid_to_softmax(logit)
+ else:
+ logit = fluid.layers.softmax(logit, axis=1)
+ return pred, logit
+
+ return logit
diff --git a/paddlex/cv/nets/segmentation/model_utils/__init__.py b/paddlex/cv/nets/segmentation/model_utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..87ab6f19957bbcd460056c5def700b0c7e14424f
--- /dev/null
+++ b/paddlex/cv/nets/segmentation/model_utils/__init__.py
@@ -0,0 +1,16 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from . import libs
+from . import loss
diff --git a/paddlex/cv/nets/segmentation/model_utils/libs.py b/paddlex/cv/nets/segmentation/model_utils/libs.py
new file mode 100644
index 0000000000000000000000000000000000000000..01fdad2cec6ce4b13cea2b7c957fb648edb4aeb2
--- /dev/null
+++ b/paddlex/cv/nets/segmentation/model_utils/libs.py
@@ -0,0 +1,219 @@
+# coding: utf8
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import paddle
+import paddle.fluid as fluid
+import contextlib
+
+bn_regularizer = fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.0)
+name_scope = ""
+
+
+@contextlib.contextmanager
+def scope(name):
+ global name_scope
+ bk = name_scope
+ name_scope = name_scope + name + '/'
+ yield
+ name_scope = bk
+
+
+def max_pool(input, kernel, stride, padding):
+ data = fluid.layers.pool2d(
+ input,
+ pool_size=kernel,
+ pool_type='max',
+ pool_stride=stride,
+ pool_padding=padding)
+ return data
+
+
+def avg_pool(input, kernel, stride, padding=0):
+ data = fluid.layers.pool2d(
+ input,
+ pool_size=kernel,
+ pool_type='avg',
+ pool_stride=stride,
+ pool_padding=padding)
+ return data
+
+
+def group_norm(input, G, eps=1e-5, param_attr=None, bias_attr=None):
+ N, C, H, W = input.shape
+ if C % G != 0:
+ for d in range(10):
+ for t in [d, -d]:
+ if G + t <= 0: continue
+ if C % (G + t) == 0:
+ G = G + t
+ break
+ if C % G == 0:
+ break
+ assert C % G == 0, "group can not divide channle"
+ x = fluid.layers.group_norm(
+ input,
+ groups=G,
+ param_attr=param_attr,
+ bias_attr=bias_attr,
+ name=name_scope + 'group_norm')
+ return x
+
+
+def bn(*args,
+ norm_type='bn',
+ eps=1e-5,
+ bn_momentum=0.99,
+ group_norm=32,
+ **kargs):
+
+ if norm_type == 'bn':
+ with scope('BatchNorm'):
+ return fluid.layers.batch_norm(
+ *args,
+ epsilon=eps,
+ momentum=bn_momentum,
+ param_attr=fluid.ParamAttr(
+ name=name_scope + 'gamma', regularizer=bn_regularizer),
+ bias_attr=fluid.ParamAttr(
+ name=name_scope + 'beta', regularizer=bn_regularizer),
+ moving_mean_name=name_scope + 'moving_mean',
+ moving_variance_name=name_scope + 'moving_variance',
+ **kargs)
+ elif norm_type == 'gn':
+ with scope('GroupNorm'):
+ return group_norm(
+ args[0],
+ group_norm,
+ eps=eps,
+ param_attr=fluid.ParamAttr(
+ name=name_scope + 'gamma', regularizer=bn_regularizer),
+ bias_attr=fluid.ParamAttr(
+ name=name_scope + 'beta', regularizer=bn_regularizer))
+ else:
+ raise Exception("Unsupport norm type:" + norm_type)
+
+
+def bn_relu(data, norm_type='bn', eps=1e-5):
+ return fluid.layers.relu(bn(data, norm_type=norm_type, eps=eps))
+
+
+def relu(data):
+ return fluid.layers.relu(data)
+
+
+def conv(*args, **kargs):
+ kargs['param_attr'] = name_scope + 'weights'
+ if 'bias_attr' in kargs and kargs['bias_attr']:
+ kargs['bias_attr'] = fluid.ParamAttr(
+ name=name_scope + 'biases',
+ regularizer=None,
+ initializer=fluid.initializer.ConstantInitializer(value=0.0))
+ else:
+ kargs['bias_attr'] = False
+ return fluid.layers.conv2d(*args, **kargs)
+
+
+def deconv(*args, **kargs):
+ kargs['param_attr'] = name_scope + 'weights'
+ if 'bias_attr' in kargs and kargs['bias_attr']:
+ kargs['bias_attr'] = name_scope + 'biases'
+ else:
+ kargs['bias_attr'] = False
+ return fluid.layers.conv2d_transpose(*args, **kargs)
+
+
+def separate_conv(input,
+ channel,
+ stride,
+ filter,
+ dilation=1,
+ act=None,
+ eps=1e-5):
+ param_attr = fluid.ParamAttr(
+ name=name_scope + 'weights',
+ regularizer=fluid.regularizer.L2DecayRegularizer(
+ regularization_coeff=0.0),
+ initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.33))
+ with scope('depthwise'):
+ input = conv(
+ input,
+ input.shape[1],
+ filter,
+ stride,
+ groups=input.shape[1],
+ padding=(filter // 2) * dilation,
+ dilation=dilation,
+ use_cudnn=False,
+ param_attr=param_attr)
+ input = bn(input, eps=eps)
+ if act: input = act(input)
+
+ param_attr = fluid.ParamAttr(
+ name=name_scope + 'weights',
+ regularizer=None,
+ initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
+ with scope('pointwise'):
+ input = conv(
+ input, channel, 1, 1, groups=1, padding=0, param_attr=param_attr)
+ input = bn(input, eps=eps)
+ if act: input = act(input)
+ return input
+
+
+def conv_bn_layer(input,
+ filter_size,
+ num_filters,
+ stride,
+ padding,
+ channels=None,
+ num_groups=1,
+ if_act=True,
+ name=None,
+ use_cudnn=True):
+ conv = fluid.layers.conv2d(
+ input=input,
+ num_filters=num_filters,
+ filter_size=filter_size,
+ stride=stride,
+ padding=padding,
+ groups=num_groups,
+ act=None,
+ use_cudnn=use_cudnn,
+ param_attr=fluid.ParamAttr(name=name + '_weights'),
+ bias_attr=False)
+ bn_name = name + '_bn'
+ bn = fluid.layers.batch_norm(
+ input=conv,
+ param_attr=fluid.ParamAttr(name=bn_name + "_scale"),
+ bias_attr=fluid.ParamAttr(name=bn_name + "_offset"),
+ moving_mean_name=bn_name + '_mean',
+ moving_variance_name=bn_name + '_variance')
+ if if_act:
+ return fluid.layers.relu6(bn)
+ else:
+ return bn
+
+
+def sigmoid_to_softmax(input):
+ """
+ one channel to two channel
+ """
+ logit = fluid.layers.sigmoid(input)
+ logit_back = 1 - logit
+ logit = fluid.layers.concat([logit_back, logit], axis=1)
+ return logit
diff --git a/paddlex/cv/nets/segmentation/model_utils/loss.py b/paddlex/cv/nets/segmentation/model_utils/loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..60c21bd2fc159cf049dc46c0f43130481b80d896
--- /dev/null
+++ b/paddlex/cv/nets/segmentation/model_utils/loss.py
@@ -0,0 +1,116 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle.fluid as fluid
+import numpy as np
+
+
+def softmax_with_loss(logit,
+ label,
+ ignore_mask=None,
+ num_classes=2,
+ weight=None,
+ ignore_index=255):
+ ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
+ label = fluid.layers.elementwise_min(
+ label, fluid.layers.assign(
+ np.array([num_classes - 1], dtype=np.int32)))
+ logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+ logit = fluid.layers.reshape(logit, [-1, num_classes])
+ label = fluid.layers.reshape(label, [-1, 1])
+ label = fluid.layers.cast(label, 'int64')
+ ignore_mask = fluid.layers.reshape(ignore_mask, [-1, 1])
+ if weight is None:
+ loss, probs = fluid.layers.softmax_with_cross_entropy(
+ logit, label, ignore_index=ignore_index, return_softmax=True)
+ else:
+ label_one_hot = fluid.layers.one_hot(input=label, depth=num_classes)
+ if isinstance(weight, list):
+ assert len(
+ weight
+ ) == num_classes, "weight length must equal num of classes"
+ weight = fluid.layers.assign(np.array([weight], dtype='float32'))
+ elif isinstance(weight, str):
+ assert weight.lower(
+ ) == 'dynamic', 'if weight is string, must be dynamic!'
+ tmp = []
+ total_num = fluid.layers.cast(
+ fluid.layers.shape(label)[0], 'float32')
+ for i in range(num_classes):
+ cls_pixel_num = fluid.layers.reduce_sum(label_one_hot[:, i])
+ ratio = total_num / (cls_pixel_num + 1)
+ tmp.append(ratio)
+ weight = fluid.layers.concat(tmp)
+ weight = weight / fluid.layers.reduce_sum(weight) * num_classes
+ elif isinstance(weight, fluid.layers.Variable):
+ pass
+ else:
+ raise ValueError(
+ 'Expect weight is a list, string or Variable, but receive {}'.
+ format(type(weight)))
+ weight = fluid.layers.reshape(weight, [1, num_classes])
+ weighted_label_one_hot = fluid.layers.elementwise_mul(
+ label_one_hot, weight)
+ probs = fluid.layers.softmax(logit)
+ loss = fluid.layers.cross_entropy(
+ probs,
+ weighted_label_one_hot,
+ soft_label=True,
+ ignore_index=ignore_index)
+ weighted_label_one_hot.stop_gradient = True
+
+ loss = loss * ignore_mask
+ avg_loss = fluid.layers.mean(loss) / (
+ fluid.layers.mean(ignore_mask) + 0.00001)
+
+ label.stop_gradient = True
+ ignore_mask.stop_gradient = True
+ return avg_loss
+
+
+# to change, how to appicate ignore index and ignore mask
+def dice_loss(logit, label, ignore_mask=None, epsilon=0.00001):
+ if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1:
+ raise Exception(
+ "dice loss is only applicable to one channel classfication")
+ ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
+ logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+ label = fluid.layers.transpose(label, [0, 2, 3, 1])
+ label = fluid.layers.cast(label, 'int64')
+ ignore_mask = fluid.layers.transpose(ignore_mask, [0, 2, 3, 1])
+ logit = fluid.layers.sigmoid(logit)
+ logit = logit * ignore_mask
+ label = label * ignore_mask
+ reduce_dim = list(range(1, len(logit.shape)))
+ inse = fluid.layers.reduce_sum(logit * label, dim=reduce_dim)
+ dice_denominator = fluid.layers.reduce_sum(
+ logit, dim=reduce_dim) + fluid.layers.reduce_sum(
+ label, dim=reduce_dim)
+ dice_score = 1 - inse * 2 / (dice_denominator + epsilon)
+ label.stop_gradient = True
+ ignore_mask.stop_gradient = True
+ return fluid.layers.reduce_mean(dice_score)
+
+
+def bce_loss(logit, label, ignore_mask=None, ignore_index=255):
+ if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1:
+ raise Exception("bce loss is only applicable to binary classfication")
+ label = fluid.layers.cast(label, 'float32')
+ loss = fluid.layers.sigmoid_cross_entropy_with_logits(
+ x=logit, label=label, ignore_index=ignore_index,
+ normalize=True) # or False
+ loss = fluid.layers.reduce_sum(loss)
+ label.stop_gradient = True
+ ignore_mask.stop_gradient = True
+ return loss
diff --git a/paddlex/cv/nets/segmentation/unet.py b/paddlex/cv/nets/segmentation/unet.py
new file mode 100644
index 0000000000000000000000000000000000000000..d1d29926bd5d3a4166ef09126b56d4b2d0252f3a
--- /dev/null
+++ b/paddlex/cv/nets/segmentation/unet.py
@@ -0,0 +1,273 @@
+# coding: utf8
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from collections import OrderedDict
+
+import paddle.fluid as fluid
+from .model_utils.libs import scope, name_scope
+from .model_utils.libs import bn, bn_relu, relu
+from .model_utils.libs import conv, max_pool, deconv
+from .model_utils.libs import sigmoid_to_softmax
+from .model_utils.loss import softmax_with_loss
+from .model_utils.loss import dice_loss
+from .model_utils.loss import bce_loss
+import paddlex.utils.logging as logging
+
+
+class UNet(object):
+ """实现Unet模型
+ `"U-Net: Convolutional Networks for Biomedical Image Segmentation"
+ `
+
+ Args:
+ num_classes (int): 类别数
+ mode (str): 网络运行模式,根据mode构建网络的输入和返回。
+ 当mode为'train'时,输入为image(-1, 3, -1, -1)和label (-1, 1, -1, -1) 返回loss。
+ 当mode为'train'时,输入为image (-1, 3, -1, -1)和label (-1, 1, -1, -1),返回loss,
+ pred (与网络输入label 相同大小的预测结果,值代表相应的类别),label,mask(非忽略值的mask,
+ 与label相同大小,bool类型)。
+ 当mode为'test'时,输入为image(-1, 3, -1, -1)返回pred (-1, 1, -1, -1)和
+ logit (-1, num_classes, -1, -1) 通道维上代表每一类的概率值。
+ upsample_mode (str): UNet decode时采用的上采样方式,取值为'bilinear'时利用双线行差值进行上菜样,
+ 当输入其他选项时则利用反卷积进行上菜样,默认为'bilinear'。
+ use_bce_loss (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。
+ use_dice_loss (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。
+ 当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。
+ class_weight (list/str): 交叉熵损失函数各类损失的权重。当class_weight为list的时候,长度应为
+ num_classes。当class_weight为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重
+ 自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,
+ 即平时使用的交叉熵损失函数。
+ ignore_index (int): label上忽略的值,label为ignore_index的像素不参与损失函数的计算。
+
+ Raises:
+ ValueError: use_bce_loss或use_dice_loss为真且num_calsses > 2。
+ ValueError: class_weight为list, 但长度不等于num_class。
+ class_weight为str, 但class_weight.low()不等于dynamic。
+ TypeError: class_weight不为None时,其类型不是list或str。
+ """
+
+ def __init__(self,
+ num_classes,
+ mode='train',
+ upsample_mode='bilinear',
+ use_bce_loss=False,
+ use_dice_loss=False,
+ class_weight=None,
+ ignore_index=255):
+ # dice_loss或bce_loss只适用两类分割中
+ if num_classes > 2 and (use_bce_loss or use_dice_loss):
+ raise Exception(
+ "dice loss and bce loss is only applicable to binary classfication"
+ )
+
+ if class_weight is not None:
+ if isinstance(class_weight, list):
+ if len(class_weight) != num_classes:
+ raise ValueError(
+ "Length of class_weight should be equal to number of classes"
+ )
+ elif isinstance(class_weight, str):
+ if class_weight.lower() != 'dynamic':
+ raise ValueError(
+ "if class_weight is string, must be dynamic!")
+ else:
+ raise TypeError(
+ 'Expect class_weight is a list or string but receive {}'.
+ format(type(class_weight)))
+ self.num_classes = num_classes
+ self.mode = mode
+ self.upsample_mode = upsample_mode
+ self.use_bce_loss = use_bce_loss
+ self.use_dice_loss = use_dice_loss
+ self.class_weight = class_weight
+ self.ignore_index = ignore_index
+
+ def _double_conv(self, data, out_ch):
+ param_attr = fluid.ParamAttr(
+ name='weights',
+ regularizer=fluid.regularizer.L2DecayRegularizer(
+ regularization_coeff=0.0),
+ initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.33))
+ with scope("conv0"):
+ data = bn_relu(
+ conv(
+ data,
+ out_ch,
+ 3,
+ stride=1,
+ padding=1,
+ param_attr=param_attr))
+ with scope("conv1"):
+ data = bn_relu(
+ conv(
+ data,
+ out_ch,
+ 3,
+ stride=1,
+ padding=1,
+ param_attr=param_attr))
+ return data
+
+ def _down(self, data, out_ch):
+ # 下采样:max_pool + 2个卷积
+ with scope("down"):
+ data = max_pool(data, 2, 2, 0)
+ data = self._double_conv(data, out_ch)
+ return data
+
+ def _up(self, data, short_cut, out_ch):
+ # 上采样:data上采样(resize或deconv), 并与short_cut concat
+ param_attr = fluid.ParamAttr(
+ name='weights',
+ regularizer=fluid.regularizer.L2DecayRegularizer(
+ regularization_coeff=0.0),
+ initializer=fluid.initializer.XavierInitializer(),
+ )
+ with scope("up"):
+ if self.upsample_mode == 'bilinear':
+ short_cut_shape = fluid.layers.shape(short_cut)
+ data = fluid.layers.resize_bilinear(data, short_cut_shape[2:])
+ else:
+ data = deconv(
+ data,
+ out_ch // 2,
+ filter_size=2,
+ stride=2,
+ padding=0,
+ param_attr=param_attr)
+ data = fluid.layers.concat([data, short_cut], axis=1)
+ data = self._double_conv(data, out_ch)
+ return data
+
+ def _encode(self, data):
+ # 编码器设置
+ short_cuts = []
+ with scope("encode"):
+ with scope("block1"):
+ data = self._double_conv(data, 64)
+ short_cuts.append(data)
+ with scope("block2"):
+ data = self._down(data, 128)
+ short_cuts.append(data)
+ with scope("block3"):
+ data = self._down(data, 256)
+ short_cuts.append(data)
+ with scope("block4"):
+ data = self._down(data, 512)
+ short_cuts.append(data)
+ with scope("block5"):
+ data = self._down(data, 512)
+ return data, short_cuts
+
+ def _decode(self, data, short_cuts):
+ # 解码器设置,与编码器对称
+ with scope("decode"):
+ with scope("decode1"):
+ data = self._up(data, short_cuts[3], 256)
+ with scope("decode2"):
+ data = self._up(data, short_cuts[2], 128)
+ with scope("decode3"):
+ data = self._up(data, short_cuts[1], 64)
+ with scope("decode4"):
+ data = self._up(data, short_cuts[0], 64)
+ return data
+
+ def _get_logit(self, data, num_classes):
+ # 根据类别数设置最后一个卷积层输出
+ param_attr = fluid.ParamAttr(
+ name='weights',
+ regularizer=fluid.regularizer.L2DecayRegularizer(
+ regularization_coeff=0.0),
+ initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
+ with scope("logit"):
+ data = conv(
+ data,
+ num_classes,
+ 3,
+ stride=1,
+ padding=1,
+ param_attr=param_attr)
+ return data
+
+ def _get_loss(self, logit, label, mask):
+ avg_loss = 0
+ if not (self.use_dice_loss or self.use_bce_loss):
+ avg_loss += softmax_with_loss(
+ logit,
+ label,
+ mask,
+ num_classes=self.num_classes,
+ weight=self.class_weight,
+ ignore_index=self.ignore_index)
+ else:
+ if self.use_dice_loss:
+ avg_loss += dice_loss(logit, label, mask)
+ if self.use_bce_loss:
+ avg_loss += bce_loss(
+ logit, label, mask, ignore_index=self.ignore_index)
+
+ return avg_loss
+
+ def generate_inputs(self):
+ inputs = OrderedDict()
+ inputs['image'] = fluid.data(
+ dtype='float32', shape=[None, 3, None, None], name='image')
+ if self.mode == 'train':
+ inputs['label'] = fluid.data(
+ dtype='int32', shape=[None, 1, None, None], name='label')
+ elif self.mode == 'eval':
+ inputs['label'] = fluid.data(
+ dtype='int32', shape=[None, 1, None, None], name='label')
+ return inputs
+
+ def build_net(self, inputs):
+ # 在两类分割情况下,当loss函数选择dice_loss或bce_loss的时候,最后logit输出通道数设置为1
+ if self.use_dice_loss or self.use_bce_loss:
+ self.num_classes = 1
+
+ image = inputs['image']
+ encode_data, short_cuts = self._encode(image)
+ decode_data = self._decode(encode_data, short_cuts)
+ logit = self._get_logit(decode_data, self.num_classes)
+
+ if self.num_classes == 1:
+ out = sigmoid_to_softmax(logit)
+ out = fluid.layers.transpose(out, [0, 2, 3, 1])
+ else:
+ out = fluid.layers.transpose(logit, [0, 2, 3, 1])
+
+ pred = fluid.layers.argmax(out, axis=3)
+ pred = fluid.layers.unsqueeze(pred, axes=[3])
+
+ if self.mode == 'train':
+ label = inputs['label']
+ mask = label != self.ignore_index
+ return self._get_loss(logit, label, mask)
+
+ elif self.mode == 'eval':
+ label = inputs['label']
+ mask = label != self.ignore_index
+ loss = self._get_loss(logit, label, mask)
+ return loss, pred, label, mask
+ else:
+ if self.num_classes == 1:
+ logit = sigmoid_to_softmax(logit)
+ else:
+ logit = fluid.layers.softmax(logit, axis=1)
+ return pred, logit
diff --git a/paddlex/cv/nets/shufflenet_v2.py b/paddlex/cv/nets/shufflenet_v2.py
new file mode 100644
index 0000000000000000000000000000000000000000..253c620a2d69a03760466f1e33f4ecd19c47a91e
--- /dev/null
+++ b/paddlex/cv/nets/shufflenet_v2.py
@@ -0,0 +1,272 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import paddle.fluid as fluid
+from paddle.fluid.initializer import MSRA
+from paddle.fluid.param_attr import ParamAttr
+
+
+class ShuffleNetV2():
+ def __init__(self, num_classes=None, scale=1.0):
+ self.num_classes = num_classes
+ self.scale = scale
+
+ def __call__(self, input):
+ scale = self.scale
+ stage_repeats = [4, 8, 4]
+
+ if scale == 0.25:
+ stage_out_channels = [-1, 24, 24, 48, 96, 512]
+ elif scale == 0.33:
+ stage_out_channels = [-1, 24, 32, 64, 128, 512]
+ elif scale == 0.5:
+ stage_out_channels = [-1, 24, 48, 96, 192, 1024]
+ elif scale == 1.0:
+ stage_out_channels = [-1, 24, 116, 232, 464, 1024]
+ elif scale == 1.5:
+ stage_out_channels = [-1, 24, 176, 352, 704, 1024]
+ elif scale == 2.0:
+ stage_out_channels = [-1, 24, 224, 488, 976, 2048]
+ else:
+ raise NotImplementedError("This scale size:[" + str(scale) +
+ "] is not implemented!")
+ #conv1
+
+ input_channel = stage_out_channels[1]
+ conv1 = self.conv_bn_layer(
+ input=input,
+ filter_size=3,
+ num_filters=input_channel,
+ padding=1,
+ stride=2,
+ name='stage1_conv')
+ pool1 = fluid.layers.pool2d(
+ input=conv1,
+ pool_size=3,
+ pool_stride=2,
+ pool_padding=1,
+ pool_type='max')
+ conv = pool1
+ # bottleneck sequences
+ for idxstage in range(len(stage_repeats)):
+ numrepeat = stage_repeats[idxstage]
+ output_channel = stage_out_channels[idxstage + 2]
+ for i in range(numrepeat):
+ if i == 0:
+ conv = self.inverted_residual_unit(
+ input=conv,
+ num_filters=output_channel,
+ stride=2,
+ benchmodel=2,
+ name=str(idxstage + 2) + '_' + str(i + 1))
+ else:
+ conv = self.inverted_residual_unit(
+ input=conv,
+ num_filters=output_channel,
+ stride=1,
+ benchmodel=1,
+ name=str(idxstage + 2) + '_' + str(i + 1))
+
+ output = self.conv_bn_layer(
+ input=conv,
+ filter_size=1,
+ num_filters=stage_out_channels[-1],
+ padding=0,
+ stride=1,
+ name='conv5')
+
+ if self.num_classes is not None:
+ output = fluid.layers.pool2d(
+ input=output,
+ pool_size=7,
+ pool_stride=1,
+ pool_padding=0,
+ pool_type='avg')
+ output = fluid.layers.fc(
+ input=output,
+ size=self.num_classes,
+ param_attr=ParamAttr(initializer=MSRA(), name='fc6_weights'),
+ bias_attr=ParamAttr(name='fc6_offset'))
+ return output
+
+ def conv_bn_layer(self,
+ input,
+ filter_size,
+ num_filters,
+ stride,
+ padding,
+ num_groups=1,
+ use_cudnn=True,
+ if_act=True,
+ name=None):
+ conv = fluid.layers.conv2d(
+ input=input,
+ num_filters=num_filters,
+ filter_size=filter_size,
+ stride=stride,
+ padding=padding,
+ groups=num_groups,
+ act=None,
+ use_cudnn=use_cudnn,
+ param_attr=ParamAttr(initializer=MSRA(), name=name + '_weights'),
+ bias_attr=False)
+ out = int((input.shape[2] - 1) / float(stride) + 1)
+ bn_name = name + '_bn'
+ if if_act:
+ return fluid.layers.batch_norm(
+ input=conv,
+ act='relu',
+ param_attr=ParamAttr(name=bn_name + "_scale"),
+ bias_attr=ParamAttr(name=bn_name + "_offset"),
+ moving_mean_name=bn_name + '_mean',
+ moving_variance_name=bn_name + '_variance')
+ else:
+ return fluid.layers.batch_norm(
+ input=conv,
+ param_attr=ParamAttr(name=bn_name + "_scale"),
+ bias_attr=ParamAttr(name=bn_name + "_offset"),
+ moving_mean_name=bn_name + '_mean',
+ moving_variance_name=bn_name + '_variance')
+
+ def channel_shuffle(self, x, groups):
+ num_channels = x.shape[1]
+ channels_per_group = num_channels // groups
+ x_shape = fluid.layers.shape(x)
+
+ # reshape
+ x = fluid.layers.reshape(
+ x=x,
+ shape=[
+ x_shape[0], groups, channels_per_group, x_shape[2], x_shape[3]
+ ])
+
+ x = fluid.layers.transpose(x=x, perm=[0, 2, 1, 3, 4])
+
+ # flatten
+ x = fluid.layers.reshape(
+ x=x, shape=[x_shape[0], num_channels, x_shape[2], x_shape[3]])
+
+ return x
+
+ def inverted_residual_unit(self,
+ input,
+ num_filters,
+ stride,
+ benchmodel,
+ name=None):
+ assert stride in [1, 2], \
+ "supported stride are {} but your stride is {}".format([1,2], stride)
+
+ oup_inc = num_filters // 2
+ inp = input.shape[1]
+
+ if benchmodel == 1:
+ x1, x2 = fluid.layers.split(
+ input,
+ num_or_sections=[input.shape[1] // 2, input.shape[1] // 2],
+ dim=1)
+
+ conv_pw = self.conv_bn_layer(
+ input=x2,
+ num_filters=oup_inc,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ num_groups=1,
+ if_act=True,
+ name='stage_' + name + '_conv1')
+
+ conv_dw = self.conv_bn_layer(
+ input=conv_pw,
+ num_filters=oup_inc,
+ filter_size=3,
+ stride=stride,
+ padding=1,
+ num_groups=oup_inc,
+ if_act=False,
+ use_cudnn=False,
+ name='stage_' + name + '_conv2')
+
+ conv_linear = self.conv_bn_layer(
+ input=conv_dw,
+ num_filters=oup_inc,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ num_groups=1,
+ if_act=True,
+ name='stage_' + name + '_conv3')
+
+ out = fluid.layers.concat([x1, conv_linear], axis=1)
+
+ else:
+ #branch1
+ conv_dw_1 = self.conv_bn_layer(
+ input=input,
+ num_filters=inp,
+ filter_size=3,
+ stride=stride,
+ padding=1,
+ num_groups=inp,
+ if_act=False,
+ use_cudnn=False,
+ name='stage_' + name + '_conv4')
+
+ conv_linear_1 = self.conv_bn_layer(
+ input=conv_dw_1,
+ num_filters=oup_inc,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ num_groups=1,
+ if_act=True,
+ name='stage_' + name + '_conv5')
+
+ #branch2
+ conv_pw_2 = self.conv_bn_layer(
+ input=input,
+ num_filters=oup_inc,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ num_groups=1,
+ if_act=True,
+ name='stage_' + name + '_conv1')
+
+ conv_dw_2 = self.conv_bn_layer(
+ input=conv_pw_2,
+ num_filters=oup_inc,
+ filter_size=3,
+ stride=stride,
+ padding=1,
+ num_groups=oup_inc,
+ if_act=False,
+ use_cudnn=False,
+ name='stage_' + name + '_conv2')
+
+ conv_linear_2 = self.conv_bn_layer(
+ input=conv_dw_2,
+ num_filters=oup_inc,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ num_groups=1,
+ if_act=True,
+ name='stage_' + name + '_conv3')
+ out = fluid.layers.concat([conv_linear_1, conv_linear_2], axis=1)
+
+ return self.channel_shuffle(out, 2)
diff --git a/paddlex/cv/nets/xception.py b/paddlex/cv/nets/xception.py
new file mode 100644
index 0000000000000000000000000000000000000000..297e0cedcb39d0649eea3d672347d1f216b3a902
--- /dev/null
+++ b/paddlex/cv/nets/xception.py
@@ -0,0 +1,332 @@
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import contextlib
+import paddle
+import math
+import paddle.fluid as fluid
+from .segmentation.model_utils.libs import scope, name_scope
+from .segmentation.model_utils.libs import bn, bn_relu, relu
+from .segmentation.model_utils.libs import conv
+from .segmentation.model_utils.libs import separate_conv
+
+__all__ = ['xception_65', 'xception_41', 'xception_71']
+
+
+def check_data(data, number):
+ if type(data) == int:
+ return [data] * number
+ assert len(data) == number
+ return data
+
+
+def check_stride(s, os):
+ if s <= os:
+ return True
+ else:
+ return False
+
+
+def check_points(count, points):
+ if points is None:
+ return False
+ else:
+ if isinstance(points, list):
+ return (True if count in points else False)
+ else:
+ return (True if count == points else False)
+
+
+class Xception():
+ def __init__(self,
+ num_classes=None,
+ layers=65,
+ output_stride=32,
+ end_points=None,
+ decode_points=None):
+ self.backbone = 'xception_' + str(layers)
+ self.num_classes = num_classes
+ self.output_stride = output_stride
+ self.output_stride = output_stride
+ self.end_points = end_points
+ self.decode_points = decode_points
+ self.bottleneck_params = self.gen_bottleneck_params(self.backbone)
+
+ def __call__(
+ self,
+ input,
+ ):
+ self.stride = 2
+ self.block_point = 0
+ self.short_cuts = dict()
+ with scope(self.backbone):
+ # Entry flow
+ data = self.entry_flow(input)
+ if check_points(self.block_point, self.end_points):
+ return data, self.short_cuts
+
+ # Middle flow
+ data = self.middle_flow(data)
+ if check_points(self.block_point, self.end_points):
+ return data, self.short_cuts
+
+ # Exit flow
+ data = self.exit_flow(data)
+ if check_points(self.block_point, self.end_points):
+ return data, self.short_cuts
+
+ if self.num_classes is not None:
+ data = fluid.layers.reduce_mean(data, [2, 3], keep_dim=True)
+ data = fluid.layers.dropout(data, 0.5)
+ stdv = 1.0 / math.sqrt(data.shape[1] * 1.0)
+ with scope("logit"):
+ out = fluid.layers.fc(
+ input=data,
+ size=self.num_classes,
+ act='softmax',
+ param_attr=fluid.param_attr.ParamAttr(
+ name='weights',
+ initializer=fluid.initializer.Uniform(-stdv, stdv)),
+ bias_attr=fluid.param_attr.ParamAttr(name='bias'))
+
+ return out
+ else:
+ return data
+
+ def gen_bottleneck_params(self, backbone='xception_65'):
+ if backbone == 'xception_65':
+ bottleneck_params = {
+ "entry_flow": (3, [2, 2, 2], [128, 256, 728]),
+ "middle_flow": (16, 1, 728),
+ "exit_flow": (2, [2, 1], [[728, 1024, 1024],
+ [1536, 1536, 2048]])
+ }
+ elif backbone == 'xception_41':
+ bottleneck_params = {
+ "entry_flow": (3, [2, 2, 2], [128, 256, 728]),
+ "middle_flow": (8, 1, 728),
+ "exit_flow": (2, [2, 1], [[728, 1024, 1024],
+ [1536, 1536, 2048]])
+ }
+ elif backbone == 'xception_71':
+ bottleneck_params = {
+ "entry_flow": (5, [2, 1, 2, 1, 2], [128, 256, 256, 728, 728]),
+ "middle_flow": (16, 1, 728),
+ "exit_flow": (2, [2, 1], [[728, 1024, 1024],
+ [1536, 1536, 2048]])
+ }
+ else:
+ raise Exception(
+ "xception backbont only support xception_41/xception_65/xception_71"
+ )
+ return bottleneck_params
+
+ def entry_flow(self, data):
+ param_attr = fluid.ParamAttr(
+ name=name_scope + 'weights',
+ regularizer=None,
+ initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.09))
+ with scope("entry_flow"):
+ with scope("conv1"):
+ data = bn_relu(
+ conv(
+ data,
+ 32,
+ 3,
+ stride=2,
+ padding=1,
+ param_attr=param_attr),
+ eps=1e-3)
+ with scope("conv2"):
+ data = bn_relu(
+ conv(
+ data,
+ 64,
+ 3,
+ stride=1,
+ padding=1,
+ param_attr=param_attr),
+ eps=1e-3)
+
+ # get entry flow params
+ block_num = self.bottleneck_params["entry_flow"][0]
+ strides = self.bottleneck_params["entry_flow"][1]
+ chns = self.bottleneck_params["entry_flow"][2]
+ strides = check_data(strides, block_num)
+ chns = check_data(chns, block_num)
+
+ # params to control your flow
+ s = self.stride
+ block_point = self.block_point
+ output_stride = self.output_stride
+ with scope("entry_flow"):
+ for i in range(block_num):
+ block_point = block_point + 1
+ with scope("block" + str(i + 1)):
+ stride = strides[i] if check_stride(
+ s * strides[i], output_stride) else 1
+ data, short_cuts = self.xception_block(
+ data, chns[i], [1, 1, stride])
+ s = s * stride
+ if check_points(block_point, self.decode_points):
+ self.short_cuts[block_point] = short_cuts[1]
+
+ self.stride = s
+ self.block_point = block_point
+ return data
+
+ def middle_flow(self, data):
+ block_num = self.bottleneck_params["middle_flow"][0]
+ strides = self.bottleneck_params["middle_flow"][1]
+ chns = self.bottleneck_params["middle_flow"][2]
+ strides = check_data(strides, block_num)
+ chns = check_data(chns, block_num)
+
+ # params to control your flow
+ s = self.stride
+ block_point = self.block_point
+ output_stride = self.output_stride
+ with scope("middle_flow"):
+ for i in range(block_num):
+ block_point = block_point + 1
+ with scope("block" + str(i + 1)):
+ stride = strides[i] if check_stride(
+ s * strides[i], output_stride) else 1
+ data, short_cuts = self.xception_block(
+ data, chns[i], [1, 1, strides[i]], skip_conv=False)
+ s = s * stride
+ if check_points(block_point, self.decode_points):
+ self.short_cuts[block_point] = short_cuts[1]
+
+ self.stride = s
+ self.block_point = block_point
+ return data
+
+ def exit_flow(self, data):
+ block_num = self.bottleneck_params["exit_flow"][0]
+ strides = self.bottleneck_params["exit_flow"][1]
+ chns = self.bottleneck_params["exit_flow"][2]
+ strides = check_data(strides, block_num)
+ chns = check_data(chns, block_num)
+
+ assert (block_num == 2)
+ # params to control your flow
+ s = self.stride
+ block_point = self.block_point
+ output_stride = self.output_stride
+ with scope("exit_flow"):
+ with scope('block1'):
+ block_point += 1
+ stride = strides[0] if check_stride(s * strides[0],
+ output_stride) else 1
+ data, short_cuts = self.xception_block(data, chns[0],
+ [1, 1, stride])
+ s = s * stride
+ if check_points(block_point, self.decode_points):
+ self.short_cuts[block_point] = short_cuts[1]
+ with scope('block2'):
+ block_point += 1
+ stride = strides[1] if check_stride(s * strides[1],
+ output_stride) else 1
+ data, short_cuts = self.xception_block(
+ data,
+ chns[1], [1, 1, stride],
+ dilation=2,
+ has_skip=False,
+ activation_fn_in_separable_conv=True)
+ s = s * stride
+ if check_points(block_point, self.decode_points):
+ self.short_cuts[block_point] = short_cuts[1]
+
+ self.stride = s
+ self.block_point = block_point
+ return data
+
+ def xception_block(self,
+ input,
+ channels,
+ strides=1,
+ filters=3,
+ dilation=1,
+ skip_conv=True,
+ has_skip=True,
+ activation_fn_in_separable_conv=False):
+ repeat_number = 3
+ channels = check_data(channels, repeat_number)
+ filters = check_data(filters, repeat_number)
+ strides = check_data(strides, repeat_number)
+ data = input
+ results = []
+ for i in range(repeat_number):
+ with scope('separable_conv' + str(i + 1)):
+ if not activation_fn_in_separable_conv:
+ data = relu(data)
+ data = separate_conv(
+ data,
+ channels[i],
+ strides[i],
+ filters[i],
+ dilation=dilation,
+ eps=1e-3)
+ else:
+ data = separate_conv(
+ data,
+ channels[i],
+ strides[i],
+ filters[i],
+ dilation=dilation,
+ act=relu,
+ eps=1e-3)
+ results.append(data)
+ if not has_skip:
+ return data, results
+ if skip_conv:
+ param_attr = fluid.ParamAttr(
+ name=name_scope + 'weights',
+ regularizer=None,
+ initializer=fluid.initializer.TruncatedNormal(
+ loc=0.0, scale=0.09))
+ with scope('shortcut'):
+ skip = bn(
+ conv(
+ input,
+ channels[-1],
+ 1,
+ strides[-1],
+ groups=1,
+ padding=0,
+ param_attr=param_attr),
+ eps=1e-3)
+ else:
+ skip = input
+ return data + skip, results
+
+
+def xception_65(num_classes=None):
+ model = Xception(num_classes, 65)
+ return model
+
+
+def xception_41(num_classes=None):
+ model = Xception(num_classes, 41)
+ return model
+
+
+def xception_71(num_classes=None):
+ model = Xception(num_classes, 71)
+ return model
diff --git a/paddlex/cv/transforms/__init__.py b/paddlex/cv/transforms/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..37c14e75f72f8c6b76a608116419d58437fab99e
--- /dev/null
+++ b/paddlex/cv/transforms/__init__.py
@@ -0,0 +1,17 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from . import cls_transforms
+from . import det_transforms
+from . import seg_transforms
diff --git a/paddlex/cv/transforms/box_utils.py b/paddlex/cv/transforms/box_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..a52631c65cbf883b16f051b4f2175a90c9c36fa4
--- /dev/null
+++ b/paddlex/cv/transforms/box_utils.py
@@ -0,0 +1,430 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import numpy as np
+import random
+import math
+import cv2
+import scipy
+
+
+def meet_emit_constraint(src_bbox, sample_bbox):
+ center_x = (src_bbox[2] + src_bbox[0]) / 2
+ center_y = (src_bbox[3] + src_bbox[1]) / 2
+ if center_x >= sample_bbox[0] and \
+ center_x <= sample_bbox[2] and \
+ center_y >= sample_bbox[1] and \
+ center_y <= sample_bbox[3]:
+ return True
+ return False
+
+
+def clip_bbox(src_bbox):
+ src_bbox[0] = max(min(src_bbox[0], 1.0), 0.0)
+ src_bbox[1] = max(min(src_bbox[1], 1.0), 0.0)
+ src_bbox[2] = max(min(src_bbox[2], 1.0), 0.0)
+ src_bbox[3] = max(min(src_bbox[3], 1.0), 0.0)
+ return src_bbox
+
+
+def bbox_area(src_bbox):
+ if src_bbox[2] < src_bbox[0] or src_bbox[3] < src_bbox[1]:
+ return 0.
+ else:
+ width = src_bbox[2] - src_bbox[0]
+ height = src_bbox[3] - src_bbox[1]
+ return width * height
+
+
+def is_overlap(object_bbox, sample_bbox):
+ if object_bbox[0] >= sample_bbox[2] or \
+ object_bbox[2] <= sample_bbox[0] or \
+ object_bbox[1] >= sample_bbox[3] or \
+ object_bbox[3] <= sample_bbox[1]:
+ return False
+ else:
+ return True
+
+
+def filter_and_process(sample_bbox, bboxes, labels, scores=None):
+ new_bboxes = []
+ new_labels = []
+ new_scores = []
+ for i in range(len(bboxes)):
+ new_bbox = [0, 0, 0, 0]
+ obj_bbox = [bboxes[i][0], bboxes[i][1], bboxes[i][2], bboxes[i][3]]
+ if not meet_emit_constraint(obj_bbox, sample_bbox):
+ continue
+ if not is_overlap(obj_bbox, sample_bbox):
+ continue
+ sample_width = sample_bbox[2] - sample_bbox[0]
+ sample_height = sample_bbox[3] - sample_bbox[1]
+ new_bbox[0] = (obj_bbox[0] - sample_bbox[0]) / sample_width
+ new_bbox[1] = (obj_bbox[1] - sample_bbox[1]) / sample_height
+ new_bbox[2] = (obj_bbox[2] - sample_bbox[0]) / sample_width
+ new_bbox[3] = (obj_bbox[3] - sample_bbox[1]) / sample_height
+ new_bbox = clip_bbox(new_bbox)
+ if bbox_area(new_bbox) > 0:
+ new_bboxes.append(new_bbox)
+ new_labels.append([labels[i][0]])
+ if scores is not None:
+ new_scores.append([scores[i][0]])
+ bboxes = np.array(new_bboxes)
+ labels = np.array(new_labels)
+ scores = np.array(new_scores)
+ return bboxes, labels, scores
+
+
+def bbox_area_sampling(bboxes, labels, scores, target_size, min_size):
+ new_bboxes = []
+ new_labels = []
+ new_scores = []
+ for i, bbox in enumerate(bboxes):
+ w = float((bbox[2] - bbox[0]) * target_size)
+ h = float((bbox[3] - bbox[1]) * target_size)
+ if w * h < float(min_size * min_size):
+ continue
+ else:
+ new_bboxes.append(bbox)
+ new_labels.append(labels[i])
+ if scores is not None and scores.size != 0:
+ new_scores.append(scores[i])
+ bboxes = np.array(new_bboxes)
+ labels = np.array(new_labels)
+ scores = np.array(new_scores)
+ return bboxes, labels, scores
+
+
+def generate_sample_bbox(sampler):
+ scale = np.random.uniform(sampler[2], sampler[3])
+ aspect_ratio = np.random.uniform(sampler[4], sampler[5])
+ aspect_ratio = max(aspect_ratio, (scale**2.0))
+ aspect_ratio = min(aspect_ratio, 1 / (scale**2.0))
+ bbox_width = scale * (aspect_ratio**0.5)
+ bbox_height = scale / (aspect_ratio**0.5)
+ xmin_bound = 1 - bbox_width
+ ymin_bound = 1 - bbox_height
+ xmin = np.random.uniform(0, xmin_bound)
+ ymin = np.random.uniform(0, ymin_bound)
+ xmax = xmin + bbox_width
+ ymax = ymin + bbox_height
+ sampled_bbox = [xmin, ymin, xmax, ymax]
+ return sampled_bbox
+
+
+def generate_sample_bbox_square(sampler, image_width, image_height):
+ scale = np.random.uniform(sampler[2], sampler[3])
+ aspect_ratio = np.random.uniform(sampler[4], sampler[5])
+ aspect_ratio = max(aspect_ratio, (scale**2.0))
+ aspect_ratio = min(aspect_ratio, 1 / (scale**2.0))
+ bbox_width = scale * (aspect_ratio**0.5)
+ bbox_height = scale / (aspect_ratio**0.5)
+ if image_height < image_width:
+ bbox_width = bbox_height * image_height / image_width
+ else:
+ bbox_height = bbox_width * image_width / image_height
+ xmin_bound = 1 - bbox_width
+ ymin_bound = 1 - bbox_height
+ xmin = np.random.uniform(0, xmin_bound)
+ ymin = np.random.uniform(0, ymin_bound)
+ xmax = xmin + bbox_width
+ ymax = ymin + bbox_height
+ sampled_bbox = [xmin, ymin, xmax, ymax]
+ return sampled_bbox
+
+
+def data_anchor_sampling(bbox_labels, image_width, image_height, scale_array,
+ resize_width):
+ num_gt = len(bbox_labels)
+ # np.random.randint range: [low, high)
+ rand_idx = np.random.randint(0, num_gt) if num_gt != 0 else 0
+
+ if num_gt != 0:
+ norm_xmin = bbox_labels[rand_idx][0]
+ norm_ymin = bbox_labels[rand_idx][1]
+ norm_xmax = bbox_labels[rand_idx][2]
+ norm_ymax = bbox_labels[rand_idx][3]
+
+ xmin = norm_xmin * image_width
+ ymin = norm_ymin * image_height
+ wid = image_width * (norm_xmax - norm_xmin)
+ hei = image_height * (norm_ymax - norm_ymin)
+ range_size = 0
+
+ area = wid * hei
+ for scale_ind in range(0, len(scale_array) - 1):
+ if area > scale_array[scale_ind] ** 2 and area < \
+ scale_array[scale_ind + 1] ** 2:
+ range_size = scale_ind + 1
+ break
+
+ if area > scale_array[len(scale_array) - 2]**2:
+ range_size = len(scale_array) - 2
+
+ scale_choose = 0.0
+ if range_size == 0:
+ rand_idx_size = 0
+ else:
+ # np.random.randint range: [low, high)
+ rng_rand_size = np.random.randint(0, range_size + 1)
+ rand_idx_size = rng_rand_size % (range_size + 1)
+
+ if rand_idx_size == range_size:
+ min_resize_val = scale_array[rand_idx_size] / 2.0
+ max_resize_val = min(2.0 * scale_array[rand_idx_size],
+ 2 * math.sqrt(wid * hei))
+ scale_choose = random.uniform(min_resize_val, max_resize_val)
+ else:
+ min_resize_val = scale_array[rand_idx_size] / 2.0
+ max_resize_val = 2.0 * scale_array[rand_idx_size]
+ scale_choose = random.uniform(min_resize_val, max_resize_val)
+
+ sample_bbox_size = wid * resize_width / scale_choose
+
+ w_off_orig = 0.0
+ h_off_orig = 0.0
+ if sample_bbox_size < max(image_height, image_width):
+ if wid <= sample_bbox_size:
+ w_off_orig = np.random.uniform(xmin + wid - sample_bbox_size,
+ xmin)
+ else:
+ w_off_orig = np.random.uniform(xmin,
+ xmin + wid - sample_bbox_size)
+
+ if hei <= sample_bbox_size:
+ h_off_orig = np.random.uniform(ymin + hei - sample_bbox_size,
+ ymin)
+ else:
+ h_off_orig = np.random.uniform(ymin,
+ ymin + hei - sample_bbox_size)
+
+ else:
+ w_off_orig = np.random.uniform(image_width - sample_bbox_size, 0.0)
+ h_off_orig = np.random.uniform(image_height - sample_bbox_size,
+ 0.0)
+
+ w_off_orig = math.floor(w_off_orig)
+ h_off_orig = math.floor(h_off_orig)
+
+ # Figure out top left coordinates.
+ w_off = float(w_off_orig / image_width)
+ h_off = float(h_off_orig / image_height)
+
+ sampled_bbox = [
+ w_off, h_off, w_off + float(sample_bbox_size / image_width),
+ h_off + float(sample_bbox_size / image_height)
+ ]
+ return sampled_bbox
+ else:
+ return 0
+
+
+def jaccard_overlap(sample_bbox, object_bbox):
+ if sample_bbox[0] >= object_bbox[2] or \
+ sample_bbox[2] <= object_bbox[0] or \
+ sample_bbox[1] >= object_bbox[3] or \
+ sample_bbox[3] <= object_bbox[1]:
+ return 0
+ intersect_xmin = max(sample_bbox[0], object_bbox[0])
+ intersect_ymin = max(sample_bbox[1], object_bbox[1])
+ intersect_xmax = min(sample_bbox[2], object_bbox[2])
+ intersect_ymax = min(sample_bbox[3], object_bbox[3])
+ intersect_size = (intersect_xmax - intersect_xmin) * (
+ intersect_ymax - intersect_ymin)
+ sample_bbox_size = bbox_area(sample_bbox)
+ object_bbox_size = bbox_area(object_bbox)
+ overlap = intersect_size / (
+ sample_bbox_size + object_bbox_size - intersect_size)
+ return overlap
+
+
+def intersect_bbox(bbox1, bbox2):
+ if bbox2[0] > bbox1[2] or bbox2[2] < bbox1[0] or \
+ bbox2[1] > bbox1[3] or bbox2[3] < bbox1[1]:
+ intersection_box = [0.0, 0.0, 0.0, 0.0]
+ else:
+ intersection_box = [
+ max(bbox1[0], bbox2[0]),
+ max(bbox1[1], bbox2[1]),
+ min(bbox1[2], bbox2[2]),
+ min(bbox1[3], bbox2[3])
+ ]
+ return intersection_box
+
+
+def bbox_coverage(bbox1, bbox2):
+ inter_box = intersect_bbox(bbox1, bbox2)
+ intersect_size = bbox_area(inter_box)
+
+ if intersect_size > 0:
+ bbox1_size = bbox_area(bbox1)
+ return intersect_size / bbox1_size
+ else:
+ return 0.
+
+
+def satisfy_sample_constraint(sampler,
+ sample_bbox,
+ gt_bboxes,
+ satisfy_all=False):
+ if sampler[6] == 0 and sampler[7] == 0:
+ return True
+ satisfied = []
+ for i in range(len(gt_bboxes)):
+ object_bbox = [
+ gt_bboxes[i][0], gt_bboxes[i][1], gt_bboxes[i][2], gt_bboxes[i][3]
+ ]
+ overlap = jaccard_overlap(sample_bbox, object_bbox)
+ if sampler[6] != 0 and \
+ overlap < sampler[6]:
+ satisfied.append(False)
+ continue
+ if sampler[7] != 0 and \
+ overlap > sampler[7]:
+ satisfied.append(False)
+ continue
+ satisfied.append(True)
+ if not satisfy_all:
+ return True
+
+ if satisfy_all:
+ return np.all(satisfied)
+ else:
+ return False
+
+
+def satisfy_sample_constraint_coverage(sampler, sample_bbox, gt_bboxes):
+ if sampler[6] == 0 and sampler[7] == 0:
+ has_jaccard_overlap = False
+ else:
+ has_jaccard_overlap = True
+ if sampler[8] == 0 and sampler[9] == 0:
+ has_object_coverage = False
+ else:
+ has_object_coverage = True
+
+ if not has_jaccard_overlap and not has_object_coverage:
+ return True
+ found = False
+ for i in range(len(gt_bboxes)):
+ object_bbox = [
+ gt_bboxes[i][0], gt_bboxes[i][1], gt_bboxes[i][2], gt_bboxes[i][3]
+ ]
+ if has_jaccard_overlap:
+ overlap = jaccard_overlap(sample_bbox, object_bbox)
+ if sampler[6] != 0 and \
+ overlap < sampler[6]:
+ continue
+ if sampler[7] != 0 and \
+ overlap > sampler[7]:
+ continue
+ found = True
+ if has_object_coverage:
+ object_coverage = bbox_coverage(object_bbox, sample_bbox)
+ if sampler[8] != 0 and \
+ object_coverage < sampler[8]:
+ continue
+ if sampler[9] != 0 and \
+ object_coverage > sampler[9]:
+ continue
+ found = True
+ if found:
+ return True
+ return found
+
+
+def crop_image_sampling(img, sample_bbox, image_width, image_height,
+ target_size):
+ # no clipping here
+ xmin = int(sample_bbox[0] * image_width)
+ xmax = int(sample_bbox[2] * image_width)
+ ymin = int(sample_bbox[1] * image_height)
+ ymax = int(sample_bbox[3] * image_height)
+
+ w_off = xmin
+ h_off = ymin
+ width = xmax - xmin
+ height = ymax - ymin
+ cross_xmin = max(0.0, float(w_off))
+ cross_ymin = max(0.0, float(h_off))
+ cross_xmax = min(float(w_off + width - 1.0), float(image_width))
+ cross_ymax = min(float(h_off + height - 1.0), float(image_height))
+ cross_width = cross_xmax - cross_xmin
+ cross_height = cross_ymax - cross_ymin
+
+ roi_xmin = 0 if w_off >= 0 else abs(w_off)
+ roi_ymin = 0 if h_off >= 0 else abs(h_off)
+ roi_width = cross_width
+ roi_height = cross_height
+
+ roi_y1 = int(roi_ymin)
+ roi_y2 = int(roi_ymin + roi_height)
+ roi_x1 = int(roi_xmin)
+ roi_x2 = int(roi_xmin + roi_width)
+
+ cross_y1 = int(cross_ymin)
+ cross_y2 = int(cross_ymin + cross_height)
+ cross_x1 = int(cross_xmin)
+ cross_x2 = int(cross_xmin + cross_width)
+
+ sample_img = np.zeros((height, width, 3))
+ sample_img[roi_y1: roi_y2, roi_x1: roi_x2] = \
+ img[cross_y1: cross_y2, cross_x1: cross_x2]
+
+ sample_img = cv2.resize(
+ sample_img, (target_size, target_size), interpolation=cv2.INTER_AREA)
+
+ return sample_img
+
+
+def box_horizontal_flip(bboxes, width):
+ oldx1 = bboxes[:, 0].copy()
+ oldx2 = bboxes[:, 2].copy()
+ bboxes[:, 0] = width - oldx2 - 1
+ bboxes[:, 2] = width - oldx1 - 1
+ if bboxes.shape[0] != 0 and (bboxes[:, 2] < bboxes[:, 0]).all():
+ raise ValueError(
+ "RandomHorizontalFlip: invalid box, x2 should be greater than x1")
+ return bboxes
+
+
+def segms_horizontal_flip(segms, height, width):
+ def _flip_poly(poly, width):
+ flipped_poly = np.array(poly)
+ flipped_poly[0::2] = width - np.array(poly[0::2]) - 1
+ return flipped_poly.tolist()
+
+ def _flip_rle(rle, height, width):
+ if 'counts' in rle and type(rle['counts']) == list:
+ rle = mask_util.frPyObjects([rle], height, width)
+ mask = mask_util.decode(rle)
+ mask = mask[:, ::-1, :]
+ rle = mask_util.encode(np.array(mask, order='F', dtype=np.uint8))
+ return rle
+
+ def is_poly(segm):
+ if not isinstance(segm, (list, dict)):
+ raise Exception("Invalid segm type: {}".format(type(segm)))
+ return isinstance(segm, list)
+
+ flipped_segms = []
+ for segm in segms:
+ if is_poly(segm):
+ # Polygon format
+ flipped_segms.append([_flip_poly(poly, width) for poly in segm])
+ else:
+ # RLE format
+ import pycocotools.mask as mask_util
+ flipped_segms.append(_flip_rle(segm, height, width))
+ return flipped_segms
diff --git a/paddlex/cv/transforms/cls_transforms.py b/paddlex/cv/transforms/cls_transforms.py
new file mode 100644
index 0000000000000000000000000000000000000000..85e37bda45dc68f900e154f86342562e70d5e6b8
--- /dev/null
+++ b/paddlex/cv/transforms/cls_transforms.py
@@ -0,0 +1,433 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .ops import *
+import random
+import os.path as osp
+import numpy as np
+from PIL import Image, ImageEnhance
+
+
+class Compose:
+ """根据数据预处理/增强算子对输入数据进行操作。
+ 所有操作的输入图像流形状均是[H, W, C],其中H为图像高,W为图像宽,C为图像通道数。
+
+ Args:
+ transforms (list): 数据预处理/增强算子。
+
+ Raises:
+ TypeError: 形参数据类型不满足需求。
+ ValueError: 数据长度不匹配。
+ """
+
+ def __init__(self, transforms):
+ if not isinstance(transforms, list):
+ raise TypeError('The transforms must be a list!')
+ if len(transforms) < 1:
+ raise ValueError('The length of transforms ' + \
+ 'must be equal or larger than 1!')
+ self.transforms = transforms
+
+ def __call__(self, im, label=None):
+ """
+ Args:
+ im (str/np.ndarray): 图像路径/图像np.ndarray数据。
+ label (int): 每张图像所对应的类别序号。
+ Returns:
+ tuple: 根据网络所需字段所组成的tuple;
+ 字段由transforms中的最后一个数据预处理操作决定。
+ """
+ im = cv2.imread(im).astype('float32')
+ if im is None:
+ raise TypeError('Can\'t read The image file {}!'.format(im))
+ im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
+ for op in self.transforms:
+ outputs = op(im, label)
+ im = outputs[0]
+ if len(outputs) == 2:
+ label = outputs[1]
+ return outputs
+
+
+class RandomCrop:
+ """对图像进行随机剪裁,模型训练时的数据增强操作。
+
+ 1. 根据lower_scale、lower_ratio、upper_ratio计算随机剪裁的高、宽。
+ 2. 根据随机剪裁的高、宽随机选取剪裁的起始点。
+ 3. 剪裁图像。
+ 4. 调整剪裁后的图像的大小到crop_size*crop_size。
+
+ Args:
+ crop_size (int): 随机裁剪后重新调整的目标边长。默认为224。
+ lower_scale (float): 裁剪面积相对原面积比例的最小限制。默认为0.88。
+ lower_ratio (float): 宽变换比例的最小限制。默认为3. / 4。
+ upper_ratio (float): 宽变换比例的最大限制。默认为4. / 3。
+ """
+
+ def __init__(self,
+ crop_size=224,
+ lower_scale=0.88,
+ lower_ratio=3. / 4,
+ upper_ratio=4. / 3):
+ self.crop_size = crop_size
+ self.lower_scale = lower_scale
+ self.lower_ratio = lower_ratio
+ self.upper_ratio = upper_ratio
+
+ def __call__(self, im, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ label (int): 每张图像所对应的类别序号。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据;
+ 当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。
+ """
+ im = random_crop(im, self.crop_size, self.lower_scale,
+ self.lower_ratio, self.upper_ratio)
+ if label is None:
+ return (im, )
+ else:
+ return (im, label)
+
+
+class RandomHorizontalFlip:
+ """以一定的概率对图像进行随机水平翻转,模型训练时的数据增强操作。
+
+ Args:
+ prob (float): 随机水平翻转的概率。默认为0.5。
+ """
+
+ def __init__(self, prob=0.5):
+ self.prob = prob
+
+ def __call__(self, im, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ label (int): 每张图像所对应的类别序号。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据;
+ 当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。
+ """
+ if random.random() < self.prob:
+ im = horizontal_flip(im)
+ if label is None:
+ return (im, )
+ else:
+ return (im, label)
+
+
+class RandomVerticalFlip:
+ """以一定的概率对图像进行随机垂直翻转,模型训练时的数据增强操作。
+
+ Args:
+ prob (float): 随机垂直翻转的概率。默认为0.5。
+ """
+
+ def __init__(self, prob=0.5):
+ self.prob = prob
+
+ def __call__(self, im, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ label (int): 每张图像所对应的类别序号。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据;
+ 当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。
+ """
+ if random.random() < self.prob:
+ im = vertical_flip(im)
+ if label is None:
+ return (im, )
+ else:
+ return (im, label)
+
+
+class Normalize:
+ """对图像进行标准化。
+
+ 1. 对图像进行归一化到区间[0.0, 1.0]。
+ 2. 对图像进行减均值除以标准差操作。
+
+ Args:
+ mean (list): 图像数据集的均值。默认为[0.485, 0.456, 0.406]。
+ std (list): 图像数据集的标准差。默认为[0.229, 0.224, 0.225]。
+
+ """
+
+ def __init__(self, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
+ self.mean = mean
+ self.std = std
+
+ def __call__(self, im, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ label (int): 每张图像所对应的类别序号。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据;
+ 当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。
+ """
+ mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
+ std = np.array(self.std)[np.newaxis, np.newaxis, :]
+ im = normalize(im, mean, std)
+ if label is None:
+ return (im, )
+ else:
+ return (im, label)
+
+
+class ResizeByShort:
+ """根据图像短边对图像重新调整大小(resize)。
+
+ 1. 获取图像的长边和短边长度。
+ 2. 根据短边与short_size的比例,计算长边的目标长度,
+ 此时高、宽的resize比例为short_size/原图短边长度。
+ 3. 如果max_size>0,调整resize比例:
+ 如果长边的目标长度>max_size,则高、宽的resize比例为max_size/原图长边长度;
+ 4. 根据调整大小的比例对图像进行resize。
+
+ Args:
+ short_size (int): 调整大小后的图像目标短边长度。默认为256。
+ max_size (int): 长边目标长度的最大限制。默认为-1。
+ """
+
+ def __init__(self, short_size=256, max_size=-1):
+ self.short_size = short_size
+ self.max_size = max_size
+
+ def __call__(self, im, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ label (int): 每张图像所对应的类别序号。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据;
+ 当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。
+ """
+ im_short_size = min(im.shape[0], im.shape[1])
+ im_long_size = max(im.shape[0], im.shape[1])
+ scale = float(self.short_size) / im_short_size
+ if self.max_size > 0 and np.round(
+ scale * im_long_size) > self.max_size:
+ scale = float(self.max_size) / float(im_long_size)
+ resized_width = int(round(im.shape[1] * scale))
+ resized_height = int(round(im.shape[0] * scale))
+ im = cv2.resize(
+ im, (resized_width, resized_height),
+ interpolation=cv2.INTER_LINEAR)
+
+ if label is None:
+ return (im, )
+ else:
+ return (im, label)
+
+
+class CenterCrop:
+ """以图像中心点扩散裁剪长宽为`crop_size`的正方形
+
+ 1. 计算剪裁的起始点。
+ 2. 剪裁图像。
+
+ Args:
+ crop_size (int): 裁剪的目标边长。默认为224。
+ """
+
+ def __init__(self, crop_size=224):
+ self.crop_size = crop_size
+
+ def __call__(self, im, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ label (int): 每张图像所对应的类别序号。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据;
+ 当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。
+ """
+ im = center_crop(im, self.crop_size)
+ if label is None:
+ return (im, )
+ else:
+ return (im, label)
+
+
+class RandomRotate:
+ def __init__(self, rotate_range=30, prob=0.5):
+ """以一定的概率对图像在[-rotate_range, rotaterange]角度范围内进行旋转,模型训练时的数据增强操作。
+
+ Args:
+ rotate_range (int): 旋转度数的范围。默认为30。
+ prob (float): 随机旋转的概率。默认为0.5。
+ """
+ self.rotate_range = rotate_range
+ self.prob = prob
+
+ def __call__(self, im, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ label (int): 每张图像所对应的类别序号。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据;
+ 当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。
+ """
+ rotate_lower = -self.rotate_range
+ rotate_upper = self.rotate_range
+ im = im.astype('uint8')
+ im = Image.fromarray(im)
+ if np.random.uniform(0, 1) < self.prob:
+ im = rotate(im, rotate_lower, rotate_upper)
+ im = np.asarray(im).astype('float32')
+ if label is None:
+ return (im, )
+ else:
+ return (im, label)
+
+
+class RandomDistort:
+ """以一定的概率对图像进行随机像素内容变换,模型训练时的数据增强操作。
+
+ 1. 对变换的操作顺序进行随机化操作。
+ 2. 按照1中的顺序以一定的概率对图像在范围[-range, range]内进行随机像素内容变换。
+
+ Args:
+ brightness_range (float): 明亮度因子的范围。默认为0.9。
+ brightness_prob (float): 随机调整明亮度的概率。默认为0.5。
+ contrast_range (float): 对比度因子的范围。默认为0.9。
+ contrast_prob (float): 随机调整对比度的概率。默认为0.5。
+ saturation_range (float): 饱和度因子的范围。默认为0.9。
+ saturation_prob (float): 随机调整饱和度的概率。默认为0.5。
+ hue_range (int): 色调因子的范围。默认为18。
+ hue_prob (float): 随机调整色调的概率。默认为0.5。
+ """
+
+ def __init__(self,
+ brightness_range=0.9,
+ brightness_prob=0.5,
+ contrast_range=0.9,
+ contrast_prob=0.5,
+ saturation_range=0.9,
+ saturation_prob=0.5,
+ hue_range=18,
+ hue_prob=0.5):
+ self.brightness_range = brightness_range
+ self.brightness_prob = brightness_prob
+ self.contrast_range = contrast_range
+ self.contrast_prob = contrast_prob
+ self.saturation_range = saturation_range
+ self.saturation_prob = saturation_prob
+ self.hue_range = hue_range
+ self.hue_prob = hue_prob
+
+ def __call__(self, im, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ label (int): 每张图像所对应的类别序号。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据;
+ 当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。
+ """
+ brightness_lower = 1 - self.brightness_range
+ brightness_upper = 1 + self.brightness_range
+ contrast_lower = 1 - self.contrast_range
+ contrast_upper = 1 + self.contrast_range
+ saturation_lower = 1 - self.saturation_range
+ saturation_upper = 1 + self.saturation_range
+ hue_lower = -self.hue_range
+ hue_upper = self.hue_range
+ ops = [brightness, contrast, saturation, hue]
+ random.shuffle(ops)
+ params_dict = {
+ 'brightness': {
+ 'brightness_lower': brightness_lower,
+ 'brightness_upper': brightness_upper
+ },
+ 'contrast': {
+ 'contrast_lower': contrast_lower,
+ 'contrast_upper': contrast_upper
+ },
+ 'saturation': {
+ 'saturation_lower': saturation_lower,
+ 'saturation_upper': saturation_upper
+ },
+ 'hue': {
+ 'hue_lower': hue_lower,
+ 'hue_upper': hue_upper
+ }
+ }
+ prob_dict = {
+ 'brightness': self.brightness_prob,
+ 'contrast': self.contrast_prob,
+ 'saturation': self.saturation_prob,
+ 'hue': self.hue_prob,
+ }
+ im = im.astype('uint8')
+ im = Image.fromarray(im)
+ for id in range(len(ops)):
+ params = params_dict[ops[id].__name__]
+ prob = prob_dict[ops[id].__name__]
+ params['im'] = im
+ if np.random.uniform(0, 1) < prob:
+ im = ops[id](**params)
+ im = np.asarray(im).astype('float32')
+ if label is None:
+ return (im, )
+ else:
+ return (im, label)
+
+
+class ArrangeClassifier:
+ """获取训练/验证/预测所需信息。注意:此操作不需用户自己显示调用
+
+ Args:
+ mode (str): 指定数据用于何种用途,取值范围为['train', 'eval', 'test', 'quant']。
+
+ Raises:
+ ValueError: mode的取值不在['train', 'eval', 'test', 'quant']之内。
+ """
+
+ def __init__(self, mode=None):
+ if mode not in ['train', 'eval', 'test', 'quant']:
+ raise ValueError(
+ "mode must be in ['train', 'eval', 'test', 'quant']!")
+ self.mode = mode
+
+ def __call__(self, im, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ label (int): 每张图像所对应的类别序号。
+
+ Returns:
+ tuple: 当mode为'train'或'eval'时,返回(im, label),分别对应图像np.ndarray数据、
+ 图像类别id;当mode为'test'或'quant'时,返回(im, ),对应图像np.ndarray数据。
+ """
+ im = permute(im, False)
+ if self.mode == 'train' or self.mode == 'eval':
+ outputs = (im, label)
+ else:
+ outputs = (im, )
+ return outputs
diff --git a/paddlex/cv/transforms/det_transforms.py b/paddlex/cv/transforms/det_transforms.py
new file mode 100644
index 0000000000000000000000000000000000000000..118005fff28b2406a229de4fc17540484856a8bc
--- /dev/null
+++ b/paddlex/cv/transforms/det_transforms.py
@@ -0,0 +1,1176 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .ops import *
+from .box_utils import *
+import random
+import os.path as osp
+import numpy as np
+from PIL import Image, ImageEnhance
+import cv2
+
+
+class Compose:
+ """根据数据预处理/增强列表对输入数据进行操作。
+ 所有操作的输入图像流形状均是[H, W, C],其中H为图像高,W为图像宽,C为图像通道数。
+
+ Args:
+ transforms (list): 数据预处理/增强列表。
+
+ Raises:
+ TypeError: 形参数据类型不满足需求。
+ ValueError: 数据长度不匹配。
+ """
+
+ def __init__(self, transforms):
+ if not isinstance(transforms, list):
+ raise TypeError('The transforms must be a list!')
+ if len(transforms) < 1:
+ raise ValueError('The length of transforms ' + \
+ 'must be equal or larger than 1!')
+ self.transforms = transforms
+
+ def __call__(self, im, im_info=None, label_info=None):
+ """
+ Args:
+ im (str/np.ndarray): 图像路径/图像np.ndarray数据。
+ im_info (dict): 存储与图像相关的信息,dict中的字段如下:
+ - im_id (np.ndarray): 图像序列号,形状为(1,)。
+ - origin_shape (np.ndarray): 图像原始大小,形状为(2,),
+ origin_shape[0]为高,origin_shape[1]为宽。
+ - mixup (list): list为[im, im_info, label_info],分别对应
+ 与当前图像进行mixup的图像np.ndarray数据、图像相关信息、标注框相关信息;
+ 注意,当前epoch若无需进行mixup,则无该字段。
+ label_info (dict): 存储与标注框相关的信息,dict中的字段如下:
+ - gt_bbox (np.ndarray): 真实标注框坐标[x1, y1, x2, y2],形状为(n, 4),
+ 其中n代表真实标注框的个数。
+ - gt_class (np.ndarray): 每个真实标注框对应的类别序号,形状为(n, 1),
+ 其中n代表真实标注框的个数。
+ - gt_score (np.ndarray): 每个真实标注框对应的混合得分,形状为(n, 1),
+ 其中n代表真实标注框的个数。
+ - gt_poly (list): 每个真实标注框内的多边形分割区域,每个分割区域由点的x、y坐标组成,
+ 长度为n,其中n代表真实标注框的个数。
+ - is_crowd (np.ndarray): 每个真实标注框中是否是一组对象,形状为(n, 1),
+ 其中n代表真实标注框的个数。
+ - difficult (np.ndarray): 每个真实标注框中的对象是否为难识别对象,形状为(n, 1),
+ 其中n代表真实标注框的个数。
+ Returns:
+ tuple: 根据网络所需字段所组成的tuple;
+ 字段由transforms中的最后一个数据预处理操作决定。
+ """
+
+ def decode_image(im_file, im_info, label_info):
+ if im_info is None:
+ im_info = dict()
+ im = cv2.imread(im_file).astype('float32')
+ if im is None:
+ raise TypeError(
+ 'Can\'t read The image file {}!'.format(im_file))
+ im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
+ # make default im_info with [h, w, 1]
+ im_info['im_resize_info'] = np.array(
+ [im.shape[0], im.shape[1], 1.], dtype=np.float32)
+ # copy augment_shape from origin_shape
+ im_info['augment_shape'] = np.array([im.shape[0],
+ im.shape[1]]).astype('int32')
+ # decode mixup image
+ if 'mixup' in im_info:
+ im_info['mixup'] = \
+ decode_image(im_info['mixup'][0],
+ im_info['mixup'][1],
+ im_info['mixup'][2])
+ if label_info is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label_info)
+
+ outputs = decode_image(im, im_info, label_info)
+ im = outputs[0]
+ im_info = outputs[1]
+ if len(outputs) == 3:
+ label_info = outputs[2]
+ for op in self.transforms:
+ if im is None:
+ return None
+ outputs = op(im, im_info, label_info)
+ im = outputs[0]
+ return outputs
+
+
+class ResizeByShort:
+ """根据图像的短边调整图像大小(resize)。
+
+ 1. 获取图像的长边和短边长度。
+ 2. 根据短边与short_size的比例,计算长边的目标长度,
+ 此时高、宽的resize比例为short_size/原图短边长度。
+ 3. 如果max_size>0,调整resize比例:
+ 如果长边的目标长度>max_size,则高、宽的resize比例为max_size/原图长边长度。
+ 4. 根据调整大小的比例对图像进行resize。
+
+ Args:
+ target_size (int): 短边目标长度。默认为800。
+ max_size (int): 长边目标长度的最大限制。默认为1333。
+
+ Raises:
+ TypeError: 形参数据类型不满足需求。
+ """
+
+ def __init__(self, short_size=800, max_size=1333):
+ self.max_size = int(max_size)
+ if not isinstance(short_size, int):
+ raise TypeError(
+ "Type of short_size is invalid. Must be Integer, now is {}".
+ format(type(short_size)))
+ self.short_size = short_size
+ if not (isinstance(self.max_size, int)):
+ raise TypeError("max_size: input type is invalid.")
+
+ def __call__(self, im, im_info=None, label_info=None):
+ """
+ Args:
+ im (numnp.ndarraypy): 图像np.ndarray数据。
+ im_info (dict, 可选): 存储与图像相关的信息。
+ label_info (dict, 可选): 存储与标注框相关的信息。
+
+ Returns:
+ tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、
+ 存储与标注框相关信息的字典。
+ 其中,im_info更新字段为:
+ - im_resize_info (np.ndarray): resize后的图像高、resize后的图像宽、resize后的图像相对原始图的缩放比例
+ 三者组成的np.ndarray,形状为(3,)。
+
+ Raises:
+ TypeError: 形参数据类型不满足需求。
+ ValueError: 数据长度不匹配。
+ """
+ if im_info is None:
+ im_info = dict()
+ if not isinstance(im, np.ndarray):
+ raise TypeError("ResizeByShort: image type is not numpy.")
+ if len(im.shape) != 3:
+ raise ValueError('ResizeByShort: image is not 3-dimensional.')
+ im_short_size = min(im.shape[0], im.shape[1])
+ im_long_size = max(im.shape[0], im.shape[1])
+ scale = float(self.short_size) / im_short_size
+ if self.max_size > 0 and np.round(
+ scale * im_long_size) > self.max_size:
+ scale = float(self.max_size) / float(im_long_size)
+ resized_width = int(round(im.shape[1] * scale))
+ resized_height = int(round(im.shape[0] * scale))
+ im_resize_info = [resized_height, resized_width, scale]
+ im = cv2.resize(
+ im, (resized_width, resized_height),
+ interpolation=cv2.INTER_LINEAR)
+ im_info['im_resize_info'] = np.array(im_resize_info).astype(np.float32)
+ if label_info is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label_info)
+
+
+class Padding:
+ """将图像的长和宽padding至coarsest_stride的倍数。如输入图像为[300, 640],
+ `coarest_stride`为32,则由于300不为32的倍数,因此在图像最右和最下使用0值
+ 进行padding,最终输出图像为[320, 640]。
+
+ 1. 如果coarsest_stride为1则直接返回。
+ 2. 获取图像的高H、宽W。
+ 3. 计算填充后图像的高H_new、宽W_new。
+ 4. 构建大小为(H_new, W_new, 3)像素值为0的np.ndarray,
+ 并将原图的np.ndarray粘贴于左上角。
+
+ Args:
+ coarsest_stride (int): 填充后的图像长、宽为该参数的倍数,默认为1。
+ """
+
+ def __init__(self, coarsest_stride=1):
+ self.coarsest_stride = coarsest_stride
+
+ def __call__(self, im, im_info=None, label_info=None):
+ """
+ Args:
+ im (numnp.ndarraypy): 图像np.ndarray数据。
+ im_info (dict, 可选): 存储与图像相关的信息。
+ label_info (dict, 可选): 存储与标注框相关的信息。
+
+ Returns:
+ tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、
+ 存储与标注框相关信息的字典。
+
+ Raises:
+ TypeError: 形参数据类型不满足需求。
+ ValueError: 数据长度不匹配。
+ """
+
+ if self.coarsest_stride == 1:
+ if label_info is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label_info)
+ if im_info is None:
+ im_info = dict()
+ if not isinstance(im, np.ndarray):
+ raise TypeError("Padding: image type is not numpy.")
+ if len(im.shape) != 3:
+ raise ValueError('Padding: image is not 3-dimensional.')
+ im_h, im_w, im_c = im.shape[:]
+ if self.coarsest_stride > 1:
+ padding_im_h = int(
+ np.ceil(im_h / self.coarsest_stride) * self.coarsest_stride)
+ padding_im_w = int(
+ np.ceil(im_w / self.coarsest_stride) * self.coarsest_stride)
+ padding_im = np.zeros((padding_im_h, padding_im_w, im_c),
+ dtype=np.float32)
+ padding_im[:im_h, :im_w, :] = im
+ if label_info is None:
+ return (padding_im, im_info)
+ else:
+ return (padding_im, im_info, label_info)
+
+
+class Resize:
+ """调整图像大小(resize)。
+
+ - 当目标大小(target_size)类型为int时,根据插值方式,
+ 将图像resize为[target_size, target_size]。
+ - 当目标大小(target_size)类型为list或tuple时,根据插值方式,
+ 将图像resize为target_size。
+ 注意:当插值方式为“RANDOM”时,则随机选取一种插值方式进行resize。
+
+ Args:
+ target_size (int/list/tuple): 短边目标长度。默认为608。
+ interp (str): resize的插值方式,与opencv的插值方式对应,取值范围为
+ ['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4', 'RANDOM']。默认为"LINEAR"。
+
+ Raises:
+ TypeError: 形参数据类型不满足需求。
+ ValueError: 插值方式不在['NEAREST', 'LINEAR', 'CUBIC',
+ 'AREA', 'LANCZOS4', 'RANDOM']中。
+ """
+
+ # The interpolation mode
+ interp_dict = {
+ 'NEAREST': cv2.INTER_NEAREST,
+ 'LINEAR': cv2.INTER_LINEAR,
+ 'CUBIC': cv2.INTER_CUBIC,
+ 'AREA': cv2.INTER_AREA,
+ 'LANCZOS4': cv2.INTER_LANCZOS4
+ }
+
+ def __init__(self, target_size=608, interp='LINEAR'):
+ self.interp = interp
+ if not (interp == "RANDOM" or interp in self.interp_dict):
+ raise ValueError("interp should be one of {}".format(
+ self.interp_dict.keys()))
+ if isinstance(target_size, list) or isinstance(target_size, tuple):
+ if len(target_size) != 2:
+ raise TypeError(
+ 'when target is list or tuple, it should include 2 elements, but it is {}'
+ .format(target_size))
+ elif not isinstance(target_size, int):
+ raise TypeError(
+ "Type of target_size is invalid. Must be Integer or List or tuple, now is {}"
+ .format(type(target_size)))
+
+ self.target_size = target_size
+
+ def __call__(self, im, im_info=None, label_info=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict, 可选): 存储与图像相关的信息。
+ label_info (dict, 可选): 存储与标注框相关的信息。
+
+ Returns:
+ tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、
+ 存储与标注框相关信息的字典。
+
+ Raises:
+ TypeError: 形参数据类型不满足需求。
+ ValueError: 数据长度不匹配。
+ """
+ if im_info is None:
+ im_info = dict()
+ if not isinstance(im, np.ndarray):
+ raise TypeError("Resize: image type is not numpy.")
+ if len(im.shape) != 3:
+ raise ValueError('Resize: image is not 3-dimensional.')
+ if self.interp == "RANDOM":
+ interp = random.choice(list(self.interp_dict.keys()))
+ else:
+ interp = self.interp
+ im = resize(im, self.target_size, self.interp_dict[interp])
+ if label_info is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label_info)
+
+
+class RandomHorizontalFlip:
+ """随机翻转图像、标注框、分割信息,模型训练时的数据增强操作。
+
+ 1. 随机采样一个0-1之间的小数,当小数小于水平翻转概率时,
+ 执行2-4步操作,否则直接返回。
+ 2. 水平翻转图像。
+ 3. 计算翻转后的真实标注框的坐标,更新label_info中的gt_bbox信息。
+ 4. 计算翻转后的真实分割区域的坐标,更新label_info中的gt_poly信息。
+
+ Args:
+ prob (float): 随机水平翻转的概率。默认为0.5。
+
+ Raises:
+ TypeError: 形参数据类型不满足需求。
+ """
+
+ def __init__(self, prob=0.5):
+ self.prob = prob
+ if not isinstance(self.prob, float):
+ raise TypeError("RandomHorizontalFlip: input type is invalid.")
+
+ def __call__(self, im, im_info=None, label_info=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict, 可选): 存储与图像相关的信息。
+ label_info (dict, 可选): 存储与标注框相关的信息。
+
+ Returns:
+ tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、
+ 存储与标注框相关信息的字典。
+ 其中,im_info更新字段为:
+ - gt_bbox (np.ndarray): 水平翻转后的标注框坐标[x1, y1, x2, y2],形状为(n, 4),
+ 其中n代表真实标注框的个数。
+ - gt_poly (list): 水平翻转后的多边形分割区域的x、y坐标,长度为n,
+ 其中n代表真实标注框的个数。
+
+ Raises:
+ TypeError: 形参数据类型不满足需求。
+ ValueError: 数据长度不匹配。
+ """
+ if not isinstance(im, np.ndarray):
+ raise TypeError(
+ "RandomHorizontalFlip: image is not a numpy array.")
+ if len(im.shape) != 3:
+ raise ValueError(
+ "RandomHorizontalFlip: image is not 3-dimensional.")
+ if im_info is None or label_info is None:
+ raise TypeError(
+ 'Cannot do RandomHorizontalFlip! ' +
+ 'Becasuse the im_info and label_info can not be None!')
+ if 'augment_shape' not in im_info:
+ raise TypeError('Cannot do RandomHorizontalFlip! ' + \
+ 'Becasuse augment_shape is not in im_info!')
+ if 'gt_bbox' not in label_info:
+ raise TypeError('Cannot do RandomHorizontalFlip! ' + \
+ 'Becasuse gt_bbox is not in label_info!')
+ augment_shape = im_info['augment_shape']
+ gt_bbox = label_info['gt_bbox']
+ height = augment_shape[0]
+ width = augment_shape[1]
+
+ if np.random.uniform(0, 1) < self.prob:
+ im = horizontal_flip(im)
+ if gt_bbox.shape[0] == 0:
+ if label_info is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label_info)
+ label_info['gt_bbox'] = box_horizontal_flip(gt_bbox, width)
+ if 'gt_poly' in label_info and \
+ len(label_info['gt_poly']) != 0:
+ label_info['gt_poly'] = segms_horizontal_flip(
+ label_info['gt_poly'], height, width)
+ if label_info is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label_info)
+
+
+class Normalize:
+ """对图像进行标准化。
+
+ 1. 归一化图像到到区间[0.0, 1.0]。
+ 2. 对图像进行减均值除以标准差操作。
+
+ Args:
+ mean (list): 图像数据集的均值。默认为[0.485, 0.456, 0.406]。
+ std (list): 图像数据集的标准差。默认为[0.229, 0.224, 0.225]。
+
+ Raises:
+ TypeError: 形参数据类型不满足需求。
+ """
+
+ def __init__(self, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
+ self.mean = mean
+ self.std = std
+ if not (isinstance(self.mean, list) and isinstance(self.std, list)):
+ raise TypeError("NormalizeImage: input type is invalid.")
+ from functools import reduce
+ if reduce(lambda x, y: x * y, self.std) == 0:
+ raise TypeError('NormalizeImage: std is invalid!')
+
+ def __call__(self, im, im_info=None, label_info=None):
+ """
+ Args:
+ im (numnp.ndarraypy): 图像np.ndarray数据。
+ im_info (dict, 可选): 存储与图像相关的信息。
+ label_info (dict, 可选): 存储与标注框相关的信息。
+
+ Returns:
+ tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、
+ 存储与标注框相关信息的字典。
+ """
+ mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
+ std = np.array(self.std)[np.newaxis, np.newaxis, :]
+ im = normalize(im, mean, std)
+ if label_info is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label_info)
+
+
+class RandomDistort:
+ """以一定的概率对图像进行随机像素内容变换,模型训练时的数据增强操作
+
+ 1. 对变换的操作顺序进行随机化操作。
+ 2. 按照1中的顺序以一定的概率对图像进行随机像素内容变换。
+
+ Args:
+ brightness_range (float): 明亮度因子的范围。默认为0.5。
+ brightness_prob (float): 随机调整明亮度的概率。默认为0.5。
+ contrast_range (float): 对比度因子的范围。默认为0.5。
+ contrast_prob (float): 随机调整对比度的概率。默认为0.5。
+ saturation_range (float): 饱和度因子的范围。默认为0.5。
+ saturation_prob (float): 随机调整饱和度的概率。默认为0.5。
+ hue_range (int): 色调因子的范围。默认为18。
+ hue_prob (float): 随机调整色调的概率。默认为0.5。
+ is_order (bool): 是否按照固定顺序
+ [变换明亮度、变换对比度、变换饱和度、变换色彩]
+ 执行像素内容变换操作。默认为False。
+ """
+
+ def __init__(self,
+ brightness_range=0.5,
+ brightness_prob=0.5,
+ contrast_range=0.5,
+ contrast_prob=0.5,
+ saturation_range=0.5,
+ saturation_prob=0.5,
+ hue_range=18,
+ hue_prob=0.5,
+ is_order=False):
+ self.brightness_range = brightness_range
+ self.brightness_prob = brightness_prob
+ self.contrast_range = contrast_range
+ self.contrast_prob = contrast_prob
+ self.saturation_range = saturation_range
+ self.saturation_prob = saturation_prob
+ self.hue_range = hue_range
+ self.hue_prob = hue_prob
+ self.is_order = is_order
+
+ def __call__(self, im, im_info=None, label_info=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict, 可选): 存储与图像相关的信息。
+ label_info (dict, 可选): 存储与标注框相关的信息。
+
+ Returns:
+ tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、
+ 存储与标注框相关信息的字典。
+ """
+ brightness_lower = 1 - self.brightness_range
+ brightness_upper = 1 + self.brightness_range
+ contrast_lower = 1 - self.contrast_range
+ contrast_upper = 1 + self.contrast_range
+ saturation_lower = 1 - self.saturation_range
+ saturation_upper = 1 + self.saturation_range
+ hue_lower = -self.hue_range
+ hue_upper = self.hue_range
+ ops = [brightness, contrast, saturation, hue]
+ if self.is_order:
+ prob = np.random.uniform(0, 1)
+ if prob < 0.5:
+ ops = [
+ brightness,
+ saturation,
+ hue,
+ contrast,
+ ]
+ else:
+ random.shuffle(ops)
+ params_dict = {
+ 'brightness': {
+ 'brightness_lower': brightness_lower,
+ 'brightness_upper': brightness_upper
+ },
+ 'contrast': {
+ 'contrast_lower': contrast_lower,
+ 'contrast_upper': contrast_upper
+ },
+ 'saturation': {
+ 'saturation_lower': saturation_lower,
+ 'saturation_upper': saturation_upper
+ },
+ 'hue': {
+ 'hue_lower': hue_lower,
+ 'hue_upper': hue_upper
+ }
+ }
+ prob_dict = {
+ 'brightness': self.brightness_prob,
+ 'contrast': self.contrast_prob,
+ 'saturation': self.saturation_prob,
+ 'hue': self.hue_prob
+ }
+ im = im.astype('uint8')
+ im = Image.fromarray(im)
+ for id in range(4):
+ params = params_dict[ops[id].__name__]
+ prob = prob_dict[ops[id].__name__]
+ params['im'] = im
+ if np.random.uniform(0, 1) < prob:
+ im = ops[id](**params)
+ im = np.asarray(im).astype('float32')
+ if label_info is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label_info)
+
+
+class MixupImage:
+ """对图像进行mixup操作,模型训练时的数据增强操作,目前仅YOLOv3模型支持该transform。
+
+ 当label_info中不存在mixup字段时,直接返回,否则进行下述操作:
+ 1. 从随机beta分布中抽取出随机因子factor。
+ 2.
+ - 当factor>=1.0时,去除label_info中的mixup字段,直接返回。
+ - 当factor<=0.0时,直接返回label_info中的mixup字段,并在label_info中去除该字段。
+ - 其余情况,执行下述操作:
+ (1)原图像乘以factor,mixup图像乘以(1-factor),叠加2个结果。
+ (2)拼接原图像标注框和mixup图像标注框。
+ (3)拼接原图像标注框类别和mixup图像标注框类别。
+ (4)原图像标注框混合得分乘以factor,mixup图像标注框混合得分乘以(1-factor),叠加2个结果。
+ 3. 更新im_info中的augment_shape信息。
+
+ Args:
+ alpha (float): 随机beta分布的下限。默认为1.5。
+ beta (float): 随机beta分布的上限。默认为1.5。
+ mixup_epoch (int): 在前mixup_epoch轮使用mixup增强操作;当该参数为-1时,该策略不会生效。
+ 默认为-1。
+
+ Raises:
+ ValueError: 数据长度不匹配。
+ """
+
+ def __init__(self, alpha=1.5, beta=1.5, mixup_epoch=-1):
+ self.alpha = alpha
+ self.beta = beta
+ if self.alpha <= 0.0:
+ raise ValueError("alpha shold be positive in MixupImage")
+ if self.beta <= 0.0:
+ raise ValueError("beta shold be positive in MixupImage")
+ self.mixup_epoch = mixup_epoch
+
+ def _mixup_img(self, img1, img2, factor):
+ h = max(img1.shape[0], img2.shape[0])
+ w = max(img1.shape[1], img2.shape[1])
+ img = np.zeros((h, w, img1.shape[2]), 'float32')
+ img[:img1.shape[0], :img1.shape[1], :] = \
+ img1.astype('float32') * factor
+ img[:img2.shape[0], :img2.shape[1], :] += \
+ img2.astype('float32') * (1.0 - factor)
+ return img.astype('uint8')
+
+ def __call__(self, im, im_info=None, label_info=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict, 可选): 存储与图像相关的信息。
+ label_info (dict, 可选): 存储与标注框相关的信息。
+
+ Returns:
+ tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、
+ 存储与标注框相关信息的字典。
+ 其中,im_info更新字段为:
+ - augment_shape (np.ndarray): mixup后的图像高、宽二者组成的np.ndarray,形状为(2,)。
+ im_info删除的字段:
+ - mixup (list): 与当前字段进行mixup的图像相关信息。
+ label_info更新字段为:
+ - gt_bbox (np.ndarray): mixup后真实标注框坐标,形状为(n, 4),
+ 其中n代表真实标注框的个数。
+ - gt_class (np.ndarray): mixup后每个真实标注框对应的类别序号,形状为(n, 1),
+ 其中n代表真实标注框的个数。
+ - gt_score (np.ndarray): mixup后每个真实标注框对应的混合得分,形状为(n, 1),
+ 其中n代表真实标注框的个数。
+
+ Raises:
+ TypeError: 形参数据类型不满足需求。
+ """
+ if im_info is None:
+ raise TypeError('Cannot do MixupImage! ' +
+ 'Becasuse the im_info can not be None!')
+ if 'mixup' not in im_info:
+ if label_info is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label_info)
+ factor = np.random.beta(self.alpha, self.beta)
+ factor = max(0.0, min(1.0, factor))
+ if im_info['epoch'] > self.mixup_epoch \
+ or factor >= 1.0:
+ im_info.pop('mixup')
+ if label_info is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label_info)
+ if factor <= 0.0:
+ return im_info.pop('mixup')
+ im = self._mixup_img(im, im_info['mixup'][0], factor)
+ if label_info is None:
+ raise TypeError('Cannot do MixupImage! ' +
+ 'Becasuse the label_info can not be None!')
+ if 'gt_bbox' not in label_info or \
+ 'gt_class' not in label_info or \
+ 'gt_score' not in label_info:
+ raise TypeError('Cannot do MixupImage! ' + \
+ 'Becasuse gt_bbox/gt_class/gt_score is not in label_info!')
+ gt_bbox1 = label_info['gt_bbox']
+ gt_bbox2 = im_info['mixup'][2]['gt_bbox']
+ gt_bbox = np.concatenate((gt_bbox1, gt_bbox2), axis=0)
+ gt_class1 = label_info['gt_class']
+ gt_class2 = im_info['mixup'][2]['gt_class']
+ gt_class = np.concatenate((gt_class1, gt_class2), axis=0)
+
+ gt_score1 = label_info['gt_score']
+ gt_score2 = im_info['mixup'][2]['gt_score']
+ gt_score = np.concatenate(
+ (gt_score1 * factor, gt_score2 * (1. - factor)), axis=0)
+ label_info['gt_bbox'] = gt_bbox
+ label_info['gt_score'] = gt_score
+ label_info['gt_class'] = gt_class
+ im_info['augment_shape'] = np.array([im.shape[0],
+ im.shape[1]]).astype('int32')
+ im_info.pop('mixup')
+ if label_info is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label_info)
+
+
+class RandomExpand:
+ """随机扩张图像,模型训练时的数据增强操作。
+
+ 1. 随机选取扩张比例(扩张比例大于1时才进行扩张)。
+ 2. 计算扩张后图像大小。
+ 3. 初始化像素值为数据集均值的图像,并将原图像随机粘贴于该图像上。
+ 4. 根据原图像粘贴位置换算出扩张后真实标注框的位置坐标。
+
+ Args:
+ max_ratio (float): 图像扩张的最大比例。默认为4.0。
+ prob (float): 随机扩张的概率。默认为0.5。
+ mean (list): 图像数据集的均值(0-255)。默认为[127.5, 127.5, 127.5]。
+
+ """
+
+ def __init__(self, max_ratio=4., prob=0.5, mean=[127.5, 127.5, 127.5]):
+ self.max_ratio = max_ratio
+ self.mean = mean
+ self.prob = prob
+
+ def __call__(self, im, im_info=None, label_info=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict, 可选): 存储与图像相关的信息。
+ label_info (dict, 可选): 存储与标注框相关的信息。
+
+ Returns:
+ tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、
+ 存储与标注框相关信息的字典。
+ 其中,im_info更新字段为:
+ - augment_shape (np.ndarray): 扩张后的图像高、宽二者组成的np.ndarray,形状为(2,)。
+ label_info更新字段为:
+ - gt_bbox (np.ndarray): 随机扩张后真实标注框坐标,形状为(n, 4),
+ 其中n代表真实标注框的个数。
+ - gt_class (np.ndarray): 随机扩张后每个真实标注框对应的类别序号,形状为(n, 1),
+ 其中n代表真实标注框的个数。
+
+ Raises:
+ TypeError: 形参数据类型不满足需求。
+ """
+ if im_info is None or label_info is None:
+ raise TypeError(
+ 'Cannot do RandomExpand! ' +
+ 'Becasuse the im_info and label_info can not be None!')
+ if 'augment_shape' not in im_info:
+ raise TypeError('Cannot do RandomExpand! ' + \
+ 'Becasuse augment_shape is not in im_info!')
+ if 'gt_bbox' not in label_info or \
+ 'gt_class' not in label_info:
+ raise TypeError('Cannot do RandomExpand! ' + \
+ 'Becasuse gt_bbox/gt_class is not in label_info!')
+ prob = np.random.uniform(0, 1)
+ augment_shape = im_info['augment_shape']
+ im_width = augment_shape[1]
+ im_height = augment_shape[0]
+ gt_bbox = label_info['gt_bbox']
+ gt_class = label_info['gt_class']
+
+ if prob < self.prob:
+ if self.max_ratio - 1 >= 0.01:
+ expand_ratio = np.random.uniform(1, self.max_ratio)
+ height = int(im_height * expand_ratio)
+ width = int(im_width * expand_ratio)
+ h_off = math.floor(np.random.uniform(0, height - im_height))
+ w_off = math.floor(np.random.uniform(0, width - im_width))
+ expand_bbox = [
+ -w_off / im_width, -h_off / im_height,
+ (width - w_off) / im_width, (height - h_off) / im_height
+ ]
+ expand_im = np.ones((height, width, 3))
+ expand_im = np.uint8(expand_im * np.squeeze(self.mean))
+ expand_im = Image.fromarray(expand_im)
+ im = im.astype('uint8')
+ im = Image.fromarray(im)
+ expand_im.paste(im, (int(w_off), int(h_off)))
+ expand_im = np.asarray(expand_im)
+ for i in range(gt_bbox.shape[0]):
+ gt_bbox[i][0] = gt_bbox[i][0] / im_width
+ gt_bbox[i][1] = gt_bbox[i][1] / im_height
+ gt_bbox[i][2] = gt_bbox[i][2] / im_width
+ gt_bbox[i][3] = gt_bbox[i][3] / im_height
+ gt_bbox, gt_class, _ = filter_and_process(
+ expand_bbox, gt_bbox, gt_class)
+ for i in range(gt_bbox.shape[0]):
+ gt_bbox[i][0] = gt_bbox[i][0] * width
+ gt_bbox[i][1] = gt_bbox[i][1] * height
+ gt_bbox[i][2] = gt_bbox[i][2] * width
+ gt_bbox[i][3] = gt_bbox[i][3] * height
+ im = expand_im.astype('float32')
+ label_info['gt_bbox'] = gt_bbox
+ label_info['gt_class'] = gt_class
+ im_info['augment_shape'] = np.array([height,
+ width]).astype('int32')
+ if label_info is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label_info)
+
+
+class RandomCrop:
+ """随机裁剪图像。
+
+ 1. 根据batch_sampler计算获取裁剪候选区域的位置。
+ (1) 根据min scale、max scale、min aspect ratio、max aspect ratio计算随机剪裁的高、宽。
+ (2) 根据随机剪裁的高、宽随机选取剪裁的起始点。
+ (3) 筛选出裁剪候选区域:
+ - 当satisfy_all为True时,需所有真实标注框与裁剪候选区域的重叠度满足需求时,该裁剪候选区域才可保留。
+ - 当satisfy_all为False时,当有一个真实标注框与裁剪候选区域的重叠度满足需求时,该裁剪候选区域就可保留。
+ 2. 遍历所有裁剪候选区域:
+ (1) 若真实标注框与候选裁剪区域不重叠,或其中心点不在候选裁剪区域,
+ 则将该真实标注框去除。
+ (2) 计算相对于该候选裁剪区域,真实标注框的位置,并筛选出对应的类别、混合得分。
+ (3) 若avoid_no_bbox为False,返回当前裁剪后的信息即可;
+ 反之,要找到一个裁剪区域中真实标注框个数不为0的区域,才返回裁剪后的信息。
+
+ Args:
+ batch_sampler (list): 随机裁剪参数的多种组合,每种组合包含8个值,如下:
+ - max sample (int):满足当前组合的裁剪区域的个数上限。
+ - max trial (int): 查找满足当前组合的次数。
+ - min scale (float): 裁剪面积相对原面积,每条边缩短比例的最小限制。
+ - max scale (float): 裁剪面积相对原面积,每条边缩短比例的最大限制。
+ - min aspect ratio (float): 裁剪后短边缩放比例的最小限制。
+ - max aspect ratio (float): 裁剪后短边缩放比例的最大限制。
+ - min overlap (float): 真实标注框与裁剪图像重叠面积的最小限制。
+ - max overlap (float): 真实标注框与裁剪图像重叠面积的最大限制。
+ 默认值为None,当为None时采用如下设置:
+ [[1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0],
+ [1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 1.0],
+ [1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 1.0],
+ [1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 1.0],
+ [1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 1.0],
+ [1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 1.0],
+ [1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]]
+ satisfy_all (bool): 是否需要所有标注框满足条件,裁剪候选区域才保留。默认为False。
+ avoid_no_bbox (bool): 是否对裁剪图像不存在标注框的图像进行保留。默认为True。
+
+ """
+
+ def __init__(self,
+ batch_sampler=None,
+ satisfy_all=False,
+ avoid_no_bbox=True):
+ if batch_sampler is None:
+ batch_sampler = [[1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0],
+ [1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 1.0],
+ [1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 1.0],
+ [1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 1.0],
+ [1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 1.0],
+ [1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 1.0],
+ [1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]]
+ self.batch_sampler = batch_sampler
+ self.satisfy_all = satisfy_all
+ self.avoid_no_bbox = avoid_no_bbox
+
+ def __call__(self, im, im_info=None, label_info=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict, 可选): 存储与图像相关的信息。
+ label_info (dict, 可选): 存储与标注框相关的信息。
+
+ Returns:
+ tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、
+ 存储与标注框相关信息的字典。
+ 其中,label_info更新字段为:
+ - gt_bbox (np.ndarray): 随机裁剪后真实标注框坐标,形状为(n, 4),
+ 其中n代表真实标注框的个数。
+ - gt_class (np.ndarray): 随机裁剪后每个真实标注框对应的类别序号,形状为(n, 1),
+ 其中n代表真实标注框的个数。
+ - gt_score (np.ndarray): 随机裁剪后每个真实标注框对应的混合得分,形状为(n, 1),
+ 其中n代表真实标注框的个数。
+
+ Raises:
+ TypeError: 形参数据类型不满足需求。
+ """
+ if im_info is None or label_info is None:
+ raise TypeError(
+ 'Cannot do RandomCrop! ' +
+ 'Becasuse the im_info and label_info can not be None!')
+ if 'augment_shape' not in im_info:
+ raise TypeError('Cannot do RandomCrop! ' + \
+ 'Becasuse augment_shape is not in im_info!')
+ if 'gt_bbox' not in label_info or \
+ 'gt_class' not in label_info:
+ raise TypeError('Cannot do RandomCrop! ' + \
+ 'Becasuse gt_bbox/gt_class is not in label_info!')
+ augment_shape = im_info['augment_shape']
+ im_width = augment_shape[1]
+ im_height = augment_shape[0]
+ gt_bbox = label_info['gt_bbox']
+ gt_bbox_tmp = gt_bbox.copy()
+ for i in range(gt_bbox_tmp.shape[0]):
+ gt_bbox_tmp[i][0] = gt_bbox[i][0] / im_width
+ gt_bbox_tmp[i][1] = gt_bbox[i][1] / im_height
+ gt_bbox_tmp[i][2] = gt_bbox[i][2] / im_width
+ gt_bbox_tmp[i][3] = gt_bbox[i][3] / im_height
+ gt_class = label_info['gt_class']
+
+ gt_score = None
+ if 'gt_score' in label_info:
+ gt_score = label_info['gt_score']
+ sampled_bbox = []
+ gt_bbox_tmp = gt_bbox_tmp.tolist()
+ for sampler in self.batch_sampler:
+ found = 0
+ for i in range(sampler[1]):
+ if found >= sampler[0]:
+ break
+ sample_bbox = generate_sample_bbox(sampler)
+ if satisfy_sample_constraint(sampler, sample_bbox, gt_bbox_tmp,
+ self.satisfy_all):
+ sampled_bbox.append(sample_bbox)
+ found = found + 1
+ im = np.array(im)
+ while sampled_bbox:
+ idx = int(np.random.uniform(0, len(sampled_bbox)))
+ sample_bbox = sampled_bbox.pop(idx)
+ sample_bbox = clip_bbox(sample_bbox)
+ crop_bbox, crop_class, crop_score = \
+ filter_and_process(sample_bbox, gt_bbox_tmp, gt_class, gt_score)
+ if self.avoid_no_bbox:
+ if len(crop_bbox) < 1:
+ continue
+ xmin = int(sample_bbox[0] * im_width)
+ xmax = int(sample_bbox[2] * im_width)
+ ymin = int(sample_bbox[1] * im_height)
+ ymax = int(sample_bbox[3] * im_height)
+ im = im[ymin:ymax, xmin:xmax]
+ for i in range(crop_bbox.shape[0]):
+ crop_bbox[i][0] = crop_bbox[i][0] * (xmax - xmin)
+ crop_bbox[i][1] = crop_bbox[i][1] * (ymax - ymin)
+ crop_bbox[i][2] = crop_bbox[i][2] * (xmax - xmin)
+ crop_bbox[i][3] = crop_bbox[i][3] * (ymax - ymin)
+ label_info['gt_bbox'] = crop_bbox
+ label_info['gt_class'] = crop_class
+ label_info['gt_score'] = crop_score
+ im_info['augment_shape'] = np.array([ymax - ymin,
+ xmax - xmin]).astype('int32')
+ if label_info is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label_info)
+ if label_info is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label_info)
+
+
+class ArrangeFasterRCNN:
+ """获取FasterRCNN模型训练/验证/预测所需信息。
+
+ Args:
+ mode (str): 指定数据用于何种用途,取值范围为['train', 'eval', 'test', 'quant']。
+
+ Raises:
+ ValueError: mode的取值不在['train', 'eval', 'test', 'quant']之内。
+ """
+
+ def __init__(self, mode=None):
+ if mode not in ['train', 'eval', 'test', 'quant']:
+ raise ValueError(
+ "mode must be in ['train', 'eval', 'test', 'quant']!")
+ self.mode = mode
+
+ def __call__(self, im, im_info=None, label_info=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict, 可选): 存储与图像相关的信息。
+ label_info (dict, 可选): 存储与标注框相关的信息。
+
+ Returns:
+ tuple: 当mode为'train'时,返回(im, im_resize_info, gt_bbox, gt_class, is_crowd),分别对应
+ 图像np.ndarray数据、图像相当对于原图的resize信息、真实标注框、真实标注框对应的类别、真实标注框内是否是一组对象;
+ 当mode为'eval'时,返回(im, im_resize_info, im_id, im_shape, gt_bbox, gt_class, is_difficult),
+ 分别对应图像np.ndarray数据、图像相当对于原图的resize信息、图像id、图像大小信息、真实标注框、真实标注框对应的类别、
+ 真实标注框是否为难识别对象;当mode为'test'或'quant'时,返回(im, im_resize_info, im_shape),分别对应图像np.ndarray数据、
+ 图像相当对于原图的resize信息、图像大小信息。
+
+ Raises:
+ TypeError: 形参数据类型不满足需求。
+ ValueError: 数据长度不匹配。
+ """
+ im = permute(im, False)
+ if self.mode == 'train':
+ if im_info is None or label_info is None:
+ raise TypeError(
+ 'Cannot do ArrangeFasterRCNN! ' +
+ 'Becasuse the im_info and label_info can not be None!')
+ if len(label_info['gt_bbox']) != len(label_info['gt_class']):
+ raise ValueError("gt num mismatch: bbox and class.")
+ im_resize_info = im_info['im_resize_info']
+ gt_bbox = label_info['gt_bbox']
+ gt_class = label_info['gt_class']
+ is_crowd = label_info['is_crowd']
+ outputs = (im, im_resize_info, gt_bbox, gt_class, is_crowd)
+ elif self.mode == 'eval':
+ if im_info is None or label_info is None:
+ raise TypeError(
+ 'Cannot do ArrangeFasterRCNN! ' +
+ 'Becasuse the im_info and label_info can not be None!')
+ im_resize_info = im_info['im_resize_info']
+ im_id = im_info['im_id']
+ im_shape = np.array(
+ (im_info['augment_shape'][0], im_info['augment_shape'][1], 1),
+ dtype=np.float32)
+ gt_bbox = label_info['gt_bbox']
+ gt_class = label_info['gt_class']
+ is_difficult = label_info['difficult']
+ outputs = (im, im_resize_info, im_id, im_shape, gt_bbox, gt_class,
+ is_difficult)
+ else:
+ if im_info is None:
+ raise TypeError('Cannot do ArrangeFasterRCNN! ' +
+ 'Becasuse the im_info can not be None!')
+ im_resize_info = im_info['im_resize_info']
+ im_shape = np.array(
+ (im_info['augment_shape'][0], im_info['augment_shape'][1], 1),
+ dtype=np.float32)
+ outputs = (im, im_resize_info, im_shape)
+ return outputs
+
+
+class ArrangeMaskRCNN:
+ """获取MaskRCNN模型训练/验证/预测所需信息。
+
+ Args:
+ mode (str): 指定数据用于何种用途,取值范围为['train', 'eval', 'test', 'quant']。
+
+ Raises:
+ ValueError: mode的取值不在['train', 'eval', 'test', 'quant']之内。
+ """
+
+ def __init__(self, mode=None):
+ if mode not in ['train', 'eval', 'test', 'quant']:
+ raise ValueError(
+ "mode must be in ['train', 'eval', 'test', 'quant']!")
+ self.mode = mode
+
+ def __call__(self, im, im_info=None, label_info=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict, 可选): 存储与图像相关的信息。
+ label_info (dict, 可选): 存储与标注框相关的信息。
+
+ Returns:
+ tuple: 当mode为'train'时,返回(im, im_resize_info, gt_bbox, gt_class, is_crowd, gt_masks),分别对应
+ 图像np.ndarray数据、图像相当对于原图的resize信息、真实标注框、真实标注框对应的类别、真实标注框内是否是一组对象、
+ 真实分割区域;当mode为'eval'时,返回(im, im_resize_info, im_id, im_shape),分别对应图像np.ndarray数据、
+ 图像相当对于原图的resize信息、图像id、图像大小信息;当mode为'test'或'quant'时,返回(im, im_resize_info, im_shape),
+ 分别对应图像np.ndarray数据、图像相当对于原图的resize信息、图像大小信息。
+
+ Raises:
+ TypeError: 形参数据类型不满足需求。
+ ValueError: 数据长度不匹配。
+ """
+ im = permute(im, False)
+ if self.mode == 'train':
+ if im_info is None or label_info is None:
+ raise TypeError(
+ 'Cannot do ArrangeTrainMaskRCNN! ' +
+ 'Becasuse the im_info and label_info can not be None!')
+ if len(label_info['gt_bbox']) != len(label_info['gt_class']):
+ raise ValueError("gt num mismatch: bbox and class.")
+ im_resize_info = im_info['im_resize_info']
+ gt_bbox = label_info['gt_bbox']
+ gt_class = label_info['gt_class']
+ is_crowd = label_info['is_crowd']
+ assert 'gt_poly' in label_info
+ segms = label_info['gt_poly']
+ if len(segms) != 0:
+ assert len(segms) == is_crowd.shape[0]
+ gt_masks = []
+ valid = True
+ for i in range(len(segms)):
+ segm = segms[i]
+ gt_segm = []
+ if is_crowd[i]:
+ gt_segm.append([[0, 0]])
+ else:
+ for poly in segm:
+ if len(poly) == 0:
+ valid = False
+ break
+ gt_segm.append(np.array(poly).reshape(-1, 2))
+ if (not valid) or len(gt_segm) == 0:
+ break
+ gt_masks.append(gt_segm)
+ outputs = (im, im_resize_info, gt_bbox, gt_class, is_crowd,
+ gt_masks)
+ else:
+ if im_info is None:
+ raise TypeError('Cannot do ArrangeMaskRCNN! ' +
+ 'Becasuse the im_info can not be None!')
+ im_resize_info = im_info['im_resize_info']
+ im_shape = np.array(
+ (im_info['augment_shape'][0], im_info['augment_shape'][1], 1),
+ dtype=np.float32)
+ if self.mode == 'eval':
+ im_id = im_info['im_id']
+ outputs = (im, im_resize_info, im_id, im_shape)
+ else:
+ outputs = (im, im_resize_info, im_shape)
+ return outputs
+
+
+class ArrangeYOLOv3:
+ """获取YOLOv3模型训练/验证/预测所需信息。
+
+ Args:
+ mode (str): 指定数据用于何种用途,取值范围为['train', 'eval', 'test', 'quant']。
+
+ Raises:
+ ValueError: mode的取值不在['train', 'eval', 'test', 'quant']之内。
+ """
+
+ def __init__(self, mode=None):
+ if mode not in ['train', 'eval', 'test', 'quant']:
+ raise ValueError(
+ "mode must be in ['train', 'eval', 'test', 'quant']!")
+ self.mode = mode
+
+ def __call__(self, im, im_info=None, label_info=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict, 可选): 存储与图像相关的信息。
+ label_info (dict, 可选): 存储与标注框相关的信息。
+
+ Returns:
+ tuple: 当mode为'train'时,返回(im, gt_bbox, gt_class, gt_score, im_shape),分别对应
+ 图像np.ndarray数据、真实标注框、真实标注框对应的类别、真实标注框混合得分、图像大小信息;
+ 当mode为'eval'时,返回(im, im_shape, im_id, gt_bbox, gt_class, difficult),
+ 分别对应图像np.ndarray数据、图像大小信息、图像id、真实标注框、真实标注框对应的类别、
+ 真实标注框是否为难识别对象;当mode为'test'或'quant'时,返回(im, im_shape),
+ 分别对应图像np.ndarray数据、图像大小信息。
+
+ Raises:
+ TypeError: 形参数据类型不满足需求。
+ ValueError: 数据长度不匹配。
+ """
+ im = permute(im, False)
+ if self.mode == 'train':
+ if im_info is None or label_info is None:
+ raise TypeError(
+ 'Cannot do ArrangeYolov3! ' +
+ 'Becasuse the im_info and label_info can not be None!')
+ im_shape = im_info['augment_shape']
+ if len(label_info['gt_bbox']) != len(label_info['gt_class']):
+ raise ValueError("gt num mismatch: bbox and class.")
+ if len(label_info['gt_bbox']) != len(label_info['gt_score']):
+ raise ValueError("gt num mismatch: bbox and score.")
+ gt_bbox = np.zeros((50, 4), dtype=im.dtype)
+ gt_class = np.zeros((50, ), dtype=np.int32)
+ gt_score = np.zeros((50, ), dtype=im.dtype)
+ gt_num = min(50, len(label_info['gt_bbox']))
+ if gt_num > 0:
+ label_info['gt_class'][:gt_num, 0] = label_info[
+ 'gt_class'][:gt_num, 0] - 1
+ gt_bbox[:gt_num, :] = label_info['gt_bbox'][:gt_num, :]
+ gt_class[:gt_num] = label_info['gt_class'][:gt_num, 0]
+ gt_score[:gt_num] = label_info['gt_score'][:gt_num, 0]
+ # parse [x1, y1, x2, y2] to [x, y, w, h]
+ gt_bbox[:, 2:4] = gt_bbox[:, 2:4] - gt_bbox[:, :2]
+ gt_bbox[:, :2] = gt_bbox[:, :2] + gt_bbox[:, 2:4] / 2.
+ outputs = (im, gt_bbox, gt_class, gt_score, im_shape)
+ elif self.mode == 'eval':
+ if im_info is None or label_info is None:
+ raise TypeError(
+ 'Cannot do ArrangeYolov3! ' +
+ 'Becasuse the im_info and label_info can not be None!')
+ im_shape = im_info['augment_shape']
+ if len(label_info['gt_bbox']) != len(label_info['gt_class']):
+ raise ValueError("gt num mismatch: bbox and class.")
+ im_id = im_info['im_id']
+ gt_bbox = np.zeros((50, 4), dtype=im.dtype)
+ gt_class = np.zeros((50, ), dtype=np.int32)
+ difficult = np.zeros((50, ), dtype=np.int32)
+ gt_num = min(50, len(label_info['gt_bbox']))
+ if gt_num > 0:
+ label_info['gt_class'][:gt_num, 0] = label_info[
+ 'gt_class'][:gt_num, 0] - 1
+ gt_bbox[:gt_num, :] = label_info['gt_bbox'][:gt_num, :]
+ gt_class[:gt_num] = label_info['gt_class'][:gt_num, 0]
+ difficult[:gt_num] = label_info['difficult'][:gt_num, 0]
+ outputs = (im, im_shape, im_id, gt_bbox, gt_class, difficult)
+ else:
+ if im_info is None:
+ raise TypeError('Cannot do ArrangeYolov3! ' +
+ 'Becasuse the im_info can not be None!')
+ im_shape = im_info['augment_shape']
+ outputs = (im, im_shape)
+ return outputs
diff --git a/paddlex/cv/transforms/ops.py b/paddlex/cv/transforms/ops.py
new file mode 100644
index 0000000000000000000000000000000000000000..9af31e8b0cf631050623786bf86803dde8bd2b9b
--- /dev/null
+++ b/paddlex/cv/transforms/ops.py
@@ -0,0 +1,177 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import cv2
+import math
+import numpy as np
+from PIL import Image, ImageEnhance
+
+
+def normalize(im, mean, std):
+ im = im / 255.0
+ im -= mean
+ im /= std
+ return im
+
+
+def permute(im, to_bgr=False):
+ im = np.swapaxes(im, 1, 2)
+ im = np.swapaxes(im, 1, 0)
+ if to_bgr:
+ im = im[[2, 1, 0], :, :]
+ return im
+
+
+def resize_long(im, long_size=224, interpolation=cv2.INTER_LINEAR):
+ value = max(im.shape[0], im.shape[1])
+ scale = float(long_size) / float(value)
+ resized_width = int(round(im.shape[1] * scale))
+ resized_height = int(round(im.shape[0] * scale))
+
+ im = cv2.resize(
+ im, (resized_width, resized_height), interpolation=interpolation)
+ return im
+
+
+def resize(im, target_size=608, interp=cv2.INTER_LINEAR):
+ if isinstance(target_size, list) or isinstance(target_size, tuple):
+ w = target_size[0]
+ h = target_size[1]
+ else:
+ w = target_size
+ h = target_size
+ im = cv2.resize(im, (w, h), interpolation=interp)
+ return im
+
+
+def random_crop(im,
+ crop_size=224,
+ lower_scale=0.08,
+ lower_ratio=3. / 4,
+ upper_ratio=4. / 3):
+ scale = [lower_scale, 1.0]
+ ratio = [lower_ratio, upper_ratio]
+ aspect_ratio = math.sqrt(np.random.uniform(*ratio))
+ w = 1. * aspect_ratio
+ h = 1. / aspect_ratio
+ bound = min((float(im.shape[0]) / im.shape[1]) / (h**2),
+ (float(im.shape[1]) / im.shape[0]) / (w**2))
+ scale_max = min(scale[1], bound)
+ scale_min = min(scale[0], bound)
+ target_area = im.shape[0] * im.shape[1] * np.random.uniform(
+ scale_min, scale_max)
+ target_size = math.sqrt(target_area)
+ w = int(target_size * w)
+ h = int(target_size * h)
+ i = np.random.randint(0, im.shape[0] - h + 1)
+ j = np.random.randint(0, im.shape[1] - w + 1)
+ im = im[i:i + h, j:j + w, :]
+ im = cv2.resize(im, (crop_size, crop_size))
+ return im
+
+
+def center_crop(im, crop_size=224):
+ height, width = im.shape[:2]
+ w_start = (width - crop_size) // 2
+ h_start = (height - crop_size) // 2
+ w_end = w_start + crop_size
+ h_end = h_start + crop_size
+ im = im[h_start:h_end, w_start:w_end, :]
+ return im
+
+
+def horizontal_flip(im):
+ if len(im.shape) == 3:
+ im = im[:, ::-1, :]
+ elif len(im.shape) == 2:
+ im = im[:, ::-1]
+ return im
+
+
+def vertical_flip(im):
+ if len(im.shape) == 3:
+ im = im[::-1, :, :]
+ elif len(im.shape) == 2:
+ im = im[::-1, :]
+ return im
+
+
+def bgr2rgb(im):
+ return im[:, :, ::-1]
+
+
+def brightness(im, brightness_lower, brightness_upper):
+ brightness_delta = np.random.uniform(brightness_lower, brightness_upper)
+ im = ImageEnhance.Brightness(im).enhance(brightness_delta)
+ return im
+
+
+def contrast(im, contrast_lower, contrast_upper):
+ contrast_delta = np.random.uniform(contrast_lower, contrast_upper)
+ im = ImageEnhance.Contrast(im).enhance(contrast_delta)
+ return im
+
+
+def saturation(im, saturation_lower, saturation_upper):
+ saturation_delta = np.random.uniform(saturation_lower, saturation_upper)
+ im = ImageEnhance.Color(im).enhance(saturation_delta)
+ return im
+
+
+def hue(im, hue_lower, hue_upper):
+ hue_delta = np.random.uniform(hue_lower, hue_upper)
+ im = np.array(im.convert('HSV'))
+ im[:, :, 0] = im[:, :, 0] + hue_delta
+ im = Image.fromarray(im, mode='HSV').convert('RGB')
+ return im
+
+
+def rotate(im, rotate_lower, rotate_upper):
+ rotate_delta = np.random.uniform(rotate_lower, rotate_upper)
+ im = im.rotate(int(rotate_delta))
+ return im
+
+
+def resize_padding(im, max_side_len=2400):
+ '''
+ resize image to a size multiple of 32 which is required by the network
+ :param im: the resized image
+ :param max_side_len: limit of max image size to avoid out of memory in gpu
+ :return: the resized image and the resize ratio
+ '''
+ h, w, _ = im.shape
+
+ resize_w = w
+ resize_h = h
+
+ # limit the max side
+ if max(resize_h, resize_w) > max_side_len:
+ ratio = float(
+ max_side_len) / resize_h if resize_h > resize_w else float(
+ max_side_len) / resize_w
+ else:
+ ratio = 1.
+ resize_h = int(resize_h * ratio)
+ resize_w = int(resize_w * ratio)
+
+ resize_h = resize_h if resize_h % 32 == 0 else (resize_h // 32 - 1) * 32
+ resize_w = resize_w if resize_w % 32 == 0 else (resize_w // 32 - 1) * 32
+ resize_h = max(32, resize_h)
+ resize_w = max(32, resize_w)
+ im = cv2.resize(im, (int(resize_w), int(resize_h)))
+ #im = cv2.resize(im, (512, 512))
+ ratio_h = resize_h / float(h)
+ ratio_w = resize_w / float(w)
+ _ratio = np.array([ratio_h, ratio_w]).reshape(-1, 2)
+ return im, _ratio
diff --git a/paddlex/cv/transforms/seg_transforms.py b/paddlex/cv/transforms/seg_transforms.py
new file mode 100644
index 0000000000000000000000000000000000000000..79dbc2a9c769f9bfaa1018959f95eac062d919e0
--- /dev/null
+++ b/paddlex/cv/transforms/seg_transforms.py
@@ -0,0 +1,942 @@
+# coding: utf8
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .ops import *
+import random
+import os.path as osp
+import numpy as np
+from PIL import Image
+import cv2
+from collections import OrderedDict
+
+
+class Compose:
+ """根据数据预处理/增强算子对输入数据进行操作。
+ 所有操作的输入图像流形状均是[H, W, C],其中H为图像高,W为图像宽,C为图像通道数。
+
+ Args:
+ transforms (list): 数据预处理/增强算子。
+
+ Raises:
+ TypeError: transforms不是list对象
+ ValueError: transforms元素个数小于1。
+
+ """
+
+ def __init__(self, transforms):
+ if not isinstance(transforms, list):
+ raise TypeError('The transforms must be a list!')
+ if len(transforms) < 1:
+ raise ValueError('The length of transforms ' + \
+ 'must be equal or larger than 1!')
+ self.transforms = transforms
+ self.to_rgb = False
+
+ def __call__(self, im, im_info=None, label=None):
+ """
+ Args:
+ im (str/np.ndarray): 图像路径/图像np.ndarray数据。
+ im_info (dict): 存储与图像相关的信息,dict中的字段如下:
+ - shape_before_resize (tuple): 图像resize之前的大小(h, w)。
+ - shape_before_padding (tuple): 图像padding之前的大小(h, w)。
+ label (str/np.ndarray): 标注图像路径/标注图像np.ndarray数据。
+
+ Returns:
+ tuple: 根据网络所需字段所组成的tuple;字段由transforms中的最后一个数据预处理操作决定。
+ """
+
+ if im_info is None:
+ im_info = dict()
+ im = cv2.imread(im).astype('float32')
+ if im is None:
+ raise ValueError('Can\'t read The image file {}!'.format(im))
+ if self.to_rgb:
+ im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
+ if label is not None:
+ label = np.asarray(Image.open(label))
+
+ for op in self.transforms:
+ outputs = op(im, im_info, label)
+ im = outputs[0]
+ if len(outputs) >= 2:
+ im_info = outputs[1]
+ if len(outputs) == 3:
+ label = outputs[2]
+ return outputs
+
+
+class RandomHorizontalFlip:
+ """以一定的概率对图像进行水平翻转。当存在标注图像时,则同步进行翻转。
+
+ Args:
+ prob (float): 随机水平翻转的概率。默认值为0.5。
+
+ """
+
+ def __init__(self, prob=0.5):
+ self.prob = prob
+
+ def __call__(self, im, im_info=None, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict): 存储与图像相关的信息。
+ label (np.ndarray): 标注图像np.ndarray数据。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
+ 存储与图像相关信息的字典和标注图像np.ndarray数据。
+ """
+ if random.random() < self.prob:
+ im = horizontal_flip(im)
+ if label is not None:
+ label = horizontal_flip(label)
+ if label is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label)
+
+
+class RandomVerticalFlip:
+ """以一定的概率对图像进行垂直翻转。当存在标注图像时,则同步进行翻转。
+
+ Args:
+ prob (float): 随机垂直翻转的概率。默认值为0.1。
+ """
+
+ def __init__(self, prob=0.1):
+ self.prob = prob
+
+ def __call__(self, im, im_info=None, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict): 存储与图像相关的信息。
+ label (np.ndarray): 标注图像np.ndarray数据。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
+ 存储与图像相关信息的字典和标注图像np.ndarray数据。
+ """
+ if random.random() < self.prob:
+ im = vertical_flip(im)
+ if label is not None:
+ label = vertical_flip(label)
+ if label is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label)
+
+
+class Resize:
+ """调整图像大小(resize),当存在标注图像时,则同步进行处理。
+
+ - 当目标大小(target_size)类型为int时,根据插值方式,
+ 将图像resize为[target_size, target_size]。
+ - 当目标大小(target_size)类型为list或tuple时,根据插值方式,
+ 将图像resize为target_size, target_size的输入应为[w, h]或(w, h)。
+
+ Args:
+ target_size (int|list|tuple): 目标大小。
+ interp (str): resize的插值方式,与opencv的插值方式对应,
+ 可选的值为['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4'],默认为"LINEAR"。
+
+ Raises:
+ TypeError: target_size不是int/list/tuple。
+ ValueError: target_size为list/tuple时元素个数不等于2。
+ AssertionError: interp的取值不在['NEAREST', 'LINEAR', 'CUBIC', 'AREA', 'LANCZOS4']之内。
+ """
+
+ # The interpolation mode
+ interp_dict = {
+ 'NEAREST': cv2.INTER_NEAREST,
+ 'LINEAR': cv2.INTER_LINEAR,
+ 'CUBIC': cv2.INTER_CUBIC,
+ 'AREA': cv2.INTER_AREA,
+ 'LANCZOS4': cv2.INTER_LANCZOS4
+ }
+
+ def __init__(self, target_size, interp='LINEAR'):
+ self.interp = interp
+ assert interp in self.interp_dict, "interp should be one of {}".format(
+ interp_dict.keys())
+ if isinstance(target_size, list) or isinstance(target_size, tuple):
+ if len(target_size) != 2:
+ raise ValueError(
+ 'when target is list or tuple, it should include 2 elements, but it is {}'
+ .format(target_size))
+ elif not isinstance(target_size, int):
+ raise TypeError(
+ "Type of target_size is invalid. Must be Integer or List or tuple, now is {}"
+ .format(type(target_size)))
+
+ self.target_size = target_size
+
+ def __call__(self, im, im_info=None, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict): 存储与图像相关的信息。
+ label (np.ndarray): 标注图像np.ndarray数据。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
+ 存储与图像相关信息的字典和标注图像np.ndarray数据。
+ 其中,im_info跟新字段为:
+ -shape_before_resize (tuple): 保存resize之前图像的形状(h, w)。
+
+ Raises:
+ ZeroDivisionError: im的短边为0。
+ TypeError: im不是np.ndarray数据。
+ ValueError: im不是3维nd.ndarray。
+ """
+ if im_info is None:
+ im_info = OrderedDict()
+ im_info['shape_before_resize'] = im.shape[:2]
+
+ if not isinstance(im, np.ndarray):
+ raise TypeError("ResizeImage: image type is not np.ndarray.")
+ if len(im.shape) != 3:
+ raise ValueError('ResizeImage: image is not 3-dimensional.')
+ im_shape = im.shape
+ im_size_min = np.min(im_shape[0:2])
+ im_size_max = np.max(im_shape[0:2])
+ if float(im_size_min) == 0:
+ raise ZeroDivisionError('ResizeImage: min size of image is 0')
+
+ if isinstance(self.target_size, int):
+ resize_w = self.target_size
+ resize_h = self.target_size
+ else:
+ resize_w = self.target_size[0]
+ resize_h = self.target_size[1]
+ im_scale_x = float(resize_w) / float(im_shape[1])
+ im_scale_y = float(resize_h) / float(im_shape[0])
+
+ im = cv2.resize(
+ im,
+ None,
+ None,
+ fx=im_scale_x,
+ fy=im_scale_y,
+ interpolation=self.interp_dict[self.interp])
+ if label is not None:
+ label = cv2.resize(
+ label,
+ None,
+ None,
+ fx=im_scale_x,
+ fy=im_scale_y,
+ interpolation=self.interp_dict['NEAREST'])
+ if label is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label)
+
+
+class ResizeByLong:
+ """对图像长边resize到固定值,短边按比例进行缩放。当存在标注图像时,则同步进行处理。
+
+ Args:
+ long_size (int): resize后图像的长边大小。
+ """
+
+ def __init__(self, long_size):
+ self.long_size = long_size
+
+ def __call__(self, im, im_info=None, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict): 存储与图像相关的信息。
+ label (np.ndarray): 标注图像np.ndarray数据。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
+ 存储与图像相关信息的字典和标注图像np.ndarray数据。
+ 其中,im_info新增字段为:
+ -shape_before_resize (tuple): 保存resize之前图像的形状(h, w)。
+ """
+ if im_info is None:
+ im_info = OrderedDict()
+
+ im_info['shape_before_resize'] = im.shape[:2]
+ im = resize_long(im, self.long_size)
+ if label is not None:
+ label = resize_long(label, self.long_size, cv2.INTER_NEAREST)
+
+ if label is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label)
+
+
+class ResizeRangeScaling:
+ """对图像长边随机resize到指定范围内,短边按比例进行缩放。当存在标注图像时,则同步进行处理。
+
+ Args:
+ min_value (int): 图像长边resize后的最小值。默认值400。
+ max_value (int): 图像长边resize后的最大值。默认值600。
+
+ Raises:
+ ValueError: min_value大于max_value
+ """
+
+ def __init__(self, min_value=400, max_value=600):
+ if min_value > max_value:
+ raise ValueError('min_value must be less than max_value, '
+ 'but they are {} and {}.'.format(
+ min_value, max_value))
+ self.min_value = min_value
+ self.max_value = max_value
+
+ def __call__(self, im, im_info=None, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict): 存储与图像相关的信息。
+ label (np.ndarray): 标注图像np.ndarray数据。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
+ 存储与图像相关信息的字典和标注图像np.ndarray数据。
+ """
+ if self.min_value == self.max_value:
+ random_size = self.max_value
+ else:
+ random_size = int(
+ np.random.uniform(self.min_value, self.max_value) + 0.5)
+ im = resize_long(im, random_size, cv2.INTER_LINEAR)
+ if label is not None:
+ label = resize_long(label, random_size, cv2.INTER_NEAREST)
+
+ if label is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label)
+
+
+class ResizeStepScaling:
+ """对图像按照某一个比例resize,这个比例以scale_step_size为步长
+ 在[min_scale_factor, max_scale_factor]随机变动。当存在标注图像时,则同步进行处理。
+
+ Args:
+ min_scale_factor(float), resize最小尺度。默认值0.75。
+ max_scale_factor (float), resize最大尺度。默认值1.25。
+ scale_step_size (float), resize尺度范围间隔。默认值0.25。
+
+ Raises:
+ ValueError: min_scale_factor大于max_scale_factor
+ """
+
+ def __init__(self,
+ min_scale_factor=0.75,
+ max_scale_factor=1.25,
+ scale_step_size=0.25):
+ if min_scale_factor > max_scale_factor:
+ raise ValueError(
+ 'min_scale_factor must be less than max_scale_factor, '
+ 'but they are {} and {}.'.format(min_scale_factor,
+ max_scale_factor))
+ self.min_scale_factor = min_scale_factor
+ self.max_scale_factor = max_scale_factor
+ self.scale_step_size = scale_step_size
+
+ def __call__(self, im, im_info=None, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict): 存储与图像相关的信息。
+ label (np.ndarray): 标注图像np.ndarray数据。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
+ 存储与图像相关信息的字典和标注图像np.ndarray数据。
+ """
+ if self.min_scale_factor == self.max_scale_factor:
+ scale_factor = self.min_scale_factor
+
+ elif self.scale_step_size == 0:
+ scale_factor = np.random.uniform(self.min_scale_factor,
+ self.max_scale_factor)
+
+ else:
+ num_steps = int((self.max_scale_factor - self.min_scale_factor) /
+ self.scale_step_size + 1)
+ scale_factors = np.linspace(self.min_scale_factor,
+ self.max_scale_factor,
+ num_steps).tolist()
+ np.random.shuffle(scale_factors)
+ scale_factor = scale_factors[0]
+
+ im = cv2.resize(
+ im, (0, 0),
+ fx=scale_factor,
+ fy=scale_factor,
+ interpolation=cv2.INTER_LINEAR)
+ if label is not None:
+ label = cv2.resize(
+ label, (0, 0),
+ fx=scale_factor,
+ fy=scale_factor,
+ interpolation=cv2.INTER_NEAREST)
+
+ if label is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label)
+
+
+class Normalize:
+ """对图像进行标准化。
+ 1.尺度缩放到 [0,1]。
+ 2.对图像进行减均值除以标准差操作。
+
+ Args:
+ mean (list): 图像数据集的均值。默认值[0.5, 0.5, 0.5]。
+ std (list): 图像数据集的标准差。默认值[0.5, 0.5, 0.5]。
+
+ Raises:
+ ValueError: mean或std不是list对象。std包含0。
+ """
+
+ def __init__(self, mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]):
+ self.mean = mean
+ self.std = std
+ if not (isinstance(self.mean, list) and isinstance(self.std, list)):
+ raise ValueError("{}: input type is invalid.".format(self))
+ from functools import reduce
+ if reduce(lambda x, y: x * y, self.std) == 0:
+ raise ValueError('{}: std is invalid!'.format(self))
+
+ def __call__(self, im, im_info=None, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict): 存储与图像相关的信息。
+ label (np.ndarray): 标注图像np.ndarray数据。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
+ 存储与图像相关信息的字典和标注图像np.ndarray数据。
+ """
+
+ mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
+ std = np.array(self.std)[np.newaxis, np.newaxis, :]
+ im = normalize(im, mean, std)
+
+ if label is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label)
+
+
+class Padding:
+ """对图像或标注图像进行padding,padding方向为右和下。
+ 根据提供的值对图像或标注图像进行padding操作。
+
+ Args:
+ target_size (int|list|tuple): padding后图像的大小。
+ im_padding_value (list): 图像padding的值。默认为[127.5, 127.5, 127.5]。
+ label_padding_value (int): 标注图像padding的值。默认值为255。
+
+ Raises:
+ TypeError: target_size不是int|list|tuple。
+ ValueError: target_size为list|tuple时元素个数不等于2。
+ """
+
+ def __init__(self,
+ target_size,
+ im_padding_value=[127.5, 127.5, 127.5],
+ label_padding_value=255):
+ if isinstance(target_size, list) or isinstance(target_size, tuple):
+ if len(target_size) != 2:
+ raise ValueError(
+ 'when target is list or tuple, it should include 2 elements, but it is {}'
+ .format(target_size))
+ elif not isinstance(target_size, int):
+ raise TypeError(
+ "Type of target_size is invalid. Must be Integer or List or tuple, now is {}"
+ .format(type(target_size)))
+ self.target_size = target_size
+ self.im_padding_value = im_padding_value
+ self.label_padding_value = label_padding_value
+
+ def __call__(self, im, im_info=None, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict): 存储与图像相关的信息。
+ label (np.ndarray): 标注图像np.ndarray数据。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
+ 存储与图像相关信息的字典和标注图像np.ndarray数据。
+ 其中,im_info新增字段为:
+ -shape_before_padding (tuple): 保存padding之前图像的形状(h, w)。
+
+ Raises:
+ ValueError: 输入图像im或label的形状大于目标值
+ """
+ if im_info is None:
+ im_info = OrderedDict()
+ im_info['shape_before_padding'] = im.shape[:2]
+
+ im_height, im_width = im.shape[0], im.shape[1]
+ if isinstance(self.target_size, int):
+ target_height = self.target_size
+ target_width = self.target_size
+ else:
+ target_height = self.target_size[1]
+ target_width = self.target_size[0]
+ pad_height = target_height - im_height
+ pad_width = target_width - im_width
+ if pad_height < 0 or pad_width < 0:
+ raise ValueError(
+ 'the size of image should be less than target_size, but the size of image ({}, {}), is larger than target_size ({}, {})'
+ .format(im_width, im_height, target_width, target_height))
+ else:
+ im = cv2.copyMakeBorder(
+ im,
+ 0,
+ pad_height,
+ 0,
+ pad_width,
+ cv2.BORDER_CONSTANT,
+ value=self.im_padding_value)
+ if label is not None:
+ label = cv2.copyMakeBorder(
+ label,
+ 0,
+ pad_height,
+ 0,
+ pad_width,
+ cv2.BORDER_CONSTANT,
+ value=self.label_padding_value)
+ if label is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label)
+
+
+class RandomPaddingCrop:
+ """对图像和标注图进行随机裁剪,当所需要的裁剪尺寸大于原图时,则进行padding操作。
+
+ Args:
+ crop_size (int|list|tuple): 裁剪图像大小。默认为512。
+ im_padding_value (list): 图像padding的值。默认为[127.5, 127.5, 127.5]。
+ label_padding_value (int): 标注图像padding的值。默认值为255。
+
+ Raises:
+ TypeError: crop_size不是int/list/tuple。
+ ValueError: target_size为list/tuple时元素个数不等于2。
+ """
+
+ def __init__(self,
+ crop_size=512,
+ im_padding_value=[127.5, 127.5, 127.5],
+ label_padding_value=255):
+ if isinstance(crop_size, list) or isinstance(crop_size, tuple):
+ if len(crop_size) != 2:
+ raise ValueError(
+ 'when crop_size is list or tuple, it should include 2 elements, but it is {}'
+ .format(crop_size))
+ elif not isinstance(crop_size, int):
+ raise TypeError(
+ "Type of crop_size is invalid. Must be Integer or List or tuple, now is {}"
+ .format(type(crop_size)))
+ self.crop_size = crop_size
+ self.im_padding_value = im_padding_value
+ self.label_padding_value = label_padding_value
+
+ def __call__(self, im, im_info=None, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict): 存储与图像相关的信息。
+ label (np.ndarray): 标注图像np.ndarray数据。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
+ 存储与图像相关信息的字典和标注图像np.ndarray数据。
+ """
+ if isinstance(self.crop_size, int):
+ crop_width = self.crop_size
+ crop_height = self.crop_size
+ else:
+ crop_width = self.crop_size[0]
+ crop_height = self.crop_size[1]
+
+ img_height = im.shape[0]
+ img_width = im.shape[1]
+
+ if img_height == crop_height and img_width == crop_width:
+ if label is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label)
+ else:
+ pad_height = max(crop_height - img_height, 0)
+ pad_width = max(crop_width - img_width, 0)
+ if (pad_height > 0 or pad_width > 0):
+ im = cv2.copyMakeBorder(
+ im,
+ 0,
+ pad_height,
+ 0,
+ pad_width,
+ cv2.BORDER_CONSTANT,
+ value=self.im_padding_value)
+ if label is not None:
+ label = cv2.copyMakeBorder(
+ label,
+ 0,
+ pad_height,
+ 0,
+ pad_width,
+ cv2.BORDER_CONSTANT,
+ value=self.label_padding_value)
+ img_height = im.shape[0]
+ img_width = im.shape[1]
+
+ if crop_height > 0 and crop_width > 0:
+ h_off = np.random.randint(img_height - crop_height + 1)
+ w_off = np.random.randint(img_width - crop_width + 1)
+
+ im = im[h_off:(crop_height + h_off), w_off:(
+ w_off + crop_width), :]
+ if label is not None:
+ label = label[h_off:(crop_height + h_off), w_off:(
+ w_off + crop_width)]
+ if label is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label)
+
+
+class RandomBlur:
+ """以一定的概率对图像进行高斯模糊。
+
+ Args:
+ prob (float): 图像模糊概率。默认为0.1。
+ """
+
+ def __init__(self, prob=0.1):
+ self.prob = prob
+
+ def __call__(self, im, im_info=None, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict): 存储与图像相关的信息。
+ label (np.ndarray): 标注图像np.ndarray数据。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
+ 存储与图像相关信息的字典和标注图像np.ndarray数据。
+ """
+ if self.prob <= 0:
+ n = 0
+ elif self.prob >= 1:
+ n = 1
+ else:
+ n = int(1.0 / self.prob)
+ if n > 0:
+ if np.random.randint(0, n) == 0:
+ radius = np.random.randint(3, 10)
+ if radius % 2 != 1:
+ radius = radius + 1
+ if radius > 9:
+ radius = 9
+ im = cv2.GaussianBlur(im, (radius, radius), 0, 0)
+
+ if label is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label)
+
+
+class RandomRotate:
+ """对图像进行随机旋转, 模型训练时的数据增强操作。
+ 在旋转区间[-rotate_range, rotate_range]内,对图像进行随机旋转,当存在标注图像时,同步进行,
+ 并对旋转后的图像和标注图像进行相应的padding。
+
+ Args:
+ rotate_range (float): 最大旋转角度。默认为15度。
+ im_padding_value (list): 图像padding的值。默认为[127.5, 127.5, 127.5]。
+ label_padding_value (int): 标注图像padding的值。默认为255。
+
+ """
+
+ def __init__(self,
+ rotate_range=15,
+ im_padding_value=[127.5, 127.5, 127.5],
+ label_padding_value=255):
+ self.rotate_range = rotate_range
+ self.im_padding_value = im_padding_value
+ self.label_padding_value = label_padding_value
+
+ def __call__(self, im, im_info=None, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict): 存储与图像相关的信息。
+ label (np.ndarray): 标注图像np.ndarray数据。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
+ 存储与图像相关信息的字典和标注图像np.ndarray数据。
+ """
+ if self.rotate_range > 0:
+ (h, w) = im.shape[:2]
+ do_rotation = np.random.uniform(-self.rotate_range,
+ self.rotate_range)
+ pc = (w // 2, h // 2)
+ r = cv2.getRotationMatrix2D(pc, do_rotation, 1.0)
+ cos = np.abs(r[0, 0])
+ sin = np.abs(r[0, 1])
+
+ nw = int((h * sin) + (w * cos))
+ nh = int((h * cos) + (w * sin))
+
+ (cx, cy) = pc
+ r[0, 2] += (nw / 2) - cx
+ r[1, 2] += (nh / 2) - cy
+ dsize = (nw, nh)
+ im = cv2.warpAffine(
+ im,
+ r,
+ dsize=dsize,
+ flags=cv2.INTER_LINEAR,
+ borderMode=cv2.BORDER_CONSTANT,
+ borderValue=self.im_padding_value)
+ label = cv2.warpAffine(
+ label,
+ r,
+ dsize=dsize,
+ flags=cv2.INTER_NEAREST,
+ borderMode=cv2.BORDER_CONSTANT,
+ borderValue=self.label_padding_value)
+
+ if label is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label)
+
+
+class RandomScaleAspect:
+ """裁剪并resize回原始尺寸的图像和标注图像。
+ 按照一定的面积比和宽高比对图像进行裁剪,并reszie回原始图像的图像,当存在标注图时,同步进行。
+
+ Args:
+ min_scale (float):裁取图像占原始图像的面积比,取值[0,1],为0时则返回原图。默认为0.5。
+ aspect_ratio (float): 裁取图像的宽高比范围,非负值,为0时返回原图。默认为0.33。
+ """
+
+ def __init__(self, min_scale=0.5, aspect_ratio=0.33):
+ self.min_scale = min_scale
+ self.aspect_ratio = aspect_ratio
+
+ def __call__(self, im, im_info=None, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict): 存储与图像相关的信息。
+ label (np.ndarray): 标注图像np.ndarray数据。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
+ 存储与图像相关信息的字典和标注图像np.ndarray数据。
+ """
+ if self.min_scale != 0 and self.aspect_ratio != 0:
+ img_height = im.shape[0]
+ img_width = im.shape[1]
+ for i in range(0, 10):
+ area = img_height * img_width
+ target_area = area * np.random.uniform(self.min_scale, 1.0)
+ aspectRatio = np.random.uniform(self.aspect_ratio,
+ 1.0 / self.aspect_ratio)
+
+ dw = int(np.sqrt(target_area * 1.0 * aspectRatio))
+ dh = int(np.sqrt(target_area * 1.0 / aspectRatio))
+ if (np.random.randint(10) < 5):
+ tmp = dw
+ dw = dh
+ dh = tmp
+
+ if (dh < img_height and dw < img_width):
+ h1 = np.random.randint(0, img_height - dh)
+ w1 = np.random.randint(0, img_width - dw)
+
+ im = im[h1:(h1 + dh), w1:(w1 + dw), :]
+ label = label[h1:(h1 + dh), w1:(w1 + dw)]
+ im = cv2.resize(
+ im, (img_width, img_height),
+ interpolation=cv2.INTER_LINEAR)
+ label = cv2.resize(
+ label, (img_width, img_height),
+ interpolation=cv2.INTER_NEAREST)
+ break
+ if label is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label)
+
+
+class RandomDistort:
+ """对图像进行随机失真。
+
+ 1. 对变换的操作顺序进行随机化操作。
+ 2. 按照1中的顺序以一定的概率对图像进行随机像素内容变换。
+
+ Args:
+ brightness_range (float): 明亮度因子的范围。默认为0.5。
+ brightness_prob (float): 随机调整明亮度的概率。默认为0.5。
+ contrast_range (float): 对比度因子的范围。默认为0.5。
+ contrast_prob (float): 随机调整对比度的概率。默认为0.5。
+ saturation_range (float): 饱和度因子的范围。默认为0.5。
+ saturation_prob (float): 随机调整饱和度的概率。默认为0.5。
+ hue_range (int): 色调因子的范围。默认为18。
+ hue_prob (float): 随机调整色调的概率。默认为0.5。
+ """
+
+ def __init__(self,
+ brightness_range=0.5,
+ brightness_prob=0.5,
+ contrast_range=0.5,
+ contrast_prob=0.5,
+ saturation_range=0.5,
+ saturation_prob=0.5,
+ hue_range=18,
+ hue_prob=0.5):
+ self.brightness_range = brightness_range
+ self.brightness_prob = brightness_prob
+ self.contrast_range = contrast_range
+ self.contrast_prob = contrast_prob
+ self.saturation_range = saturation_range
+ self.saturation_prob = saturation_prob
+ self.hue_range = hue_range
+ self.hue_prob = hue_prob
+
+ def __call__(self, im, im_info=None, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict): 存储与图像相关的信息。
+ label (np.ndarray): 标注图像np.ndarray数据。
+
+ Returns:
+ tuple: 当label为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当label不为空时,返回的tuple为(im, im_info, label),分别对应图像np.ndarray数据、
+ 存储与图像相关信息的字典和标注图像np.ndarray数据。
+ """
+ brightness_lower = 1 - self.brightness_range
+ brightness_upper = 1 + self.brightness_range
+ contrast_lower = 1 - self.contrast_range
+ contrast_upper = 1 + self.contrast_range
+ saturation_lower = 1 - self.saturation_range
+ saturation_upper = 1 + self.saturation_range
+ hue_lower = -self.hue_range
+ hue_upper = self.hue_range
+ ops = [brightness, contrast, saturation, hue]
+ random.shuffle(ops)
+ params_dict = {
+ 'brightness': {
+ 'brightness_lower': brightness_lower,
+ 'brightness_upper': brightness_upper
+ },
+ 'contrast': {
+ 'contrast_lower': contrast_lower,
+ 'contrast_upper': contrast_upper
+ },
+ 'saturation': {
+ 'saturation_lower': saturation_lower,
+ 'saturation_upper': saturation_upper
+ },
+ 'hue': {
+ 'hue_lower': hue_lower,
+ 'hue_upper': hue_upper
+ }
+ }
+ prob_dict = {
+ 'brightness': self.brightness_prob,
+ 'contrast': self.contrast_prob,
+ 'saturation': self.saturation_prob,
+ 'hue': self.hue_prob
+ }
+ im = im.astype('uint8')
+ im = Image.fromarray(im)
+ for id in range(4):
+ params = params_dict[ops[id].__name__]
+ prob = prob_dict[ops[id].__name__]
+ params['im'] = im
+ if np.random.uniform(0, 1) < prob:
+ im = ops[id](**params)
+ im = np.asarray(im).astype('float32')
+ if label is None:
+ return (im, im_info)
+ else:
+ return (im, im_info, label)
+
+
+class ArrangeSegmenter:
+ """获取训练/验证/预测所需的信息。
+
+ Args:
+ mode (str): 指定数据用于何种用途,取值范围为['train', 'eval', 'test', 'quant']。
+
+ Raises:
+ ValueError: mode的取值不在['train', 'eval', 'test', 'quant']之内
+ """
+
+ def __init__(self, mode):
+ if mode not in ['train', 'eval', 'test', 'quant']:
+ raise ValueError(
+ "mode should be defined as one of ['train', 'eval', 'test', 'quant']!"
+ )
+ self.mode = mode
+
+ def __call__(self, im, im_info, label=None):
+ """
+ Args:
+ im (np.ndarray): 图像np.ndarray数据。
+ im_info (dict): 存储与图像相关的信息。
+ label (np.ndarray): 标注图像np.ndarray数据。
+
+ Returns:
+ tuple: 当mode为'train'或'eval'时,返回的tuple为(im, label),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
+ 当mode为'test'时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;当mode为
+ 'quant'时,返回的tuple为(im,),为图像np.ndarray数据。
+ """
+ im = permute(im, False)
+ if self.mode == 'train' or self.mode == 'eval':
+ label = label[np.newaxis, :, :]
+ return (im, label)
+ elif self.mode == 'test':
+ return (im, im_info)
+ else:
+ return (im, )
diff --git a/paddlex/det.py b/paddlex/det.py
new file mode 100644
index 0000000000000000000000000000000000000000..a69d78f9c117a19dc412ee20f023bf7ad7e8684f
--- /dev/null
+++ b/paddlex/det.py
@@ -0,0 +1,22 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from . import cv
+
+FasterRCNN = cv.models.FasterRCNN
+YOLOv3 = cv.models.YOLOv3
+MaskRCNN = cv.models.MaskRCNN
+transforms = cv.transforms.det_transforms
+visualize = cv.models.utils.visualize.visualize_detection
diff --git a/paddlex/seg.py b/paddlex/seg.py
new file mode 100644
index 0000000000000000000000000000000000000000..0f92813d45b4e7f5e08ee64fbd6cfa675087ba4a
--- /dev/null
+++ b/paddlex/seg.py
@@ -0,0 +1,21 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from . import cv
+
+UNet = cv.models.UNet
+DeepLabv3p = cv.models.DeepLabv3p
+transforms = cv.transforms.seg_transforms
+visualize = cv.models.utils.visualize.visualize_segmentation
diff --git a/paddlex/slim.py b/paddlex/slim.py
new file mode 100644
index 0000000000000000000000000000000000000000..57fc104d75307ac13ead57d12717490eb8154acf
--- /dev/null
+++ b/paddlex/slim.py
@@ -0,0 +1,34 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from .cv.models.slim import prune
+from .cv.models.slim import visualize
+
+cal_params_sensitivities = prune.cal_params_sensitivities
+visualize = visualize.visualize
+
+
+def export_quant_model(model,
+ test_dataset,
+ batch_size=1,
+ batch_num=10,
+ save_dir='./quant_model',
+ cache_dir='./temp'):
+ model.export_quant_model(
+ dataset=test_dataset,
+ batch_size=batch_size,
+ batch_num=batch_num,
+ save_dir=save_dir,
+ cache_dir='./temp')
diff --git a/paddlex/utils/__init__.py b/paddlex/utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ff774c985feb6ffc24a3e8c67237cdff0a074ee4
--- /dev/null
+++ b/paddlex/utils/__init__.py
@@ -0,0 +1,22 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+from __future__ import absolute_import
+from . import logging
+from . import utils
+from . import save
+from .utils import seconds_to_hms
+from .download import download
+from .download import decompress
+from .download import download_and_decompress
diff --git a/paddlex/utils/download.py b/paddlex/utils/download.py
new file mode 100644
index 0000000000000000000000000000000000000000..fafd5082ac182c9986b564a1f2b26b4b85d1dd55
--- /dev/null
+++ b/paddlex/utils/download.py
@@ -0,0 +1,156 @@
+import os
+import os.path as osp
+import shutil
+import requests
+import tqdm
+import time
+import hashlib
+import tarfile
+import zipfile
+from .utils import logging
+
+DOWNLOAD_RETRY_LIMIT = 3
+
+
+def md5check(fullname, md5sum=None):
+ if md5sum is None:
+ return True
+
+ logger.info("File {} md5 checking...".format(fullname))
+ md5 = hashlib.md5()
+ with open(fullname, 'rb') as f:
+ for chunk in iter(lambda: f.read(4096), b""):
+ md5.update(chunk)
+ calc_md5sum = md5.hexdigest()
+
+ if calc_md5sum != md5sum:
+ logger.info("File {} md5 check failed, {}(calc) != "
+ "{}(base)".format(fullname, calc_md5sum, md5sum))
+ return False
+ return True
+
+
+def move_and_merge_tree(src, dst):
+ """
+ Move src directory to dst, if dst is already exists,
+ merge src to dst
+ """
+ if not osp.exists(dst):
+ shutil.move(src, dst)
+ else:
+ for fp in os.listdir(src):
+ src_fp = osp.join(src, fp)
+ dst_fp = osp.join(dst, fp)
+ if osp.isdir(src_fp):
+ if osp.isdir(dst_fp):
+ move_and_merge_tree(src_fp, dst_fp)
+ else:
+ shutil.move(src_fp, dst_fp)
+ elif osp.isfile(src_fp) and \
+ not osp.isfile(dst_fp):
+ shutil.move(src_fp, dst_fp)
+
+
+def download(url, path, md5sum=None):
+ """
+ Download from url, save to path.
+
+ url (str): download url
+ path (str): download to given path
+ """
+ if not osp.exists(path):
+ os.makedirs(path)
+
+ fname = osp.split(url)[-1]
+ fullname = osp.join(path, fname)
+ retry_cnt = 0
+ while not (osp.exists(fullname) and md5check(fullname, md5sum)):
+ if retry_cnt < DOWNLOAD_RETRY_LIMIT:
+ retry_cnt += 1
+ else:
+ logging.debug("{} download failed.".format(fname))
+ raise RuntimeError("Download from {} failed. "
+ "Retry limit reached".format(url))
+
+ logging.info("Downloading {} from {}".format(fname, url))
+
+ req = requests.get(url, stream=True)
+ if req.status_code != 200:
+ raise RuntimeError("Downloading from {} failed with code "
+ "{}!".format(url, req.status_code))
+
+ # For protecting download interupted, download to
+ # tmp_fullname firstly, move tmp_fullname to fullname
+ # after download finished
+ tmp_fullname = fullname + "_tmp"
+ total_size = req.headers.get('content-length')
+ with open(tmp_fullname, 'wb') as f:
+ if total_size:
+ download_size = 0
+ current_time = time.time()
+ for chunk in tqdm.tqdm(
+ req.iter_content(chunk_size=1024),
+ total=(int(total_size) + 1023) // 1024,
+ unit='KB'):
+ f.write(chunk)
+ download_size += 1024
+ if download_size % 524288 == 0:
+ total_size_m = round(
+ int(total_size) / 1024.0 / 1024.0, 2)
+ download_size_m = round(
+ download_size / 1024.0 / 1024.0, 2)
+ speed = int(
+ 524288 / (time.time() - current_time + 0.01) /
+ 1024.0)
+ current_time = time.time()
+ logging.debug(
+ "Downloading: TotalSize={}M, DownloadSize={}M, Speed={}KB/s"
+ .format(total_size_m, download_size_m, speed))
+ else:
+ for chunk in req.iter_content(chunk_size=1024):
+ if chunk:
+ f.write(chunk)
+ shutil.move(tmp_fullname, fullname)
+ logging.debug("{} download completed.".format(fname))
+
+ return fullname
+
+
+def decompress(fname):
+ """
+ Decompress for zip and tar file
+ """
+ logging.info("Decompressing {}...".format(fname))
+
+ # For protecting decompressing interupted,
+ # decompress to fpath_tmp directory firstly, if decompress
+ # successed, move decompress files to fpath and delete
+ # fpath_tmp and remove download compress file.
+ fpath = osp.split(fname)[0]
+ fpath_tmp = osp.join(fpath, 'tmp')
+ if osp.isdir(fpath_tmp):
+ shutil.rmtree(fpath_tmp)
+ os.makedirs(fpath_tmp)
+
+ if fname.find('tar') >= 0 or fname.find('tgz') >= 0:
+ with tarfile.open(fname) as tf:
+ tf.extractall(path=fpath_tmp)
+ elif fname.find('zip') >= 0:
+ with zipfile.ZipFile(fname) as zf:
+ zf.extractall(path=fpath_tmp)
+ else:
+ raise TypeError("Unsupport compress file type {}".format(fname))
+
+ for f in os.listdir(fpath_tmp):
+ src_dir = osp.join(fpath_tmp, f)
+ dst_dir = osp.join(fpath, f)
+ move_and_merge_tree(src_dir, dst_dir)
+
+ shutil.rmtree(fpath_tmp)
+ logging.debug("{} decompressed.".format(fname))
+
+
+def download_and_decompress(url, path='.'):
+ download(url, path)
+ fname = osp.split(url)[-1]
+ decompress(osp.join(path, fname))
diff --git a/paddlex/utils/logging.py b/paddlex/utils/logging.py
new file mode 100644
index 0000000000000000000000000000000000000000..42aa3e400bb748d58809109216e1aed28522a0a8
--- /dev/null
+++ b/paddlex/utils/logging.py
@@ -0,0 +1,55 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import time
+import os
+import sys
+import colorama
+from colorama import init
+import paddlex
+
+levels = {0: 'ERROR', 1: 'WARNING', 2: 'INFO', 3: 'DEBUG'}
+
+
+def log(level=2, message="", use_color=False):
+ current_time = time.time()
+ time_array = time.localtime(current_time)
+ current_time = time.strftime("%Y-%m-%d %H:%M:%S", time_array)
+ if paddlex.log_level >= level:
+ if use_color:
+ init(autoreset=True)
+ print("\033[1;31;40m{} [{}]\t{}\033[0m".format(
+ current_time, levels[level],
+ message).encode("utf-8").decode("latin1"))
+ else:
+ print(
+ "{} [{}]\t{}".format(current_time, levels[level],
+ message).encode("utf-8").decode("latin1"))
+ sys.stdout.flush()
+
+
+def debug(message="", use_color=False):
+ log(level=3, message=message, use_color=use_color)
+
+
+def info(message="", use_color=False):
+ log(level=2, message=message, use_color=use_color)
+
+
+def warning(message="", use_color=False):
+ log(level=1, message=message, use_color=use_color)
+
+
+def error(message="", use_color=False):
+ log(level=0, message=message, use_color=use_color)
diff --git a/paddlex/utils/save.py b/paddlex/utils/save.py
new file mode 100644
index 0000000000000000000000000000000000000000..397022d3c1e2d2110e900051a666f820de523204
--- /dev/null
+++ b/paddlex/utils/save.py
@@ -0,0 +1,628 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import six
+import os
+import errno
+import warnings
+import six
+
+import numpy as np
+
+import paddle
+from paddle.fluid import layers
+from paddle.fluid import core
+from paddle.fluid import unique_name
+from paddle.fluid.executor import global_scope
+from paddle.fluid.compiler import CompiledProgram
+from paddle.fluid.framework import Program, Parameter, default_main_program, default_startup_program, Variable, \
+ program_guard
+
+__all__ = ["save_mask_inference_model"]
+
+
+def _get_valid_program(main_program):
+ if main_program is None:
+ main_program = default_main_program()
+ elif isinstance(main_program, CompiledProgram):
+ main_program = main_program._program
+ if main_program is None:
+ raise TypeError("program should be as Program type or None")
+ warnings.warn(
+ "The input is a CompiledProgram, this is not recommended.")
+ if not isinstance(main_program, Program):
+ raise TypeError("program should be as Program type or None")
+ return main_program
+
+
+def prepend_feed_ops(inference_program,
+ feed_target_names,
+ feed_holder_name='feed'):
+ if len(feed_target_names) == 0:
+ return
+
+ global_block = inference_program.global_block()
+ feed_var = global_block.create_var(
+ name=feed_holder_name,
+ type=core.VarDesc.VarType.FEED_MINIBATCH,
+ persistable=True)
+
+ for i, name in enumerate(feed_target_names):
+ out = global_block.var(name)
+ global_block._prepend_op(
+ type='feed',
+ inputs={'X': [feed_var]},
+ outputs={'Out': [out]},
+ attrs={'col': i})
+
+
+def append_fetch_ops(inference_program,
+ fetch_target_names,
+ fetch_holder_name='fetch'):
+ global_block = inference_program.global_block()
+ fetch_var = global_block.create_var(
+ name=fetch_holder_name,
+ type=core.VarDesc.VarType.FETCH_LIST,
+ persistable=True)
+
+ for i, name in enumerate(fetch_target_names):
+ global_block.append_op(
+ type='fetch',
+ inputs={'X': [name]},
+ outputs={'Out': [fetch_var]},
+ attrs={'col': i})
+
+
+def _clone_var_in_block_(block, var):
+ assert isinstance(var, Variable)
+ if var.desc.type() == core.VarDesc.VarType.LOD_TENSOR:
+ return block.create_var(
+ name=var.name,
+ shape=var.shape,
+ dtype=var.dtype,
+ type=var.type,
+ lod_level=var.lod_level,
+ persistable=True)
+ else:
+ return block.create_var(
+ name=var.name,
+ shape=var.shape,
+ dtype=var.dtype,
+ type=var.type,
+ persistable=True)
+
+
+def save_vars(executor,
+ dirname,
+ main_program=None,
+ vars=None,
+ predicate=None,
+ filename=None):
+ """
+ This API saves specific variables in the `Program` to files.
+
+ There are two ways to specify the variables to be saved: set variables in
+ a list and assign it to the `vars`, or use the `predicate` function to select
+ variables that make `predicate(variable) == True`. The first way has a higher priority.
+
+ The `dirname` is used to specify the folder where to save variables.
+ If you prefer to save variables in separate files in the `dirname` floder,
+ do not set `filename`. If you prefer to save all variables in a single file,
+ use `filename` to specify it.
+
+ Args:
+ executor(Executor): The executor to run for saving variables.
+ dirname(str, optional): The folder where to save variables.
+ When you need to save the parameter to the memory, set it to None.
+ main_program(Program, optional): The program whose variables will be saved.
+ If it is None, the default main program will
+ be used automatically.
+ Default: None
+ vars(list[Variable], optional): The list contains all variables to be saved.
+ Default: None
+ predicate(function, optional): The function selects the variables that make
+ `predicate(variable) == True`.
+ Default: None
+ filename(str, optional): If you prefer to save all variables in a single file,
+ use `filename` to specify it. Otherwise, let `filename` be None.
+ Default: None
+
+ Returns:
+ str: When saving parameters to a file, returns None.
+ When saving parameters to memory, returns a binary string containing parameters.
+
+ Raises:
+ TypeError: If `main_program` is not an instance of Program nor None.
+
+ Examples:
+ .. code-block:: python
+
+ import paddle.fluid as fluid
+
+ main_prog = fluid.Program()
+ startup_prog = fluid.Program()
+ with fluid.program_guard(main_prog, startup_prog):
+ data = fluid.layers.data(name="img", shape=[64, 784], append_batch_size=False)
+ w = fluid.layers.create_parameter(shape=[784, 200], dtype='float32', name='fc_w')
+ b = fluid.layers.create_parameter(shape=[200], dtype='float32', name='fc_b')
+ hidden_w = fluid.layers.matmul(x=data, y=w)
+ hidden_b = fluid.layers.elementwise_add(hidden_w, b)
+ place = fluid.CPUPlace()
+ exe = fluid.Executor(place)
+ exe.run(startup_prog)
+
+ # The first usage: use `vars` to set the saved variables.
+ var_list = [w, b]
+ path = "./my_paddle_vars"
+ fluid.io.save_vars(executor=exe, dirname=path, vars=var_list,
+ filename="vars_file")
+ # w and b will be save in a file named "var_file".
+
+ # The second usage: use `predicate` to select the saved variable.
+ def name_has_fc(var):
+ res = "fc" in var.name
+ return res
+ param_path = "./my_paddle_model"
+ fluid.io.save_vars(executor=exe, dirname=param_path, main_program=main_prog, vars=None, predicate = name_has_fc)
+ # all variables whose names contain "fc " are saved.
+ """
+ save_to_memory = False
+ if dirname is None and filename is None:
+ save_to_memory = True
+
+ main_program = _get_valid_program(main_program)
+
+ if vars is None:
+ return save_vars(
+ executor,
+ main_program=main_program,
+ dirname=dirname,
+ vars=list(filter(predicate, main_program.list_vars())),
+ filename=filename)
+ else:
+ params_var_name = unique_name.generate("saved_params")
+ # give warning when there is no var in model
+ if len(list(vars)) == 0:
+ warnings.warn(
+ "no variable in your model, please ensure there are any variables in your model to save"
+ )
+ return None
+
+ save_program = Program()
+ save_block = save_program.global_block()
+
+ save_var_map = {}
+ for each_var in vars:
+ # NOTE: don't save the variable which type is RAW
+ if each_var.type == core.VarDesc.VarType.RAW:
+ continue
+ new_var = _clone_var_in_block_(save_block, each_var)
+ if filename is None and save_to_memory is False:
+ save_file_path = os.path.join(
+ os.path.normpath(dirname), new_var.name)
+ save_block.append_op(
+ type='save',
+ inputs={'X': [new_var]},
+ outputs={},
+ attrs={'file_path': os.path.normpath(save_file_path)})
+ else:
+ save_var_map[new_var.name] = new_var
+
+ if filename is not None or save_to_memory:
+ save_var_list = []
+ for name in sorted(save_var_map.keys()):
+ save_var_list.append(save_var_map[name])
+
+ save_path = str()
+ if save_to_memory is False:
+ save_path = os.path.join(os.path.normpath(dirname), filename)
+
+ saved_params = save_block.create_var(
+ type=core.VarDesc.VarType.RAW, name=params_var_name)
+ saved_params.desc.set_persistable(True)
+ save_block.append_op(
+ type='save_combine',
+ inputs={'X': save_var_list},
+ outputs={'Y': saved_params},
+ attrs={
+ 'file_path': save_path,
+ 'save_to_memory': save_to_memory
+ })
+
+ #NOTE(zhiqiu): save op will add variable kLookupTablePath in save_program.desc,
+ # which leads to diff on save_program and its desc. Call _sync_with_cpp
+ # to keep consistency.
+ save_program._sync_with_cpp()
+ executor.run(save_program)
+ if save_to_memory:
+ return global_scope().find_var(params_var_name).get_bytes()
+
+
+def _save_distributed_persistables(executor, dirname, main_program):
+ """
+ save_persistables for distributed training.
+ the method will do things listed below:
+ 1.save part of persistable variables on trainer.
+ 2.receive "remote prefetch variables" from parameter servers and merge them.
+ 3.save "distributed lookup table" on parameter servers.
+ 4.receive "optimizer variables" from parameter servers and merge them.
+
+ Args:
+ executor(Executor): The executor to run for saving parameters.
+ dirname(str): The saving directory path.
+ main_program(Program): The program whose parameters will be
+ saved. the main_program must be the trainer_program
+ get after transpiler.
+
+ Returns:
+ None
+
+ Examples:
+ .. code-block:: python
+
+ import paddle.fluid as fluid
+ exe = fluid.Executor(fluid.CPUPlace())
+ param_path = "./my_paddle_model"
+ t = distribute_transpiler.DistributeTranspiler()
+ t.transpile(...)
+ train_program = t.get_trainer_program()
+ _save_distributed_persistables(executor=exe, dirname=param_path, main_program=train_program)
+ """
+
+ def __save_remote_params(executor, dirname, remote_params_map):
+ """
+ recive params on pserver through rpc.
+ if the params are be sliced, will concat them to one, then save it.
+ """
+ if not remote_params_map:
+ return
+
+ prog = Program()
+ block = prog.global_block()
+
+ # recv optimize vars from pserver
+ for name, remote_params in remote_params_map.items():
+ origin = remote_params[0].origin
+ is_slice = remote_params[0].is_slice
+
+ slices = [None] * len(remote_params)
+ slice_varnames = [None] * len(remote_params)
+ remote_varnames = [None] * len(remote_params)
+ endpoints = [None] * len(remote_params)
+
+ for idx, optimizer in enumerate(remote_params):
+ block_id = optimizer.block_id
+ slice = optimizer.slice
+ endpoint = optimizer.endpoint
+
+ index = block_id if is_slice else idx
+ slices[index] = slice
+ slice_varnames[index] = "{}.slice.{}".format(slice.name, idx)
+ remote_varnames[index] = slice.name
+ endpoints[index] = endpoint
+
+ slice_shapes = []
+ for slice in slices:
+ tmp = [str(dim) for dim in slice.shape]
+ slice_shapes.append(",".join(tmp))
+
+ block.append_op(
+ type='recv_save',
+ attrs={
+ "trainer_id": 0,
+ "shape": origin.shape,
+ "slice_shapes": slice_shapes,
+ "slice_varnames": slice_varnames,
+ "remote_varnames": remote_varnames,
+ "endpoints": endpoints,
+ "file_path": os.path.join(dirname, origin.name)
+ })
+
+ executor.run(prog)
+
+
+def is_persistable(var):
+ """
+ Check whether the given variable is persistable.
+
+ Args:
+ var(Variable): The variable to be checked.
+
+ Returns:
+ bool: True if the given `var` is persistable
+ False if not.
+
+ Examples:
+ .. code-block:: python
+
+ import paddle.fluid as fluid
+ param = fluid.default_main_program().global_block().var('fc.b')
+ res = fluid.io.is_persistable(param)
+ """
+ if var.desc.type() == core.VarDesc.VarType.FEED_MINIBATCH or \
+ var.desc.type() == core.VarDesc.VarType.FETCH_LIST or \
+ var.desc.type() == core.VarDesc.VarType.READER:
+ return False
+ return var.persistable
+
+
+def save_persistables(executor, dirname, main_program=None, filename=None):
+ """
+ This operator saves all persistable variables from :code:`main_program` to
+ the folder :code:`dirname` or file :code:`filename`. You can refer to
+ :ref:`api_guide_model_save_reader_en` for more details. And then
+ saves these persistables variables to the folder :code:`dirname` or file
+ :code:`filename`.
+
+ The :code:`dirname` is used to specify the folder where persistable variables
+ are going to be saved. If you would like to save variables in separate
+ files, set :code:`filename` None; if you would like to save all variables in a
+ single file, use :code:`filename` to specify the file name.
+
+ Args:
+ executor(Executor): The executor to run for saving persistable variables.
+ You can refer to :ref:`api_guide_executor_en` for
+ more details.
+ dirname(str, optional): The saving directory path.
+ When you need to save the parameter to the memory, set it to None.
+ main_program(Program, optional): The program whose persistbale variables will
+ be saved. You can refer to
+ :ref:`api_guide_Program_en` for more details.
+ If it is None, the default main program will
+ be used.
+ Default: None.
+ filename(str, optional): The file to save all variables. If you prefer to
+ save variables in different files, set it to None.
+ Default: None.
+
+ Returns:
+ str: When saving parameters to a file, returns None.
+ When saving parameters to memory, returns a binary string containing parameters.
+
+ Examples:
+ .. code-block:: python
+
+ import paddle.fluid as fluid
+
+ dir_path = "./my_paddle_model"
+ file_name = "persistables"
+ image = fluid.data(name='img', shape=[None, 28, 28], dtype='float32')
+ label = fluid.data(name='label', shape=[None, 1], dtype='int64')
+ feeder = fluid.DataFeeder(feed_list=[image, label], place=fluid.CPUPlace())
+
+ predict = fluid.layers.fc(input=image, size=10, act='softmax')
+ loss = fluid.layers.cross_entropy(input=predict, label=label)
+ avg_loss = fluid.layers.mean(loss)
+ exe = fluid.Executor(fluid.CPUPlace())
+ exe.run(fluid.default_startup_program())
+ fluid.io.save_persistables(executor=exe, dirname=dir_path, filename=file_name)
+ # The persistables variables weights and bias in the fc layer of the network
+ # are going to be saved in the same file named "persistables" in the path
+ # "./my_paddle_model"
+ """
+ if main_program and main_program._is_distributed:
+ return _save_distributed_persistables(
+ executor, dirname=dirname, main_program=main_program)
+ else:
+ return save_vars(
+ executor,
+ dirname=dirname,
+ main_program=main_program,
+ vars=None,
+ predicate=is_persistable,
+ filename=filename)
+
+
+def save_mask_inference_model(dirname,
+ feeded_var_names,
+ target_vars,
+ executor,
+ main_program=None,
+ model_filename=None,
+ params_filename=None,
+ export_for_deployment=True,
+ program_only=False):
+ """
+ Prune the given `main_program` to build a new program especially for inference,
+ and then save it and all related parameters to given `dirname` .
+ If you just want to save parameters of your trained model, please use the
+ :ref:`api_fluid_io_save_params` . You can refer to :ref:`api_guide_model_save_reader_en`
+ for more details.
+
+ Note:
+ The :code:`dirname` is used to specify the folder where inference model
+ structure and parameters are going to be saved. If you would like to save params of
+ Program in separate files, set `params_filename` None; if you would like to save all
+ params of Program in a single file, use `params_filename` to specify the file name.
+
+ Args:
+ dirname(str): The directory path to save the inference model.
+ feeded_var_names(list[str]): list of string. Names of variables that need to be feeded
+ data during inference.
+ target_vars(list[Variable]): list of Variable. Variables from which we can get
+ inference results.
+ executor(Executor): The executor that saves the inference model. You can refer
+ to :ref:`api_guide_executor_en` for more details.
+ main_program(Program, optional): The original program, which will be pruned to
+ build the inference model. If is setted None,
+ the global default :code:`_main_program_` will be used.
+ Default: None.
+ model_filename(str, optional): The name of file to save the inference program
+ itself. If is setted None, a default filename
+ :code:`__model__` will be used.
+ params_filename(str, optional): The name of file to save all related parameters.
+ If it is setted None, parameters will be saved
+ in separate files .
+ export_for_deployment(bool): If True, programs are modified to only support
+ direct inference deployment. Otherwise,
+ more information will be stored for flexible
+ optimization and re-training. Currently, only
+ True is supported.
+ Default: True.
+ program_only(bool, optional): If True, It will save inference program only, and do not
+ save params of Program.
+ Default: False.
+
+ Returns:
+ The fetch variables' name list
+
+ Return Type:
+ list
+
+ Raises:
+ ValueError: If `feed_var_names` is not a list of basestring, an exception is thrown.
+ ValueError: If `target_vars` is not a list of Variable, an exception is thrown.
+
+ Examples:
+ .. code-block:: python
+
+ import paddle.fluid as fluid
+
+ path = "./infer_model"
+
+ # User defined network, here a softmax regresssion example
+ image = fluid.data(name='img', shape=[None, 28, 28], dtype='float32')
+ label = fluid.data(name='label', shape=[None, 1], dtype='int64')
+ feeder = fluid.DataFeeder(feed_list=[image, label], place=fluid.CPUPlace())
+ predict = fluid.layers.fc(input=image, size=10, act='softmax')
+
+ loss = fluid.layers.cross_entropy(input=predict, label=label)
+ avg_loss = fluid.layers.mean(loss)
+
+ exe = fluid.Executor(fluid.CPUPlace())
+ exe.run(fluid.default_startup_program())
+
+ # Feed data and train process
+
+ # Save inference model. Note we don't save label and loss in this example
+ fluid.io.save_inference_model(dirname=path,
+ feeded_var_names=['img'],
+ target_vars=[predict],
+ executor=exe)
+
+ # In this example, the save_inference_mode inference will prune the default
+ # main program according to the network's input node (img) and output node(predict).
+ # The pruned inference program is going to be saved in the "./infer_model/__model__"
+ # and parameters are going to be saved in separate files under folder
+ # "./infer_model".
+
+ """
+ if isinstance(feeded_var_names, six.string_types):
+ feeded_var_names = [feeded_var_names]
+ elif export_for_deployment:
+ if len(feeded_var_names) > 0:
+ # TODO(paddle-dev): polish these code blocks
+ if not (bool(feeded_var_names) and all(
+ isinstance(name, six.string_types)
+ for name in feeded_var_names)):
+ raise ValueError("'feed_var_names' should be a list of str.")
+
+ if isinstance(target_vars, Variable):
+ target_vars = [target_vars]
+ elif export_for_deployment:
+ if not (bool(target_vars)
+ and all(isinstance(var, Variable) for var in target_vars)):
+ raise ValueError("'target_vars' should be a list of Variable.")
+
+ main_program = _get_valid_program(main_program)
+
+ # remind user to set auc_states to zeros if the program contains auc op
+ all_ops = main_program.global_block().ops
+ for op in all_ops:
+ if op.type == 'auc':
+ warnings.warn(
+ "please ensure that you have set the auc states to zeros before saving inference model"
+ )
+ break
+
+ # fix the bug that the activation op's output as target will be pruned.
+ # will affect the inference performance.
+ # TODO(Superjomn) add an IR pass to remove 1-scale op.
+ with program_guard(main_program):
+ uniq_target_vars = []
+ for i, var in enumerate(target_vars):
+ if isinstance(var, Variable):
+ var = layers.scale(
+ var, 1., name="save_infer_model/scale_{}".format(i))
+ uniq_target_vars.append(var)
+ target_vars = uniq_target_vars
+ target_var_name_list = [var.name for var in target_vars]
+
+ # when a pserver and a trainer running on the same machine, mkdir may conflict
+ save_dirname = dirname
+ try:
+ save_dirname = os.path.normpath(dirname)
+ os.makedirs(save_dirname)
+ except OSError as e:
+ if e.errno != errno.EEXIST:
+ raise
+
+ if model_filename is not None:
+ model_basename = os.path.basename(model_filename)
+ else:
+ model_basename = "__model__"
+ model_basename = os.path.join(save_dirname, model_basename)
+
+ # When export_for_deployment is true, we modify the program online so that
+ # it can only be loaded for inference directly. If it's false, the whole
+ # original program and related meta are saved so that future usage can be
+ # more flexible.
+
+ origin_program = main_program.clone()
+
+ if export_for_deployment:
+ main_program = main_program.clone()
+ global_block = main_program.global_block()
+ need_to_remove_op_index = []
+ for i, op in enumerate(global_block.ops):
+ op.desc.set_is_target(False)
+ if op.type == "feed" or op.type == "fetch":
+ need_to_remove_op_index.append(i)
+
+ for index in need_to_remove_op_index[::-1]:
+ global_block._remove_op(index)
+
+ main_program.desc.flush()
+
+ main_program = main_program._prune_with_input(
+ feeded_var_names=feeded_var_names, targets=target_vars)
+ main_program = main_program._inference_optimize(prune_read_op=True)
+ fetch_var_names = [v.name for v in target_vars]
+
+ prepend_feed_ops(main_program, feeded_var_names)
+ append_fetch_ops(main_program, fetch_var_names)
+
+ main_program.desc._set_version()
+ paddle.fluid.core.save_op_compatible_info(main_program.desc)
+ with open(model_basename, "wb") as f:
+ f.write(main_program.desc.serialize_to_string())
+ else:
+ # TODO(panyx0718): Save more information so that it can also be used
+ # for training and more flexible post-processing.
+ with open(model_basename + ".main_program", "wb") as f:
+ f.write(main_program.desc.serialize_to_string())
+
+ if program_only:
+ warnings.warn(
+ "save_inference_model specified the param `program_only` to True, It will not save params of Program."
+ )
+ return target_var_name_list
+
+ main_program._copy_dist_param_info_from(origin_program)
+
+ if params_filename is not None:
+ params_filename = os.path.basename(params_filename)
+
+ save_persistables(executor, save_dirname, main_program, params_filename)
+ return target_var_name_list
diff --git a/paddlex/utils/utils.py b/paddlex/utils/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..27dc6dea16b24085e286f2948907310c6b4ff4de
--- /dev/null
+++ b/paddlex/utils/utils.py
@@ -0,0 +1,222 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import sys
+import time
+import os
+import os.path as osp
+import numpy as np
+import six
+import yaml
+import math
+from . import logging
+
+
+def seconds_to_hms(seconds):
+ h = math.floor(seconds / 3600)
+ m = math.floor((seconds - h * 3600) / 60)
+ s = int(seconds - h * 3600 - m * 60)
+ hms_str = "{}:{}:{}".format(h, m, s)
+ return hms_str
+
+
+def setting_environ_flags():
+ if 'FLAGS_eager_delete_tensor_gb' not in os.environ:
+ os.environ['FLAGS_eager_delete_tensor_gb'] = '0.0'
+ if 'FLAGS_allocator_strategy' not in os.environ:
+ os.environ['FLAGS_allocator_strategy'] = 'auto_growth'
+ if "CUDA_VISIBLE_DEVICES" in os.environ:
+ if os.environ["CUDA_VISIBLE_DEVICES"].count("-1") > 0:
+ os.environ["CUDA_VISIBLE_DEVICES"] = ""
+
+
+def get_environ_info():
+ setting_environ_flags()
+ import paddle.fluid as fluid
+ info = dict()
+ info['place'] = 'cpu'
+ info['num'] = int(os.environ.get('CPU_NUM', 1))
+ if os.environ.get('CUDA_VISIBLE_DEVICES', None) != "":
+ if hasattr(fluid.core, 'get_cuda_device_count'):
+ gpu_num = 0
+ try:
+ gpu_num = fluid.core.get_cuda_device_count()
+ except:
+ os.environ['CUDA_VISIBLE_DEVICES'] = ''
+ pass
+ if gpu_num > 0:
+ info['place'] = 'cuda'
+ info['num'] = fluid.core.get_cuda_device_count()
+ return info
+
+
+def parse_param_file(param_file, return_shape=True):
+ from paddle.fluid.proto.framework_pb2 import VarType
+ f = open(param_file, 'rb')
+ version = np.fromstring(f.read(4), dtype='int32')
+ lod_level = np.fromstring(f.read(8), dtype='int64')
+ for i in range(int(lod_level)):
+ _size = np.fromstring(f.read(8), dtype='int64')
+ _ = f.read(_size)
+ version = np.fromstring(f.read(4), dtype='int32')
+ tensor_desc = VarType.TensorDesc()
+ tensor_desc_size = np.fromstring(f.read(4), dtype='int32')
+ tensor_desc.ParseFromString(f.read(int(tensor_desc_size)))
+ tensor_shape = tuple(tensor_desc.dims)
+ if return_shape:
+ f.close()
+ return tuple(tensor_desc.dims)
+ if tensor_desc.data_type != 5:
+ raise Exception(
+ "Unexpected data type while parse {}".format(param_file))
+ data_size = 4
+ for i in range(len(tensor_shape)):
+ data_size *= tensor_shape[i]
+ weight = np.fromstring(f.read(data_size), dtype='float32')
+ f.close()
+ return np.reshape(weight, tensor_shape)
+
+
+def fuse_bn_weights(exe, main_prog, weights_dir):
+ import paddle.fluid as fluid
+ logging.info("Try to fuse weights of batch_norm...")
+ bn_vars = list()
+ for block in main_prog.blocks:
+ ops = list(block.ops)
+ for op in ops:
+ if op.type == 'affine_channel':
+ scale_name = op.input('Scale')[0]
+ bias_name = op.input('Bias')[0]
+ prefix = scale_name[:-5]
+ mean_name = prefix + 'mean'
+ variance_name = prefix + 'variance'
+ if not osp.exists(osp.join(
+ weights_dir, mean_name)) or not osp.exists(
+ osp.join(weights_dir, variance_name)):
+ logging.info(
+ "There's no batch_norm weight found to fuse, skip fuse_bn."
+ )
+ return
+
+ bias = block.var(bias_name)
+ pretrained_shape = parse_param_file(
+ osp.join(weights_dir, bias_name))
+ actual_shape = tuple(bias.shape)
+ if pretrained_shape != actual_shape:
+ continue
+ bn_vars.append(
+ [scale_name, bias_name, mean_name, variance_name])
+ eps = 1e-5
+ for names in bn_vars:
+ scale_name, bias_name, mean_name, variance_name = names
+ scale = parse_param_file(
+ osp.join(weights_dir, scale_name), return_shape=False)
+ bias = parse_param_file(
+ osp.join(weights_dir, bias_name), return_shape=False)
+ mean = parse_param_file(
+ osp.join(weights_dir, mean_name), return_shape=False)
+ variance = parse_param_file(
+ osp.join(weights_dir, variance_name), return_shape=False)
+ bn_std = np.sqrt(np.add(variance, eps))
+ new_scale = np.float32(np.divide(scale, bn_std))
+ new_bias = bias - mean * new_scale
+ scale_tensor = fluid.global_scope().find_var(scale_name).get_tensor()
+ bias_tensor = fluid.global_scope().find_var(bias_name).get_tensor()
+ scale_tensor.set(new_scale, exe.place)
+ bias_tensor.set(new_bias, exe.place)
+ if len(bn_vars) == 0:
+ logging.info(
+ "There's no batch_norm weight found to fuse, skip fuse_bn.")
+ else:
+ logging.info("There's {} batch_norm ops been fused.".format(
+ len(bn_vars)))
+
+
+def load_pdparams(exe, main_prog, model_dir):
+ import paddle.fluid as fluid
+ from paddle.fluid.proto.framework_pb2 import VarType
+ from paddle.fluid.framework import Program
+
+ vars_to_load = list()
+ import pickle
+ with open(osp.join(model_dir, 'model.pdparams'), 'rb') as f:
+ params_dict = pickle.load(f) if six.PY2 else pickle.load(
+ f, encoding='latin1')
+ unused_vars = list()
+ for var in main_prog.list_vars():
+ if not isinstance(var, fluid.framework.Parameter):
+ continue
+ if var.name not in params_dict:
+ raise Exception("{} is not in saved paddlex model".format(
+ var.name))
+ if var.shape != params_dict[var.name].shape:
+ unused_vars.append(var.name)
+ logging.warning(
+ "[SKIP] Shape of pretrained weight {} doesn't match.(Pretrained: {}, Actual: {})"
+ .format(var.name, params_dict[var.name].shape, var.shape))
+ continue
+ vars_to_load.append(var)
+ logging.debug("Weight {} will be load".format(var.name))
+ for var_name in unused_vars:
+ del params_dict[var_name]
+ fluid.io.set_program_state(main_prog, params_dict)
+
+ if len(vars_to_load) == 0:
+ logging.warning(
+ "There is no pretrain weights loaded, maybe you should check you pretrain model!"
+ )
+ else:
+ logging.info("There are {} varaibles in {} are loaded.".format(
+ len(vars_to_load), model_dir))
+
+
+def load_pretrain_weights(exe, main_prog, weights_dir, fuse_bn=False):
+ if not osp.exists(weights_dir):
+ raise Exception("Path {} not exists.".format(weights_dir))
+ if osp.exists(osp.join(weights_dir, "model.pdparams")):
+ return load_pdparams(exe, main_prog, weights_dir)
+ import paddle.fluid as fluid
+ vars_to_load = list()
+ for var in main_prog.list_vars():
+ if not isinstance(var, fluid.framework.Parameter):
+ continue
+ if not osp.exists(osp.join(weights_dir, var.name)):
+ logging.debug(
+ "[SKIP] Pretrained weight {}/{} doesn't exist".format(
+ weights_dir, var.name))
+ continue
+ pretrained_shape = parse_param_file(osp.join(weights_dir, var.name))
+ actual_shape = tuple(var.shape)
+ if pretrained_shape != actual_shape:
+ logging.warning(
+ "[SKIP] Shape of pretrained weight {}/{} doesn't match.(Pretrained: {}, Actual: {})"
+ .format(weights_dir, var.name, pretrained_shape, actual_shape))
+ continue
+ vars_to_load.append(var)
+ logging.debug("Weight {} will be load".format(var.name))
+
+ fluid.io.load_vars(
+ executor=exe,
+ dirname=weights_dir,
+ main_program=main_prog,
+ vars=vars_to_load)
+ if len(vars_to_load) == 0:
+ logging.warning(
+ "There is no pretrain weights loaded, maybe you should check you pretrain model!"
+ )
+ else:
+ logging.info("There are {} varaibles in {} are loaded.".format(
+ len(vars_to_load), weights_dir))
+ if fuse_bn:
+ fuse_bn_weights(exe, main_prog, weights_dir)
diff --git a/requirements.txt b/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..2e89bb73ea213f928666e799be8a15e16b7091da
--- /dev/null
+++ b/requirements.txt
@@ -0,0 +1,8 @@
+pyyaml
+tqdm
+colorama
+sklearn
+cython
+pycocotools
+visualdl=1.3.0
+paddleslim=1.0.1
diff --git a/setup.py b/setup.py
new file mode 100644
index 0000000000000000000000000000000000000000..eb73b47aac3629b47009ed17e8bf33939f6bcdaf
--- /dev/null
+++ b/setup.py
@@ -0,0 +1,43 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import setuptools
+import sys
+
+long_description = "PaddleX. A end-to-end deeplearning model development toolkit base on PaddlePaddle\n\n"
+
+setuptools.setup(
+ name="paddlex",
+ version='0.1.0',
+ author="paddlex",
+ author_email="paddlex@baidu.com",
+ description=long_description,
+ long_description=long_description,
+ long_description_content_type="text/plain",
+ url="https://github.com/PaddlePaddle/PaddleX",
+ packages=setuptools.find_packages(),
+ setup_requires=['cython', 'numpy', 'sklearn'],
+ install_requires=[
+ 'pycocotools', 'pyyaml', 'colorama', 'tqdm', 'visualdl==1.3.0',
+ 'paddleslim==1.0.1', 'paddlehub>=1.6.2'
+ ],
+ classifiers=[
+ "Programming Language :: Python :: 3",
+ "License :: OSI Approved :: Apache Software License",
+ "Operating System :: OS Independent",
+ ],
+ license='Apache 2.0',
+ entry_points={'console_scripts': [
+ 'paddlex=paddlex.command:main',
+ ]})
diff --git a/tutorials/compress/README.md b/tutorials/compress/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..5b7470d211689a5fd82154af2840ce15020275d6
--- /dev/null
+++ b/tutorials/compress/README.md
@@ -0,0 +1,33 @@
+# 使用教程——模型压缩
+本目录下整理了使用PaddleX进行模型裁剪训练的代码,代码中均提供了数据的自动下载,并使用单张GPU卡进行训练。
+
+PaddleX提供了两种裁剪训练方式,
+1. 用户自行计算裁剪配置(推荐),整体流程为
+> 1.使用数据训练原始模型;
+> 2.使用第1步训练好的模型,在验证集上计算各个模型参数的敏感度,并将敏感信息保存至本地文件
+> 3.再次使用数据训练原始模型,在训练时调用`train`接口时,传入第2步计算得到的参数敏感信息文件,
+> 4.模型在训练过程中,会根据传入的参数敏感信息文件,对模型结构裁剪后,继续迭代训练
+>
+2. 使用PaddleX预先计算好的参数敏感度信息文件,整体流程为
+> 1. 在训练调用'train'接口时,将`sensetivities_file`参数设为`DEFAULT`字符串
+> 2. 在训练过程中,会自动下载PaddleX预先计算好的模型参数敏感度信息,并对模型结构裁剪,继而迭代训练
+
+上述两种方式,第1种方法相对比第2种方法少了两步(即用户训练原始模型+自行计算参数敏感度信息),在实际实验证,第1种方法的精度会更高,裁剪的模型效果更好,因此在用户时间和计算成本允许的前提下,更推荐使用第1种方法。
+
+
+## 开始裁剪训练
+
+1. 第1种方法,用户自行计算裁剪配置
+```
+# 训练模型
+python classification/mobilenet.py
+# 计算模型参数敏感度
+python classification/cal_sensitivities_file.py --model_dir=output/mobilenetv2/epoch_10 --save_file=./sensitivities.data
+# 裁剪训练
+python classification/mobilenet.py --model_dir=output/mobilenetv2/epoch_10 --sensetive_file=./sensitivities.data --eval_metric_loss=0.05
+```
+2. 第2种方法,使用PaddleX预先计算好的参数敏感度文件
+```
+# 自动下载PaddleX预先在ImageNet上计算好的参数敏感度信息文件
+python classification/mobilenet.py --sensitivities_file=DEFAULT --eval_metric_loss=0.05
+```
diff --git a/tutorials/compress/classification/cal_sensitivities_file.py b/tutorials/compress/classification/cal_sensitivities_file.py
new file mode 100644
index 0000000000000000000000000000000000000000..bfe17385c41dd4b4d49d0efa968ea8ccf1cfa4a9
--- /dev/null
+++ b/tutorials/compress/classification/cal_sensitivities_file.py
@@ -0,0 +1,57 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import argparse
+import os
+# 选择使用0号卡
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+import paddlex as pdx
+
+
+def cal_sensitivies_file(model_dir, dataset, save_file):
+ # 加载模型
+ model = pdx.load_model(model_dir)
+
+ # 定义验证所用的数据集
+ eval_dataset = pdx.datasets.ImageNet(
+ data_dir=dataset,
+ file_list=os.path.join(dataset, 'val_list.txt'),
+ label_list=os.path.join(dataset, 'labels.txt'),
+ transforms=model.eval_transforms)
+
+ pdx.slim.cal_params_sensitivities(
+ model, save_file, eval_dataset, batch_size=8)
+
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser(description=__doc__)
+ parser.add_argument(
+ "--model_dir",
+ default="./output/mobilenet/best_model",
+ type=str,
+ help="The model path.")
+ parser.add_argument(
+ "--dataset",
+ default="./vegetables_cls",
+ type=str,
+ help="The dataset path.")
+ parser.add_argument(
+ "--save_file",
+ default="./sensitivities.data",
+ type=str,
+ help="The sensitivities file path.")
+
+ args = parser.parse_args()
+ cal_sensitivies_file(args.model_dir, args.dataset, args.save_file)
diff --git a/tutorials/compress/classification/mobilenet.py b/tutorials/compress/classification/mobilenet.py
new file mode 100644
index 0000000000000000000000000000000000000000..4c4acb604ec07cbe4849cf4ca3f461c51fdb047a
--- /dev/null
+++ b/tutorials/compress/classification/mobilenet.py
@@ -0,0 +1,105 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import argparse
+import os
+# 选择使用0号卡
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from paddlex.cls import transforms
+import paddlex as pdx
+
+
+def train(model_dir=None, sensitivities_file=None, eval_metric_loss=0.05):
+ # 下载和解压蔬菜分类数据集
+ veg_dataset = 'https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz'
+ pdx.utils.download_and_decompress(veg_dataset, path='./')
+
+ # 定义训练和验证时的transforms
+ train_transforms = transforms.Compose([
+ transforms.RandomCrop(crop_size=224),
+ transforms.RandomHorizontalFlip(),
+ transforms.Normalize()
+ ])
+ eval_transforms = transforms.Compose([
+ transforms.ResizeByShort(short_size=256),
+ transforms.CenterCrop(crop_size=224),
+ transforms.Normalize()
+ ])
+
+ # 定义训练和验证所用的数据集
+ train_dataset = pdx.datasets.ImageNet(
+ data_dir='vegetables_cls',
+ file_list='vegetables_cls/train_list.txt',
+ label_list='vegetables_cls/labels.txt',
+ transforms=train_transforms,
+ shuffle=True)
+ eval_dataset = pdx.datasets.ImageNet(
+ data_dir='vegetables_cls',
+ file_list='vegetables_cls/val_list.txt',
+ label_list='vegetables_cls/labels.txt',
+ transforms=eval_transforms)
+
+ num_classes = len(train_dataset.labels)
+ model = pdx.cls.MobileNetV2(num_classes=num_classes)
+
+ if model_dir is None:
+ # 使用imagenet数据集预训练模型权重
+ pretrain_weights = "IMAGENET"
+ else:
+ # 使用传入的model_dir作为预训练模型权重
+ assert os.path.isdir(model_dir), "Path {} is not a directory".format(
+ model_dir)
+ pretrain_weights = model_dir
+
+ save_dir = './output/mobilenet'
+ if sensitivities_file is not None:
+ # DEFAULT 指使用模型预置的参数敏感度信息作为裁剪依据
+ if sensitivities_file != "DEFAULT":
+ assert os.path.exists(
+ sensitivities_file), "Path {} not exist".format(
+ sensitivities_file)
+ save_dir = './output/mobilenet_prune'
+
+ model.train(
+ num_epochs=10,
+ train_dataset=train_dataset,
+ train_batch_size=32,
+ eval_dataset=eval_dataset,
+ lr_decay_epochs=[4, 6, 8],
+ learning_rate=0.025,
+ pretrain_weights=pretrain_weights,
+ save_dir=save_dir,
+ use_vdl=True,
+ sensitivities_file=sensitivities_file,
+ eval_metric_loss=eval_metric_loss)
+
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser(description=__doc__)
+ parser.add_argument(
+ "--model_dir", default=None, type=str, help="The model path.")
+ parser.add_argument(
+ "--sensitivities_file",
+ default=None,
+ type=str,
+ help="The sensitivities file path.")
+ parser.add_argument(
+ "--eval_metric_loss",
+ default=0.05,
+ type=float,
+ help="The loss threshold.")
+
+ args = parser.parse_args()
+ train(args.model_dir, args.sensitivities_file, args.eval_metric_loss)
diff --git a/tutorials/compress/detection/cal_sensitivities_file.py b/tutorials/compress/detection/cal_sensitivities_file.py
new file mode 100644
index 0000000000000000000000000000000000000000..d1111a434d8e669bc23b3cf86f245b64c1bbb9a1
--- /dev/null
+++ b/tutorials/compress/detection/cal_sensitivities_file.py
@@ -0,0 +1,53 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import argparse
+import os
+# 选择使用0号卡
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+import paddlex as pdx
+
+
+def cal_sensitivies_file(model_dir, dataset, save_file):
+ # 加载模型
+ model = pdx.load_model(model_dir)
+
+ # 定义验证所用的数据集
+ eval_dataset = pdx.datasets.VOCDetection(
+ data_dir=dataset,
+ file_list=os.path.join(dataset, 'val_list.txt'),
+ label_list=os.path.join(dataset, 'labels.txt'),
+ transforms=model.eval_transforms)
+
+ pdx.slim.cal_params_sensitivities(
+ model, save_file, eval_dataset, batch_size=8)
+
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser(description=__doc__)
+ parser.add_argument(
+ "--model_dir",
+ default="./output/yolov3_mobilenet/best_model",
+ type=str,
+ help="The model path.")
+ parser.add_argument(
+ "--dataset", default="./insect_det", type=str, help="The model path.")
+ parser.add_argument(
+ "--save_file",
+ default="./sensitivities.data",
+ type=str,
+ help="The sensitivities file path.")
+
+ args = parser.parse_args()
+ cal_sensitivies_file(args.model_dir, args.dataset, args.save_file)
diff --git a/tutorials/compress/detection/yolov3_mobilenet.py b/tutorials/compress/detection/yolov3_mobilenet.py
new file mode 100644
index 0000000000000000000000000000000000000000..8c125d0980757180453b912999f10ae13c978c18
--- /dev/null
+++ b/tutorials/compress/detection/yolov3_mobilenet.py
@@ -0,0 +1,104 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import argparse
+import os
+# 选择使用0号卡
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from paddlex.det import transforms
+import paddlex as pdx
+
+
+def train(model_dir, sensitivities_file, eval_metric_loss):
+ # 下载和解压昆虫检测数据集
+ insect_dataset = 'https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz'
+ pdx.utils.download_and_decompress(insect_dataset, path='./')
+
+ # 定义训练和验证时的transforms
+ train_transforms = transforms.Compose([
+ transforms.MixupImage(mixup_epoch=250),
+ transforms.RandomDistort(),
+ transforms.RandomExpand(),
+ transforms.RandomCrop(),
+ transforms.Resize(target_size=608, interp='RANDOM'),
+ transforms.RandomHorizontalFlip(),
+ transforms.Normalize()
+ ])
+ eval_transforms = transforms.Compose([
+ transforms.Resize(target_size=608, interp='CUBIC'),
+ transforms.Normalize()
+ ])
+
+ # 定义训练和验证所用的数据集
+ train_dataset = pdx.datasets.VOCDetection(
+ data_dir='insect_det',
+ file_list='insect_det/train_list.txt',
+ label_list='insect_det/labels.txt',
+ transforms=train_transforms,
+ shuffle=True)
+ eval_dataset = pdx.datasets.VOCDetection(
+ data_dir='insect_det',
+ file_list='insect_det/val_list.txt',
+ label_list='insect_det/labels.txt',
+ transforms=eval_transforms)
+
+ if model_dir is None:
+ # 使用imagenet数据集上的预训练权重
+ pretrain_weights = "IMAGENET"
+ else:
+ assert os.path.isdir(model_dir), "Path {} is not a directory".format(
+ model_dir)
+ pretrain_weights = model_dir
+ save_dir = "output/yolov3_mobile"
+ if sensitivities_file is not None:
+ if sensitivities_file != 'DEFAULT':
+ assert os.path.exists(
+ sensitivities_file), "Path {} not exist".format(
+ sensitivities_file)
+ save_dir = "output/yolov3_mobile_prune"
+
+ num_classes = len(train_dataset.labels)
+ model = pdx.det.YOLOv3(num_classes=num_classes)
+ model.train(
+ num_epochs=270,
+ train_dataset=train_dataset,
+ train_batch_size=8,
+ eval_dataset=eval_dataset,
+ learning_rate=0.000125,
+ lr_decay_epochs=[210, 240],
+ pretrain_weights=pretrain_weights,
+ save_dir=save_dir,
+ use_vdl=True,
+ sensitivities_file=sensitivities_file,
+ eval_metric_loss=eval_metric_loss)
+
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser(description=__doc__)
+ parser.add_argument(
+ "--model_dir", default=None, type=str, help="The model path.")
+ parser.add_argument(
+ "--sensitivities_file",
+ default=None,
+ type=str,
+ help="The sensitivities file path.")
+ parser.add_argument(
+ "--eval_metric_loss",
+ default=0.05,
+ type=float,
+ help="The loss threshold.")
+
+ args = parser.parse_args()
+ train(args.model_dir, args.sensitivities_file, args.eval_metric_loss)
diff --git a/tutorials/compress/segmentation/cal_sensitivities_file.py b/tutorials/compress/segmentation/cal_sensitivities_file.py
new file mode 100644
index 0000000000000000000000000000000000000000..542488afe902ef02f82cab3ef9b58f9f65dd53ba
--- /dev/null
+++ b/tutorials/compress/segmentation/cal_sensitivities_file.py
@@ -0,0 +1,57 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import argparse
+import os
+# 选择使用0号卡
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+import paddlex as pdx
+
+
+def cal_sensitivies_file(model_dir, dataset, save_file):
+ # 加载模型
+ model = pdx.load_model(model_dir)
+
+ # 定义验证所用的数据集
+ eval_dataset = pdx.datasets.SegDataset(
+ data_dir=dataset,
+ file_list=os.path.join(dataset, 'val_list.txt'),
+ label_list=os.path.join(dataset, 'labels.txt'),
+ transforms=model.eval_transforms)
+
+ pdx.slim.cal_params_sensitivities(
+ model, save_file, eval_dataset, batch_size=4)
+ pdx.slim.visualize(model, save_file, save_dir='./')
+
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser(description=__doc__)
+ parser.add_argument(
+ "--model_dir",
+ default="./output/unet/best_model",
+ type=str,
+ help="The model path.")
+ parser.add_argument(
+ "--dataset",
+ default="./optic_disc_seg",
+ type=str,
+ help="The dataset path.")
+ parser.add_argument(
+ "--save_file",
+ default="./sensitivities.data",
+ type=str,
+ help="The sensitivities file path.")
+
+ args = parser.parse_args()
+ cal_sensitivies_file(args.model_dir, args.dataset, args.save_file)
diff --git a/tutorials/compress/segmentation/unet.py b/tutorials/compress/segmentation/unet.py
new file mode 100644
index 0000000000000000000000000000000000000000..7895443d59e483bedd9e5a5cf267d5278c33770f
--- /dev/null
+++ b/tutorials/compress/segmentation/unet.py
@@ -0,0 +1,101 @@
+#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+
+import argparse
+import os
+# 选择使用0号卡
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from paddlex.seg import transforms
+import paddlex as pdx
+
+
+def train(model_dir, sensitivities_file, eval_metric_loss):
+ # 下载和解压视盘分割数据集
+ optic_dataset = 'https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz'
+ pdx.utils.download_and_decompress(optic_dataset, path='./')
+
+ # 定义训练和验证时的transforms
+ train_transforms = transforms.Compose([
+ transforms.RandomHorizontalFlip(),
+ transforms.ResizeRangeScaling(),
+ transforms.RandomPaddingCrop(crop_size=512),
+ transforms.Normalize()
+ ])
+ eval_transforms = transforms.Compose([
+ transforms.ResizeByLong(long_size=512),
+ transforms.Padding(target_size=512),
+ transforms.Normalize()
+ ])
+
+ # 定义训练和验证所用的数据集
+ train_dataset = pdx.datasets.SegDataset(
+ data_dir='optic_disc_seg',
+ file_list='optic_disc_seg/train_list.txt',
+ label_list='optic_disc_seg/labels.txt',
+ transforms=train_transforms,
+ shuffle=True)
+ eval_dataset = pdx.datasets.SegDataset(
+ data_dir='optic_disc_seg',
+ file_list='optic_disc_seg/val_list.txt',
+ label_list='optic_disc_seg/labels.txt',
+ transforms=eval_transforms)
+
+ if model_dir is None:
+ # 使用coco数据集上的预训练权重
+ pretrain_weights = "COCO"
+ else:
+ assert os.path.isdir(model_dir), "Path {} is not a directory".format(
+ model_dir)
+ pretrain_weights = model_dir
+ save_dir = "output/unet"
+ if sensitivities_file is not None:
+ if sensitivities_file != 'DEFAULT':
+ assert os.path.exists(
+ sensitivities_file), "Path {} not exist".format(
+ sensitivities_file)
+ save_dir = "output/unet_prune"
+
+ num_classes = len(train_dataset.labels)
+ model = pdx.seg.UNet(num_classes=num_classes)
+ model.train(
+ num_epochs=20,
+ train_dataset=train_dataset,
+ train_batch_size=4,
+ eval_dataset=eval_dataset,
+ learning_rate=0.01,
+ pretrain_weights=pretrain_weights,
+ save_dir=save_dir,
+ use_vdl=True,
+ sensitivities_file=sensitivities_file,
+ eval_metric_loss=eval_metric_loss)
+
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser(description=__doc__)
+ parser.add_argument(
+ "--model_dir", default=None, type=str, help="The model path.")
+ parser.add_argument(
+ "--sensitivities_file",
+ default=None,
+ type=str,
+ help="The sensitivities file path.")
+ parser.add_argument(
+ "--eval_metric_loss",
+ default=0.05,
+ type=float,
+ help="The loss threshold.")
+
+ args = parser.parse_args()
+ train(args.model_dir, args.sensitivities_file, args.eval_metric_loss)
diff --git a/tutorials/train/README.md b/tutorials/train/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..1900143bceb3435da8ffa04a7fed7b0205e04477
--- /dev/null
+++ b/tutorials/train/README.md
@@ -0,0 +1,18 @@
+# 使用教程——训练模型
+
+本目录下整理了使用PaddleX训练模型的示例代码,代码中均提供了示例数据的自动下载,并均使用单张GPU卡进行训练。
+
+|代码 | 模型任务 | 数据 |
+|------|--------|---------|
+|classification/mobilenetv2.py | 图像分类MobileNetV2 | 蔬菜分类 |
+|classification/resnet50.py | 图像分类ResNet50 | 蔬菜分类 |
+|detection/faster_rcnn_r50_fpn.py | 目标检测FasterRCNN | 昆虫检测 |
+|detection/mask_rcnn_f50_fpn.py | 实例分割MaskRCNN | 垃圾分拣 |
+|segmentation/deeplabv3p.py | 语义分割DeepLabV3| 视盘分割 |
+|segmentation/unet.py | 语义分割UNet | 视盘分割 |
+
+## 开始训练
+在安装PaddleX后,使用如下命令开始训练
+```
+python classification/mobilenetv2.py
+```
diff --git a/tutorials/train/classification/mobilenetv2.py b/tutorials/train/classification/mobilenetv2.py
new file mode 100644
index 0000000000000000000000000000000000000000..3f637125b760de6d992d6a062e4d456bf5038426
--- /dev/null
+++ b/tutorials/train/classification/mobilenetv2.py
@@ -0,0 +1,51 @@
+import os
+# 选择使用0号卡
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from paddlex.cls import transforms
+import paddlex as pdx
+
+# 下载和解压蔬菜分类数据集
+veg_dataset = 'https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz'
+pdx.utils.download_and_decompress(veg_dataset, path='./')
+
+# 定义训练和验证时的transforms
+train_transforms = transforms.Compose([
+ transforms.RandomCrop(crop_size=224),
+ transforms.RandomHorizontalFlip(),
+ transforms.Normalize()
+])
+eval_transforms = transforms.Compose([
+ transforms.ResizeByShort(short_size=256),
+ transforms.CenterCrop(crop_size=224),
+ transforms.Normalize()
+])
+
+# 定义训练和验证所用的数据集
+train_dataset = pdx.datasets.ImageNet(
+ data_dir='vegetables_cls',
+ file_list='vegetables_cls/train_list.txt',
+ label_list='vegetables_cls/labels.txt',
+ transforms=train_transforms,
+ shuffle=True)
+eval_dataset = pdx.datasets.ImageNet(
+ data_dir='vegetables_cls',
+ file_list='vegetables_cls/val_list.txt',
+ label_list='vegetables_cls/labels.txt',
+ transforms=eval_transforms)
+
+# 初始化模型,并进行训练
+# 可使用VisualDL查看训练指标
+# VisualDL启动方式: visualdl --logdir output/mobilenetv2/vdl_log --port 8001
+# 浏览器打开 https://0.0.0.0:8001即可
+# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+model = pdx.cls.MobileNetV2(num_classes=len(train_dataset.labels))
+model.train(
+ num_epochs=10,
+ train_dataset=train_dataset,
+ train_batch_size=32,
+ eval_dataset=eval_dataset,
+ lr_decay_epochs=[4, 6, 8],
+ learning_rate=0.025,
+ save_dir='output/mobilenetv2',
+ use_vdl=True)
diff --git a/tutorials/train/classification/resnet50.py b/tutorials/train/classification/resnet50.py
new file mode 100644
index 0000000000000000000000000000000000000000..2e5a9b4820c7e66a83abaca0b13e057b15ceb830
--- /dev/null
+++ b/tutorials/train/classification/resnet50.py
@@ -0,0 +1,58 @@
+import os
+# 选择使用0号卡
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+import paddle.fluid as fluid
+from paddlex.cls import transforms
+import paddlex as pdx
+
+# 下载和解压蔬菜分类数据集
+veg_dataset = 'https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz'
+pdx.utils.download_and_decompress(veg_dataset, path='./')
+
+# 定义训练和验证时的transforms
+train_transforms = transforms.Compose(
+ [transforms.RandomCrop(crop_size=224),
+ transforms.Normalize()])
+eval_transforms = transforms.Compose([
+ transforms.ResizeByShort(short_size=256),
+ transforms.CenterCrop(crop_size=224),
+ transforms.Normalize()
+])
+
+# 定义训练和验证所用的数据集
+train_dataset = pdx.datasets.ImageNet(
+ data_dir='vegetables_cls',
+ file_list='vegetables_cls/train_list.txt',
+ label_list='vegetables_cls/labels.txt',
+ transforms=train_transforms,
+ shuffle=True)
+eval_dataset = pdx.datasets.ImageNet(
+ data_dir='vegetables_cls',
+ file_list='vegetables_cls/val_list.txt',
+ label_list='vegetables_cls/labels.txt',
+ transforms=eval_transforms)
+
+# PaddleX支持自定义构建优化器
+step_each_epoch = train_dataset.num_samples // 32
+learning_rate = fluid.layers.cosine_decay(
+ learning_rate=0.025, step_each_epoch=step_each_epoch, epochs=10)
+optimizer = fluid.optimizer.Momentum(
+ learning_rate=learning_rate,
+ momentum=0.9,
+ regularization=fluid.regularizer.L2Decay(4e-5))
+
+# 初始化模型,并进行训练
+# 可使用VisualDL查看训练指标
+# VisualDL启动方式: visualdl --logdir output/resnet50/vdl_log --port 8001
+# 浏览器打开 https://0.0.0.0:8001即可
+# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+model = pdx.cls.ResNet50(num_classes=len(train_dataset.labels))
+model.train(
+ num_epochs=10,
+ train_dataset=train_dataset,
+ train_batch_size=32,
+ eval_dataset=eval_dataset,
+ optimizer=optimizer,
+ save_dir='output/resnet50',
+ use_vdl=True)
diff --git a/tutorials/train/detection/faster_rcnn_r50_fpn.py b/tutorials/train/detection/faster_rcnn_r50_fpn.py
new file mode 100644
index 0000000000000000000000000000000000000000..cbe6dabe535b5972418349ac31576b344652e69d
--- /dev/null
+++ b/tutorials/train/detection/faster_rcnn_r50_fpn.py
@@ -0,0 +1,55 @@
+import os
+# 选择使用0号卡
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from paddlex.det import transforms
+import paddlex as pdx
+
+# 下载和解压昆虫检测数据集
+insect_dataset = 'https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz'
+pdx.utils.download_and_decompress(insect_dataset, path='./')
+
+# 定义训练和验证时的transforms
+train_transforms = transforms.Compose([
+ transforms.RandomHorizontalFlip(),
+ transforms.Normalize(),
+ transforms.ResizeByShort(short_size=800, max_size=1333),
+ transforms.Padding(coarsest_stride=32)
+])
+
+eval_transforms = transforms.Compose([
+ transforms.Normalize(),
+ transforms.ResizeByShort(short_size=800, max_size=1333),
+ transforms.Padding(coarsest_stride=32),
+])
+
+# 定义训练和验证所用的数据集
+train_dataset = pdx.datasets.VOCDetection(
+ data_dir='insect_det',
+ file_list='insect_det/train_list.txt',
+ label_list='insect_det/labels.txt',
+ transforms=train_transforms,
+ shuffle=True)
+eval_dataset = pdx.datasets.VOCDetection(
+ data_dir='insect_det',
+ file_list='insect_det/val_list.txt',
+ label_list='insect_det/labels.txt',
+ transforms=eval_transforms)
+
+# 初始化模型,并进行训练
+# 可使用VisualDL查看训练指标
+# VisualDL启动方式: visualdl --logdir output/faster_rcnn_r50_fpn/vdl_log --port 8001
+# 浏览器打开 https://0.0.0.0:8001即可
+# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
+num_classes = len(train_dataset.labels) + 1
+model = pdx.det.FasterRCNN(num_classes=num_classes)
+model.train(
+ num_epochs=12,
+ train_dataset=train_dataset,
+ train_batch_size=2,
+ eval_dataset=eval_dataset,
+ learning_rate=0.0025,
+ lr_decay_epochs=[8, 11],
+ save_dir='output/faster_rcnn_r50_fpn',
+ use_vdl=True)
diff --git a/tutorials/train/detection/mask_rcnn_r50_fpn.py b/tutorials/train/detection/mask_rcnn_r50_fpn.py
new file mode 100644
index 0000000000000000000000000000000000000000..47d11f6a957cbf302fe9eaad4915f1f320168a66
--- /dev/null
+++ b/tutorials/train/detection/mask_rcnn_r50_fpn.py
@@ -0,0 +1,53 @@
+import os
+# 选择使用0号卡
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from paddlex.det import transforms
+import paddlex as pdx
+
+# 下载和解压垃圾分拣数据集
+garbage_dataset = 'https://bj.bcebos.com/paddlex/datasets/garbage_ins_det.tar.gz'
+pdx.utils.download_and_decompress(garbage_dataset, path='./')
+
+# 定义训练和验证时的transforms
+train_transforms = transforms.Compose([
+ transforms.RandomHorizontalFlip(),
+ transforms.Normalize(),
+ transforms.ResizeByShort(short_size=800, max_size=1333),
+ transforms.Padding(coarsest_stride=32)
+])
+
+eval_transforms = transforms.Compose([
+ transforms.Normalize(),
+ transforms.ResizeByShort(short_size=800, max_size=1333),
+ transforms.Padding(coarsest_stride=32)
+])
+
+# 定义训练和验证所用的数据集
+train_dataset = pdx.datasets.CocoDetection(
+ data_dir='garbage_ins_det/JPEGImages',
+ ann_file='garbage_ins_det/train.json',
+ transforms=train_transforms,
+ shuffle=True)
+eval_dataset = pdx.datasets.CocoDetection(
+ data_dir='garbage_ins_det/JPEGImages',
+ ann_file='garbage_ins_det/val.json',
+ transforms=eval_transforms)
+
+# 初始化模型,并进行训练
+# 可使用VisualDL查看训练指标
+# VisualDL启动方式: visualdl --logdir output/mask_rcnn_r50_fpn/vdl_log --port 8001
+# 浏览器打开 https://0.0.0.0:8001即可
+# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
+num_classes = len(train_dataset.labels) + 1
+model = pdx.det.MaskRCNN(num_classes=num_classes)
+model.train(
+ num_epochs=12,
+ train_dataset=train_dataset,
+ train_batch_size=1,
+ eval_dataset=eval_dataset,
+ learning_rate=0.00125,
+ lr_decay_epochs=[8, 11],
+ save_dir='output/mask_rcnn_r50_fpn',
+ use_vdl=True)
diff --git a/tutorials/train/detection/yolov3_mobilenetv1.py b/tutorials/train/detection/yolov3_mobilenetv1.py
new file mode 100644
index 0000000000000000000000000000000000000000..116c7e72b9e05dd94eeabd911cd83e20b17234e4
--- /dev/null
+++ b/tutorials/train/detection/yolov3_mobilenetv1.py
@@ -0,0 +1,56 @@
+import os
+# 选择使用0号卡
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from paddlex.det import transforms
+import paddlex as pdx
+
+# 下载和解压昆虫检测数据集
+insect_dataset = 'https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz'
+pdx.utils.download_and_decompress(insect_dataset, path='./')
+
+# 定义训练和验证时的transforms
+train_transforms = transforms.Compose([
+ transforms.MixupImage(mixup_epoch=250),
+ transforms.RandomDistort(),
+ transforms.RandomExpand(),
+ transforms.RandomCrop(),
+ transforms.Resize(target_size=608, interp='RANDOM'),
+ transforms.RandomHorizontalFlip(),
+ transforms.Normalize(),
+])
+
+eval_transforms = transforms.Compose([
+ transforms.Resize(target_size=608, interp='CUBIC'),
+ transforms.Normalize(),
+])
+
+# 定义训练和验证所用的数据集
+train_dataset = pdx.datasets.VOCDetection(
+ data_dir='insect_det',
+ file_list='insect_det/train_list.txt',
+ label_list='insect_det/labels.txt',
+ transforms=train_transforms,
+ shuffle=True)
+eval_dataset = pdx.datasets.VOCDetection(
+ data_dir='insect_det',
+ file_list='insect_det/val_list.txt',
+ label_list='insect_det/labels.txt',
+ transforms=eval_transforms)
+
+# 初始化模型,并进行训练
+# 可使用VisualDL查看训练指标
+# VisualDL启动方式: visualdl --logdir output/yolov3_darknet/vdl_log --port 8001
+# 浏览器打开 https://0.0.0.0:8001即可
+# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+num_classes = len(train_dataset.labels)
+model = pdx.det.YOLOv3(num_classes=num_classes)
+model.train(
+ num_epochs=270,
+ train_dataset=train_dataset,
+ train_batch_size=8,
+ eval_dataset=eval_dataset,
+ learning_rate=0.000125,
+ lr_decay_epochs=[210, 240],
+ save_dir='output/yolov3_mobilenetv1',
+ use_vdl=True)
diff --git a/tutorials/train/segmentation/deeplabv3p.py b/tutorials/train/segmentation/deeplabv3p.py
new file mode 100644
index 0000000000000000000000000000000000000000..346a229a358a76830112acfd596740c070822874
--- /dev/null
+++ b/tutorials/train/segmentation/deeplabv3p.py
@@ -0,0 +1,50 @@
+import os
+# 选择使用0号卡
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+import paddlex as pdx
+from paddlex.seg import transforms
+
+# 下载和解压视盘分割数据集
+optic_dataset = 'https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz'
+pdx.utils.download_and_decompress(optic_dataset, path='./')
+
+# 定义训练和验证时的transforms
+train_transforms = transforms.Compose([
+ transforms.RandomHorizontalFlip(),
+ transforms.Resize(target_size=512),
+ transforms.RandomPaddingCrop(crop_size=500),
+ transforms.Normalize()
+])
+
+eval_transforms = transforms.Compose(
+ [transforms.Resize(512), transforms.Normalize()])
+
+# 定义训练和验证所用的数据集
+train_dataset = pdx.datasets.SegDataset(
+ data_dir='optic_disc_seg',
+ file_list='optic_disc_seg/train_list.txt',
+ label_list='optic_disc_seg/labels.txt',
+ transforms=train_transforms,
+ shuffle=True)
+eval_dataset = pdx.datasets.SegDataset(
+ data_dir='optic_disc_seg',
+ file_list='optic_disc_seg/val_list.txt',
+ label_list='optic_disc_seg/labels.txt',
+ transforms=eval_transforms)
+
+# 初始化模型,并进行训练
+# 可使用VisualDL查看训练指标
+# VisualDL启动方式: visualdl --logdir output/deeplab/vdl_log --port 8001
+# 浏览器打开 https://0.0.0.0:8001即可
+# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+num_classes = len(train_dataset.labels)
+model = pdx.seg.DeepLabv3p(num_classes=num_classes)
+model.train(
+ num_epochs=40,
+ train_dataset=train_dataset,
+ train_batch_size=4,
+ eval_dataset=eval_dataset,
+ learning_rate=0.01,
+ save_dir='output/deeplab',
+ use_vdl=True)
diff --git a/tutorials/train/segmentation/unet.py b/tutorials/train/segmentation/unet.py
new file mode 100644
index 0000000000000000000000000000000000000000..a683af98322eacb9d0775b3a5256d900f5743bb2
--- /dev/null
+++ b/tutorials/train/segmentation/unet.py
@@ -0,0 +1,53 @@
+import os
+# 选择使用0号卡
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+import paddlex as pdx
+from paddlex.seg import transforms
+
+# 下载和解压视盘分割数据集
+optic_dataset = 'https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz'
+pdx.utils.download_and_decompress(optic_dataset, path='./')
+
+# 定义训练和验证时的transforms
+train_transforms = transforms.Compose([
+ transforms.RandomHorizontalFlip(),
+ transforms.ResizeRangeScaling(),
+ transforms.RandomPaddingCrop(crop_size=512),
+ transforms.Normalize()
+])
+
+eval_transforms = transforms.Compose([
+ transforms.ResizeByLong(long_size=512),
+ transforms.Padding(target_size=512),
+ transforms.Normalize()
+])
+
+# 定义训练和验证所用的数据集
+train_dataset = pdx.datasets.SegDataset(
+ data_dir='optic_disc_seg',
+ file_list='optic_disc_seg/train_list.txt',
+ label_list='optic_disc_seg/labels.txt',
+ transforms=train_transforms,
+ shuffle=True)
+eval_dataset = pdx.datasets.SegDataset(
+ data_dir='optic_disc_seg',
+ file_list='optic_disc_seg/val_list.txt',
+ label_list='optic_disc_seg/labels.txt',
+ transforms=eval_transforms)
+
+# 初始化模型,并进行训练
+# 可使用VisualDL查看训练指标
+# VisualDL启动方式: visualdl --logdir output/unet/vdl_log --port 8001
+# 浏览器打开 https://0.0.0.0:8001即可
+# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
+num_classes = len(train_dataset.labels)
+model = pdx.seg.UNet(num_classes=num_classes)
+model.train(
+ num_epochs=20,
+ train_dataset=train_dataset,
+ train_batch_size=4,
+ eval_dataset=eval_dataset,
+ learning_rate=0.01,
+ save_dir='output/unet',
+ use_vdl=True)