
diff --git a/docs/annotation/jingling_demo/aa63d7e6db0d03137883772c246c6761fc201059.jpg b/docs/annotation/jingling_demo/jingling.jpg
similarity index 100%
rename from docs/annotation/jingling_demo/aa63d7e6db0d03137883772c246c6761fc201059.jpg
rename to docs/annotation/jingling_demo/jingling.jpg
diff --git a/docs/annotation/jingling_demo/outputs/aa63d7e6db0d03137883772c246c6761fc201059.json b/docs/annotation/jingling_demo/outputs/aa63d7e6db0d03137883772c246c6761fc201059.json
deleted file mode 100644
index 69d80205de92afc9cffa304b32ff0e3e95502687..0000000000000000000000000000000000000000
--- a/docs/annotation/jingling_demo/outputs/aa63d7e6db0d03137883772c246c6761fc201059.json
+++ /dev/null
@@ -1 +0,0 @@
-{"path":"/Users/dataset/aa63d7e6db0d03137883772c246c6761fc201059.jpg","outputs":{"object":[{"name":"person","polygon":{"x1":321.99,"y1":63,"x2":293,"y2":98.00999999999999,"x3":245.01,"y3":141.01,"x4":221,"y4":194,"x5":231.99,"y5":237,"x6":231.99,"y6":348.01,"x7":191,"y7":429,"x8":197,"y8":465.01,"x9":193,"y9":586,"x10":151,"y10":618.01,"x11":124,"y11":622,"x12":100,"y12":703,"x13":121.99,"y13":744,"x14":141.99,"y14":724,"x15":163,"y15":658.01,"x16":238.01,"y16":646,"x17":259,"y17":627,"x18":313,"y18":618.01,"x19":416,"y19":639,"x20":464,"y20":606,"x21":454,"y21":555.01,"x22":404,"y22":508.01,"x23":430,"y23":489,"x24":407,"y24":464,"x25":397,"y25":365.01,"x26":407,"y26":290,"x27":361.99,"y27":252,"x28":376,"y28":215.01,"x29":391.99,"y29":189,"x30":388.01,"y30":135.01,"x31":340,"y31":120,"x32":313,"y32":161.01,"x33":307,"y33":188.01,"x34":311,"y34":207,"x35":277,"y35":186,"x36":293,"y36":137,"x37":308.01,"y37":117,"x38":361,"y38":93}}]},"time_labeled":1568101256852,"labeled":true,"size":{"width":706,"height":1000,"depth":3}}
\ No newline at end of file
diff --git a/docs/annotation/jingling_demo/outputs/annotations/aa63d7e6db0d03137883772c246c6761fc201059.png b/docs/annotation/jingling_demo/outputs/annotations/aa63d7e6db0d03137883772c246c6761fc201059.png
deleted file mode 100644
index 8dfbff7b73bcfff7ef79b904667241731641d4a4..0000000000000000000000000000000000000000
Binary files a/docs/annotation/jingling_demo/outputs/annotations/aa63d7e6db0d03137883772c246c6761fc201059.png and /dev/null differ
diff --git a/docs/annotation/jingling_demo/outputs/annotations/jingling.png b/docs/annotation/jingling_demo/outputs/annotations/jingling.png
new file mode 100644
index 0000000000000000000000000000000000000000..526acefdcdd8317c5778a5d47495d7049a46269d
Binary files /dev/null and b/docs/annotation/jingling_demo/outputs/annotations/jingling.png differ
diff --git a/docs/annotation/jingling_demo/outputs/jingling.json b/docs/annotation/jingling_demo/outputs/jingling.json
new file mode 100644
index 0000000000000000000000000000000000000000..0021522487a26f66dadc979a96ea631c0314adab
--- /dev/null
+++ b/docs/annotation/jingling_demo/outputs/jingling.json
@@ -0,0 +1 @@
+{"path":"/Users/dataset/jingling.jpg","outputs":{"object":[{"name":"person","polygon":{"x1":321.99,"y1":63,"x2":293,"y2":98.00999999999999,"x3":245.01,"y3":141.01,"x4":221,"y4":194,"x5":231.99,"y5":237,"x6":231.99,"y6":348.01,"x7":191,"y7":429,"x8":197,"y8":465.01,"x9":193,"y9":586,"x10":151,"y10":618.01,"x11":124,"y11":622,"x12":100,"y12":703,"x13":121.99,"y13":744,"x14":141.99,"y14":724,"x15":163,"y15":658.01,"x16":238.01,"y16":646,"x17":259,"y17":627,"x18":313,"y18":618.01,"x19":416,"y19":639,"x20":464,"y20":606,"x21":454,"y21":555.01,"x22":404,"y22":508.01,"x23":430,"y23":489,"x24":407,"y24":464,"x25":397,"y25":365.01,"x26":407,"y26":290,"x27":361.99,"y27":252,"x28":376,"y28":215.01,"x29":391.99,"y29":189,"x30":388.01,"y30":135.01,"x31":340,"y31":120,"x32":313,"y32":161.01,"x33":307,"y33":188.01,"x34":311,"y34":207,"x35":277,"y35":186,"x36":293,"y36":137,"x37":308.01,"y37":117,"x38":361,"y38":93}}]},"time_labeled":1568101256852,"labeled":true,"size":{"width":706,"height":1000,"depth":3}}
\ No newline at end of file
diff --git a/docs/annotation/labelme2seg.md b/docs/annotation/labelme2seg.md
index a270591d06131ec48f4ebb0d25ec206031956a24..235e3c41b6a79ece0b7512955aba04fe06faabe3 100644
--- a/docs/annotation/labelme2seg.md
+++ b/docs/annotation/labelme2seg.md
@@ -47,7 +47,7 @@ git clone https://github.com/wkentaro/labelme
(3) 图片中所有目标的标注都完成后,点击`Save`保存json文件,**请将json文件和图片放在同一个文件夹里**,点击`Next Image`标注下一张图片。
-LableMe产出的真值文件可参考我们给出的文件夹`docs/annotation/labelme_demo`。
+LableMe产出的真值文件可参考我们给出的文件夹[docs/annotation/labelme_demo](labelme_demo)。

@@ -64,6 +64,7 @@ LableMe产出的真值文件可参考我们给出的文件夹`docs/annotation/la
## 3 数据格式转换
+最后用我们提供的数据转换脚本将上述标注工具产出的数据格式转换为模型训练时所需的数据格式。
* 经过数据格式转换后的数据集目录结构如下:
@@ -94,12 +95,18 @@ pip install pillow
* 运行以下代码,将标注后的数据转换成满足以上格式的数据集:
```
- python pdseg/tools/labelme2seg.py
+ python pdseg/tools/labelme2seg.py
```
-其中,``为图片以及LabelMe产出的json文件所在文件夹的目录,同时也是转换后的标注集所在文件夹的目录。
+其中,``为图片以及LabelMe产出的json文件所在文件夹的目录,同时也是转换后的标注集所在文件夹的目录。
-转换得到的数据集可参考我们给出的文件夹`docs/annotation/labelme_demo`。其中,文件`class_names.txt`是数据集中所有标注类别的名称,包含背景类;文件夹`annotations`保存的是各图片的像素级别的真值信息,背景类`_background_`对应为0,其它目标类别从1开始递增,至多为255。
+我们已内置了一个标注的示例,可运行以下代码进行体验:
+
+```
+python pdseg/tools/labelme2seg.py docs/annotation/labelme_demo/
+```
+
+转换得到的数据集可参考我们给出的文件夹[docs/annotation/labelme_demo](labelme_demo)。其中,文件`class_names.txt`是数据集中所有标注类别的名称,包含背景类;文件夹`annotations`保存的是各图片的像素级别的真值信息,背景类`_background_`对应为0,其它目标类别从1开始递增,至多为255。

diff --git a/docs/annotation/labelme_demo/annotations/2011_000025.png b/docs/annotation/labelme_demo/annotations/2011_000025.png
index dcf7c96517d4870f6e83293cef62e3285e5b37e3..0b5a56dda153c92f4411ac7d71665aaf93111e10 100644
Binary files a/docs/annotation/labelme_demo/annotations/2011_000025.png and b/docs/annotation/labelme_demo/annotations/2011_000025.png differ
diff --git a/docs/benchmark.md b/docs/benchmark.md
deleted file mode 100644
index c1e6de2fcee971437c29e370e9410f9d00c9145f..0000000000000000000000000000000000000000
--- a/docs/benchmark.md
+++ /dev/null
@@ -1,17 +0,0 @@
-# PaddleSeg 性能Benchmark
-
-## 训练性能
-
-### 多GPU加速比
-
-### 显存开销对比
-
-## 预测性能对比
-
-### Windows
-
-### Linux
-
-#### Naive
-
-#### Analysis
diff --git a/docs/check.md b/docs/check.md
index fac9520f11ef46d3628ecab3fcc4127a468a3ca5..20dc87f7e10d856f050a80554adb9c93d0ff05e3 100644
--- a/docs/check.md
+++ b/docs/check.md
@@ -55,7 +55,7 @@ Doing label pixel statistics:
- 当`AUG.AUG_METHOD`为stepscaling时,`EVAL_CROP_SIZE`的宽高应不小于原图中最大的宽高。
-- 当`AUG.AUG_METHOD`为rangscaling时,`EVAL_CROP_SIZE`的宽高应不小于缩放后图像中最大的宽高。
+- 当`AUG.AUG_METHOD`为rangescaling时,`EVAL_CROP_SIZE`的宽高应不小于缩放后图像中最大的宽高。
### 11 数据增强参数`AUG.INF_RESIZE_VALUE`校验
验证`AUG.INF_RESIZE_VALUE`是否在[`AUG.MIN_RESIZE_VALUE`~`AUG.MAX_RESIZE_VALUE`]范围内。若在范围内,则通过校验。
diff --git a/docs/config.md b/docs/config.md
index 387af4d4e18dc0e5b8cee7baa96ecf5b713f03ab..67e1353a7d88994b584d5bd3da4dd36d81430a59 100644
--- a/docs/config.md
+++ b/docs/config.md
@@ -1,18 +1,281 @@
-# PaddleSeg 分割库配置说明
+# 脚本使用和配置说明
-PaddleSeg提供了提供了统一的配置用于 训练/评估/可视化/导出模型
+PaddleSeg提供了 **训练**/**评估**/**可视化**/**模型导出** 等4个功能的使用脚本。所有脚本都支持通过不同的Flags来开启特定功能,也支持通过Options来修改默认的训练配置。它们的使用方式非常接近,如下:
+
+```shell
+# 训练
+python pdseg/train.py ${FLAGS} ${OPTIONS}
+# 评估
+python pdseg/eval.py ${FLAGS} ${OPTIONS}
+# 可视化
+python pdseg/vis.py ${FLAGS} ${OPTIONS}
+# 模型导出
+python pdseg/export_model.py ${FLAGS} ${OPTIONS}
+```
+
+**Note:** FLAGS必须位于OPTIONS之前,否会将会遇到报错,例如如下的例子:
+
+```shell
+# FLAGS "--cfg configs/unet_optic.yaml" 必须在 OPTIONS "BATCH_SIZE 1" 之前
+python pdseg/train.py BATCH_SIZE 1 --cfg configs/unet_optic.yaml
+```
+
+## 命令行FLAGS
+
+|FLAG|用途|支持脚本|默认值|备注|
+|-|-|-|-|-|
+|--cfg|配置文件路径|ALL|None||
+|--use_gpu|是否使用GPU进行训练|train/eval/vis|False||
+|--use_mpio|是否使用多进程进行IO处理|train/eval|False|打开该开关会占用一定量的CPU内存,但是可以提高训练速度。 **NOTE:** windows平台下不支持该功能, 建议使用自定义数据初次训练时不打开,打开会导致数据读取异常不可见。 |
+|--use_tb|是否使用TensorBoard记录训练数据|train|False||
+|--log_steps|训练日志的打印周期(单位为step)|train|10||
+|--debug|是否打印debug信息|train|False|IOU等指标涉及到混淆矩阵的计算,会降低训练速度|
+|--tb_log_dir |TensorBoard的日志路径|train|None||
+|--do_eval|是否在保存模型时进行效果评估 |train|False||
+|--vis_dir|保存可视化图片的路径|vis|"visual"||
+
+## OPTIONS
+
+PaddleSeg提供了统一的配置用于 训练/评估/可视化/导出模型。一共存在三套配置方案:
+* 命令行窗口传递的参数。
+* configs目录下的yaml文件。
+* 默认参数,位于pdseg/utils/config.py。
+
+三者的优先级顺序为 命令行窗口 > yaml > 默认配置。
配置包含以下Group:
-* [通用](./configs/basic_group.md)
-* [DATASET](./configs/dataset_group.md)
-* [DATALOADER](./configs/dataloader_group.md)
-* [FREEZE](./configs/freeze_group.md)
-* [MODEL](./configs/model_group.md)
-* [SOLVER](./configs/solver_group.md)
-* [TRAIN](./configs/train_group.md)
-* [TEST](./configs/test_group.md)
-
-`Note`:
-
- 代码详见pdseg/utils/config.py
+|OPTIONS|用途|支持脚本|
+|-|-|-|
+|[BASIC](./configs/basic_group.md)|通用配置|ALL|
+|[DATASET](./configs/dataset_group.md)|数据集相关|train/eval/vis|
+|[MODEL](./configs/model_group.md)|模型相关|ALL|
+|[TRAIN](./configs/train_group.md)|训练相关|train|
+|[SOLVER](./configs/solver_group.md)|训练优化相关|train|
+|[TEST](./configs/test_group.md)|测试模型相关|eval/vis/export_model|
+|[AUG](./data_aug.md)|数据增强|ALL|
+[FREEZE](./configs/freeze_group.md)|模型导出相关|export_model|
+|[DATALOADER](./configs/dataloader_group.md)|数据加载相关|ALL|
+
+在进行自定义的分割任务之前,您需要准备一份yaml文件,建议参照[configs目录下的示例yaml](../configs)进行修改。
+
+以下是PaddleSeg的默认配置,供查询使用。
+
+```yaml
+########################## 基本配置 ###########################################
+# 批处理大小
+BATCH_SIZE: 1
+# 验证时图像裁剪尺寸(宽,高)
+EVAL_CROP_SIZE: tuple()
+# 训练时图像裁剪尺寸(宽,高)
+TRAIN_CROP_SIZE: tuple()
+
+########################## 数据集配置 #########################################
+DATASET:
+ # 数据主目录目录
+ DATA_DIR: './dataset/cityscapes/'
+ # 训练集列表
+ TRAIN_FILE_LIST: './dataset/cityscapes/train.list'
+ # 验证集列表
+ VAL_FILE_LIST: './dataset/cityscapes/val.list'
+ # 测试数据列表
+ TEST_FILE_LIST: './dataset/cityscapes/test.list'
+ # Tensorboard 可视化的数据集
+ VIS_FILE_LIST: None
+ # 类别数(需包括背景类)
+ NUM_CLASSES: 19
+ # 输入图像类型, 支持三通道'rgb',四通道'rgba',单通道灰度图'gray'
+ IMAGE_TYPE: 'rgb'
+ # 输入图片的通道数
+ DATA_DIM: 3
+ # 数据列表分割符, 默认为空格
+ SEPARATOR: ' '
+ # 忽略的像素标签值, 默认为255,一般无需改动
+ IGNORE_INDEX: 255
+
+########################## 模型通用配置 #######################################
+MODEL:
+ # 模型名称, 已支持deeplabv3p, unet, icnet,pspnet,hrnet
+ MODEL_NAME: ''
+ # BatchNorm类型: bn、gn(group_norm)
+ DEFAULT_NORM_TYPE: 'bn'
+ # 多路损失加权值
+ MULTI_LOSS_WEIGHT: [1.0]
+ # DEFAULT_NORM_TYPE为gn时group数
+ DEFAULT_GROUP_NUMBER: 32
+ # 极小值, 防止分母除0溢出,一般无需改动
+ DEFAULT_EPSILON: 1e-5
+ # BatchNorm动量, 一般无需改动
+ BN_MOMENTUM: 0.99
+ # 是否使用FP16训练
+ FP16: False
+
+ ########################## DeepLab模型配置 ####################################
+ DEEPLAB:
+ # DeepLab backbone 配置, 可选项xception_65, mobilenetv2
+ BACKBONE: "xception_65"
+ # DeepLab output stride
+ OUTPUT_STRIDE: 16
+ # MobileNet v2 backbone scale 设置
+ DEPTH_MULTIPLIER: 1.0
+ # MobileNet v2 backbone scale 设置
+ ENCODER_WITH_ASPP: True
+ # MobileNet v2 backbone scale 设置
+ ENABLE_DECODER: True
+ # ASPP是否使用可分离卷积
+ ASPP_WITH_SEP_CONV: True
+ # 解码器是否使用可分离卷积
+ DECODER_USE_SEP_CONV: True
+
+ ########################## UNET模型配置 #######################################
+ UNET:
+ # 上采样方式, 默认为双线性插值
+ UPSAMPLE_MODE: 'bilinear'
+
+ ########################## ICNET模型配置 ######################################
+ ICNET:
+ # RESNET backbone scale 设置
+ DEPTH_MULTIPLIER: 0.5
+ # RESNET 层数 设置
+ LAYERS: 50
+
+ ########################## PSPNET模型配置 ######################################
+ PSPNET:
+ # RESNET backbone scale 设置
+ DEPTH_MULTIPLIER: 1
+ # RESNET backbone 层数 设置
+ LAYERS: 50
+
+ ########################## HRNET模型配置 ######################################
+ HRNET:
+ # HRNET STAGE2 设置
+ STAGE2:
+ NUM_MODULES: 1
+ NUM_CHANNELS: [40, 80]
+ # HRNET STAGE3 设置
+ STAGE3:
+ NUM_MODULES: 4
+ NUM_CHANNELS: [40, 80, 160]
+ # HRNET STAGE4 设置
+ STAGE4:
+ NUM_MODULES: 3
+ NUM_CHANNELS: [40, 80, 160, 320]
+
+########################### 训练配置 ##########################################
+TRAIN:
+ # 模型保存路径
+ MODEL_SAVE_DIR: ''
+ # 预训练模型路径
+ PRETRAINED_MODEL_DIR: ''
+ # 是否resume,继续训练
+ RESUME_MODEL_DIR: ''
+ # 是否使用多卡间同步BatchNorm均值和方差
+ SYNC_BATCH_NORM: False
+ # 模型参数保存的epoch间隔数,可用来继续训练中断的模型
+ SNAPSHOT_EPOCH: 10
+
+########################### 模型优化相关配置 ##################################
+SOLVER:
+ # 初始学习率
+ LR: 0.1
+ # 学习率下降方法, 支持poly piecewise cosine 三种
+ LR_POLICY: "poly"
+ # 优化算法, 支持SGD和Adam两种算法
+ OPTIMIZER: "sgd"
+ # 动量参数
+ MOMENTUM: 0.9
+ # 二阶矩估计的指数衰减率
+ MOMENTUM2: 0.999
+ # 学习率Poly下降指数
+ POWER: 0.9
+ # step下降指数
+ GAMMA: 0.1
+ # step下降间隔
+ DECAY_EPOCH: [10, 20]
+ # 学习率权重衰减,0-1
+ WEIGHT_DECAY: 0.00004
+ # 训练开始epoch数,默认为1
+ BEGIN_EPOCH: 1
+ # 训练epoch数,正整数
+ NUM_EPOCHS: 30
+ # loss的选择,支持softmax_loss, bce_loss, dice_loss
+ LOSS: ["softmax_loss"]
+ # 是否开启warmup学习策略
+ LR_WARMUP: False
+ # warmup的迭代次数
+ LR_WARMUP_STEPS: 2000
+
+########################## 测试配置 ###########################################
+TEST:
+ # 测试模型路径
+ TEST_MODEL: ''
+
+########################### 数据增强配置 ######################################
+AUG:
+ # 图像resize的方式有三种:
+ # unpadding(固定尺寸),stepscaling(按比例resize),rangescaling(长边对齐)
+ AUG_METHOD: 'unpadding'
+
+ # 图像resize的固定尺寸(宽,高),非负
+ FIX_RESIZE_SIZE: (500, 500)
+
+ # 图像resize方式为stepscaling,resize最小尺度,非负
+ MIN_SCALE_FACTOR: 0.5
+ # 图像resize方式为stepscaling,resize最大尺度,不小于MIN_SCALE_FACTOR
+ MAX_SCALE_FACTOR: 2.0
+ # 图像resize方式为stepscaling,resize尺度范围间隔,非负
+ SCALE_STEP_SIZE: 0.25
+
+ # 图像resize方式为rangescaling,训练时长边resize的范围最小值,非负
+ MIN_RESIZE_VALUE: 400
+ # 图像resize方式为rangescaling,训练时长边resize的范围最大值,
+ # 不小于MIN_RESIZE_VALUE
+ MAX_RESIZE_VALUE: 600
+ # 图像resize方式为rangescaling, 测试验证可视化模式下长边resize的长度,
+ # 在MIN_RESIZE_VALUE到MAX_RESIZE_VALUE范围内
+ INF_RESIZE_VALUE: 500
+
+ # 图像镜像左右翻转
+ MIRROR: True
+ # 图像上下翻转开关,True/False
+ FLIP: False
+ # 图像启动上下翻转的概率,0-1
+ FLIP_RATIO: 0.5
+
+ RICH_CROP:
+ # RichCrop数据增广开关,用于提升模型鲁棒性
+ ENABLE: False
+ # 图像旋转最大角度,0-90
+ MAX_ROTATION: 15
+ # 裁取图像与原始图像面积比,0-1
+ MIN_AREA_RATIO: 0.5
+ # 裁取图像宽高比范围,非负
+ ASPECT_RATIO: 0.33
+ # 亮度调节范围,0-1
+ BRIGHTNESS_JITTER_RATIO: 0.5
+ # 饱和度调节范围,0-1
+ SATURATION_JITTER_RATIO: 0.5
+ # 对比度调节范围,0-1
+ CONTRAST_JITTER_RATIO: 0.5
+ # 图像模糊开关,True/False
+ BLUR: False
+ # 图像启动模糊百分比,0-1
+ BLUR_RATIO: 0.1
+
+########################## 预测部署模型配置 ###################################
+FREEZE:
+ # 预测保存的模型名称
+ MODEL_FILENAME: '__model__'
+ # 预测保存的参数名称
+ PARAMS_FILENAME: '__params__'
+ # 预测模型参数保存的路径
+ SAVE_DIR: 'freeze_model'
+
+########################## 数据载入配置 #######################################
+DATALOADER:
+ # 数据载入时的并发数, 建议值8
+ NUM_WORKERS: 8
+ # 数据载入时缓存队列大小, 建议值256
+ BUF_SIZE: 256
+```
+
diff --git a/docs/configs/basic_group.md b/docs/configs/basic_group.md
index c66752f38e153084601c89e0aeb0c9385f02885b..dbe22b91da0632ad6b0b435582495b784aa2b276 100644
--- a/docs/configs/basic_group.md
+++ b/docs/configs/basic_group.md
@@ -2,70 +2,58 @@
BASIC Group存放所有通用配置
-## `MEAN`
+## `BATCH_SIZE`
-图像预处理减去的均值(格式为 *[R, G, B]* )
+训练、评估、可视化时所用的BATCH大小
### 默认值
-[0.5, 0.5, 0.5]
+1(需要根据实际需求填写)
-
-
+### 注意事项
-## `STD`
+* 当指定了多卡运行时,PaddleSeg会将数据平分到每张卡上运行,因此每张卡单次运行的数量为 BATCH_SIZE // dev_count
-图像预处理所除的标准差(格式为 *[R, G, B]* )
+* 多卡运行时,请确保BATCH_SIZE可被dev_count整除
-### 默认值
+* 增大BATCH_SIZE有利于模型训练时的收敛速度,但是会带来显存的开销。请根据实际情况评估后填写合适的值
-[0.5, 0.5, 0.5]
+* 目前PaddleSeg提供的很多预训练模型都有BN层,如果BATCH SIZE设置为1,则此时训练可能不稳定导致nan
-## `EVAL_CROP_SIZE`
+## `TRAIN_CROP_SIZE`
-评估时所对图片裁剪的大小(格式为 *[宽, 高]* )
+训练时所对图片裁剪的大小(格式为 *[宽, 高]* )
### 默认值
无(需要用户自己填写)
### 注意事项
-* 裁剪的大小不能小于原图,请将该字段的值填写为评估数据中最长的宽和高
+`TRAIN_CROP_SIZE`可以设置任意大小,具体如何设置根据数据集而定。
-## `TRAIN_CROP_SIZE`
+## `EVAL_CROP_SIZE`
-训练时所对图片裁剪的大小(格式为 *[宽, 高]* )
+评估时所对图片裁剪的大小(格式为 *[宽, 高]* )
### 默认值
无(需要用户自己填写)
-
-
-
-## `BATCH_SIZE`
-
-训练、评估、可视化时所用的BATCH大小
-
-### 默认值
-
-1(需要根据实际需求填写)
-
### 注意事项
+`EVAL_CROP_SIZE`的设置需要满足以下条件,共有3种情形:
+- 当`AUG.AUG_METHOD`为unpadding时,`EVAL_CROP_SIZE`的宽高应不小于`AUG.FIX_RESIZE_SIZE`的宽高。
+- 当`AUG.AUG_METHOD`为stepscaling时,`EVAL_CROP_SIZE`的宽高应不小于原图中最长的宽高。
+- 当`AUG.AUG_METHOD`为rangescaling时,`EVAL_CROP_SIZE`的宽高应不小于缩放后图像中最长的宽高。
-* 当指定了多卡运行时,PaddleSeg会将数据平分到每张卡上运行,因此每张卡单次运行的数量为 BATCH_SIZE // dev_count
+
+
-* 多卡运行时,请确保BATCH_SIZE可被dev_count整除
-* 增大BATCH_SIZE有利于模型训练时的收敛速度,但是会带来显存的开销。请根据实际情况评估后填写合适的值
-* 目前PaddleSeg提供的很多预训练模型都有BN层,如果BATCH SIZE设置为1,则此时训练可能不稳定导致nan
-
-
diff --git a/docs/configs/model_group.md b/docs/configs/model_group.md
index e11b769de7d8aabbd14583e6666045de6cfc5b42..ca8758cdf2e93337da9bcd4400d572e88f006445 100644
--- a/docs/configs/model_group.md
+++ b/docs/configs/model_group.md
@@ -5,11 +5,12 @@ MODEL Group存放所有和模型相关的配置,该Group还包含三个子Grou
* [DeepLabv3p](./model_deeplabv3p_group.md)
* [UNet](./model_unet_group.md)
* [ICNet](./model_icnet_group.md)
+* [PSPNet](./model_pspnet_group.md)
* [HRNet](./model_hrnet_group.md)
## `MODEL_NAME`
-所选模型,支持`deeplabv3p` `unet` `icnet` `hrnet`四种模型
+所选模型,支持`deeplabv3p` `unet` `icnet` `pspnet` `hrnet`五种模型
### 默认值
@@ -20,7 +21,13 @@ MODEL Group存放所有和模型相关的配置,该Group还包含三个子Grou
## `DEFAULT_NORM_TYPE`
-模型所用norm类型,支持`bn` [`gn`]()
+模型所用norm类型,支持`bn`(Batch Norm)、`gn`(Group Norm)
+
+
+
+关于Group Norm的介绍可以参考论文:https://arxiv.org/abs/1803.08494
+
+GN 把通道分为组,并计算每一组之内的均值和方差,以进行归一化。GN 的计算与批量大小无关,其精度也在各种批量大小下保持稳定。适应于网络参数很重的模型,比如deeplabv3+这种,可以在一个小batch下取得一个较好的训练效果。
### 默认值
@@ -111,4 +118,3 @@ loss = 1.0 * loss1 + 0.4 * loss2 + 0.16 * loss3
-
diff --git a/docs/configs/model_pspnet_group.md b/docs/configs/model_pspnet_group.md
new file mode 100644
index 0000000000000000000000000000000000000000..c1acd31b296b8b64ac05730e0e92b840264a4f23
--- /dev/null
+++ b/docs/configs/model_pspnet_group.md
@@ -0,0 +1,25 @@
+# cfg.MODEL.PSPNET
+
+MODEL.PSPNET 子Group存放所有和PSPNet模型相关的配置
+
+## `DEPTH_MULTIPER`
+
+Resnet backbone的depth multiper
+
+### 默认值
+
+1
+
+
+
+
+## `LAYERS`
+
+ResNet backbone的层数,支持`18` `34` `50` `101` `152`等五种
+
+### 默认值
+
+50
+
+
+
diff --git a/docs/configs/train_group.md b/docs/configs/train_group.md
index 6c8a0d79c79af665d8c7bf54a2b7555aa024bb8d..2fc8806c457d561978379589f6e05657e62a6e86 100644
--- a/docs/configs/train_group.md
+++ b/docs/configs/train_group.md
@@ -5,7 +5,7 @@ TRAIN Group存放所有和训练相关的配置
## `MODEL_SAVE_DIR`
在训练周期内定期保存模型的主目录
-## 默认值
+### 默认值
无(需要用户自己填写)
@@ -14,10 +14,10 @@ TRAIN Group存放所有和训练相关的配置
## `PRETRAINED_MODEL_DIR`
预训练模型路径
-## 默认值
+### 默认值
无
-## 注意事项
+### 注意事项
* 若未指定该字段,则模型会随机初始化所有的参数,从头开始训练
@@ -31,10 +31,10 @@ TRAIN Group存放所有和训练相关的配置
## `RESUME_MODEL_DIR`
从指定路径中恢复参数并继续训练
-## 默认值
+### 默认值
无
-## 注意事项
+### 注意事项
* 当`RESUME_MODEL_DIR`存在时,PaddleSeg会恢复到上一次训练的最近一个epoch,并且恢复训练过程中的临时变量(如已经衰减过的学习率,Optimizer的动量数据等),`PRETRAINED_MODEL`路径的最后一个目录必须为int数值或者字符串final,PaddleSeg会将int数值作为当前起始EPOCH继续训练,若目录为final,则不会继续训练。若目录不满足上述条件,PaddleSeg会抛出错误。
@@ -42,12 +42,17 @@ TRAIN Group存放所有和训练相关的配置
## `SYNC_BATCH_NORM`
-是否在多卡间同步BN的均值和方差
+是否在多卡间同步BN的均值和方差。
-## 默认值
+Synchronized Batch Norm跨GPU批归一化策略最早在[MegDet: A Large Mini-Batch Object Detector](https://arxiv.org/abs/1711.07240)
+论文中提出,在[Bag of Freebies for Training Object Detection Neural Networks](https://arxiv.org/pdf/1902.04103.pdf)论文中以Yolov3验证了这一策略的有效性,[PaddleCV/yolov3](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/yolov3)实现了这一系列策略并比Darknet框架版本在COCO17数据上mAP高5.9.
+
+PaddleSeg基于PaddlePaddle框架的sync_batch_norm策略,可以支持通过多卡实现大batch size的分割模型训练,可以得到更高的mIoU精度。
+
+### 默认值
False
-## 注意事项
+### 注意事项
* 打开该选项会带来一定的性能消耗(多卡间同步数据导致)
diff --git a/docs/data_aug.md b/docs/data_aug.md
index 2865d413b7090f414eb44c0681562837de21f19a..ed1b5c3c2dc66fa94f6dc1067bdec19161cda431 100644
--- a/docs/data_aug.md
+++ b/docs/data_aug.md
@@ -7,67 +7,108 @@
## Resize
-resize步骤是指将输入图像按照某种规则讲图片重新缩放到某一个尺寸,PaddleSeg支持以下3种resize方式:
+Resize步骤是指将输入图像按照某种规则讲图片重新缩放到某一个尺寸,PaddleSeg支持以下3种resize方式:

-- Un-padding
-将输入图像直接resize到某一个固定大小下,送入到网络中间训练,对应参数为AUG.FIX_RESIZE_SIZE。预测时同样操作。
+- Unpadding
+将输入图像直接resize到某一个固定大小下,送入到网络中间训练。预测时同样操作。
- Step-Scaling
-将输入图像按照某一个比例resize,这个比例以某一个步长在一定范围内随机变动。设定最小比例参数为`AUG.MIN_SCALE_FACTOR`, 最大比例参数`AUG.MAX_SCALE_FACTOR`,步长参数为`AUG.SCALE_STEP_SIZE`。预测时不对输入图像做处理。
+将输入图像按照某一个比例resize,这个比例以某一个步长在一定范围内随机变动。预测时不对输入图像做处理。
- Range-Scaling
-固定长宽比resize,即图像长边对齐到某一个固定大小,短边随同样的比例变化。设定最小大小参数为`AUG.MIN_RESIZE_VALUE`,设定最大大小参数为`AUG.MAX_RESIZE_VALUE`。预测时需要将长边对齐到`AUG.INF_RESIZE_VALUE`所指定的大小,其中`AUG.INF_RESIZE_VALUE`在`AUG.MIN_RESIZE_VALUE`和`AUG.MAX_RESIZE_VALUE`范围内。
+将输入图像按照长边变化进行resize,即图像长边对齐到某一长度,该长度在一定范围内随机变动,短边随同样的比例变化。
+预测时需要将长边对齐到另外指定的固定长度。
Range-Scaling示意图如下:

+|Resize方式|配置参数|含义|备注|
+|-|-|-|-|
+|Unpadding|AUG.FIX_RESIZE_SIZE|Resize的固定尺寸|
+|Step-Scaling|AUG.MIN_SCALE_FACTOR|Resize最小比例|
+||AUG.MAX_SCALE_FACTOR|Resize最大比例|
+||AUG.SCALE_STEP_SIZE|Resize比例选取的步长|
+|Range-Scaling|AUG.MIN_RESIZE_VALUE|图像长边变动范围的最小值|
+||AUG.MAX_RESIZE_VALUE|图像长边变动范围的最大值|
+| |AUG.INF_RESIZE_VALUE|预测时长边对齐时所指定的固定长度|取值必须在
[AUG.MIN_RESIZE_VALUE,
AUG.MAX_RESIZE_VALUE]
范围内。|
+
+**注:本文所有配置参数可在configs目录下您的yaml文件中进行设置。**
+
## 图像翻转
PaddleSeg支持以下2种翻转方式:
- 左右翻转(Mirror)
-使用开关`AUG.MIRROR`,为True时该项功能开启,为False时该项功能关闭。
+以50%概率对图像进行左右翻转。
- 上下翻转(Flip)
-使用开关`AUG.FLIP`,为True时该项功能开启,`AUG.FLIP_RATIO`控制是否上下翻转的概率。为False时该项功能关闭。
+以一定概率对图像进行上下翻转。
以上2种开关独立运作,可组合使用。故图像翻转一共有如下4种可能的情况:

+|图像翻转方式|配置参数|含义|备注|
+|-|-|-|-|
+|Mirror|AUG.MIRROR|左右翻转开关|为True时开启,为False时关闭|
+|Flip|AUG.FLIP|上下翻转开关|为True时开启,为False时关闭|
+||AUG.FLIP_RATIO|控制是否上下翻转的概率|当AUG.FLIP为False时无效|
+
+
## Rich Crop
Rich Crop是PaddleSeg结合实际业务经验开放的一套数据增强策略,面向标注数据少,测试数据情况繁杂的分割业务场景使用的数据增强策略。流程如下图所示:

-rich crop是指对图像进行多种变换,保证在训练过程中数据的丰富多样性,PaddleSeg支持以下几种变换。`AUG.RICH_CROP.ENABLE`为False时会直接跳过该步骤。
+Rich Crop是指对图像进行多种变换,保证在训练过程中数据的丰富多样性,包含以下4种变换:
+
+- Blur
+使用高斯模糊对图像进行平滑。
+
+- Rotation
+图像旋转,旋转角度在一定范围内随机选取,旋转产生的多余的区域使用`DATASET.PADDING_VALUE`值进行填充。
-- blur
-图像加模糊,使用开关`AUG.RICH_CROP.BLUR`,为False时该项功能关闭。`AUG.RICH_CROP.BLUR_RATIO`控制加入模糊的概率。
+- Aspect
+图像长宽比调整,从图像中按一定大小和宽高比裁取一定区域出来之后进行resize。
-- rotation
-图像旋转,`AUG.RICH_CROP.MAX_ROTATION`控制最大旋转角度。旋转产生的多余的区域的填充值为均值。
+- Color jitter
+图像颜色抖动,共进行亮度、饱和度和对比度三种颜色属性的调节。
-- aspect
-图像长宽比调整,从图像中crop一定区域出来之后在某一长宽比内进行resize。控制参数`AUG.RICH_CROP.MIN_AREA_RATIO`和`AUG.RICH_CROP.ASPECT_RATIO`。
+|Rich crop方式|配置参数|含义|备注|
+|-|-|-|-|
+|Rich crop|AUG.RICH_CROP.ENABLE|Rich crop总开关|为True时开启,为False时关闭所有变换|
+|Blur|AUG.RICH_CROP.BLUR|图像模糊开关|为True时开启,为False时关闭|
+||AUG.RICH_CROP.BLUR_RATIO|控制进行模糊的概率|当AUG.RICH_CROP.BLUR为False时无效|
+|Rotation|AUG.RICH_CROP.MAX_ROTATION|图像正向旋转的最大角度|取值0~90°,实际旋转角度在\[-AUG.RICH_CROP.MAX_ROTATION, AUG.RICH_CROP.MAX_ROTATION]范围内随机选取|
+|Aspect|AUG.RICH_CROP.MIN_AREA_RATIO|裁取图像与原始图像面积比最小值|取值0~1,取值越小则变化范围越大,若为0则不进行调节|
+||AUG.RICH_CROP.ASPECT_RATIO|裁取图像宽高比范围|取值非负,越小则变化范围越大,若为0则不进行调节|
+|Color jitter|AUG.RICH_CROP.BRIGHTNESS_JITTER_RATIO|亮度调节因子|取值0~1,取值越大则变化范围越大,若为0则不进行调节|
+||AUG.RICH_CROP.SATURATION_JITTER_RATIO|饱和度调节因子|取值0~1,取值越大则变化范围越大,若为0则不进行调节|
+| |AUG.RICH_CROP.CONTRAST_JITTER_RATIO|对比度调节因子 |取值0~1,取值越大则变化范围越大,若为0则不进行调节|
-- color jitter
-图像颜色调整,控制参数`AUG.RICH_CROP.BRIGHTNESS_JITTER_RATIO`、`AUG.RICH_CROP.SATURATION_JITTER_RATIO`、`AUG.RICH_CROP.CONTRAST_JITTER_RATIO`。
## Random Crop
-该步骤主要是通过crop的方式使得输入到网络中的图像在某一个固定大小,控制该大小的参数为TRAIN_CROP_SIZE,类型为tuple,格式为(width, height). 当输入图像大小小于CROP_SIZE的时候会对输入图像进行padding,padding值为均值。
-
-- 输入图片格式
- - 原图
- - 图片格式:RGB三通道图片和RGBA四通道图片两种类型的图片进行训练,但是在一次训练过程只能存在一种格式。
- - 图片转换:灰度图片经过预处理后之后会转变成三通道图片
- - 图片参数设置:当图片为三通道图片时IMAGE_TYPE设置为rgb, 对应MEAN和STD也必须是一个长度为3的list,当图片为四通道图片时IMAGE_TYPE设置为rgba,对应的MEAN和STD必须是一个长度为4的list。
- - 标注图
- - 图片格式:标注图片必须为png格式的单通道多值图,元素值代表的是这个元素所属于的类别。
- - 图片转换:在datalayer层对label图片进行的任何resize,以及旋转的操作,都必须采用最近邻的插值方式。
- - 图片ignore:设置TRAIN.IGNORE_INDEX 参数可以选择性忽略掉属于某一个类别的所有像素点。这个参数一般设置为255
+随机裁剪图片和标签图,该步骤主要是通过裁剪的方式使得输入到网络中的图像在某一个固定大小。
+
+Random crop过程分为3种情形:
+- 当输入图像尺寸等于CROP_SIZE时,返回原图。
+- 当输入图像尺寸大于CROP_SIZE时,直接裁剪。
+- 当输入图像尺寸小于CROP_SIZE时,分别使用`DATASET.PADDING_VALUE`值和`DATASET.IGNORE_INDEX`值对图像和标签图进行填充,再进行裁剪。
+
+|Random crop方式|配置参数|含义|备注|
+|-|-|-|-|
+|Train crop|TRAIN_CROP_SIZE|训练过程进行random crop后的图像尺寸|类型为tuple,格式为(width, height)
+|Eval crop |EVAL_CROP_SIZE|除训练外的过程进行random crop后的图像尺寸|类型为tuple,格式为(width, height)
+
+`TRAIN_CROP_SIZE`可以设置任意大小,具体如何设置根据数据集而定。
+
+`EVAL_CROP_SIZE`的设置需要满足以下条件,共有3种情形:
+- 当`AUG.AUG_METHOD`为unpadding时,`EVAL_CROP_SIZE`的宽高应不小于`AUG.FIX_RESIZE_SIZE`的宽高。
+- 当`AUG.AUG_METHOD`为stepscaling时,`EVAL_CROP_SIZE`的宽高应不小于原图中最长的宽高。
+- 当`AUG.AUG_METHOD`为rangescaling时,`EVAL_CROP_SIZE`的宽高应不小于缩放后图像中最长的宽高。
+
diff --git a/docs/data_prepare.md b/docs/data_prepare.md
index 50864a730a534c4a0e5eba84fb11dfb1bb9c542d..de1fd7965cf74efe22b5c126b94ae063ac8a52ca 100644
--- a/docs/data_prepare.md
+++ b/docs/data_prepare.md
@@ -2,6 +2,45 @@
## 数据标注
+### 标注协议
+PaddleSeg采用单通道的标注图片,每一种像素值代表一种类别,像素标注类别需要从0开始递增,例如0,1,2,3表示有4种类别。
+
+**NOTE:** 标注图像请使用PNG无损压缩格式的图片。标注类别最多为256类。
+
+### 灰度标注vs伪彩色标注
+一般的分割库使用单通道灰度图作为标注图片,往往显示出来是全黑的效果。灰度标注图的弊端:
+1. 对图像标注后,无法直接观察标注是否正确。
+2. 模型测试过程无法直接判断分割的实际效果。
+
+**PaddleSeg支持伪彩色图作为标注图片,在原来的单通道图片基础上,注入调色板。在基本不增加图片大小的基础上,却可以显示出彩色的效果。**
+
+同时PaddleSeg也兼容灰度图标注,用户原来的灰度数据集可以不做修改,直接使用。
+
+
+### 灰度标注转换为伪彩色标注
+如果用户需要转换成伪彩色标注图,可使用我们的转换工具。适用于以下两种常见的情况:
+1. 如果您希望将指定目录下的所有灰度标注图转换为伪彩色标注图,则执行以下命令,指定灰度标注所在的目录即可。
+```buildoutcfg
+python pdseg/tools/gray2pseudo_color.py
+```
+
+|参数|用途|
+|-|-|
+|dir_or_file|指定灰度标注所在目录|
+|output_dir|彩色标注图片的输出目录|
+
+2. 如果您仅希望将指定数据集中的部分灰度标注图转换为伪彩色标注图,则执行以下命令,需要已有文件列表,按列表读取指定图片。
+```buildoutcfg
+python pdseg/tools/gray2pseudo_color.py --dataset_dir --file_separator
+```
+|参数|用途|
+|-|-|
+|dir_or_file|指定文件列表路径|
+|output_dir|彩色标注图片的输出目录|
+|--dataset_dir|数据集所在根目录|
+|--file_separator|文件列表分隔符|
+
+### 标注教程
用户需预先采集好用于训练、评估和测试的图片,然后使用数据标注工具完成数据标注。
PddleSeg已支持2种标注工具:LabelMe、精灵数据标注工具。标注教程如下:
@@ -9,63 +48,32 @@ PddleSeg已支持2种标注工具:LabelMe、精灵数据标注工具。标注
- [LabelMe标注教程](annotation/labelme2seg.md)
- [精灵数据标注工具教程](annotation/jingling2seg.md)
-最后用我们提供的数据转换脚本将上述标注工具产出的数据格式转换为模型训练时所需的数据格式。
## 文件列表
### 文件列表规范
-PaddleSeg采用通用的文件列表方式组织训练集、验证集和测试集。像素标注类别需要从0开始递增。
-
-**NOTE:** 标注图像请使用PNG无损压缩格式的图片
-
-以Cityscapes数据集为例, 我们需要整理出训练集、验证集、测试集对应的原图和标注文件列表用于PaddleSeg训练即可。
-
-其中`DATASET.DATA_DIR`为数据根目录,文件列表的路径以数据集根目录作为相对路径起始点。
-
-```
-./cityscapes/ # 数据集根目录
-├── gtFine # 标注目录
-│ ├── test
-│ │ ├── berlin
-│ │ └── ...
-│ ├── train
-│ │ ├── aachen
-│ │ └── ...
-│ └── val
-│ ├── frankfurt
-│ └── ...
-└── leftImg8bit # 原图目录
- ├── test
- │ ├── berlin
- │ └── ...
- ├── train
- │ ├── aachen
- │ └── ...
- └── val
- ├── frankfurt
- └── ...
-```
+PaddleSeg采用通用的文件列表方式组织训练集、验证集和测试集。在训练、评估、可视化过程前必须准备好相应的文件列表。
文件列表组织形式如下
```
原始图片路径 [SEP] 标注图片路径
```
+其中`[SEP]`是文件路径分割符,可以在`DATASET.SEPARATOR`配置项中修改, 默认为空格。文件列表的路径以数据集根目录作为相对路径起始点,`DATASET.DATA_DIR`即为数据集根目录。
+
+如下图所示,左边为原图的图片路径,右边为图片对应的标注路径。
-其中`[SEP]`是文件路径分割符,可以在`DATASET.SEPARATOR`配置项中修改, 默认为空格。
+
**注意事项**
-* 务必保证分隔符在文件列表中每行只存在一次, 如文件名中存在空格,请使用'|'等文件名不可用字符进行切分
+* 务必保证分隔符在文件列表中每行只存在一次, 如文件名中存在空格,请使用"|"等文件名不可用字符进行切分
* 文件列表请使用**UTF-8**格式保存, PaddleSeg默认使用UTF-8编码读取file_list文件
-如下图所示,左边为原图的图片路径,右边为图片对应的标注路径。
-
-
-
若数据集缺少标注图片,则文件列表不用包含分隔符和标注图片路径,如下图所示。
+

**注意事项**
@@ -75,32 +83,14 @@ PaddleSeg采用通用的文件列表方式组织训练集、验证集和测试
不可在`DATASET.TRAIN_FILE_LIST`和`DATASET.VAL_FILE_LIST`配置项中使用。
-完整的配置信息可以参考[`./docs/annotation/cityscapes_demo`](../docs/annotation/cityscapes_demo/)目录下的yaml和文件列表。
+**符合规范的文件列表是什么样的呢?**
-### 文件列表生成
-PaddleSeg提供了生成文件列表的使用脚本,可适用于自定义数据集或cityscapes数据集,并支持通过不同的Flags来开启特定功能。
-```
-python pdseg/tools/create_dataset_list.py ${FLAGS}
-```
-运行后将在数据集根目录下生成训练/验证/测试集的文件列表(文件主名与`--second_folder`一致,扩展名为`.txt`)。
-
-**Note:** 若训练/验证/测试集缺少标注图片,仍可自动生成不含分隔符和标注图片路径的文件列表。
-
-#### 命令行FLAGS列表
+请参考目录[`./docs/annotation/cityscapes_demo`](../docs/annotation/cityscapes_demo/)。
-|FLAG|用途|默认值|参数数目|
-|-|-|-|-|
-|--type|指定数据集类型,`cityscapes`或`自定义`|`自定义`|1|
-|--separator|文件列表分隔符|'|'|1|
-|--folder|图片和标签集的文件夹名|'images' 'annotations'|2|
-|--second_folder|训练/验证/测试集的文件夹名|'train' 'val' 'test'|若干|
-|--format|图片和标签集的数据格式|'jpg' 'png'|2|
-|--postfix|按文件主名(无扩展名)是否包含指定后缀对图片和标签集进行筛选|'' ''(2个空字符)|2|
+### 数据集目录结构整理
-#### 使用示例
-- **对于自定义数据集**
+如果用户想要生成数据集的文件列表,需要整理成如下的目录结构(类似于Cityscapes数据集):
-如果用户想要生成自己数据集的文件列表,需要整理成如下的目录结构:
```
./dataset/ # 数据集根目录
├── annotations # 标注目录
@@ -125,9 +115,32 @@ python pdseg/tools/create_dataset_list.py ${FLAGS}
└── ...
Note:以上目录名可任意
```
-必须指定自定义数据集目录,可以按需要设定FLAG。
-**Note:** 无需指定`--type`。
+### 文件列表生成
+PaddleSeg提供了生成文件列表的使用脚本,可适用于自定义数据集或cityscapes数据集,并支持通过不同的Flags来开启特定功能。
+```
+python pdseg/tools/create_dataset_list.py ${FLAGS}
+```
+运行后将在数据集根目录下生成训练/验证/测试集的文件列表(文件主名与`--second_folder`一致,扩展名为`.txt`)。
+
+**Note:** 生成文件列表要求:要么原图和标注图片数量一致,要么只有原图,没有标注图片。若数据集缺少标注图片,仍可自动生成不含分隔符和标注图片路径的文件列表。
+
+#### 命令行FLAGS列表
+
+|FLAG|用途|默认值|参数数目|
+|-|-|-|-|
+|--type|指定数据集类型,`cityscapes`或`自定义`|`自定义`|1|
+|--separator|文件列表分隔符|"|"|1|
+|--folder|图片和标签集的文件夹名|"images" "annotations"|2|
+|--second_folder|训练/验证/测试集的文件夹名|"train" "val" "test"|若干|
+|--format|图片和标签集的数据格式|"jpg" "png"|2|
+|--postfix|按文件主名(无扩展名)是否包含指定后缀对图片和标签集进行筛选|"" ""(2个空字符)|2|
+
+#### 使用示例
+- **对于自定义数据集**
+
+若您已经按上述说明整理好了数据集目录结构,可以运行下面的命令生成文件列表。
+
```
# 生成文件列表,其分隔符为空格,图片和标签集的数据格式都为png
python pdseg/tools/create_dataset_list.py --separator " " --format png png
@@ -137,22 +150,26 @@ python pdseg/tools/create_dataset_list.py --separator " " --f
python pdseg/tools/create_dataset_list.py \
--folder img gt --second_folder training validation
```
-
+**Note:** 必须指定自定义数据集目录,可以按需要设定FLAG。无需指定`--type`。
- **对于cityscapes数据集**
+若您使用的是cityscapes数据集,可以运行下面的命令生成文件列表。
+
+```
+# 生成cityscapes文件列表,其分隔符为逗号
+python pdseg/tools/create_dataset_list.py --type cityscapes --separator ","
+```
+**Note:**
+
必须指定cityscapes数据集目录,`--type`必须为`cityscapes`。
在cityscapes类型下,部分FLAG将被重新设定,无需手动指定,具体如下:
|FLAG|固定值|
|-|-|
-|--folder|'leftImg8bit' 'gtFine'|
-|--format|'png' 'png'|
-|--postfix|'_leftImg8bit' '_gtFine_labelTrainIds'|
+|--folder|"leftImg8bit" "gtFine"|
+|--format|"png" "png"|
+|--postfix|"_leftImg8bit" "_gtFine_labelTrainIds"|
其余FLAG可以按需要设定。
-```
-# 生成cityscapes文件列表,其分隔符为逗号
-python pdseg/tools/create_dataset_list.py --type cityscapes --separator ","
-```
diff --git a/docs/imgs/annotation/image-11.png b/docs/imgs/annotation/image-11.png
new file mode 100644
index 0000000000000000000000000000000000000000..2e3b6ff1f1ffd33fb57a35b547bcce31ca248e19
Binary files /dev/null and b/docs/imgs/annotation/image-11.png differ
diff --git a/docs/imgs/annotation/image-7.png b/docs/imgs/annotation/image-7.png
index b65d56e92b2b5c1633f5c3168eee2971b476e8f3..7c24ca50361e0f602bc5a603e6377af021dbb63d 100644
Binary files a/docs/imgs/annotation/image-7.png and b/docs/imgs/annotation/image-7.png differ
diff --git a/docs/imgs/annotation/jingling-5.png b/docs/imgs/annotation/jingling-5.png
index 59a15567a3e25df338a3577fe9a9035c5bd0c719..5106559099570140fe91a94e2cdffffe2fdbdaca 100644
Binary files a/docs/imgs/annotation/jingling-5.png and b/docs/imgs/annotation/jingling-5.png differ
diff --git a/docs/imgs/deeplabv3p.png b/docs/imgs/deeplabv3p.png
index c0f12db6474e28f68ea45aa498026ef5261bcbe9..ba754f3e8b75c49630a96d4cd9fcb4aa45d6e5bd 100644
Binary files a/docs/imgs/deeplabv3p.png and b/docs/imgs/deeplabv3p.png differ
diff --git a/docs/imgs/dice.png b/docs/imgs/dice.png
new file mode 100644
index 0000000000000000000000000000000000000000..56f443dfade0a02240dad61d6554a23c91213bb5
Binary files /dev/null and b/docs/imgs/dice.png differ
diff --git a/docs/imgs/dice1.png b/docs/imgs/dice1.png
deleted file mode 100644
index f8520802296cc264849fae4a8442792cf56cb20a..0000000000000000000000000000000000000000
Binary files a/docs/imgs/dice1.png and /dev/null differ
diff --git a/docs/imgs/dice2.png b/docs/imgs/dice2.png
new file mode 100644
index 0000000000000000000000000000000000000000..37c3da1f1906421c0d3928ab18212a4d1a0966a0
Binary files /dev/null and b/docs/imgs/dice2.png differ
diff --git a/docs/imgs/dice3.png b/docs/imgs/dice3.png
new file mode 100644
index 0000000000000000000000000000000000000000..50b422385ee1e6b0cf7652ac63571652ce1d52ef
Binary files /dev/null and b/docs/imgs/dice3.png differ
diff --git a/docs/imgs/hrnet.png b/docs/imgs/hrnet.png
new file mode 100644
index 0000000000000000000000000000000000000000..a4733a7b7c62534f8cfc8f8cfeb4fe049d6dfba8
Binary files /dev/null and b/docs/imgs/hrnet.png differ
diff --git a/docs/imgs/icnet.png b/docs/imgs/icnet.png
index 7d9659db01bfb7a887f94b36fdaad303284deab7..125889691edcc5857d8e1322704dda652412d33f 100644
Binary files a/docs/imgs/icnet.png and b/docs/imgs/icnet.png differ
diff --git a/docs/imgs/pspnet.png b/docs/imgs/pspnet.png
new file mode 100644
index 0000000000000000000000000000000000000000..2963aeadb89aef05bfb19163f89d413d620c6564
Binary files /dev/null and b/docs/imgs/pspnet.png differ
diff --git a/docs/imgs/pspnet2.png b/docs/imgs/pspnet2.png
new file mode 100644
index 0000000000000000000000000000000000000000..401263a9b5fddc4c6ca77ef2dc172c7cb565c00f
Binary files /dev/null and b/docs/imgs/pspnet2.png differ
diff --git a/docs/imgs/softmax_loss.png b/docs/imgs/softmax_loss.png
new file mode 100644
index 0000000000000000000000000000000000000000..3c5cbbce470fe48ca5f500c59995776c2fbd5ec5
Binary files /dev/null and b/docs/imgs/softmax_loss.png differ
diff --git a/docs/imgs/tensorboard_image.JPG b/docs/imgs/tensorboard_image.JPG
index 2d5d0ceb001cb7fc9f68622842710afd9d032463..140aa2a0ed6a9b1a2d0a98477685b9e6d434a113 100644
Binary files a/docs/imgs/tensorboard_image.JPG and b/docs/imgs/tensorboard_image.JPG differ
diff --git a/docs/imgs/tensorboard_scalar.JPG b/docs/imgs/tensorboard_scalar.JPG
index 2de89c32a3469764631352597f0e55f8a431ad4b..322c98dc8ba7e5ca96477f3dbe193a70a8cf4609 100644
Binary files a/docs/imgs/tensorboard_scalar.JPG and b/docs/imgs/tensorboard_scalar.JPG differ
diff --git a/docs/imgs/unet.png b/docs/imgs/unet.png
index 960f289321a9a6b894d3054ec4f257a36cb8969e..5a7b691ae54f9fe29dded913d8e6f6cacac494f7 100644
Binary files a/docs/imgs/unet.png and b/docs/imgs/unet.png differ
diff --git a/docs/imgs/usage_vis_demo.jpg b/docs/imgs/usage_vis_demo.jpg
index 50bedf2f547d11cb4aaefa0435022acc0392ba3c..40b35f13418e7c68e0bfaabf992d8411bd87bc77 100644
Binary files a/docs/imgs/usage_vis_demo.jpg and b/docs/imgs/usage_vis_demo.jpg differ
diff --git a/docs/imgs/usage_vis_demo2.jpg b/docs/imgs/usage_vis_demo2.jpg
deleted file mode 100644
index 9665e9e2f4d90d6db75411d43d0dc5a34d8b28e7..0000000000000000000000000000000000000000
Binary files a/docs/imgs/usage_vis_demo2.jpg and /dev/null differ
diff --git a/docs/imgs/usage_vis_demo3.jpg b/docs/imgs/usage_vis_demo3.jpg
deleted file mode 100644
index 318c06bcf7debf76b7bff504648df056802130df..0000000000000000000000000000000000000000
Binary files a/docs/imgs/usage_vis_demo3.jpg and /dev/null differ
diff --git a/docs/installation.md b/docs/installation.md
deleted file mode 100644
index 80cc341bb8764065dc7fd871e81fdb31225d636a..0000000000000000000000000000000000000000
--- a/docs/installation.md
+++ /dev/null
@@ -1,44 +0,0 @@
-# PaddleSeg 安装说明
-
-## 1. 安装PaddlePaddle
-
-版本要求
-* PaddlePaddle >= 1.6.1
-* Python 2.7 or 3.5+
-
-更多详细安装信息如CUDA版本、cuDNN版本等兼容信息请查看[PaddlePaddle安装](https://www.paddlepaddle.org.cn/install/doc/index)
-
-### pip安装
-
-由于图像分割模型计算开销大,推荐在GPU版本的PaddlePaddle下使用PaddleSeg.
-
-```
-pip install paddlepaddle-gpu
-```
-
-### Conda安装
-
-PaddlePaddle最新版本1.5支持Conda安装,可以减少相关依赖安装成本,conda相关使用说明可以参考[Anaconda](https://www.anaconda.com/distribution/)
-
-```
-conda install -c paddle paddlepaddle-gpu cudatoolkit=9.0
-```
-
- * 如果有多卡训练需求,请安装 NVIDIA NCCL >= 2.4.7,并在Linux环境下运行
-
-更多安装方式详情可以查看 [PaddlePaddle安装说明](https://www.paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/install/index_cn.html)
-
-
-## 2. 下载PaddleSeg代码
-
-```
-git clone https://github.com/PaddlePaddle/PaddleSeg
-```
-
-
-## 3. 安装PaddleSeg依赖
-
-```
-cd PaddleSeg
-pip install -r requirements.txt
-```
diff --git a/docs/loss_select.md b/docs/loss_select.md
index 454085c9c22a5c3308c77c93c961628b53157042..6749979821de5cd7387f3161e0a2bd25a9f02e4e 100644
--- a/docs/loss_select.md
+++ b/docs/loss_select.md
@@ -1,41 +1,66 @@
-# dice loss解决二分类中样本不均衡问题
+# 如何解决二分类中类别不均衡问题
+对于二类图像分割任务中,经常出现类别分布不均匀的情况,例如:工业产品的瑕疵检测、道路提取及病变区域提取等。
+
+目前PaddleSeg提供了三种loss函数,分别为softmax loss(sotfmax with cross entroy loss)、dice loss(dice coefficient loss)和bce loss(binary cross entroy loss). 我们可使用dice loss解决这个问题。
+
+注:dice loss和bce loss仅支持二分类。
+
+## Dice loss
+Dice loss的定义如下:
-对于二类图像分割任务中,往往存在类别分布不均的情况,如:瑕疵检测,道路提取及病变区域提取等等。
-在DeepGlobe比赛的Road Extraction中,训练数据道路占比为:%4.5。如下为其图片样例:
-
+
-可以看出道路在整张图片中的比例很小。
-
-## 数据集下载
-我们从DeepGlobe比赛的Road Extraction的训练集中随机抽取了800张图片作为训练集,200张图片作为验证集,
-制作了一个小型的道路提取数据集[MiniDeepGlobeRoadExtraction](https://paddleseg.bj.bcebos.com/dataset/MiniDeepGlobeRoadExtraction.zip)
-## softmax loss与dice loss
-在图像分割中,softmax loss(sotfmax with cross entroy loss)同等的对待每一像素,因此当背景占据绝大部分的情况下,
-网络将偏向于背景的学习,使网络对目标的提取能力变差。`dice loss(dice coefficient loss)`通过计算预测与标注之间的重叠部分计算损失函数,避免了类别不均衡带来的影响,能够取得更好的效果。
-在实际应用中`dice loss`往往与`bce loss(binary cross entroy loss)`结合使用,提高模型训练的稳定性。
+其中 Y 表示ground truth,P 表示预测结果。| |表示矩阵元素之和。 表示*Y*和*P*的共有元素数,
+实际通过求两者的逐像素乘积之和进行计算。例如:
+
+
+
+
+
+其中 1 表示前景,0 表示背景。
+
+**Note:** 在标注图片中,务必保证前景像素值为1,背景像素值为0.
-dice loss的定义如下:
+Dice系数请参见[维基百科](https://zh.wikipedia.org/wiki/Dice%E7%B3%BB%E6%95%B0)
-
+**为什么在类别不均衡问题上,dice loss效果比softmax loss更好?**
-其中  表示*Y*和*P*的共有元素数,
-实际计算通过求两者的乘积之和进行计算。如下所示:
+首先来看softmax loss的定义:
-
+
+
+其中 y 表示ground truth,p 表示网络输出。
+
+在图像分割中,`softmax loss`评估每一个像素点的类别预测,然后平均所有的像素点。这个本质上就是对图片上的每个像素进行平等的学习。这就造成了一个问题,如果在图像上的多种类别有不平衡的表征,那么训练会由最主流的类别主导。以上面DeepGlobe道路提取的数据为例子,网络将偏向于背景的学习,降低了网络对前景目标的提取能力。
+而`dice loss(dice coefficient loss)`通过预测和标注的交集除以它们的总体像素进行计算,它将一个类别的所有像素作为一个整体作为考量,而且计算交集在总体中的占比,所以不受大量背景像素的影响,能够取得更好的效果。
+
+在实际应用中`dice loss`往往与`bce loss(binary cross entroy loss)`结合使用,提高模型训练的稳定性。
-[dice系数详解](https://zh.wikipedia.org/wiki/Dice%E7%B3%BB%E6%95%B0)
## PaddleSeg指定训练loss
PaddleSeg通过`cfg.SOLVER.LOSS`参数可以选择训练时的损失函数,
如`cfg.SOLVER.LOSS=['dice_loss','bce_loss']`将指定训练loss为`dice loss`与`bce loss`的组合
-## 实验比较
+## Dice loss解决类别不均衡问题的示例
+
+我们以道路提取任务为例应用dice loss.
+在DeepGlobe比赛的Road Extraction中,训练数据道路占比为:4.5%. 如下为其图片样例:
+
+
+
+可以看出道路在整张图片中的比例很小。
+
+### 数据集下载
+我们从DeepGlobe比赛的Road Extraction的训练集中随机抽取了800张图片作为训练集,200张图片作为验证集,
+制作了一个小型的道路提取数据集[MiniDeepGlobeRoadExtraction](https://paddleseg.bj.bcebos.com/dataset/MiniDeepGlobeRoadExtraction.zip)
+
+### 实验比较
在MiniDeepGlobeRoadExtraction数据集进行了实验比较。
@@ -73,5 +98,4 @@ softmax loss和dice loss + bce loss实验结果如下图所示。
-
diff --git a/docs/model_zoo.md b/docs/model_zoo.md
index 7e625db73a5ae185b8db00e8dd6f04e26d4e11e5..8cd89fa41d6b7fc88759cf1250d88ec067755a6c 100644
--- a/docs/model_zoo.md
+++ b/docs/model_zoo.md
@@ -1,6 +1,7 @@
# PaddleSeg 预训练模型
-PaddleSeg对所有内置的分割模型都提供了公开数据集下的预训练模型,通过加载预训练模型后训练可以在自定义数据集中得到更稳定地效果。
+PaddleSeg对所有内置的分割模型都提供了公开数据集下的预训练模型。因为对于自定
+义数据集的场景,使用预训练模型进行训练可以得到更稳定地效果。用户可以根据模型类型、自己的数据集和预训练数据集的相似程度,选择并下载预训练模型。
## ImageNet预训练模型
@@ -32,6 +33,11 @@ PaddleSeg对所有内置的分割模型都提供了公开数据集下的预训
| HRNet_W48 | ImageNet | [hrnet_w48_imagenet.tar](https://paddleseg.bj.bcebos.com/models/hrnet_w48_imagenet.tar) | 78.95%/94.42% |
| HRNet_W64 | ImageNet | [hrnet_w64_imagenet.tar](https://paddleseg.bj.bcebos.com/models/hrnet_w64_imagenet.tar) | 79.30%/94.61% |
+| 模型 | 数据集合 | 下载地址 | Accuray Top1/5 Error |
+|---|---|---|---|
+| ResNet50(适配PSPNet) | ImageNet | [resnet50_v2_pspnet](https://paddleseg.bj.bcebos.com/resnet50_v2_pspnet.tgz)| -- |
+| ResNet101(适配PSPNet) | ImageNet | [resnet101_v2_pspnet](https://paddleseg.bj.bcebos.com/resnet101_v2_pspnet.tgz)| -- |
+
## COCO预训练模型
数据集为COCO实例分割数据集合转换成的语义分割数据集合
@@ -57,3 +63,6 @@ train数据集合为Cityscapes训练集合,测试为Cityscapes的验证集合
| PSPNet/bn | Cityscapes |[pspnet50_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/pspnet50_cityscapes.tgz) |16|false| 0.7013 |
| PSPNet/bn | Cityscapes |[pspnet101_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/pspnet101_cityscapes.tgz) |16|false| 0.7734 |
| HRNet_W18/bn | Cityscapes |[hrnet_w18_bn_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/hrnet_w18_bn_cityscapes.tgz) | 4 | false | 0.7936 |
+| Fast-SCNN/bn | Cityscapes |[fast_scnn_cityscapes.tar](https://paddleseg.bj.bcebos.com/models/fast_scnn_cityscape.tar) | 32 | false | 0.6964 |
+
+测试环境为python 3.7.3,v100,cudnn 7.6.2。
diff --git a/docs/models.md b/docs/models.md
index 680dfe87356db9dd6be181e003598d3eb8967ffe..a452aa3639c3901d8f75d1aa4f5f1b7f393ce0b7 100644
--- a/docs/models.md
+++ b/docs/models.md
@@ -1,56 +1,74 @@
# PaddleSeg 分割模型介绍
-### U-Net
-U-Net 起源于医疗图像分割,整个网络是标准的encoder-decoder网络,特点是参数少,计算快,应用性强,对于一般场景适应度很高。
+- [U-Net](#U-Net)
+- [DeepLabv3+](#DeepLabv3)
+- [PSPNet](#PSPNet)
+- [ICNet](#ICNet)
+- [HRNet](#HRNet)
+
+## U-Net
+U-Net [1] 起源于医疗图像分割,整个网络是标准的encoder-decoder网络,特点是参数少,计算快,应用性强,对于一般场景适应度很高。U-Net最早于2015年提出,并在ISBI 2015 Cell Tracking Challenge取得了第一。经过发展,目前有多个变形和应用。
+
+原始U-Net的结构如下图所示,由于网络整体结构类似于大写的英文字母U,故得名U-net。左侧可视为一个编码器,右侧可视为一个解码器。编码器有四个子模块,每个子模块包含两个卷积层,每个子模块之后通过max pool进行下采样。由于卷积使用的是valid模式,故实际输出比输入图像小一些。具体来说,后一个子模块的分辨率=(前一个子模块的分辨率-4)/2。U-Net使用了Overlap-tile 策略用于补全输入图像的上下信息,使得任意大小的输入图像都可获得无缝分割。同样解码器也包含四个子模块,分辨率通过上采样操作依次上升,直到与输入图像的分辨率基本一致。该网络还使用了跳跃连接,以拼接的方式将解码器和编码器中相同分辨率的feature map进行特征融合,帮助解码器更好地恢复目标的细节。
+

-### DeepLabv3+
+## DeepLabv3+
-DeepLabv3+ 是DeepLab系列的最后一篇文章,其前作有 DeepLabv1,DeepLabv2, DeepLabv3,
-在最新作中,DeepLab的作者通过encoder-decoder进行多尺度信息的融合,同时保留了原来的空洞卷积和ASSP层,
-其骨干网络使用了Xception模型,提高了语义分割的健壮性和运行速率,在 PASCAL VOC 2012 dataset取得新的state-of-art performance,89.0mIOU。
+DeepLabv3+ [2] 是DeepLab系列的最后一篇文章,其前作有 DeepLabv1, DeepLabv2, DeepLabv3.
+在最新作中,作者通过encoder-decoder进行多尺度信息的融合,以优化分割效果,尤其是目标边缘的效果。
+并且其使用了Xception模型作为骨干网络,并将深度可分离卷积(depthwise separable convolution)应用到atrous spatial pyramid pooling(ASPP)中和decoder模块,提高了语义分割的健壮性和运行速率,在 PASCAL VOC 2012 和 Cityscapes 数据集上取得新的state-of-art performance.

-在PaddleSeg当前实现中,支持两种分类Backbone网络的切换
+在PaddleSeg当前实现中,支持两种分类Backbone网络的切换:
-- MobileNetv2:
+- MobileNetv2
适用于移动设备的快速网络,如果对分割性能有较高的要求,请使用这一backbone网络。
-- Xception:
+- Xception
DeepLabv3+原始实现的backbone网络,兼顾了精度和性能,适用于服务端部署。
+## PSPNet
+
+Pyramid Scene Parsing Network (PSPNet) [3] 起源于场景解析(Scene Parsing)领域。如下图所示,普通FCN [4] 面向复杂场景出现三种误分割现象:(1)关系不匹配。将船误分类成车,显然车一般不会出现在水面上。(2)类别混淆。摩天大厦和建筑物这两个类别相近,误将摩天大厦分类成建筑物。(3)类别不显著。枕头区域较小且纹理与床相近,误将枕头分类成床。
+
+
-### ICNet
+PSPNet的出发点是在算法中引入更多的上下文信息来解决上述问题。为了融合了图像中不同区域的上下文信息,PSPNet通过特殊设计的全局均值池化操作(global average pooling)和特征融合构造金字塔池化模块 (Pyramid Pooling Module)。PSPNet最终获得了2016年ImageNet场景解析挑战赛的冠军,并在PASCAL VOC 2012 和 Cityscapes 数据集上取得当时的最佳效果。整个网络结构如下:
-Image Cascade Network(ICNet)主要用于图像实时语义分割。相较于其它压缩计算的方法,ICNet即考虑了速度,也考虑了准确性。 ICNet的主要思想是将输入图像变换为不同的分辨率,然后用不同计算复杂度的子网络计算不同分辨率的输入,然后将结果合并。ICNet由三个子网络组成,计算复杂度高的网络处理低分辨率输入,计算复杂度低的网络处理分辨率高的网络,通过这种方式在高分辨率图像的准确性和低复杂度网络的效率之间获得平衡。
+
+
+
+## ICNet
+
+Image Cascade Network(ICNet) [5] 是一个基于PSPNet的语义分割网络,设计目的是减少PSPNet推断时期的耗时。ICNet主要用于图像实时语义分割。ICNet由三个不同分辨率的子网络组成,将输入图像变换为不同的分辨率,随后使用计算复杂度高的网络处理低分辨率输入,计算复杂度低的网络处理分辨率高的网络,通过这种方式在高分辨率图像的准确性和低复杂度网络的效率之间获得平衡。并在PSPNet的基础上引入级联特征融合单元(cascade feature fusion unit),实现快速且高质量的分割模型。
整个网络结构如下:

-## 参考
+### HRNet
-- [Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1802.02611)
+High-Resolution Network (HRNet) [6] 在整个训练过程中始终维持高分辨率表示。
+HRNet具有两个特点:(1)从高分辨率到低分辨率并行连接各子网络,(2)反复交换跨分辨率子网络信息。这两个特点使HRNet网络能够学习到更丰富的语义信息和细节信息。
+HRNet在人体姿态估计、语义分割和目标检测领域都取得了显著的性能提升。
-- [U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/abs/1505.04597)
-
-- [ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545)
+整个网络结构如下:
-# PaddleSeg特殊网络结构介绍
+
-### Group Norm
+## 参考文献
-
-关于Group Norm的介绍可以参考论文:https://arxiv.org/abs/1803.08494
+[1] [U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/abs/1505.04597)
-GN 把通道分为组,并计算每一组之内的均值和方差,以进行归一化。GN 的计算与批量大小无关,其精度也在各种批量大小下保持稳定。适应于网络参数很重的模型,比如deeplabv3+这种,可以在一个小batch下取得一个较好的训练效果。
+[2] [Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1802.02611)
+[3] [Pyramid Scene Parsing Network](https://arxiv.org/abs/1612.01105)
-### Synchronized Batch Norm
+[4] [Fully Convolutional Networks for Semantic Segmentation](https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf)
-Synchronized Batch Norm跨GPU批归一化策略最早在[MegDet: A Large Mini-Batch Object Detector](https://arxiv.org/abs/1711.07240)
-论文中提出,在[Bag of Freebies for Training Object Detection Neural Networks](https://arxiv.org/pdf/1902.04103.pdf)论文中以Yolov3验证了这一策略的有效性,[PaddleCV/yolov3](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/yolov3)实现了这一系列策略并比Darknet框架版本在COCO17数据上mAP高5.9.
+[5] [ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545)
-PaddleSeg基于PaddlePaddle框架的sync_batch_norm策略,可以支持通过多卡实现大batch size的分割模型训练,可以得到更高的mIoU精度。
+[6] [Deep High-Resolution Representation Learning for Visual Recognition](https://arxiv.org/abs/1908.07919)
diff --git a/docs/multiple_gpus_train_and_mixed_precision_train.md b/docs/multiple_gpus_train_and_mixed_precision_train.md
index 7826d88171bec71cba7ae2db9327ce3dfd47efd9..206a9409d0326ee6d4cd7c07569e7698f7d9c469 100644
--- a/docs/multiple_gpus_train_and_mixed_precision_train.md
+++ b/docs/multiple_gpus_train_and_mixed_precision_train.md
@@ -4,7 +4,7 @@
* PaddlePaddle >= 1.6.1
* NVIDIA NCCL >= 2.4.7
-环境配置,数据,预训练模型准备等工作请参考[安装说明](./installation.md),[PaddleSeg使用说明](./usage.md)
+环境配置,数据,预训练模型准备等工作请参考[PaddleSeg使用说明](./usage.md)
### 多进程训练示例
diff --git a/docs/usage.md b/docs/usage.md
index e38d16e047b4b97a71278b1ba17682d20c4586ee..6da85a2de7b8be220e955a9e20a351c2d306b489 100644
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -1,98 +1,74 @@
-# 训练/评估/可视化
+# PaddleSeg快速入门
-PaddleSeg提供了 **训练**/**评估**/**可视化** 等三个功能的使用脚本。三个脚本都支持通过不同的Flags来开启特定功能,也支持通过Options来修改默认的[训练配置](./config.md)。三者的使用方式非常接近,如下:
+本教程通过一个简单的示例,说明如何基于PaddleSeg启动训练(训练可视化)、评估和可视化。我们选择基于COCO数据集预训练的unet模型作为预训练模型,以一个眼底医疗分割数据集为例。
-```shell
-# 训练
-python pdseg/train.py ${FLAGS} ${OPTIONS}
-# 评估
-python pdseg/eval.py ${FLAGS} ${OPTIONS}
-# 可视化
-python pdseg/vis.py ${FLAGS} ${OPTIONS}
-```
-
-**Note:**
-
-* FLAGS必须位于OPTIONS之前,否会将会遇到报错,例如如下的例子:
-
-```shell
-# FLAGS "--cfg configs/cityscapes.yaml" 必须在 OPTIONS "BATCH_SIZE 1" 之前
-python pdseg/train.py BATCH_SIZE 1 --cfg configs/cityscapes.yaml
-```
-
-## 命令行FLAGS列表
-
-|FLAG|支持脚本|用途|默认值|备注|
-|-|-|-|-|-|
-|--cfg|ALL|配置文件路径|None||
-|--use_gpu|ALL|是否使用GPU进行训练|False||
-|--use_mpio|train/eval|是否使用多进程进行IO处理|False|打开该开关会占用一定量的CPU内存,但是可以提高训练速度。 **NOTE:** windows平台下不支持该功能, 建议使用自定义数据初次训练时不打开,打开会导致数据读取异常不可见。 |
-|--use_tb|train|是否使用TensorBoard记录训练数据|False||
-|--log_steps|train|训练日志的打印周期(单位为step)|10||
-|--debug|train|是否打印debug信息|False|IOU等指标涉及到混淆矩阵的计算,会降低训练速度|
-|--tb_log_dir|train|TensorBoard的日志路径|None||
-|--do_eval|train|是否在保存模型时进行效果评估|False||
-|--vis_dir|vis|保存可视化图片的路径|"visual"||
-|--also_save_raw_results|vis|是否保存原始的预测图片|False||
-
-## OPTIONS
-
-详见[训练配置](./config.md)
+- [1.准备工作](#1准备工作)
+- [2.下载待训练数据](#2下载待训练数据)
+- [3.下载预训练模型](#3下载预训练模型)
+- [4.模型训练](#4模型训练)
+- [5.训练过程可视化](#5训练过程可视化)
+- [6.模型评估](#6模型评估)
+- [7.模型可视化](#7模型可视化)
+- [在线体验](#在线体验)
-## 使用示例
-下面通过一个简单的示例,说明如何基于PaddleSeg提供的预训练模型启动训练。我们选择基于COCO数据集预训练的unet模型作为预训练模型,在一个Oxford-IIIT Pet数据集上进行训练。
-**Note:** 为了快速体验,我们使用Oxford-IIIT Pet做了一个小型数据集,后续数据都使用该小型数据集。
-### 准备工作
+## 1.准备工作
在开始教程前,请先确认准备工作已经完成:
1. 正确安装了PaddlePaddle
2. PaddleSeg相关依赖已经安装
-如果有不确认的地方,请参考[安装说明](./installation.md)
+如果有不确认的地方,请参考[首页安装说明](../README.md#安装)
+
+## 2.下载待训练数据
+
+
+
+我们提前准备好了一份眼底医疗分割数据集--视盘分割(optic disc segmentation),包含267张训练图片、76张验证图片、38张测试图片。通过以下命令进行下载:
-### 下载预训练模型
```shell
-# 下载预训练模型并进行解压
-python pretrained_model/download_model.py unet_bn_coco
+# 下载待训练数据集
+python dataset/download_optic.py
```
-### 下载Oxford-IIIT Pet数据集
-我们使用了Oxford-IIIT中的猫和狗两个类别数据制作了一个小数据集mini_pet,用于快速体验。
-更多关于数据集的介绍情参考[Oxford-IIIT Pet](https://www.robots.ox.ac.uk/~vgg/data/pets/)
+## 3.下载预训练模型
```shell
# 下载预训练模型并进行解压
-python dataset/download_pet.py
+python pretrained_model/download_model.py unet_bn_coco
```
-### 模型训练
+## 4.模型训练
-为了方便体验,我们在configs目录下放置了mini_pet所对应的配置文件`unet_pet.yaml`,可以通过`--cfg`指向该文件来设置训练配置。
+为了方便体验,我们在configs目录下放置了配置文件`unet_optic.yaml`,可以通过`--cfg`指向该文件来设置训练配置。
-我们选择GPU 0号卡进行训练,这可以通过环境变量`CUDA_VISIBLE_DEVICES`来指定。
+可以通过环境变量`CUDA_VISIBLE_DEVICES`来指定GPU卡号。
```
+# 指定GPU卡号(以0号卡为例)
export CUDA_VISIBLE_DEVICES=0
-python pdseg/train.py --use_gpu \
+# 训练
+python pdseg/train.py --cfg configs/unet_optic.yaml \
+ --use_gpu \
--do_eval \
--use_tb \
--tb_log_dir train_log \
- --cfg configs/unet_pet.yaml \
BATCH_SIZE 4 \
- TRAIN.PRETRAINED_MODEL_DIR pretrained_model/unet_bn_coco \
- SOLVER.LR 5e-5
+ SOLVER.LR 0.001
+
+```
+若需要使用多块GPU,以0、1、2号卡为例,可输入
+```
+export CUDA_VISIBLE_DEVICES=0,1,2
```
**NOTE:**
-* 上述示例中,一共存在三套配置方案: PaddleSeg默认配置/unet_pet.yaml/OPTIONS,三者的优先级顺序为 OPTIONS > yaml > 默认配置。这个原则对于train.py/eval.py/vis.py都适用
-
-* 如果发现因为内存不足而Crash。请适当调低BATCH_SIZE。如果本机GPU内存充足,则可以调高BATCH_SIZE的大小以获得更快的训练速度,BATCH_SIZE增大时,可以适当调高学习率。
+* 如果发现因为内存不足而Crash。请适当调低`BATCH_SIZE`。如果本机GPU内存充足,则可以调高`BATCH_SIZE`的大小以获得更快的训练速度,`BATCH_SIZE`增大时,可以适当调高学习率`SOLVER.LR`.
* 如果在Linux系统下训练,可以使用`--use_mpio`使用多进程I/O,通过提升数据增强的处理速度进而大幅度提升GPU利用率。
-### 训练过程可视化
+## 5.训练过程可视化
当打开do_eval和use_tb两个开关后,我们可以通过TensorBoard查看边训练边评估的效果。
@@ -101,40 +77,42 @@ tensorboard --logdir train_log --host {$HOST_IP} --port {$PORT}
```
NOTE:
-1. 上述示例中,$HOST\_IP为机器IP地址,请替换为实际IP,$PORT请替换为可访问的端口
-2. 数据量较大时,前端加载速度会比较慢,请耐心等待
+1. 上述示例中,$HOST\_IP为机器IP地址,请替换为实际IP,$PORT请替换为可访问的端口。
+2. 数据量较大时,前端加载速度会比较慢,请耐心等待。
-启动TensorBoard命令后,我们可以在浏览器中查看对应的训练数据
-在`SCALAR`这个tab中,查看训练loss、iou、acc的变化趋势
+启动TensorBoard命令后,我们可以在浏览器中查看对应的训练数据。
+在`SCALAR`这个tab中,查看训练loss、iou、acc的变化趋势。

-在`IMAGE`这个tab中,查看样本的预测情况
+在`IMAGE`这个tab中,查看样本图片。

-### 模型评估
-训练完成后,我们可以通过eval.py来评估模型效果。由于我们设置的训练EPOCH数量为100,保存间隔为10,因此一共会产生10个定期保存的模型,加上最终保存的final模型,一共有11个模型。我们选择最后保存的模型进行效果的评估:
+## 6.模型评估
+训练完成后,我们可以通过eval.py来评估模型效果。由于我们设置的训练EPOCH数量为10,保存间隔为5,因此一共会产生2个定期保存的模型,加上最终保存的final模型,一共有3个模型。我们选择最后保存的模型进行效果的评估:
```shell
python pdseg/eval.py --use_gpu \
- --cfg configs/unet_pet.yaml \
- TEST.TEST_MODEL saved_model/unet_pet/final
+ --cfg configs/unet_optic.yaml \
+ TEST.TEST_MODEL saved_model/unet_optic/final
```
-可以看到,在经过训练后,模型在验证集上的mIoU指标达到了0.70+(由于随机种子等因素的影响,效果会有小范围波动,属于正常情况)。
+可以看到,在经过训练后,模型在验证集上的mIoU指标达到了0.85+(由于随机种子等因素的影响,效果会有小范围波动,属于正常情况)。
-### 模型可视化
-通过vis.py来评估模型效果,我们选择最后保存的模型进行效果的评估:
+## 7.模型可视化
+通过vis.py进行测试和可视化,以选择最后保存的模型进行测试为例:
```shell
python pdseg/vis.py --use_gpu \
- --cfg configs/unet_pet.yaml \
- TEST.TEST_MODEL saved_model/unet_pet/final
+ --cfg configs/unet_optic.yaml \
+ TEST.TEST_MODEL saved_model/unet_optic/final
```
-执行上述脚本后,会在主目录下产生一个visual/visual_results文件夹,里面存放着测试集图片的预测结果,我们选择其中几张图片进行查看,可以看到,在测试集中的图片上的预测效果已经很不错:
+执行上述脚本后,会在主目录下产生一个visual文件夹,里面存放着测试集图片的预测结果,我们选择其中1张图片进行查看:

-
-
`NOTE`
-1. 可视化的图片会默认保存在visual/visual_results目录下,可以通过`--vis_dir`来指定输出目录
-2. 训练过程中会使用DATASET.VIS_FILE_LIST中的图片进行可视化显示,而vis.py则会使用DATASET.TEST_FILE_LIST
+1. 可视化的图片会默认保存在visual目录下,可以通过`--vis_dir`来指定输出目录。
+2. 训练过程中会使用`DATASET.VIS_FILE_LIST`中的图片进行可视化显示,而vis.py则会使用`DATASET.TEST_FILE_LIST`.
+
+## 在线体验
+
+PaddleSeg在AI Studio平台上提供了在线体验的快速入门教程,欢迎[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/100798)
diff --git a/pdseg/__init__.py b/pdseg/__init__.py
index 7f051e1e16ed29046c6ea46e341d62e4280f412d..e1cb8ed082023155b95e6b6778b797a571b20ca8 100644
--- a/pdseg/__init__.py
+++ b/pdseg/__init__.py
@@ -14,3 +14,4 @@
# limitations under the License.
import models
import utils
+from . import tools
\ No newline at end of file
diff --git a/pdseg/data_aug.py b/pdseg/data_aug.py
index 15186150a3734a3a0c026386a04206ac036c7858..ae976bf7e4bb6751ba6ec4186a137cbf5644ce84 100644
--- a/pdseg/data_aug.py
+++ b/pdseg/data_aug.py
@@ -327,7 +327,7 @@ def random_jitter(cv_img, saturation_range, brightness_range, contrast_range):
brightness_ratio = np.random.uniform(-brightness_range, brightness_range)
contrast_ratio = np.random.uniform(-contrast_range, contrast_range)
- order = [1, 2, 3]
+ order = [0, 1, 2]
np.random.shuffle(order)
for i in range(3):
@@ -368,7 +368,7 @@ def hsv_color_jitter(crop_img,
def rand_crop(crop_img, crop_seg, mode=ModelPhase.TRAIN):
"""
- 随机裁剪图片和标签图, 若crop尺寸大于原始尺寸,分别使用均值和ignore值填充再进行crop,
+ 随机裁剪图片和标签图, 若crop尺寸大于原始尺寸,分别使用DATASET.PADDING_VALUE值和DATASET.IGNORE_INDEX值填充再进行crop,
crop尺寸与原始尺寸一致,返回原图,crop尺寸小于原始尺寸直接crop
Args:
diff --git a/pdseg/loss.py b/pdseg/loss.py
index 36ba43b27fca957a31f9ba68160f66792686c619..14f1b3794b6c8a15f4da5cf2a838ab7339eeffc4 100644
--- a/pdseg/loss.py
+++ b/pdseg/loss.py
@@ -20,7 +20,7 @@ import importlib
from utils.config import cfg
-def softmax_with_loss(logit, label, ignore_mask=None, num_classes=2):
+def softmax_with_loss(logit, label, ignore_mask=None, num_classes=2, weight=None):
ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
label = fluid.layers.elementwise_min(
label, fluid.layers.assign(np.array([num_classes - 1], dtype=np.int32)))
@@ -29,12 +29,40 @@ def softmax_with_loss(logit, label, ignore_mask=None, num_classes=2):
label = fluid.layers.reshape(label, [-1, 1])
label = fluid.layers.cast(label, 'int64')
ignore_mask = fluid.layers.reshape(ignore_mask, [-1, 1])
-
- loss, probs = fluid.layers.softmax_with_cross_entropy(
- logit,
- label,
- ignore_index=cfg.DATASET.IGNORE_INDEX,
- return_softmax=True)
+ if weight is None:
+ loss, probs = fluid.layers.softmax_with_cross_entropy(
+ logit,
+ label,
+ ignore_index=cfg.DATASET.IGNORE_INDEX,
+ return_softmax=True)
+ else:
+ label_one_hot = fluid.layers.one_hot(input=label, depth=num_classes)
+ if isinstance(weight, list):
+ assert len(weight) == num_classes, "weight length must equal num of classes"
+ weight = fluid.layers.assign(np.array([weight], dtype='float32'))
+ elif isinstance(weight, str):
+ assert weight.lower() == 'dynamic', 'if weight is string, must be dynamic!'
+ tmp = []
+ total_num = fluid.layers.cast(fluid.layers.shape(label)[0], 'float32')
+ for i in range(num_classes):
+ cls_pixel_num = fluid.layers.reduce_sum(label_one_hot[:, i])
+ ratio = total_num / (cls_pixel_num + 1)
+ tmp.append(ratio)
+ weight = fluid.layers.concat(tmp)
+ weight = weight / fluid.layers.reduce_sum(weight) * num_classes
+ elif isinstance(weight, fluid.layers.Variable):
+ pass
+ else:
+ raise ValueError('Expect weight is a list, string or Variable, but receive {}'.format(type(weight)))
+ weight = fluid.layers.reshape(weight, [1, num_classes])
+ weighted_label_one_hot = fluid.layers.elementwise_mul(label_one_hot, weight)
+ probs = fluid.layers.softmax(logit)
+ loss = fluid.layers.cross_entropy(
+ probs,
+ weighted_label_one_hot,
+ soft_label=True,
+ ignore_index=cfg.DATASET.IGNORE_INDEX)
+ weighted_label_one_hot.stop_gradient = True
loss = loss * ignore_mask
avg_loss = fluid.layers.mean(loss) / fluid.layers.mean(ignore_mask)
@@ -43,6 +71,7 @@ def softmax_with_loss(logit, label, ignore_mask=None, num_classes=2):
ignore_mask.stop_gradient = True
return avg_loss
+
# to change, how to appicate ignore index and ignore mask
def dice_loss(logit, label, ignore_mask=None, epsilon=0.00001):
if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1:
@@ -65,6 +94,7 @@ def dice_loss(logit, label, ignore_mask=None, epsilon=0.00001):
ignore_mask.stop_gradient = True
return fluid.layers.reduce_mean(dice_score)
+
def bce_loss(logit, label, ignore_mask=None):
if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1:
raise Exception("bce loss is only applicable to binary classfication")
@@ -80,20 +110,22 @@ def bce_loss(logit, label, ignore_mask=None):
return loss
-def multi_softmax_with_loss(logits, label, ignore_mask=None, num_classes=2):
+def multi_softmax_with_loss(logits, label, ignore_mask=None, num_classes=2, weight=None):
if isinstance(logits, tuple):
avg_loss = 0
for i, logit in enumerate(logits):
- logit_label = fluid.layers.resize_nearest(label, logit.shape[2:])
- logit_mask = (logit_label.astype('int32') !=
+ if label.shape[2] != logit.shape[2] or label.shape[3] != logit.shape[3]:
+ label = fluid.layers.resize_nearest(label, logit.shape[2:])
+ logit_mask = (label.astype('int32') !=
cfg.DATASET.IGNORE_INDEX).astype('int32')
- loss = softmax_with_loss(logit, logit_label, logit_mask,
+ loss = softmax_with_loss(logit, label, logit_mask,
num_classes)
avg_loss += cfg.MODEL.MULTI_LOSS_WEIGHT[i] * loss
else:
- avg_loss = softmax_with_loss(logits, label, ignore_mask, num_classes)
+ avg_loss = softmax_with_loss(logits, label, ignore_mask, num_classes, weight=weight)
return avg_loss
+
def multi_dice_loss(logits, label, ignore_mask=None):
if isinstance(logits, tuple):
avg_loss = 0
@@ -107,6 +139,7 @@ def multi_dice_loss(logits, label, ignore_mask=None):
avg_loss = dice_loss(logits, label, ignore_mask)
return avg_loss
+
def multi_bce_loss(logits, label, ignore_mask=None):
if isinstance(logits, tuple):
avg_loss = 0
diff --git a/pdseg/models/__init__.py b/pdseg/models/__init__.py
index f2a9093490fc284154c8e09dc5c58e638c567d26..f1465913991c5aaffefff26c1f5a5d668edd1596 100644
--- a/pdseg/models/__init__.py
+++ b/pdseg/models/__init__.py
@@ -14,5 +14,3 @@
# limitations under the License.
import models.modeling
-import models.libs
-import models.backbone
diff --git a/pdseg/models/backbone/vgg.py b/pdseg/models/backbone/vgg.py
new file mode 100644
index 0000000000000000000000000000000000000000..7e9df0a66cd85b291aad8846eed30c9bb7b4e947
--- /dev/null
+++ b/pdseg/models/backbone/vgg.py
@@ -0,0 +1,81 @@
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import paddle
+import paddle.fluid as fluid
+from paddle.fluid import ParamAttr
+
+__all__ = ["VGGNet"]
+
+
+def check_points(count, points):
+ if points is None:
+ return False
+ else:
+ if isinstance(points, list):
+ return (True if count in points else False)
+ else:
+ return (True if count == points else False)
+
+
+class VGGNet():
+ def __init__(self, layers=16):
+ self.layers = layers
+
+ def net(self, input, class_dim=1000, end_points=None, decode_points=None):
+ short_cuts = dict()
+ layers_count = 0
+ layers = self.layers
+ vgg_spec = {
+ 11: ([1, 1, 2, 2, 2]),
+ 13: ([2, 2, 2, 2, 2]),
+ 16: ([2, 2, 3, 3, 3]),
+ 19: ([2, 2, 4, 4, 4])
+ }
+ assert layers in vgg_spec.keys(), \
+ "supported layers are {} but input layer is {}".format(vgg_spec.keys(), layers)
+
+ nums = vgg_spec[layers]
+ channels = [64, 128, 256, 512, 512]
+ conv = input
+ for i in range(len(nums)):
+ conv = self.conv_block(conv, channels[i], nums[i], name="conv" + str(i + 1) + "_")
+ layers_count += nums[i]
+ if check_points(layers_count, decode_points):
+ short_cuts[layers_count] = conv
+ if check_points(layers_count, end_points):
+ return conv, short_cuts
+
+ return conv
+
+ def conv_block(self, input, num_filter, groups, name=None):
+ conv = input
+ for i in range(groups):
+ conv = fluid.layers.conv2d(
+ input=conv,
+ num_filters=num_filter,
+ filter_size=3,
+ stride=1,
+ padding=1,
+ act='relu',
+ param_attr=fluid.param_attr.ParamAttr(
+ name=name + str(i + 1) + "_weights"),
+ bias_attr=False)
+ return fluid.layers.pool2d(
+ input=conv, pool_size=2, pool_type='max', pool_stride=2)
diff --git a/pdseg/models/libs/model_libs.py b/pdseg/models/libs/model_libs.py
index 19afe54224f259cbd98c189d6bc7196138ed8863..84494a9dd892105c799119c7a467b584c23f4241 100644
--- a/pdseg/models/libs/model_libs.py
+++ b/pdseg/models/libs/model_libs.py
@@ -164,3 +164,37 @@ def separate_conv(input, channel, stride, filter, dilation=1, act=None):
input = bn(input)
if act: input = act(input)
return input
+
+
+def conv_bn_layer(input,
+ filter_size,
+ num_filters,
+ stride,
+ padding,
+ channels=None,
+ num_groups=1,
+ if_act=True,
+ name=None,
+ use_cudnn=True):
+ conv = fluid.layers.conv2d(
+ input=input,
+ num_filters=num_filters,
+ filter_size=filter_size,
+ stride=stride,
+ padding=padding,
+ groups=num_groups,
+ act=None,
+ use_cudnn=use_cudnn,
+ param_attr=fluid.ParamAttr(name=name + '_weights'),
+ bias_attr=False)
+ bn_name = name + '_bn'
+ bn = fluid.layers.batch_norm(
+ input=conv,
+ param_attr=fluid.ParamAttr(name=bn_name + "_scale"),
+ bias_attr=fluid.ParamAttr(name=bn_name + "_offset"),
+ moving_mean_name=bn_name + '_mean',
+ moving_variance_name=bn_name + '_variance')
+ if if_act:
+ return fluid.layers.relu6(bn)
+ else:
+ return bn
\ No newline at end of file
diff --git a/pdseg/models/model_builder.py b/pdseg/models/model_builder.py
index 495652464f8cd14fef650bf5bdc77c14ebdbb4e7..668d69e44aeb91cc7705a79f092730ae6a1fdb09 100644
--- a/pdseg/models/model_builder.py
+++ b/pdseg/models/model_builder.py
@@ -24,7 +24,7 @@ from utils.config import cfg
from loss import multi_softmax_with_loss
from loss import multi_dice_loss
from loss import multi_bce_loss
-from models.modeling import deeplab, unet, icnet, pspnet, hrnet
+from models.modeling import deeplab, unet, icnet, pspnet, hrnet, fast_scnn
class ModelPhase(object):
@@ -81,9 +81,11 @@ def seg_model(image, class_num):
logits = pspnet.pspnet(image, class_num)
elif model_name == 'hrnet':
logits = hrnet.hrnet(image, class_num)
+ elif model_name == 'fast_scnn':
+ logits = fast_scnn.fast_scnn(image, class_num)
else:
raise Exception(
- "unknow model name, only support unet, deeplabv3p, icnet, pspnet, hrnet"
+ "unknow model name, only support unet, deeplabv3p, icnet, pspnet, hrnet, fast_scnn"
)
return logits
@@ -223,8 +225,9 @@ def build_model(main_prog, start_prog, phase=ModelPhase.TRAIN):
avg_loss_list = []
valid_loss = []
if "softmax_loss" in loss_type:
+ weight = cfg.SOLVER.CROSS_ENTROPY_WEIGHT
avg_loss_list.append(
- multi_softmax_with_loss(logits, label, mask, class_num))
+ multi_softmax_with_loss(logits, label, mask, class_num, weight))
loss_valid = True
valid_loss.append("softmax_loss")
if "dice_loss" in loss_type:
diff --git a/pdseg/models/modeling/deeplab.py b/pdseg/models/modeling/deeplab.py
index e7ed9604b2227bb498c2eb0b863804fbe0159333..186e2406d90d291de43133550875072d790a805f 100644
--- a/pdseg/models/modeling/deeplab.py
+++ b/pdseg/models/modeling/deeplab.py
@@ -27,6 +27,7 @@ from models.libs.model_libs import separate_conv
from models.backbone.mobilenet_v2 import MobileNetV2 as mobilenet_backbone
from models.backbone.xception import Xception as xception_backbone
+
def encoder(input):
# 编码器配置,采用ASPP架构,pooling + 1x1_conv + 三个不同尺度的空洞卷积并行, concat后1x1conv
# ASPP_WITH_SEP_CONV:默认为真,使用depthwise可分离卷积,否则使用普通卷积
@@ -47,8 +48,7 @@ def encoder(input):
with scope('encoder'):
channel = 256
with scope("image_pool"):
- image_avg = fluid.layers.reduce_mean(
- input, [2, 3], keep_dim=True)
+ image_avg = fluid.layers.reduce_mean(input, [2, 3], keep_dim=True)
image_avg = bn_relu(
conv(
image_avg,
@@ -250,14 +250,15 @@ def deeplabv3p(img, num_classes):
regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
with scope('logit'):
- logit = conv(
- data,
- num_classes,
- 1,
- stride=1,
- padding=0,
- bias_attr=True,
- param_attr=param_attr)
+ with fluid.name_scope('last_conv'):
+ logit = conv(
+ data,
+ num_classes,
+ 1,
+ stride=1,
+ padding=0,
+ bias_attr=True,
+ param_attr=param_attr)
logit = fluid.layers.resize_bilinear(logit, img.shape[2:])
return logit
diff --git a/pdseg/models/modeling/fast_scnn.py b/pdseg/models/modeling/fast_scnn.py
new file mode 100644
index 0000000000000000000000000000000000000000..b1ecdffea6625992e0c7e9e635e67ee79b7b4522
--- /dev/null
+++ b/pdseg/models/modeling/fast_scnn.py
@@ -0,0 +1,263 @@
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import paddle.fluid as fluid
+from models.libs.model_libs import scope
+from models.libs.model_libs import bn, bn_relu, relu, conv_bn_layer
+from models.libs.model_libs import conv, avg_pool
+from models.libs.model_libs import separate_conv
+from utils.config import cfg
+
+
+def learning_to_downsample(x, dw_channels1=32, dw_channels2=48, out_channels=64):
+ x = relu(bn(conv(x, dw_channels1, 3, 2)))
+ with scope('dsconv1'):
+ x = separate_conv(x, dw_channels2, stride=2, filter=3, act=fluid.layers.relu)
+ with scope('dsconv2'):
+ x = separate_conv(x, out_channels, stride=2, filter=3, act=fluid.layers.relu)
+ return x
+
+
+def shortcut(input, data_residual):
+ return fluid.layers.elementwise_add(input, data_residual)
+
+
+def dropout2d(input, prob, is_train=False):
+ if not is_train:
+ return input
+ channels = input.shape[1]
+ keep_prob = 1.0 - prob
+ random_tensor = keep_prob + fluid.layers.uniform_random_batch_size_like(input, [-1, channels, 1, 1], min=0., max=1.)
+ binary_tensor = fluid.layers.floor(random_tensor)
+ output = input / keep_prob * binary_tensor
+ return output
+
+
+def inverted_residual_unit(input,
+ num_in_filter,
+ num_filters,
+ ifshortcut,
+ stride,
+ filter_size,
+ padding,
+ expansion_factor,
+ name=None):
+ num_expfilter = int(round(num_in_filter * expansion_factor))
+
+ channel_expand = conv_bn_layer(
+ input=input,
+ num_filters=num_expfilter,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ num_groups=1,
+ if_act=True,
+ name=name + '_expand')
+
+ bottleneck_conv = conv_bn_layer(
+ input=channel_expand,
+ num_filters=num_expfilter,
+ filter_size=filter_size,
+ stride=stride,
+ padding=padding,
+ num_groups=num_expfilter,
+ if_act=True,
+ name=name + '_dwise',
+ use_cudnn=False)
+
+ depthwise_output = bottleneck_conv
+
+ linear_out = conv_bn_layer(
+ input=bottleneck_conv,
+ num_filters=num_filters,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ num_groups=1,
+ if_act=False,
+ name=name + '_linear')
+
+ if ifshortcut:
+ out = shortcut(input=input, data_residual=linear_out)
+ return out, depthwise_output
+ else:
+ return linear_out, depthwise_output
+
+
+def inverted_blocks(input, in_c, t, c, n, s, name=None):
+ first_block, depthwise_output = inverted_residual_unit(
+ input=input,
+ num_in_filter=in_c,
+ num_filters=c,
+ ifshortcut=False,
+ stride=s,
+ filter_size=3,
+ padding=1,
+ expansion_factor=t,
+ name=name + '_1')
+
+ last_residual_block = first_block
+ last_c = c
+
+ for i in range(1, n):
+ last_residual_block, depthwise_output = inverted_residual_unit(
+ input=last_residual_block,
+ num_in_filter=last_c,
+ num_filters=c,
+ ifshortcut=True,
+ stride=1,
+ filter_size=3,
+ padding=1,
+ expansion_factor=t,
+ name=name + '_' + str(i + 1))
+ return last_residual_block, depthwise_output
+
+
+def psp_module(input, out_features):
+
+ cat_layers = []
+ sizes = (1, 2, 3, 6)
+ for size in sizes:
+ psp_name = "psp" + str(size)
+ with scope(psp_name):
+ pool = fluid.layers.adaptive_pool2d(input,
+ pool_size=[size, size],
+ pool_type='avg',
+ name=psp_name + '_adapool')
+ data = conv(pool, out_features,
+ filter_size=1,
+ bias_attr=False,
+ name=psp_name + '_conv')
+ data_bn = bn(data, act='relu')
+ interp = fluid.layers.resize_bilinear(data_bn,
+ out_shape=input.shape[2:],
+ name=psp_name + '_interp', align_mode=0)
+ cat_layers.append(interp)
+ cat_layers = [input] + cat_layers
+ out = fluid.layers.concat(cat_layers, axis=1, name='psp_cat')
+
+ return out
+
+
+class FeatureFusionModule:
+ """Feature fusion module"""
+
+ def __init__(self, higher_in_channels, lower_in_channels, out_channels, scale_factor=4):
+ self.higher_in_channels = higher_in_channels
+ self.lower_in_channels = lower_in_channels
+ self.out_channels = out_channels
+ self.scale_factor = scale_factor
+
+ def net(self, higher_res_feature, lower_res_feature):
+ h, w = higher_res_feature.shape[2:]
+ lower_res_feature = fluid.layers.resize_bilinear(lower_res_feature, [h, w], align_mode=0)
+
+ with scope('dwconv'):
+ lower_res_feature = relu(bn(conv(lower_res_feature, self.out_channels, 1)))#(lower_res_feature)
+ with scope('conv_lower_res'):
+ lower_res_feature = bn(conv(lower_res_feature, self.out_channels, 1, bias_attr=True))
+ with scope('conv_higher_res'):
+ higher_res_feature = bn(conv(higher_res_feature, self.out_channels, 1, bias_attr=True))
+ out = higher_res_feature + lower_res_feature
+
+ return relu(out)
+
+
+class GlobalFeatureExtractor():
+ """Global feature extractor module"""
+
+ def __init__(self, in_channels=64, block_channels=(64, 96, 128), out_channels=128,
+ t=6, num_blocks=(3, 3, 3)):
+ self.in_channels = in_channels
+ self.block_channels = block_channels
+ self.out_channels = out_channels
+ self.t = t
+ self.num_blocks = num_blocks
+
+ def net(self, x):
+ x, _ = inverted_blocks(x, self.in_channels, self.t, self.block_channels[0],
+ self.num_blocks[0], 2, 'inverted_block_1')
+ x, _ = inverted_blocks(x, self.block_channels[0], self.t, self.block_channels[1],
+ self.num_blocks[1], 2, 'inverted_block_2')
+ x, _ = inverted_blocks(x, self.block_channels[1], self.t, self.block_channels[2],
+ self.num_blocks[2], 1, 'inverted_block_3')
+ x = psp_module(x, self.block_channels[2] // 4)
+ with scope('out'):
+ x = relu(bn(conv(x, self.out_channels, 1)))
+ return x
+
+
+class Classifier:
+ """Classifier"""
+
+ def __init__(self, dw_channels, num_classes, stride=1):
+ self.dw_channels = dw_channels
+ self.num_classes = num_classes
+ self.stride = stride
+
+ def net(self, x):
+ with scope('dsconv1'):
+ x = separate_conv(x, self.dw_channels, stride=self.stride, filter=3, act=fluid.layers.relu)
+ with scope('dsconv2'):
+ x = separate_conv(x, self.dw_channels, stride=self.stride, filter=3, act=fluid.layers.relu)
+ x = dropout2d(x, 0.1, is_train=cfg.PHASE=='train')
+ x = conv(x, self.num_classes, 1, bias_attr=True)
+ return x
+
+
+def aux_layer(x, num_classes):
+ x = relu(bn(conv(x, 32, 3, padding=1)))
+ x = dropout2d(x, 0.1, is_train=(cfg.PHASE == 'train'))
+ with scope('logit'):
+ x = conv(x, num_classes, 1, bias_attr=True)
+ return x
+
+
+def fast_scnn(img, num_classes):
+ size = img.shape[2:]
+ classifier = Classifier(128, num_classes)
+
+ global_feature_extractor = GlobalFeatureExtractor(64, [64, 96, 128], 128, 6, [3, 3, 3])
+ feature_fusion = FeatureFusionModule(64, 128, 128)
+
+ with scope('learning_to_downsample'):
+ higher_res_features = learning_to_downsample(img, 32, 48, 64)
+ with scope('global_feature_extractor'):
+ lower_res_feature = global_feature_extractor.net(higher_res_features)
+ with scope('feature_fusion'):
+ x = feature_fusion.net(higher_res_features, lower_res_feature)
+ with scope('classifier'):
+ logit = classifier.net(x)
+ logit = fluid.layers.resize_bilinear(logit, size, align_mode=0)
+
+ if len(cfg.MODEL.MULTI_LOSS_WEIGHT) == 3:
+ with scope('aux_layer_higher'):
+ higher_logit = aux_layer(higher_res_features, num_classes)
+ higher_logit = fluid.layers.resize_bilinear(higher_logit, size, align_mode=0)
+ with scope('aux_layer_lower'):
+ lower_logit = aux_layer(lower_res_feature, num_classes)
+ lower_logit = fluid.layers.resize_bilinear(lower_logit, size, align_mode=0)
+ return logit, higher_logit, lower_logit
+ elif len(cfg.MODEL.MULTI_LOSS_WEIGHT) == 2:
+ with scope('aux_layer_higher'):
+ higher_logit = aux_layer(higher_res_features, num_classes)
+ higher_logit = fluid.layers.resize_bilinear(higher_logit, size, align_mode=0)
+ return logit, higher_logit
+
+ return logit
\ No newline at end of file
diff --git a/pdseg/reader.py b/pdseg/reader.py
index d3c3659e5064cd8a11e463267a4b046ffdf105ca..7f1fd6fbbe25f1199c9247aa9e42ae7cb682c03d 100644
--- a/pdseg/reader.py
+++ b/pdseg/reader.py
@@ -98,8 +98,8 @@ class SegDataset(object):
# Re-shuffle file list
if self.shuffle and cfg.NUM_TRAINERS > 1:
np.random.RandomState(self.shuffle_seed).shuffle(self.all_lines)
- num_lines = len(self.all_lines) // self.num_trainers
- self.lines = self.all_lines[num_lines * self.trainer_id: num_lines * (self.trainer_id + 1)]
+ num_lines = len(self.all_lines) // cfg.NUM_TRAINERS
+ self.lines = self.all_lines[num_lines * cfg.TRAINER_ID: num_lines * (cfg.TRAINER_ID + 1)]
self.shuffle_seed += 1
elif self.shuffle:
np.random.shuffle(self.lines)
diff --git a/pdseg/tools/create_dataset_list.py b/pdseg/tools/create_dataset_list.py
index aca6d95d20bc645c1843399c99f5e56d4560f7f8..6c7d7c943c9baf916533621d353d5f2700388a01 100644
--- a/pdseg/tools/create_dataset_list.py
+++ b/pdseg/tools/create_dataset_list.py
@@ -116,18 +116,19 @@ def generate_list(args):
label_files = get_files(1, dataset_split, args)
if not image_files:
img_dir = os.path.join(dataset_root, args.folder[0], dataset_split)
- print("No files in {}".format(img_dir))
+ warnings.warn("No images in {} !!!".format(img_dir))
num_images = len(image_files)
if not label_files:
label_dir = os.path.join(dataset_root, args.folder[1], dataset_split)
- print("No files in {}".format(label_dir))
+ warnings.warn("No labels in {} !!!".format(label_dir))
num_label = len(label_files)
- if num_images < num_label:
- warnings.warn("number of images = {} < number of labels = {}."
- .format(num_images, num_label))
- continue
+ if num_images != num_label and num_label > 0:
+ raise Exception("Number of images = {} number of labels = {} \n"
+ "Either number of images is equal to number of labels, "
+ "or number of labels is equal to 0.\n"
+ "Please check your dataset!".format(num_images, num_label))
file_list = os.path.join(dataset_root, dataset_split + '.txt')
with open(file_list, "w") as f:
diff --git a/pdseg/tools/gray2pseudo_color.py b/pdseg/tools/gray2pseudo_color.py
index b385049172c4b134aca849682cbf76193c569f62..3627db0b216175b04a50d9012999d441f4df69fb 100644
--- a/pdseg/tools/gray2pseudo_color.py
+++ b/pdseg/tools/gray2pseudo_color.py
@@ -2,13 +2,11 @@
from __future__ import print_function
import argparse
-import glob
import os
import os.path as osp
import sys
import numpy as np
from PIL import Image
-from pdseg.vis import get_color_map_list
def parse_args():
@@ -26,6 +24,28 @@ def parse_args():
return parser.parse_args()
+def get_color_map_list(num_classes):
+ """ Returns the color map for visualizing the segmentation mask,
+ which can support arbitrary number of classes.
+ Args:
+ num_classes: Number of classes
+ Returns:
+ The color map
+ """
+ color_map = num_classes * [0, 0, 0]
+ for i in range(0, num_classes):
+ j = 0
+ lab = i
+ while lab:
+ color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
+ color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
+ color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
+ j += 1
+ lab >>= 3
+
+ return color_map
+
+
def gray2pseudo_color(args):
"""将灰度标注图片转换为伪彩色图片"""
input = args.dir_or_file
@@ -36,18 +56,28 @@ def gray2pseudo_color(args):
color_map = get_color_map_list(256)
if os.path.isdir(input):
- for grt_path in glob.glob(osp.join(input, '*.png')):
- print('Converting original label:', grt_path)
- basename = osp.basename(grt_path)
+ for fpath, dirs, fs in os.walk(input):
+ for f in fs:
+ try:
+ grt_path = osp.join(fpath, f)
+ _output_dir = fpath.replace(input, '')
+ _output_dir = _output_dir.lstrip(os.path.sep)
- im = Image.open(grt_path)
- lbl = np.asarray(im)
+ im = Image.open(grt_path)
+ lbl = np.asarray(im)
- lbl_pil = Image.fromarray(lbl.astype(np.uint8), mode='P')
- lbl_pil.putpalette(color_map)
+ lbl_pil = Image.fromarray(lbl.astype(np.uint8), mode='P')
+ lbl_pil.putpalette(color_map)
- new_file = osp.join(output_dir, basename)
- lbl_pil.save(new_file)
+ real_dir = osp.join(output_dir, _output_dir)
+ if not osp.exists(real_dir):
+ os.makedirs(real_dir)
+ new_grt_path = osp.join(real_dir, f)
+
+ lbl_pil.save(new_grt_path)
+ print('New label path:', new_grt_path)
+ except:
+ continue
elif os.path.isfile(input):
if args.dataset_dir is None or args.file_separator is None:
print('No dataset_dir or file_separator input!')
@@ -58,17 +88,20 @@ def gray2pseudo_color(args):
grt_name = parts[1]
grt_path = os.path.join(args.dataset_dir, grt_name)
- print('Converting original label:', grt_path)
- basename = osp.basename(grt_path)
-
im = Image.open(grt_path)
lbl = np.asarray(im)
lbl_pil = Image.fromarray(lbl.astype(np.uint8), mode='P')
lbl_pil.putpalette(color_map)
- new_file = osp.join(output_dir, basename)
- lbl_pil.save(new_file)
+ grt_dir, _ = osp.split(grt_name)
+ new_dir = osp.join(output_dir, grt_dir)
+ if not osp.exists(new_dir):
+ os.makedirs(new_dir)
+ new_grt_path = osp.join(output_dir, grt_name)
+
+ lbl_pil.save(new_grt_path)
+ print('New label path:', new_grt_path)
else:
print('It\'s neither a dir nor a file')
diff --git a/pdseg/tools/jingling2seg.py b/pdseg/tools/jingling2seg.py
index 9c1d663685cb357017387c54ed25115e6117408e..28bce3b0436242f5174087c0852dde99a7878684 100644
--- a/pdseg/tools/jingling2seg.py
+++ b/pdseg/tools/jingling2seg.py
@@ -12,7 +12,7 @@ import numpy as np
import PIL.Image
import labelme
-from pdseg.vis import get_color_map_list
+from gray2pseudo_color import get_color_map_list
def parse_args():
diff --git a/pdseg/tools/labelme2seg.py b/pdseg/tools/labelme2seg.py
index be1c99ee32c249cda29fea3d628b707415bf8b23..6ae3ad3a50a6df750ce321d94b7235ef57dcf80b 100755
--- a/pdseg/tools/labelme2seg.py
+++ b/pdseg/tools/labelme2seg.py
@@ -12,7 +12,7 @@ import numpy as np
import PIL.Image
import labelme
-from pdseg.vis import get_color_map_list
+from gray2pseudo_color import get_color_map_list
def parse_args():
diff --git a/pdseg/train.py b/pdseg/train.py
index 4f6a90e003c0b2997daceab684b7199f52c9aafc..8254f1655c97c09204d2e4a64e2404907270fcfc 100644
--- a/pdseg/train.py
+++ b/pdseg/train.py
@@ -24,12 +24,14 @@ os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
import sys
import argparse
import pprint
+import random
import shutil
import functools
import paddle
import numpy as np
import paddle.fluid as fluid
+from paddle.fluid import profiler
from utils.config import cfg
from utils.timer import Timer, calculate_eta
@@ -95,6 +97,24 @@ def parse_args():
help='See utils/config.py for all options',
default=None,
nargs=argparse.REMAINDER)
+ parser.add_argument(
+ '--enable_ce',
+ dest='enable_ce',
+ help='If set True, enable continuous evaluation job.'
+ 'This flag is only used for internal test.',
+ action='store_true')
+
+ # NOTE: This for benchmark
+ parser.add_argument(
+ '--is_profiler',
+ help='the profiler switch.(used for benchmark)',
+ default=0,
+ type=int)
+ parser.add_argument(
+ '--profiler_path',
+ help='the profiler output file path.(used for benchmark)',
+ default='./seg.profiler',
+ type=str)
return parser.parse_args()
@@ -194,6 +214,9 @@ def print_info(*msg):
def train(cfg):
startup_prog = fluid.Program()
train_prog = fluid.Program()
+ if args.enable_ce:
+ startup_prog.random_seed = 1000
+ train_prog.random_seed = 1000
drop_last = True
dataset = SegDataset(
@@ -431,6 +454,13 @@ def train(cfg):
sys.stdout.flush()
avg_loss = 0.0
timer.restart()
+
+ # NOTE : used for benchmark, profiler tools
+ if args.is_profiler and epoch == 1 and global_step == args.log_steps:
+ profiler.start_profiler("All")
+ elif args.is_profiler and epoch == 1 and global_step == args.log_steps + 5:
+ profiler.stop_profiler("total", args.profiler_path)
+ return
except fluid.core.EOFException:
py_reader.reset()
@@ -483,6 +513,9 @@ def main(args):
cfg.update_from_file(args.cfg_file)
if args.opts:
cfg.update_from_list(args.opts)
+ if args.enable_ce:
+ random.seed(0)
+ np.random.seed(0)
cfg.TRAINER_ID = int(os.getenv("PADDLE_TRAINER_ID", 0))
cfg.NUM_TRAINERS = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
diff --git a/pdseg/utils/config.py b/pdseg/utils/config.py
index 5d66c2f076ca964fcdf23d1cfd427e61acf68876..c3d84216752838a388fd2cda1946949d77960fb9 100644
--- a/pdseg/utils/config.py
+++ b/pdseg/utils/config.py
@@ -72,17 +72,11 @@ cfg.DATASET.IGNORE_INDEX = 255
cfg.DATASET.PADDING_VALUE = [127.5, 127.5, 127.5]
########################### 数据增强配置 ######################################
-# 图像镜像左右翻转
-cfg.AUG.MIRROR = True
-# 图像上下翻转开关,True/False
-cfg.AUG.FLIP = False
-# 图像启动上下翻转的概率,0-1
-cfg.AUG.FLIP_RATIO = 0.5
-# 图像resize的固定尺寸(宽,高),非负
-cfg.AUG.FIX_RESIZE_SIZE = tuple()
# 图像resize的方式有三种:
# unpadding(固定尺寸),stepscaling(按比例resize),rangescaling(长边对齐)
-cfg.AUG.AUG_METHOD = 'rangescaling'
+cfg.AUG.AUG_METHOD = 'unpadding'
+# 图像resize的固定尺寸(宽,高),非负
+cfg.AUG.FIX_RESIZE_SIZE = (512, 512)
# 图像resize方式为stepscaling,resize最小尺度,非负
cfg.AUG.MIN_SCALE_FACTOR = 0.5
# 图像resize方式为stepscaling,resize最大尺度,不小于MIN_SCALE_FACTOR
@@ -98,6 +92,13 @@ cfg.AUG.MAX_RESIZE_VALUE = 600
# 在MIN_RESIZE_VALUE到MAX_RESIZE_VALUE范围内
cfg.AUG.INF_RESIZE_VALUE = 500
+# 图像镜像左右翻转
+cfg.AUG.MIRROR = True
+# 图像上下翻转开关,True/False
+cfg.AUG.FLIP = False
+# 图像启动上下翻转的概率,0-1
+cfg.AUG.FLIP_RATIO = 0.5
+
# RichCrop数据增广开关,用于提升模型鲁棒性
cfg.AUG.RICH_CROP.ENABLE = False
# 图像旋转最大角度,0-90
@@ -158,13 +159,16 @@ cfg.SOLVER.LOSS = ["softmax_loss"]
cfg.SOLVER.LR_WARMUP = False
# warmup的迭代次数
cfg.SOLVER.LR_WARMUP_STEPS = 2000
-
+# cross entropy weight, 默认为None,如果设置为'dynamic',会根据每个batch中各个类别的数目,
+# 动态调整类别权重。
+# 也可以设置一个静态权重(list的方式),比如有3类,每个类别权重可以设置为[0.1, 2.0, 0.9]
+cfg.SOLVER.CROSS_ENTROPY_WEIGHT = None
########################## 测试配置 ###########################################
# 测试模型路径
cfg.TEST.TEST_MODEL = ''
########################## 模型通用配置 #######################################
-# 模型名称, 支持deeplab, unet, icnet三种
+# 模型名称, 已支持deeplabv3p, unet, icnet,pspnet,hrnet
cfg.MODEL.MODEL_NAME = ''
# BatchNorm类型: bn、gn(group_norm)
cfg.MODEL.DEFAULT_NORM_TYPE = 'bn'
@@ -232,3 +236,19 @@ cfg.FREEZE.MODEL_FILENAME = '__model__'
cfg.FREEZE.PARAMS_FILENAME = '__params__'
# 预测模型参数保存的路径
cfg.FREEZE.SAVE_DIR = 'freeze_model'
+
+########################## paddle-slim ######################################
+cfg.SLIM.KNOWLEDGE_DISTILL_IS_TEACHER = False
+cfg.SLIM.KNOWLEDGE_DISTILL = False
+cfg.SLIM.KNOWLEDGE_DISTILL_TEACHER_MODEL_DIR = ""
+
+cfg.SLIM.NAS_PORT = 23333
+cfg.SLIM.NAS_ADDRESS = ""
+cfg.SLIM.NAS_SEARCH_STEPS = 100
+cfg.SLIM.NAS_START_EVAL_EPOCH = 0
+cfg.SLIM.NAS_IS_SERVER = True
+cfg.SLIM.NAS_SPACE_NAME = ""
+
+cfg.SLIM.PRUNE_PARAMS = ''
+cfg.SLIM.PRUNE_RATIOS = []
+
diff --git a/pdseg/vis.py b/pdseg/vis.py
index 9fc349a3876f2667f8cc86bc1b9556594acfa638..d94221c0be1a0b4abe241e75966215863d8fd35d 100644
--- a/pdseg/vis.py
+++ b/pdseg/vis.py
@@ -34,6 +34,7 @@ from utils.config import cfg
from reader import SegDataset
from models.model_builder import build_model
from models.model_builder import ModelPhase
+from tools.gray2pseudo_color import get_color_map_list
def parse_args():
@@ -73,28 +74,6 @@ def makedirs(directory):
os.makedirs(directory)
-def get_color_map_list(num_classes):
- """ Returns the color map for visualizing the segmentation mask,
- which can support arbitrary number of classes.
- Args:
- num_classes: Number of classes
- Returns:
- The color map
- """
- color_map = num_classes * [0, 0, 0]
- for i in range(0, num_classes):
- j = 0
- lab = i
- while lab:
- color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
- color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
- color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
- j += 1
- lab >>= 3
-
- return color_map
-
-
def to_png_fn(fn):
"""
Append png as filename postfix
@@ -108,7 +87,7 @@ def to_png_fn(fn):
def visualize(cfg,
vis_file_list=None,
use_gpu=False,
- vis_dir="visual_predict",
+ vis_dir="visual",
ckpt_dir=None,
log_writer=None,
local_test=False,
@@ -138,7 +117,7 @@ def visualize(cfg,
fluid.io.load_params(exe, ckpt_dir, main_program=test_prog)
- save_dir = os.path.join('visual', vis_dir)
+ save_dir = vis_dir
makedirs(save_dir)
fetch_list = [pred.name]
diff --git a/pretrained_model/download_model.py b/pretrained_model/download_model.py
index 12b01472457bd25e22005141b21bb9d3014bf4fe..28b5ae421425a42e959fa6cf792c3e536e53c964 100644
--- a/pretrained_model/download_model.py
+++ b/pretrained_model/download_model.py
@@ -81,6 +81,8 @@ model_urls = {
"https://paddleseg.bj.bcebos.com/models/pspnet101_cityscapes.tgz",
"hrnet_w18_bn_cityscapes":
"https://paddleseg.bj.bcebos.com/models/hrnet_w18_bn_cityscapes.tgz",
+ "fast_scnn_cityscapes":
+ "https://paddleseg.bj.bcebos.com/models/fast_scnn_cityscape.tar",
}
if __name__ == "__main__":
diff --git a/slim/distillation/README.md b/slim/distillation/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..2bd772a1001e11efa89324315fa32d44032ade05
--- /dev/null
+++ b/slim/distillation/README.md
@@ -0,0 +1,100 @@
+>运行该示例前请安装PaddleSlim和Paddle1.6或更高版本
+
+# PaddleSeg蒸馏教程
+
+在阅读本教程前,请确保您已经了解过[PaddleSeg使用说明](../../docs/usage.md)等章节,以便对PaddleSeg有一定的了解
+
+该文档介绍如何使用[PaddleSlim](https://paddlepaddle.github.io/PaddleSlim)对分割库中的模型进行蒸馏。
+
+该教程中所示操作,如无特殊说明,均在`PaddleSeg/`路径下执行。
+
+## 概述
+
+该示例使用PaddleSlim提供的[蒸馏策略](https://paddlepaddle.github.io/PaddleSlim/algo/algo/#3)对分割库中的模型进行蒸馏训练。
+在阅读该示例前,建议您先了解以下内容:
+
+- [PaddleSlim蒸馏API文档](https://paddlepaddle.github.io/PaddleSlim/api/single_distiller_api/)
+
+## 安装PaddleSlim
+可按照[PaddleSlim使用文档](https://paddlepaddle.github.io/PaddleSlim/)中的步骤安装PaddleSlim
+
+## 蒸馏策略说明
+
+关于蒸馏API如何使用您可以参考PaddleSlim蒸馏API文档
+
+这里以Deeplabv3-xception蒸馏训练Deeplabv3-mobilenet模型为例,首先,为了对`student model`和`teacher model`有个总体的认识,进一步确认蒸馏的对象,我们通过以下命令分别观察两个网络变量(Variables)的名称和形状:
+
+```python
+# 观察student model的Variables
+student_vars = []
+for v in fluid.default_main_program().list_vars():
+ try:
+ student_vars.append((v.name, v.shape))
+ except:
+ pass
+print("="*50+"student_model_vars"+"="*50)
+print(student_vars)
+# 观察teacher model的Variables
+teacher_vars = []
+for v in teacher_program.list_vars():
+ try:
+ teacher_vars.append((v.name, v.shape))
+ except:
+ pass
+print("="*50+"teacher_model_vars"+"="*50)
+print(teacher_vars)
+```
+
+经过对比可以发现,`student model`和`teacher model`输入到`loss`的特征图分别为:
+
+```bash
+# student model
+bilinear_interp_0.tmp_0
+# teacher model
+bilinear_interp_2.tmp_0
+```
+
+
+它们形状两两相同,且分别处于两个网络的输出部分。所以,我们用`l2_loss`对这几个特征图两两对应添加蒸馏loss。需要注意的是,teacher的Variable在merge过程中被自动添加了一个`name_prefix`,所以这里也需要加上这个前缀`"teacher_"`,merge过程请参考[蒸馏API文档](https://paddlepaddle.github.io/PaddleSlim/api/single_distiller_api/#merge)
+
+```python
+distill_loss = l2_loss('teacher_bilinear_interp_2.tmp_0', 'bilinear_interp_0.tmp_0')
+```
+
+我们也可以根据上述操作为蒸馏策略选择其他loss,PaddleSlim支持的有`FSP_loss`, `L2_loss`, `softmax_with_cross_entropy_loss` 以及自定义的任何loss。
+
+## 训练
+
+根据[PaddleSeg/pdseg/train.py](../../pdseg/train.py)编写压缩脚本`train_distill.py`。
+在该脚本中定义了teacher_model和student_model,用teacher_model的输出指导student_model的训练
+
+### 执行示例
+
+下载teacher的预训练模型([deeplabv3p_xception65_bn_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/xception65_bn_cityscapes.tgz))和student的预训练模型([mobilenet_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/mobilenet_cityscapes.tgz)),
+修改student config file(./slim/distillation/cityscape.yaml)中预训练模型的路径:
+```
+TRAIN:
+ PRETRAINED_MODEL_DIR: your_student_pretrained_model_dir
+```
+修改teacher config file(./slim/distillation/cityscape_teacher.yaml)中预训练模型的路径:
+```
+SLIM:
+ KNOWLEDGE_DISTILL_TEACHER_MODEL_DIR: your_teacher_pretrained_model_dir
+```
+
+执行如下命令启动训练,每间隔```cfg.TRAIN.SNAPSHOT_EPOCH```会进行一次评估。
+```shell
+CUDA_VISIBLE_DEVICES=0,1
+python -m paddle.distributed.launch ./slim/distillation/train_distill.py \
+--log_steps 10 --cfg ./slim/distillation/cityscape.yaml \
+--teacher_cfg ./slim/distillation/cityscape_teacher.yaml \
+--use_gpu \
+--use_mpio \
+--do_eval
+```
+
+注意:如需修改配置文件中的参数,请在对应的配置文件中直接修改,暂不支持命令行输入覆盖。
+
+## 评估预测
+
+训练完成后的评估和预测请参考PaddleSeg的[快速入门](../../README.md#快速入门)和[基础功能](../../README.md#基础功能)等章节
diff --git a/slim/distillation/cityscape.yaml b/slim/distillation/cityscape.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..703a6a2483fcf68f9ea801369ff0675c41ad286c
--- /dev/null
+++ b/slim/distillation/cityscape.yaml
@@ -0,0 +1,59 @@
+EVAL_CROP_SIZE: (2049, 1025) # (width, height), for unpadding rangescaling and stepscaling
+TRAIN_CROP_SIZE: (769, 769) # (width, height), for unpadding rangescaling and stepscaling
+AUG:
+ AUG_METHOD: "stepscaling" # choice unpadding rangescaling and stepscaling
+ FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding
+ INF_RESIZE_VALUE: 500 # for rangescaling
+ MAX_RESIZE_VALUE: 600 # for rangescaling
+ MIN_RESIZE_VALUE: 400 # for rangescaling
+ MAX_SCALE_FACTOR: 2.0 # for stepscaling
+ MIN_SCALE_FACTOR: 0.5 # for stepscaling
+ SCALE_STEP_SIZE: 0.25 # for stepscaling
+ MIRROR: True
+ FLIP: True
+ FLIP_RATIO: 0.2
+ RICH_CROP:
+ ENABLE: False
+ ASPECT_RATIO: 0.33
+ BLUR: True
+ BLUR_RATIO: 0.1
+ MAX_ROTATION: 15
+ MIN_AREA_RATIO: 0.5
+ BRIGHTNESS_JITTER_RATIO: 0.5
+ CONTRAST_JITTER_RATIO: 0.5
+ SATURATION_JITTER_RATIO: 0.5
+BATCH_SIZE: 16
+MEAN: [0.5, 0.5, 0.5]
+STD: [0.5, 0.5, 0.5]
+DATASET:
+ DATA_DIR: "./dataset/cityscapes/"
+ IMAGE_TYPE: "rgb" # choice rgb or rgba
+ NUM_CLASSES: 19
+ TEST_FILE_LIST: "dataset/cityscapes/val.list"
+ TRAIN_FILE_LIST: "dataset/cityscapes/train.list"
+ VAL_FILE_LIST: "dataset/cityscapes/val.list"
+ IGNORE_INDEX: 255
+FREEZE:
+ MODEL_FILENAME: "model"
+ PARAMS_FILENAME: "params"
+MODEL:
+ DEFAULT_NORM_TYPE: "bn"
+ MODEL_NAME: "deeplabv3p"
+ DEEPLAB:
+ BACKBONE: "mobilenet"
+ ASPP_WITH_SEP_CONV: True
+ DECODER_USE_SEP_CONV: True
+ ENCODER_WITH_ASPP: False
+ ENABLE_DECODER: False
+TEST:
+ TEST_MODEL: "snapshots/cityscape_v5/final/"
+TRAIN:
+ MODEL_SAVE_DIR: "snapshots/cityscape_mbv2_kd_e100_1/"
+ PRETRAINED_MODEL_DIR: u"pretrained_model/mobilenet_cityscapes"
+ SNAPSHOT_EPOCH: 5
+ SYNC_BATCH_NORM: True
+SOLVER:
+ LR: 0.001
+ LR_POLICY: "poly"
+ OPTIMIZER: "sgd"
+ NUM_EPOCHS: 100
diff --git a/slim/distillation/cityscape_teacher.yaml b/slim/distillation/cityscape_teacher.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..ff7df807bbb782e4d5862f8963104f07fa147bb1
--- /dev/null
+++ b/slim/distillation/cityscape_teacher.yaml
@@ -0,0 +1,65 @@
+EVAL_CROP_SIZE: (2049, 1025) # (width, height), for unpadding rangescaling and stepscaling
+TRAIN_CROP_SIZE: (769, 769) # (width, height), for unpadding rangescaling and stepscaling
+AUG:
+ AUG_METHOD: "stepscaling" # choice unpadding rangescaling and stepscaling
+ FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding
+ INF_RESIZE_VALUE: 500 # for rangescaling
+ MAX_RESIZE_VALUE: 600 # for rangescaling
+ MIN_RESIZE_VALUE: 400 # for rangescaling
+ MAX_SCALE_FACTOR: 2.0 # for stepscaling
+ MIN_SCALE_FACTOR: 0.5 # for stepscaling
+ SCALE_STEP_SIZE: 0.25 # for stepscaling
+ MIRROR: True
+ FLIP: True
+ FLIP_RATIO: 0.2
+ RICH_CROP:
+ ENABLE: False
+ ASPECT_RATIO: 0.33
+ BLUR: True
+ BLUR_RATIO: 0.1
+ MAX_ROTATION: 15
+ MIN_AREA_RATIO: 0.5
+ BRIGHTNESS_JITTER_RATIO: 0.5
+ CONTRAST_JITTER_RATIO: 0.5
+ SATURATION_JITTER_RATIO: 0.5
+BATCH_SIZE: 16
+MEAN: [0.5, 0.5, 0.5]
+STD: [0.5, 0.5, 0.5]
+DATASET:
+ DATA_DIR: "./dataset/cityscapes/"
+ IMAGE_TYPE: "rgb" # choice rgb or rgba
+ NUM_CLASSES: 19
+ TEST_FILE_LIST: "dataset/cityscapes/val.list"
+ TRAIN_FILE_LIST: "dataset/cityscapes/train.list"
+ VAL_FILE_LIST: "dataset/cityscapes/val.list"
+ IGNORE_INDEX: 255
+FREEZE:
+ MODEL_FILENAME: "model"
+ PARAMS_FILENAME: "params"
+MODEL:
+ DEFAULT_NORM_TYPE: "bn"
+ MODEL_NAME: "deeplabv3p"
+ DEEPLAB:
+ BACKBONE: "xception_65"
+ ASPP_WITH_SEP_CONV: True
+ DECODER_USE_SEP_CONV: True
+ ENCODER_WITH_ASPP: True
+ ENABLE_DECODER: True
+TEST:
+ TEST_MODEL: "snapshots/cityscape_v5/final/"
+TRAIN:
+ MODEL_SAVE_DIR: "snapshots/cityscape_v7/"
+ PRETRAINED_MODEL_DIR: u"pretrain/deeplabv3plus_gn_init"
+ SNAPSHOT_EPOCH: 5
+ SYNC_BATCH_NORM: True
+SOLVER:
+ LR: 0.001
+ LR_POLICY: "poly"
+ OPTIMIZER: "sgd"
+ NUM_EPOCHS: 100
+
+SLIM:
+ KNOWLEDGE_DISTILL_IS_TEACHER: True
+ KNOWLEDGE_DISTILL: True
+ KNOWLEDGE_DISTILL_TEACHER_MODEL_DIR: "pretrained_model/xception65_bn_cityscapes"
+
diff --git a/slim/distillation/model_builder.py b/slim/distillation/model_builder.py
new file mode 100644
index 0000000000000000000000000000000000000000..f903b8dd2b635fa10070dcc3da488be66746d539
--- /dev/null
+++ b/slim/distillation/model_builder.py
@@ -0,0 +1,342 @@
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import struct
+
+import paddle.fluid as fluid
+import numpy as np
+from paddle.fluid.proto.framework_pb2 import VarType
+
+import solver
+from utils.config import cfg
+from loss import multi_softmax_with_loss
+from loss import multi_dice_loss
+from loss import multi_bce_loss
+from models.modeling import deeplab, unet, icnet, pspnet, hrnet, fast_scnn
+
+
+class ModelPhase(object):
+ """
+ Standard name for model phase in PaddleSeg
+
+ The following standard keys are defined:
+ * `TRAIN`: training mode.
+ * `EVAL`: testing/evaluation mode.
+ * `PREDICT`: prediction/inference mode.
+ * `VISUAL` : visualization mode
+ """
+
+ TRAIN = 'train'
+ EVAL = 'eval'
+ PREDICT = 'predict'
+ VISUAL = 'visual'
+
+ @staticmethod
+ def is_train(phase):
+ return phase == ModelPhase.TRAIN
+
+ @staticmethod
+ def is_predict(phase):
+ return phase == ModelPhase.PREDICT
+
+ @staticmethod
+ def is_eval(phase):
+ return phase == ModelPhase.EVAL
+
+ @staticmethod
+ def is_visual(phase):
+ return phase == ModelPhase.VISUAL
+
+ @staticmethod
+ def is_valid_phase(phase):
+ """ Check valid phase """
+ if ModelPhase.is_train(phase) or ModelPhase.is_predict(phase) \
+ or ModelPhase.is_eval(phase) or ModelPhase.is_visual(phase):
+ return True
+
+ return False
+
+
+def seg_model(image, class_num):
+ model_name = cfg.MODEL.MODEL_NAME
+ if model_name == 'unet':
+ logits = unet.unet(image, class_num)
+ elif model_name == 'deeplabv3p':
+ logits = deeplab.deeplabv3p(image, class_num)
+ elif model_name == 'icnet':
+ logits = icnet.icnet(image, class_num)
+ elif model_name == 'pspnet':
+ logits = pspnet.pspnet(image, class_num)
+ elif model_name == 'hrnet':
+ logits = hrnet.hrnet(image, class_num)
+ elif model_name == 'fast_scnn':
+ logits = fast_scnn.fast_scnn(image, class_num)
+ else:
+ raise Exception(
+ "unknow model name, only support unet, deeplabv3p, icnet, pspnet, hrnet"
+ )
+ return logits
+
+
+def softmax(logit):
+ logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+ logit = fluid.layers.softmax(logit)
+ logit = fluid.layers.transpose(logit, [0, 3, 1, 2])
+ return logit
+
+
+def sigmoid_to_softmax(logit):
+ """
+ one channel to two channel
+ """
+ logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+ logit = fluid.layers.sigmoid(logit)
+ logit_back = 1 - logit
+ logit = fluid.layers.concat([logit_back, logit], axis=-1)
+ logit = fluid.layers.transpose(logit, [0, 3, 1, 2])
+ return logit
+
+
+def export_preprocess(image):
+ """导出模型的预处理流程"""
+
+ image = fluid.layers.transpose(image, [0, 3, 1, 2])
+ origin_shape = fluid.layers.shape(image)[-2:]
+
+ # 不同AUG_METHOD方法的resize
+ if cfg.AUG.AUG_METHOD == 'unpadding':
+ h_fix = cfg.AUG.FIX_RESIZE_SIZE[1]
+ w_fix = cfg.AUG.FIX_RESIZE_SIZE[0]
+ image = fluid.layers.resize_bilinear(
+ image, out_shape=[h_fix, w_fix], align_corners=False, align_mode=0)
+ elif cfg.AUG.AUG_METHOD == 'rangescaling':
+ size = cfg.AUG.INF_RESIZE_VALUE
+ value = fluid.layers.reduce_max(origin_shape)
+ scale = float(size) / value.astype('float32')
+ image = fluid.layers.resize_bilinear(
+ image, scale=scale, align_corners=False, align_mode=0)
+
+ # 存储resize后图像shape
+ valid_shape = fluid.layers.shape(image)[-2:]
+
+ # padding到eval_crop_size大小
+ width = cfg.EVAL_CROP_SIZE[0]
+ height = cfg.EVAL_CROP_SIZE[1]
+ pad_target = fluid.layers.assign(
+ np.array([height, width]).astype('float32'))
+ up = fluid.layers.assign(np.array([0]).astype('float32'))
+ down = pad_target[0] - valid_shape[0]
+ left = up
+ right = pad_target[1] - valid_shape[1]
+ paddings = fluid.layers.concat([up, down, left, right])
+ paddings = fluid.layers.cast(paddings, 'int32')
+ image = fluid.layers.pad2d(image, paddings=paddings, pad_value=127.5)
+
+ # normalize
+ mean = np.array(cfg.MEAN).reshape(1, len(cfg.MEAN), 1, 1)
+ mean = fluid.layers.assign(mean.astype('float32'))
+ std = np.array(cfg.STD).reshape(1, len(cfg.STD), 1, 1)
+ std = fluid.layers.assign(std.astype('float32'))
+ image = (image / 255 - mean) / std
+ # 使后面的网络能通过类似image.shape获取特征图的shape
+ image = fluid.layers.reshape(
+ image, shape=[-1, cfg.DATASET.DATA_DIM, height, width])
+ return image, valid_shape, origin_shape
+
+
+def build_model(main_prog=None, start_prog=None, phase=ModelPhase.TRAIN, **kwargs):
+
+ if not ModelPhase.is_valid_phase(phase):
+ raise ValueError("ModelPhase {} is not valid!".format(phase))
+ if ModelPhase.is_train(phase):
+ width = cfg.TRAIN_CROP_SIZE[0]
+ height = cfg.TRAIN_CROP_SIZE[1]
+ else:
+ width = cfg.EVAL_CROP_SIZE[0]
+ height = cfg.EVAL_CROP_SIZE[1]
+
+ image_shape = [cfg.DATASET.DATA_DIM, height, width]
+ grt_shape = [1, height, width]
+ class_num = cfg.DATASET.NUM_CLASSES
+
+ #with fluid.program_guard(main_prog, start_prog):
+ # with fluid.unique_name.guard():
+ # 在导出模型的时候,增加图像标准化预处理,减小预测部署时图像的处理流程
+ # 预测部署时只须对输入图像增加batch_size维度即可
+ if cfg.SLIM.KNOWLEDGE_DISTILL_IS_TEACHER:
+ image = main_prog.global_block()._clone_variable(kwargs['image'],
+ force_persistable=False)
+ label = main_prog.global_block()._clone_variable(kwargs['label'],
+ force_persistable=False)
+ mask = main_prog.global_block()._clone_variable(kwargs['mask'],
+ force_persistable=False)
+ else:
+ if ModelPhase.is_predict(phase):
+ origin_image = fluid.layers.data(
+ name='image',
+ shape=[-1, -1, -1, cfg.DATASET.DATA_DIM],
+ dtype='float32',
+ append_batch_size=False)
+ image, valid_shape, origin_shape = export_preprocess(
+ origin_image)
+
+ else:
+ image = fluid.layers.data(
+ name='image', shape=image_shape, dtype='float32')
+ label = fluid.layers.data(
+ name='label', shape=grt_shape, dtype='int32')
+ mask = fluid.layers.data(
+ name='mask', shape=grt_shape, dtype='int32')
+
+
+ # use PyReader when doing traning and evaluation
+ if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
+ py_reader = None
+ if not cfg.SLIM.KNOWLEDGE_DISTILL_IS_TEACHER:
+ py_reader = fluid.io.PyReader(
+ feed_list=[image, label, mask],
+ capacity=cfg.DATALOADER.BUF_SIZE,
+ iterable=False,
+ use_double_buffer=True)
+
+ loss_type = cfg.SOLVER.LOSS
+ if not isinstance(loss_type, list):
+ loss_type = list(loss_type)
+
+ # dice_loss或bce_loss只适用两类分割中
+ if class_num > 2 and (("dice_loss" in loss_type) or
+ ("bce_loss" in loss_type)):
+ raise Exception(
+ "dice loss and bce loss is only applicable to binary classfication"
+ )
+
+ # 在两类分割情况下,当loss函数选择dice_loss或bce_loss的时候,最后logit输出通道数设置为1
+ if ("dice_loss" in loss_type) or ("bce_loss" in loss_type):
+ class_num = 1
+ if "softmax_loss" in loss_type:
+ raise Exception(
+ "softmax loss can not combine with dice loss or bce loss"
+ )
+ logits = seg_model(image, class_num)
+
+ # 根据选择的loss函数计算相应的损失函数
+ if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
+ loss_valid = False
+ avg_loss_list = []
+ valid_loss = []
+ if "softmax_loss" in loss_type:
+ weight = cfg.SOLVER.CROSS_ENTROPY_WEIGHT
+ avg_loss_list.append(
+ multi_softmax_with_loss(logits, label, mask, class_num, weight))
+ loss_valid = True
+ valid_loss.append("softmax_loss")
+ if "dice_loss" in loss_type:
+ avg_loss_list.append(multi_dice_loss(logits, label, mask))
+ loss_valid = True
+ valid_loss.append("dice_loss")
+ if "bce_loss" in loss_type:
+ avg_loss_list.append(multi_bce_loss(logits, label, mask))
+ loss_valid = True
+ valid_loss.append("bce_loss")
+ if not loss_valid:
+ raise Exception(
+ "SOLVER.LOSS: {} is set wrong. it should "
+ "include one of (softmax_loss, bce_loss, dice_loss) at least"
+ " example: ['softmax_loss'], ['dice_loss'], ['bce_loss', 'dice_loss']"
+ .format(cfg.SOLVER.LOSS))
+
+ invalid_loss = [x for x in loss_type if x not in valid_loss]
+ if len(invalid_loss) > 0:
+ print(
+ "Warning: the loss {} you set is invalid. it will not be included in loss computed."
+ .format(invalid_loss))
+
+ avg_loss = 0
+ for i in range(0, len(avg_loss_list)):
+ avg_loss += avg_loss_list[i]
+
+ #get pred result in original size
+ if isinstance(logits, tuple):
+ logit = logits[0]
+ else:
+ logit = logits
+
+ if logit.shape[2:] != label.shape[2:]:
+ logit = fluid.layers.resize_bilinear(logit, label.shape[2:])
+
+ # return image input and logit output for inference graph prune
+ if ModelPhase.is_predict(phase):
+ # 两类分割中,使用dice_loss或bce_loss返回的logit为单通道,进行到两通道的变换
+ if class_num == 1:
+ logit = sigmoid_to_softmax(logit)
+ else:
+ logit = softmax(logit)
+
+ # 获取有效部分
+ logit = fluid.layers.slice(
+ logit, axes=[2, 3], starts=[0, 0], ends=valid_shape)
+
+ logit = fluid.layers.resize_bilinear(
+ logit,
+ out_shape=origin_shape,
+ align_corners=False,
+ align_mode=0)
+ logit = fluid.layers.argmax(logit, axis=1)
+ return origin_image, logit
+
+ if class_num == 1:
+ out = sigmoid_to_softmax(logit)
+ out = fluid.layers.transpose(out, [0, 2, 3, 1])
+ else:
+ out = fluid.layers.transpose(logit, [0, 2, 3, 1])
+
+ pred = fluid.layers.argmax(out, axis=3)
+ pred = fluid.layers.unsqueeze(pred, axes=[3])
+ if ModelPhase.is_visual(phase):
+ if class_num == 1:
+ logit = sigmoid_to_softmax(logit)
+ else:
+ logit = softmax(logit)
+ return pred, logit
+
+ if ModelPhase.is_eval(phase):
+ return py_reader, avg_loss, pred, label, mask
+
+ if ModelPhase.is_train(phase):
+ decayed_lr = None
+ if not cfg.SLIM.KNOWLEDGE_DISTILL:
+ optimizer = solver.Solver(main_prog, start_prog)
+ decayed_lr = optimizer.optimise(avg_loss)
+ # optimizer = solver.Solver(main_prog, start_prog)
+ # decayed_lr = optimizer.optimise(avg_loss)
+ return py_reader, avg_loss, decayed_lr, pred, label, mask, image
+
+
+def to_int(string, dest="I"):
+ return struct.unpack(dest, string)[0]
+
+
+def parse_shape_from_file(filename):
+ with open(filename, "rb") as file:
+ version = file.read(4)
+ lod_level = to_int(file.read(8), dest="Q")
+ for i in range(lod_level):
+ _size = to_int(file.read(8), dest="Q")
+ _ = file.read(_size)
+ version = file.read(4)
+ tensor_desc_size = to_int(file.read(4))
+ tensor_desc = VarType.TensorDesc()
+ tensor_desc.ParseFromString(file.read(tensor_desc_size))
+ return tuple(tensor_desc.dims)
diff --git a/slim/distillation/train_distill.py b/slim/distillation/train_distill.py
new file mode 100644
index 0000000000000000000000000000000000000000..c1e23253ffcde9eea034bd7f67906ca9e534d2e2
--- /dev/null
+++ b/slim/distillation/train_distill.py
@@ -0,0 +1,584 @@
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+import sys
+
+LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
+SEG_PATH = os.path.join(LOCAL_PATH, "../../", "pdseg")
+sys.path.append(SEG_PATH)
+
+import argparse
+import pprint
+import random
+import shutil
+import functools
+
+import paddle
+import numpy as np
+import paddle.fluid as fluid
+
+from utils.config import cfg
+from utils.timer import Timer, calculate_eta
+from metrics import ConfusionMatrix
+from reader import SegDataset
+from model_builder import build_model
+from model_builder import ModelPhase
+from model_builder import parse_shape_from_file
+from eval import evaluate
+from vis import visualize
+from utils import dist_utils
+
+import solver
+from paddleslim.dist.single_distiller import merge, l2_loss
+
+def parse_args():
+ parser = argparse.ArgumentParser(description='PaddleSeg training')
+ parser.add_argument(
+ '--cfg',
+ dest='cfg_file',
+ help='Config file for training (and optionally testing)',
+ default=None,
+ type=str)
+ parser.add_argument(
+ '--teacher_cfg',
+ dest='teacher_cfg_file',
+ help='Config file for training (and optionally testing)',
+ default=None,
+ type=str)
+ parser.add_argument(
+ '--use_gpu',
+ dest='use_gpu',
+ help='Use gpu or cpu',
+ action='store_true',
+ default=False)
+ parser.add_argument(
+ '--use_mpio',
+ dest='use_mpio',
+ help='Use multiprocess I/O or not',
+ action='store_true',
+ default=False)
+ parser.add_argument(
+ '--log_steps',
+ dest='log_steps',
+ help='Display logging information at every log_steps',
+ default=10,
+ type=int)
+ parser.add_argument(
+ '--debug',
+ dest='debug',
+ help='debug mode, display detail information of training',
+ action='store_true')
+ parser.add_argument(
+ '--use_tb',
+ dest='use_tb',
+ help='whether to record the data during training to Tensorboard',
+ action='store_true')
+ parser.add_argument(
+ '--tb_log_dir',
+ dest='tb_log_dir',
+ help='Tensorboard logging directory',
+ default=None,
+ type=str)
+ parser.add_argument(
+ '--do_eval',
+ dest='do_eval',
+ help='Evaluation models result on every new checkpoint',
+ action='store_true')
+ parser.add_argument(
+ 'opts',
+ help='See utils/config.py for all options',
+ default=None,
+ nargs=argparse.REMAINDER)
+ parser.add_argument(
+ '--enable_ce',
+ dest='enable_ce',
+ help='If set True, enable continuous evaluation job.'
+ 'This flag is only used for internal test.',
+ action='store_true')
+ return parser.parse_args()
+
+
+def save_vars(executor, dirname, program=None, vars=None):
+ """
+ Temporary resolution for Win save variables compatability.
+ Will fix in PaddlePaddle v1.5.2
+ """
+
+ save_program = fluid.Program()
+ save_block = save_program.global_block()
+
+ for each_var in vars:
+ # NOTE: don't save the variable which type is RAW
+ if each_var.type == fluid.core.VarDesc.VarType.RAW:
+ continue
+ new_var = save_block.create_var(
+ name=each_var.name,
+ shape=each_var.shape,
+ dtype=each_var.dtype,
+ type=each_var.type,
+ lod_level=each_var.lod_level,
+ persistable=True)
+ file_path = os.path.join(dirname, new_var.name)
+ file_path = os.path.normpath(file_path)
+ save_block.append_op(
+ type='save',
+ inputs={'X': [new_var]},
+ outputs={},
+ attrs={'file_path': file_path})
+
+ executor.run(save_program)
+
+
+def save_checkpoint(exe, program, ckpt_name):
+ """
+ Save checkpoint for evaluation or resume training
+ """
+ ckpt_dir = os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, str(ckpt_name))
+ print("Save model checkpoint to {}".format(ckpt_dir))
+ if not os.path.isdir(ckpt_dir):
+ os.makedirs(ckpt_dir)
+
+ save_vars(
+ exe,
+ ckpt_dir,
+ program,
+ vars=list(filter(fluid.io.is_persistable, program.list_vars())))
+
+ return ckpt_dir
+
+
+def load_checkpoint(exe, program):
+ """
+ Load checkpoiont from pretrained model directory for resume training
+ """
+
+ print('Resume model training from:', cfg.TRAIN.RESUME_MODEL_DIR)
+ if not os.path.exists(cfg.TRAIN.RESUME_MODEL_DIR):
+ raise ValueError("TRAIN.PRETRAIN_MODEL {} not exist!".format(
+ cfg.TRAIN.RESUME_MODEL_DIR))
+
+ fluid.io.load_persistables(
+ exe, cfg.TRAIN.RESUME_MODEL_DIR, main_program=program)
+
+ model_path = cfg.TRAIN.RESUME_MODEL_DIR
+ # Check is path ended by path spearator
+ if model_path[-1] == os.sep:
+ model_path = model_path[0:-1]
+ epoch_name = os.path.basename(model_path)
+ # If resume model is final model
+ if epoch_name == 'final':
+ begin_epoch = cfg.SOLVER.NUM_EPOCHS
+ # If resume model path is end of digit, restore epoch status
+ elif epoch_name.isdigit():
+ epoch = int(epoch_name)
+ begin_epoch = epoch + 1
+ else:
+ raise ValueError("Resume model path is not valid!")
+ print("Model checkpoint loaded successfully!")
+
+ return begin_epoch
+
+
+def update_best_model(ckpt_dir):
+ best_model_dir = os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, 'best_model')
+ if os.path.exists(best_model_dir):
+ shutil.rmtree(best_model_dir)
+ shutil.copytree(ckpt_dir, best_model_dir)
+
+
+def print_info(*msg):
+ if cfg.TRAINER_ID == 0:
+ print(*msg)
+
+
+def train(cfg):
+ # startup_prog = fluid.Program()
+ # train_prog = fluid.Program()
+
+ drop_last = True
+
+ dataset = SegDataset(
+ file_list=cfg.DATASET.TRAIN_FILE_LIST,
+ mode=ModelPhase.TRAIN,
+ shuffle=True,
+ data_dir=cfg.DATASET.DATA_DIR)
+
+ def data_generator():
+ if args.use_mpio:
+ data_gen = dataset.multiprocess_generator(
+ num_processes=cfg.DATALOADER.NUM_WORKERS,
+ max_queue_size=cfg.DATALOADER.BUF_SIZE)
+ else:
+ data_gen = dataset.generator()
+
+ batch_data = []
+ for b in data_gen:
+ batch_data.append(b)
+ if len(batch_data) == (cfg.BATCH_SIZE // cfg.NUM_TRAINERS):
+ for item in batch_data:
+ yield item[0], item[1], item[2]
+ batch_data = []
+ # If use sync batch norm strategy, drop last batch if number of samples
+ # in batch_data is less then cfg.BATCH_SIZE to avoid NCCL hang issues
+ if not cfg.TRAIN.SYNC_BATCH_NORM:
+ for item in batch_data:
+ yield item[0], item[1], item[2]
+
+ # Get device environment
+ # places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
+ # place = places[0]
+ gpu_id = int(os.environ.get('FLAGS_selected_gpus', 0))
+ place = fluid.CUDAPlace(gpu_id) if args.use_gpu else fluid.CPUPlace()
+ places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
+
+ # Get number of GPU
+ dev_count = cfg.NUM_TRAINERS if cfg.NUM_TRAINERS > 1 else len(places)
+ print_info("#Device count: {}".format(dev_count))
+
+ # Make sure BATCH_SIZE can divided by GPU cards
+ assert cfg.BATCH_SIZE % dev_count == 0, (
+ 'BATCH_SIZE:{} not divisble by number of GPUs:{}'.format(
+ cfg.BATCH_SIZE, dev_count))
+ # If use multi-gpu training mode, batch data will allocated to each GPU evenly
+ batch_size_per_dev = cfg.BATCH_SIZE // dev_count
+ print_info("batch_size_per_dev: {}".format(batch_size_per_dev))
+
+ py_reader, loss, lr, pred, grts, masks, image = build_model(phase=ModelPhase.TRAIN)
+ py_reader.decorate_sample_generator(
+ data_generator, batch_size=batch_size_per_dev, drop_last=drop_last)
+
+ exe = fluid.Executor(place)
+
+ cfg.update_from_file(args.teacher_cfg_file)
+ # teacher_arch = teacher_cfg.architecture
+ teacher_program = fluid.Program()
+ teacher_startup_program = fluid.Program()
+
+ with fluid.program_guard(teacher_program, teacher_startup_program):
+ with fluid.unique_name.guard():
+ _, teacher_loss, _, _, _, _, _ = build_model(
+ teacher_program, teacher_startup_program, phase=ModelPhase.TRAIN, image=image,
+ label=grts, mask=masks)
+
+ exe.run(teacher_startup_program)
+
+ teacher_program = teacher_program.clone(for_test=True)
+ ckpt_dir = cfg.SLIM.KNOWLEDGE_DISTILL_TEACHER_MODEL_DIR
+ assert ckpt_dir is not None
+ print('load teacher model:', ckpt_dir)
+ fluid.io.load_params(exe, ckpt_dir, main_program=teacher_program)
+
+ # cfg = load_config(FLAGS.config)
+ cfg.update_from_file(args.cfg_file)
+ data_name_map = {
+ 'image': 'image',
+ 'label': 'label',
+ 'mask': 'mask',
+ }
+ merge(teacher_program, fluid.default_main_program(), data_name_map, place)
+ distill_pairs = [['teacher_bilinear_interp_2.tmp_0', 'bilinear_interp_0.tmp_0']]
+
+ def distill(pairs, weight):
+ """
+ Add 3 pairs of distillation losses, each pair of feature maps is the
+ input of teacher and student's yolov3_loss respectively
+ """
+ loss = l2_loss(pairs[0][0], pairs[0][1])
+ weighted_loss = loss * weight
+ return weighted_loss
+
+ distill_loss = distill(distill_pairs, 0.1)
+ cfg.update_from_file(args.cfg_file)
+ optimizer = solver.Solver(None, None)
+ all_loss = loss + distill_loss
+ lr = optimizer.optimise(all_loss)
+
+ exe.run(fluid.default_startup_program())
+
+ exec_strategy = fluid.ExecutionStrategy()
+ # Clear temporary variables every 100 iteration
+ if args.use_gpu:
+ exec_strategy.num_threads = fluid.core.get_cuda_device_count()
+ exec_strategy.num_iteration_per_drop_scope = 100
+ build_strategy = fluid.BuildStrategy()
+ build_strategy.fuse_all_reduce_ops = False
+ build_strategy.fuse_all_optimizer_ops = False
+ build_strategy.fuse_elewise_add_act_ops = True
+ if cfg.NUM_TRAINERS > 1 and args.use_gpu:
+ dist_utils.prepare_for_multi_process(exe, build_strategy, fluid.default_main_program())
+ exec_strategy.num_threads = 1
+
+ if cfg.TRAIN.SYNC_BATCH_NORM and args.use_gpu:
+ if dev_count > 1:
+ # Apply sync batch norm strategy
+ print_info("Sync BatchNorm strategy is effective.")
+ build_strategy.sync_batch_norm = True
+ else:
+ print_info(
+ "Sync BatchNorm strategy will not be effective if GPU device"
+ " count <= 1")
+ compiled_train_prog = fluid.CompiledProgram(fluid.default_main_program()).with_data_parallel(
+ loss_name=all_loss.name,
+ exec_strategy=exec_strategy,
+ build_strategy=build_strategy)
+
+ # Resume training
+ begin_epoch = cfg.SOLVER.BEGIN_EPOCH
+ if cfg.TRAIN.RESUME_MODEL_DIR:
+ begin_epoch = load_checkpoint(exe, fluid.default_main_program())
+ # Load pretrained model
+ elif os.path.exists(cfg.TRAIN.PRETRAINED_MODEL_DIR):
+ print_info('Pretrained model dir: ', cfg.TRAIN.PRETRAINED_MODEL_DIR)
+ load_vars = []
+ load_fail_vars = []
+
+ def var_shape_matched(var, shape):
+ """
+ Check whehter persitable variable shape is match with current network
+ """
+ var_exist = os.path.exists(
+ os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name))
+ if var_exist:
+ var_shape = parse_shape_from_file(
+ os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name))
+ return var_shape == shape
+ return False
+
+ for x in fluid.default_main_program().list_vars():
+ if isinstance(x, fluid.framework.Parameter):
+ shape = tuple(fluid.global_scope().find_var(
+ x.name).get_tensor().shape())
+ if var_shape_matched(x, shape):
+ load_vars.append(x)
+ else:
+ load_fail_vars.append(x)
+
+ fluid.io.load_vars(
+ exe, dirname=cfg.TRAIN.PRETRAINED_MODEL_DIR, vars=load_vars)
+ for var in load_vars:
+ print_info("Parameter[{}] loaded sucessfully!".format(var.name))
+ for var in load_fail_vars:
+ print_info(
+ "Parameter[{}] don't exist or shape does not match current network, skip"
+ " to load it.".format(var.name))
+ print_info("{}/{} pretrained parameters loaded successfully!".format(
+ len(load_vars),
+ len(load_vars) + len(load_fail_vars)))
+ else:
+ print_info(
+ 'Pretrained model dir {} not exists, training from scratch...'.
+ format(cfg.TRAIN.PRETRAINED_MODEL_DIR))
+
+ #fetch_list = [avg_loss.name, lr.name]
+ fetch_list = [loss.name, 'teacher_' + teacher_loss.name, distill_loss.name, lr.name]
+
+ if args.debug:
+ # Fetch more variable info and use streaming confusion matrix to
+ # calculate IoU results if in debug mode
+ np.set_printoptions(
+ precision=4, suppress=True, linewidth=160, floatmode="fixed")
+ fetch_list.extend([pred.name, grts.name, masks.name])
+ cm = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
+
+ if args.use_tb:
+ if not args.tb_log_dir:
+ print_info("Please specify the log directory by --tb_log_dir.")
+ exit(1)
+
+ from tb_paddle import SummaryWriter
+ log_writer = SummaryWriter(args.tb_log_dir)
+
+ # trainer_id = int(os.getenv("PADDLE_TRAINER_ID", 0))
+ # num_trainers = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
+ global_step = 0
+ all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE
+ if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True:
+ all_step += 1
+ all_step *= (cfg.SOLVER.NUM_EPOCHS - begin_epoch + 1)
+
+ avg_loss = 0.0
+ avg_t_loss = 0.0
+ avg_d_loss = 0.0
+ best_mIoU = 0.0
+
+ timer = Timer()
+ timer.start()
+ if begin_epoch > cfg.SOLVER.NUM_EPOCHS:
+ raise ValueError(
+ ("begin epoch[{}] is larger than cfg.SOLVER.NUM_EPOCHS[{}]").format(
+ begin_epoch, cfg.SOLVER.NUM_EPOCHS))
+
+ if args.use_mpio:
+ print_info("Use multiprocess reader")
+ else:
+ print_info("Use multi-thread reader")
+
+ for epoch in range(begin_epoch, cfg.SOLVER.NUM_EPOCHS + 1):
+ py_reader.start()
+ while True:
+ try:
+ if args.debug:
+ # Print category IoU and accuracy to check whether the
+ # traning process is corresponed to expectation
+ loss, lr, pred, grts, masks = exe.run(
+ program=compiled_train_prog,
+ fetch_list=fetch_list,
+ return_numpy=True)
+ cm.calculate(pred, grts, masks)
+ avg_loss += np.mean(np.array(loss))
+ global_step += 1
+
+ if global_step % args.log_steps == 0:
+ speed = args.log_steps / timer.elapsed_time()
+ avg_loss /= args.log_steps
+ category_acc, mean_acc = cm.accuracy()
+ category_iou, mean_iou = cm.mean_iou()
+
+ print_info((
+ "epoch={} step={} lr={:.5f} loss={:.4f} acc={:.5f} mIoU={:.5f} step/sec={:.3f} | ETA {}"
+ ).format(epoch, global_step, lr[0], avg_loss, mean_acc,
+ mean_iou, speed,
+ calculate_eta(all_step - global_step, speed)))
+ print_info("Category IoU: ", category_iou)
+ print_info("Category Acc: ", category_acc)
+ if args.use_tb:
+ log_writer.add_scalar('Train/mean_iou', mean_iou,
+ global_step)
+ log_writer.add_scalar('Train/mean_acc', mean_acc,
+ global_step)
+ log_writer.add_scalar('Train/loss', avg_loss,
+ global_step)
+ log_writer.add_scalar('Train/lr', lr[0],
+ global_step)
+ log_writer.add_scalar('Train/step/sec', speed,
+ global_step)
+ sys.stdout.flush()
+ avg_loss = 0.0
+ cm.zero_matrix()
+ timer.restart()
+ else:
+ # If not in debug mode, avoid unnessary log and calculate
+ loss, t_loss, d_loss, lr = exe.run(
+ program=compiled_train_prog,
+ fetch_list=fetch_list,
+ return_numpy=True)
+ avg_loss += np.mean(np.array(loss))
+ avg_t_loss += np.mean(np.array(t_loss))
+ avg_d_loss += np.mean(np.array(d_loss))
+ global_step += 1
+
+ if global_step % args.log_steps == 0 and cfg.TRAINER_ID == 0:
+ avg_loss /= args.log_steps
+ avg_t_loss /= args.log_steps
+ avg_d_loss /= args.log_steps
+ speed = args.log_steps / timer.elapsed_time()
+ print((
+ "epoch={} step={} lr={:.5f} loss={:.4f} teacher loss={:.4f} distill loss={:.4f} step/sec={:.3f} | ETA {}"
+ ).format(epoch, global_step, lr[0], avg_loss, avg_t_loss, avg_d_loss, speed,
+ calculate_eta(all_step - global_step, speed)))
+ if args.use_tb:
+ log_writer.add_scalar('Train/loss', avg_loss,
+ global_step)
+ log_writer.add_scalar('Train/lr', lr[0],
+ global_step)
+ log_writer.add_scalar('Train/speed', speed,
+ global_step)
+ sys.stdout.flush()
+ avg_loss = 0.0
+ avg_t_loss = 0.0
+ avg_d_loss = 0.0
+ timer.restart()
+
+ except fluid.core.EOFException:
+ py_reader.reset()
+ break
+ except Exception as e:
+ print(e)
+
+ if (epoch % cfg.TRAIN.SNAPSHOT_EPOCH == 0
+ or epoch == cfg.SOLVER.NUM_EPOCHS) and cfg.TRAINER_ID == 0:
+ ckpt_dir = save_checkpoint(exe, fluid.default_main_program(), epoch)
+
+ if args.do_eval:
+ print("Evaluation start")
+ _, mean_iou, _, mean_acc = evaluate(
+ cfg=cfg,
+ ckpt_dir=ckpt_dir,
+ use_gpu=args.use_gpu,
+ use_mpio=args.use_mpio)
+ if args.use_tb:
+ log_writer.add_scalar('Evaluate/mean_iou', mean_iou,
+ global_step)
+ log_writer.add_scalar('Evaluate/mean_acc', mean_acc,
+ global_step)
+
+ if mean_iou > best_mIoU:
+ best_mIoU = mean_iou
+ update_best_model(ckpt_dir)
+ print_info("Save best model {} to {}, mIoU = {:.4f}".format(
+ ckpt_dir,
+ os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, 'best_model'),
+ mean_iou))
+
+ # Use Tensorboard to visualize results
+ if args.use_tb and cfg.DATASET.VIS_FILE_LIST is not None:
+ visualize(
+ cfg=cfg,
+ use_gpu=args.use_gpu,
+ vis_file_list=cfg.DATASET.VIS_FILE_LIST,
+ vis_dir="visual",
+ ckpt_dir=ckpt_dir,
+ log_writer=log_writer)
+ if cfg.TRAINER_ID == 0:
+ ckpt_dir = save_checkpoint(exe, fluid.default_main_program(), epoch)
+
+ # save final model
+ if cfg.TRAINER_ID == 0:
+ save_checkpoint(exe, fluid.default_main_program(), 'final')
+
+
+def main(args):
+ if args.cfg_file is not None:
+ cfg.update_from_file(args.cfg_file)
+ if args.opts:
+ cfg.update_from_list(args.opts)
+ if args.enable_ce:
+ random.seed(0)
+ np.random.seed(0)
+
+ cfg.TRAINER_ID = int(os.getenv("PADDLE_TRAINER_ID", 0))
+ cfg.NUM_TRAINERS = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
+
+ cfg.check_and_infer()
+ print_info(pprint.pformat(cfg))
+ train(cfg)
+
+
+if __name__ == '__main__':
+ args = parse_args()
+ if fluid.core.is_compiled_with_cuda() != True and args.use_gpu == True:
+ print(
+ "You can not set use_gpu = True in the model because you are using paddlepaddle-cpu."
+ )
+ print(
+ "Please: 1. Install paddlepaddle-gpu to run your models on GPU or 2. Set use_gpu=False to run models on CPU."
+ )
+ sys.exit(1)
+ main(args)
diff --git a/slim/nas/README.md b/slim/nas/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..cddfc5a82f07ab0b3f2e2acad6a4c0f7b2ed650c
--- /dev/null
+++ b/slim/nas/README.md
@@ -0,0 +1,63 @@
+>运行该示例前请安装Paddle1.6或更高版本
+
+# PaddleSeg神经网络搜索(NAS)示例
+
+在阅读本教程前,请确保您已经了解过[PaddleSeg使用说明](../../docs/usage.md)等章节,以便对PaddleSeg有一定的了解
+
+该文档介绍如何使用[PaddleSlim](https://paddlepaddle.github.io/PaddleSlim)对分割库中的模型进行搜索。
+
+该教程中所示操作,如无特殊说明,均在`PaddleSeg/`路径下执行。
+
+## 概述
+
+我们选取Deeplab+mobilenetv2模型作为神经网络搜索示例,该示例使用[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
+辅助完成神经网络搜索实验,具体技术细节,请您参考[神经网络搜索策略](https://github.com/PaddlePaddle/PaddleSlim/blob/4670a79343c191b61a78e416826d122eea52a7ab/docs/zh_cn/tutorials/image_classification_nas_quick_start.ipynb)。
+
+
+## 定义搜索空间
+搜索实验中,我们采用了SANAS的方式进行搜索,本次实验会对网络模型中的通道数和卷积核尺寸进行搜索。
+所以我们定义了如下搜索空间:
+- head通道模块`head_num`:定义了MobilenetV2 head模块中通道数变化区间;
+- inverse_res_block1-6`filter_num1-6`: 定义了inverse_res_block模块中通道数变化区间;
+- inverse_res_block`repeat`:定义了MobilenetV2 inverse_res_block模块中unit的个数;
+- inverse_res_block`multiply`:定义了MobilenetV2 inverse_res_block模块中expansion_factor变化区间;
+- 卷积核尺寸`k_size`:定义了MobilenetV2中卷积和尺寸大小是3x3或者5x5。
+
+根据定义的搜索空间各个区间,我们的搜索空间tokens共9位,变化区间在([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [7, 5, 8, 6, 2, 5, 8, 6, 2, 5, 8, 6, 2, 5, 10, 6, 2, 5, 10, 6, 2, 5, 12, 6, 2])范围内。
+
+
+初始化tokens为:[4, 4, 5, 1, 0, 4, 4, 1, 0, 4, 4, 3, 0, 4, 5, 2, 0, 4, 7, 2, 0, 4, 9, 0, 0]。
+
+## 开始搜索
+首先需要安装PaddleSlim,请参考[安装教程](https://paddlepaddle.github.io/PaddleSlim/#_2)。
+
+配置paddleseg的config, 下面只展示nas相关的内容
+
+```shell
+SLIM:
+ NAS_PORT: 23333 # 端口
+ NAS_ADDRESS: "" # ip地址,作为server不用填写,作为client的时候需要填写server的ip
+ NAS_SEARCH_STEPS: 100 # 搜索多少个结构
+ NAS_START_EVAL_EPOCH: -1 # 第几个epoch开始对模型进行评估
+ NAS_IS_SERVER: True # 是否为server
+ NAS_SPACE_NAME: "MobileNetV2SpaceSeg" # 搜索空间
+```
+
+## 训练与评估
+执行以下命令,边训练边评估
+```shell
+CUDA_VISIBLE_DEVICES=0 python -u ./slim/nas/train_nas.py --log_steps 10 --cfg configs/deeplabv3p_mobilenetv2_cityscapes.yaml --use_gpu --use_mpio \
+SLIM.NAS_PORT 23333 \
+SLIM.NAS_ADDRESS "" \
+SLIM.NAS_SEARCH_STEPS 2 \
+SLIM.NAS_START_EVAL_EPOCH -1 \
+SLIM.NAS_IS_SERVER True \
+SLIM.NAS_SPACE_NAME "MobileNetV2SpaceSeg" \
+```
+
+
+## FAQ
+- 运行报错:`socket.error: [Errno 98] Address already in use`。
+
+解决方法:当前端口被占用,请修改`SLIM.NAS_PORT`端口。
+
diff --git a/slim/nas/deeplab.py b/slim/nas/deeplab.py
new file mode 100644
index 0000000000000000000000000000000000000000..6cbf840927b107a36273e9890f1ba4d076ddb417
--- /dev/null
+++ b/slim/nas/deeplab.py
@@ -0,0 +1,225 @@
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import contextlib
+import paddle
+import paddle.fluid as fluid
+from utils.config import cfg
+from models.libs.model_libs import scope, name_scope
+from models.libs.model_libs import bn, bn_relu, relu
+from models.libs.model_libs import conv
+from models.libs.model_libs import separate_conv
+from models.backbone.mobilenet_v2 import MobileNetV2 as mobilenet_backbone
+from models.backbone.xception import Xception as xception_backbone
+
+def encoder(input):
+ # 编码器配置,采用ASPP架构,pooling + 1x1_conv + 三个不同尺度的空洞卷积并行, concat后1x1conv
+ # ASPP_WITH_SEP_CONV:默认为真,使用depthwise可分离卷积,否则使用普通卷积
+ # OUTPUT_STRIDE: 下采样倍数,8或16,决定aspp_ratios大小
+ # aspp_ratios:ASPP模块空洞卷积的采样率
+
+ if cfg.MODEL.DEEPLAB.OUTPUT_STRIDE == 16:
+ aspp_ratios = [6, 12, 18]
+ elif cfg.MODEL.DEEPLAB.OUTPUT_STRIDE == 8:
+ aspp_ratios = [12, 24, 36]
+ else:
+ raise Exception("deeplab only support stride 8 or 16")
+
+ param_attr = fluid.ParamAttr(
+ name=name_scope + 'weights',
+ regularizer=None,
+ initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
+ with scope('encoder'):
+ channel = 256
+ with scope("image_pool"):
+ image_avg = fluid.layers.reduce_mean(
+ input, [2, 3], keep_dim=True)
+ image_avg = bn_relu(
+ conv(
+ image_avg,
+ channel,
+ 1,
+ 1,
+ groups=1,
+ padding=0,
+ param_attr=param_attr))
+ image_avg = fluid.layers.resize_bilinear(image_avg, input.shape[2:])
+
+ with scope("aspp0"):
+ aspp0 = bn_relu(
+ conv(
+ input,
+ channel,
+ 1,
+ 1,
+ groups=1,
+ padding=0,
+ param_attr=param_attr))
+ with scope("aspp1"):
+ if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
+ aspp1 = separate_conv(
+ input, channel, 1, 3, dilation=aspp_ratios[0], act=relu)
+ else:
+ aspp1 = bn_relu(
+ conv(
+ input,
+ channel,
+ stride=1,
+ filter_size=3,
+ dilation=aspp_ratios[0],
+ padding=aspp_ratios[0],
+ param_attr=param_attr))
+ with scope("aspp2"):
+ if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
+ aspp2 = separate_conv(
+ input, channel, 1, 3, dilation=aspp_ratios[1], act=relu)
+ else:
+ aspp2 = bn_relu(
+ conv(
+ input,
+ channel,
+ stride=1,
+ filter_size=3,
+ dilation=aspp_ratios[1],
+ padding=aspp_ratios[1],
+ param_attr=param_attr))
+ with scope("aspp3"):
+ if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
+ aspp3 = separate_conv(
+ input, channel, 1, 3, dilation=aspp_ratios[2], act=relu)
+ else:
+ aspp3 = bn_relu(
+ conv(
+ input,
+ channel,
+ stride=1,
+ filter_size=3,
+ dilation=aspp_ratios[2],
+ padding=aspp_ratios[2],
+ param_attr=param_attr))
+ with scope("concat"):
+ data = fluid.layers.concat([image_avg, aspp0, aspp1, aspp2, aspp3],
+ axis=1)
+ data = bn_relu(
+ conv(
+ data,
+ channel,
+ 1,
+ 1,
+ groups=1,
+ padding=0,
+ param_attr=param_attr))
+ data = fluid.layers.dropout(data, 0.9)
+ return data
+
+
+def decoder(encode_data, decode_shortcut):
+ # 解码器配置
+ # encode_data:编码器输出
+ # decode_shortcut: 从backbone引出的分支, resize后与encode_data concat
+ # DECODER_USE_SEP_CONV: 默认为真,则concat后连接两个可分离卷积,否则为普通卷积
+ param_attr = fluid.ParamAttr(
+ name=name_scope + 'weights',
+ regularizer=None,
+ initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
+ with scope('decoder'):
+ with scope('concat'):
+ decode_shortcut = bn_relu(
+ conv(
+ decode_shortcut,
+ 48,
+ 1,
+ 1,
+ groups=1,
+ padding=0,
+ param_attr=param_attr))
+
+ encode_data = fluid.layers.resize_bilinear(
+ encode_data, decode_shortcut.shape[2:])
+ encode_data = fluid.layers.concat([encode_data, decode_shortcut],
+ axis=1)
+ if cfg.MODEL.DEEPLAB.DECODER_USE_SEP_CONV:
+ with scope("separable_conv1"):
+ encode_data = separate_conv(
+ encode_data, 256, 1, 3, dilation=1, act=relu)
+ with scope("separable_conv2"):
+ encode_data = separate_conv(
+ encode_data, 256, 1, 3, dilation=1, act=relu)
+ else:
+ with scope("decoder_conv1"):
+ encode_data = bn_relu(
+ conv(
+ encode_data,
+ 256,
+ stride=1,
+ filter_size=3,
+ dilation=1,
+ padding=1,
+ param_attr=param_attr))
+ with scope("decoder_conv2"):
+ encode_data = bn_relu(
+ conv(
+ encode_data,
+ 256,
+ stride=1,
+ filter_size=3,
+ dilation=1,
+ padding=1,
+ param_attr=param_attr))
+ return encode_data
+
+
+def nas_backbone(input, arch):
+ # scale = cfg.MODEL.DEEPLAB.DEPTH_MULTIPLIER
+ # output_stride = cfg.MODEL.DEEPLAB.OUTPUT_STRIDE
+ # model = mobilenet_backbone(scale=scale, output_stride=output_stride)
+ end_points = 8
+ decode_point = 3
+ data, decode_shortcuts = arch(
+ input, end_points=end_points, return_block=decode_point, output_stride=16)
+ decode_shortcut = decode_shortcuts[decode_point]
+ return data, decode_shortcut
+
+
+def deeplabv3p_nas(img, num_classes, arch=None):
+ data, decode_shortcut = nas_backbone(img, arch)
+ # 编码器解码器设置
+ cfg.MODEL.DEFAULT_EPSILON = 1e-5
+ if cfg.MODEL.DEEPLAB.ENCODER_WITH_ASPP:
+ data = encoder(data)
+ if cfg.MODEL.DEEPLAB.ENABLE_DECODER:
+ data = decoder(data, decode_shortcut)
+
+ # 根据类别数设置最后一个卷积层输出,并resize到图片原始尺寸
+ param_attr = fluid.ParamAttr(
+ name=name_scope + 'weights',
+ regularizer=fluid.regularizer.L2DecayRegularizer(
+ regularization_coeff=0.0),
+ initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
+ with scope('logit'):
+ logit = conv(
+ data,
+ num_classes,
+ 1,
+ stride=1,
+ padding=0,
+ bias_attr=True,
+ param_attr=param_attr)
+ logit = fluid.layers.resize_bilinear(logit, img.shape[2:])
+
+ return logit
diff --git a/slim/nas/eval_nas.py b/slim/nas/eval_nas.py
new file mode 100644
index 0000000000000000000000000000000000000000..08f75f5d8ee8d6afbcf9b038e4f8dcf0237a5b56
--- /dev/null
+++ b/slim/nas/eval_nas.py
@@ -0,0 +1,185 @@
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+# GPU memory garbage collection optimization flags
+os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
+
+import sys
+
+LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
+SEG_PATH = os.path.join(LOCAL_PATH, "../../", "pdseg")
+sys.path.append(SEG_PATH)
+
+import time
+import argparse
+import functools
+import pprint
+import cv2
+import numpy as np
+import paddle
+import paddle.fluid as fluid
+
+from utils.config import cfg
+from utils.timer import Timer, calculate_eta
+from model_builder import build_model
+from model_builder import ModelPhase
+from reader import SegDataset
+from metrics import ConfusionMatrix
+
+from mobilenetv2_search_space import MobileNetV2SpaceSeg
+
+def parse_args():
+ parser = argparse.ArgumentParser(description='PaddleSeg model evalution')
+ parser.add_argument(
+ '--cfg',
+ dest='cfg_file',
+ help='Config file for training (and optionally testing)',
+ default=None,
+ type=str)
+ parser.add_argument(
+ '--use_gpu',
+ dest='use_gpu',
+ help='Use gpu or cpu',
+ action='store_true',
+ default=False)
+ parser.add_argument(
+ '--use_mpio',
+ dest='use_mpio',
+ help='Use multiprocess IO or not',
+ action='store_true',
+ default=False)
+ parser.add_argument(
+ 'opts',
+ help='See utils/config.py for all options',
+ default=None,
+ nargs=argparse.REMAINDER)
+ if len(sys.argv) == 1:
+ parser.print_help()
+ sys.exit(1)
+ return parser.parse_args()
+
+
+def evaluate(cfg, ckpt_dir=None, use_gpu=False, use_mpio=False, **kwargs):
+ np.set_printoptions(precision=5, suppress=True)
+
+ startup_prog = fluid.Program()
+ test_prog = fluid.Program()
+ dataset = SegDataset(
+ file_list=cfg.DATASET.VAL_FILE_LIST,
+ mode=ModelPhase.EVAL,
+ data_dir=cfg.DATASET.DATA_DIR)
+
+ def data_generator():
+ #TODO: check is batch reader compatitable with Windows
+ if use_mpio:
+ data_gen = dataset.multiprocess_generator(
+ num_processes=cfg.DATALOADER.NUM_WORKERS,
+ max_queue_size=cfg.DATALOADER.BUF_SIZE)
+ else:
+ data_gen = dataset.generator()
+
+ for b in data_gen:
+ yield b[0], b[1], b[2]
+
+ py_reader, avg_loss, pred, grts, masks = build_model(
+ test_prog, startup_prog, phase=ModelPhase.EVAL, arch=kwargs['arch'])
+
+ py_reader.decorate_sample_generator(
+ data_generator, drop_last=False, batch_size=cfg.BATCH_SIZE)
+
+ # Get device environment
+ places = fluid.cuda_places() if use_gpu else fluid.cpu_places()
+ place = places[0]
+ dev_count = len(places)
+ print("#Device count: {}".format(dev_count))
+
+ exe = fluid.Executor(place)
+ exe.run(startup_prog)
+
+ test_prog = test_prog.clone(for_test=True)
+
+ ckpt_dir = cfg.TEST.TEST_MODEL if not ckpt_dir else ckpt_dir
+
+ if not os.path.exists(ckpt_dir):
+ raise ValueError('The TEST.TEST_MODEL {} is not found'.format(ckpt_dir))
+
+ if ckpt_dir is not None:
+ print('load test model:', ckpt_dir)
+ fluid.io.load_params(exe, ckpt_dir, main_program=test_prog)
+
+ # Use streaming confusion matrix to calculate mean_iou
+ np.set_printoptions(
+ precision=4, suppress=True, linewidth=160, floatmode="fixed")
+ conf_mat = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
+ fetch_list = [avg_loss.name, pred.name, grts.name, masks.name]
+ num_images = 0
+ step = 0
+ all_step = cfg.DATASET.TEST_TOTAL_IMAGES // cfg.BATCH_SIZE + 1
+ timer = Timer()
+ timer.start()
+ py_reader.start()
+ while True:
+ try:
+ step += 1
+ loss, pred, grts, masks = exe.run(
+ test_prog, fetch_list=fetch_list, return_numpy=True)
+
+ loss = np.mean(np.array(loss))
+
+ num_images += pred.shape[0]
+ conf_mat.calculate(pred, grts, masks)
+ _, iou = conf_mat.mean_iou()
+ _, acc = conf_mat.accuracy()
+
+ speed = 1.0 / timer.elapsed_time()
+
+ print(
+ "[EVAL]step={} loss={:.5f} acc={:.4f} IoU={:.4f} step/sec={:.2f} | ETA {}"
+ .format(step, loss, acc, iou, speed,
+ calculate_eta(all_step - step, speed)))
+ timer.restart()
+ sys.stdout.flush()
+ except fluid.core.EOFException:
+ break
+
+ category_iou, avg_iou = conf_mat.mean_iou()
+ category_acc, avg_acc = conf_mat.accuracy()
+ print("[EVAL]#image={} acc={:.4f} IoU={:.4f}".format(
+ num_images, avg_acc, avg_iou))
+ print("[EVAL]Category IoU:", category_iou)
+ print("[EVAL]Category Acc:", category_acc)
+ print("[EVAL]Kappa:{:.4f}".format(conf_mat.kappa()))
+
+ return category_iou, avg_iou, category_acc, avg_acc
+
+
+def main():
+ args = parse_args()
+ if args.cfg_file is not None:
+ cfg.update_from_file(args.cfg_file)
+ if args.opts:
+ cfg.update_from_list(args.opts)
+ cfg.check_and_infer()
+ print(pprint.pformat(cfg))
+ evaluate(cfg, **args.__dict__)
+
+
+if __name__ == '__main__':
+ main()
diff --git a/slim/nas/mobilenetv2_search_space.py b/slim/nas/mobilenetv2_search_space.py
new file mode 100644
index 0000000000000000000000000000000000000000..2703e161f02e9659040b827fff8d345db5bf5946
--- /dev/null
+++ b/slim/nas/mobilenetv2_search_space.py
@@ -0,0 +1,323 @@
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddleslim.nas.search_space.search_space_base import SearchSpaceBase
+from paddleslim.nas.search_space.base_layer import conv_bn_layer
+from paddleslim.nas.search_space.search_space_registry import SEARCHSPACE
+from paddleslim.nas.search_space.utils import check_points
+
+__all__ = ["MobileNetV2SpaceSeg"]
+
+
+@SEARCHSPACE.register
+class MobileNetV2SpaceSeg(SearchSpaceBase):
+ def __init__(self, input_size, output_size, block_num, block_mask=None):
+ super(MobileNetV2SpaceSeg, self).__init__(input_size, output_size,
+ block_num, block_mask)
+ # self.head_num means the first convolution channel
+ self.head_num = np.array([3, 4, 8, 12, 16, 24, 32]) #7
+ # self.filter_num1 ~ self.filter_num6 means following convlution channel
+ self.filter_num1 = np.array([3, 4, 8, 12, 16, 24, 32, 48]) #8
+ self.filter_num2 = np.array([8, 12, 16, 24, 32, 48, 64, 80]) #8
+ self.filter_num3 = np.array([16, 24, 32, 48, 64, 80, 96, 128]) #8
+ self.filter_num4 = np.array(
+ [24, 32, 48, 64, 80, 96, 128, 144, 160, 192]) #10
+ self.filter_num5 = np.array(
+ [32, 48, 64, 80, 96, 128, 144, 160, 192, 224]) #10
+ self.filter_num6 = np.array(
+ [64, 80, 96, 128, 144, 160, 192, 224, 256, 320, 384, 512]) #12
+ # self.k_size means kernel size
+ self.k_size = np.array([3, 5]) #2
+ # self.multiply means expansion_factor of each _inverted_residual_unit
+ self.multiply = np.array([1, 2, 3, 4, 6]) #5
+ # self.repeat means repeat_num _inverted_residual_unit in each _invresi_blocks
+ self.repeat = np.array([1, 2, 3, 4, 5, 6]) #6
+
+ def init_tokens(self):
+ """
+ The initial token.
+ The first one is the index of the first layers' channel in self.head_num,
+ each line in the following represent the index of the [expansion_factor, filter_num, repeat_num, kernel_size]
+ """
+ # original MobileNetV2
+ # yapf: disable
+ init_token_base = [4, # 1, 16, 1
+ 4, 5, 1, 0, # 6, 24, 2
+ 4, 4, 2, 0, # 6, 32, 3
+ 4, 4, 3, 0, # 6, 64, 4
+ 4, 5, 2, 0, # 6, 96, 3
+ 4, 7, 2, 0, # 6, 160, 3
+ 4, 9, 0, 0] # 6, 320, 1
+ # yapf: enable
+
+ return init_token_base
+
+ def range_table(self):
+ """
+ Get range table of current search space, constrains the range of tokens.
+ """
+ # head_num + 6 * [multiple(expansion_factor), filter_num, repeat, kernel_size]
+ # yapf: disable
+ range_table_base = [len(self.head_num),
+ len(self.multiply), len(self.filter_num1), len(self.repeat), len(self.k_size),
+ len(self.multiply), len(self.filter_num2), len(self.repeat), len(self.k_size),
+ len(self.multiply), len(self.filter_num3), len(self.repeat), len(self.k_size),
+ len(self.multiply), len(self.filter_num4), len(self.repeat), len(self.k_size),
+ len(self.multiply), len(self.filter_num5), len(self.repeat), len(self.k_size),
+ len(self.multiply), len(self.filter_num6), len(self.repeat), len(self.k_size)]
+ # yapf: enable
+ return range_table_base
+
+ def token2arch(self, tokens=None):
+ """
+ return net_arch function
+ """
+
+ if tokens is None:
+ tokens = self.init_tokens()
+
+ self.bottleneck_params_list = []
+ self.bottleneck_params_list.append(
+ (1, self.head_num[tokens[0]], 1, 1, 3))
+ self.bottleneck_params_list.append(
+ (self.multiply[tokens[1]], self.filter_num1[tokens[2]],
+ self.repeat[tokens[3]], 2, self.k_size[tokens[4]]))
+ self.bottleneck_params_list.append(
+ (self.multiply[tokens[5]], self.filter_num2[tokens[6]],
+ self.repeat[tokens[7]], 2, self.k_size[tokens[8]]))
+ self.bottleneck_params_list.append(
+ (self.multiply[tokens[9]], self.filter_num3[tokens[10]],
+ self.repeat[tokens[11]], 2, self.k_size[tokens[12]]))
+ self.bottleneck_params_list.append(
+ (self.multiply[tokens[13]], self.filter_num4[tokens[14]],
+ self.repeat[tokens[15]], 1, self.k_size[tokens[16]]))
+ self.bottleneck_params_list.append(
+ (self.multiply[tokens[17]], self.filter_num5[tokens[18]],
+ self.repeat[tokens[19]], 2, self.k_size[tokens[20]]))
+ self.bottleneck_params_list.append(
+ (self.multiply[tokens[21]], self.filter_num6[tokens[22]],
+ self.repeat[tokens[23]], 1, self.k_size[tokens[24]]))
+
+ def _modify_bottle_params(output_stride=None):
+ if output_stride is not None and output_stride % 2 != 0:
+ raise Exception("output stride must to be even number")
+ if output_stride is None:
+ return
+ else:
+ stride = 2
+ for i, layer_setting in enumerate(self.bottleneck_params_list):
+ t, c, n, s, ks = layer_setting
+ stride = stride * s
+ if stride > output_stride:
+ s = 1
+ self.bottleneck_params_list[i] = (t, c, n, s, ks)
+
+ def net_arch(input,
+ scale=1.0,
+ return_block=None,
+ end_points=None,
+ output_stride=None):
+ self.scale = scale
+ _modify_bottle_params(output_stride)
+
+ decode_ends = dict()
+
+ def check_points(count, points):
+ if points is None:
+ return False
+ else:
+ if isinstance(points, list):
+ return (True if count in points else False)
+ else:
+ return (True if count == points else False)
+
+ #conv1
+ # all padding is 'SAME' in the conv2d, can compute the actual padding automatic.
+ input = conv_bn_layer(
+ input,
+ num_filters=int(32 * self.scale),
+ filter_size=3,
+ stride=2,
+ padding='SAME',
+ act='relu6',
+ name='mobilenetv2_conv1')
+ layer_count = 1
+
+ depthwise_output = None
+ # bottleneck sequences
+ in_c = int(32 * self.scale)
+ for i, layer_setting in enumerate(self.bottleneck_params_list):
+ t, c, n, s, k = layer_setting
+ layer_count += 1
+ ### return_block and end_points means block num
+ if check_points((layer_count - 1), return_block):
+ decode_ends[layer_count - 1] = depthwise_output
+
+ if check_points((layer_count - 1), end_points):
+ return input, decode_ends
+ input, depthwise_output = self._invresi_blocks(
+ input=input,
+ in_c=in_c,
+ t=t,
+ c=int(c * self.scale),
+ n=n,
+ s=s,
+ k=int(k),
+ name='mobilenetv2_conv' + str(i))
+ in_c = int(c * self.scale)
+
+ ### return_block and end_points means block num
+ if check_points(layer_count, return_block):
+ decode_ends[layer_count] = depthwise_output
+
+ if check_points(layer_count, end_points):
+ return input, decode_ends
+ # last conv
+ input = conv_bn_layer(
+ input=input,
+ num_filters=int(1280 * self.scale)
+ if self.scale > 1.0 else 1280,
+ filter_size=1,
+ stride=1,
+ padding='SAME',
+ act='relu6',
+ name='mobilenetv2_conv' + str(i + 1))
+
+ input = fluid.layers.pool2d(
+ input=input,
+ pool_type='avg',
+ global_pooling=True,
+ name='mobilenetv2_last_pool')
+
+ return input
+
+ return net_arch
+
+ def _shortcut(self, input, data_residual):
+ """Build shortcut layer.
+ Args:
+ input(Variable): input.
+ data_residual(Variable): residual layer.
+ Returns:
+ Variable, layer output.
+ """
+ return fluid.layers.elementwise_add(input, data_residual)
+
+ def _inverted_residual_unit(self,
+ input,
+ num_in_filter,
+ num_filters,
+ ifshortcut,
+ stride,
+ filter_size,
+ expansion_factor,
+ reduction_ratio=4,
+ name=None):
+ """Build inverted residual unit.
+ Args:
+ input(Variable), input.
+ num_in_filter(int), number of in filters.
+ num_filters(int), number of filters.
+ ifshortcut(bool), whether using shortcut.
+ stride(int), stride.
+ filter_size(int), filter size.
+ padding(str|int|list), padding.
+ expansion_factor(float), expansion factor.
+ name(str), name.
+ Returns:
+ Variable, layers output.
+ """
+ num_expfilter = int(round(num_in_filter * expansion_factor))
+ channel_expand = conv_bn_layer(
+ input=input,
+ num_filters=num_expfilter,
+ filter_size=1,
+ stride=1,
+ padding='SAME',
+ num_groups=1,
+ act='relu6',
+ name=name + '_expand')
+
+ bottleneck_conv = conv_bn_layer(
+ input=channel_expand,
+ num_filters=num_expfilter,
+ filter_size=filter_size,
+ stride=stride,
+ padding='SAME',
+ num_groups=num_expfilter,
+ act='relu6',
+ name=name + '_dwise',
+ use_cudnn=False)
+
+ depthwise_output = bottleneck_conv
+
+ linear_out = conv_bn_layer(
+ input=bottleneck_conv,
+ num_filters=num_filters,
+ filter_size=1,
+ stride=1,
+ padding='SAME',
+ num_groups=1,
+ act=None,
+ name=name + '_linear')
+ out = linear_out
+ if ifshortcut:
+ out = self._shortcut(input=input, data_residual=out)
+ return out, depthwise_output
+
+ def _invresi_blocks(self, input, in_c, t, c, n, s, k, name=None):
+ """Build inverted residual blocks.
+ Args:
+ input: Variable, input.
+ in_c: int, number of in filters.
+ t: float, expansion factor.
+ c: int, number of filters.
+ n: int, number of layers.
+ s: int, stride.
+ k: int, filter size.
+ name: str, name.
+ Returns:
+ Variable, layers output.
+ """
+ first_block, depthwise_output = self._inverted_residual_unit(
+ input=input,
+ num_in_filter=in_c,
+ num_filters=c,
+ ifshortcut=False,
+ stride=s,
+ filter_size=k,
+ expansion_factor=t,
+ name=name + '_1')
+
+ last_residual_block = first_block
+ last_c = c
+
+ for i in range(1, n):
+ last_residual_block, depthwise_output = self._inverted_residual_unit(
+ input=last_residual_block,
+ num_in_filter=last_c,
+ num_filters=c,
+ ifshortcut=True,
+ stride=1,
+ filter_size=k,
+ expansion_factor=t,
+ name=name + '_' + str(i + 1))
+ return last_residual_block, depthwise_output
diff --git a/slim/nas/model_builder.py b/slim/nas/model_builder.py
new file mode 100644
index 0000000000000000000000000000000000000000..3dfbacb0cd41a14bb81c6f6c82b81479fb1c30c8
--- /dev/null
+++ b/slim/nas/model_builder.py
@@ -0,0 +1,316 @@
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import struct
+
+import paddle.fluid as fluid
+import numpy as np
+from paddle.fluid.proto.framework_pb2 import VarType
+
+import solver
+from utils.config import cfg
+from loss import multi_softmax_with_loss
+from loss import multi_dice_loss
+from loss import multi_bce_loss
+import deeplab
+
+
+class ModelPhase(object):
+ """
+ Standard name for model phase in PaddleSeg
+
+ The following standard keys are defined:
+ * `TRAIN`: training mode.
+ * `EVAL`: testing/evaluation mode.
+ * `PREDICT`: prediction/inference mode.
+ * `VISUAL` : visualization mode
+ """
+
+ TRAIN = 'train'
+ EVAL = 'eval'
+ PREDICT = 'predict'
+ VISUAL = 'visual'
+
+ @staticmethod
+ def is_train(phase):
+ return phase == ModelPhase.TRAIN
+
+ @staticmethod
+ def is_predict(phase):
+ return phase == ModelPhase.PREDICT
+
+ @staticmethod
+ def is_eval(phase):
+ return phase == ModelPhase.EVAL
+
+ @staticmethod
+ def is_visual(phase):
+ return phase == ModelPhase.VISUAL
+
+ @staticmethod
+ def is_valid_phase(phase):
+ """ Check valid phase """
+ if ModelPhase.is_train(phase) or ModelPhase.is_predict(phase) \
+ or ModelPhase.is_eval(phase) or ModelPhase.is_visual(phase):
+ return True
+
+ return False
+
+
+def seg_model(image, class_num, arch):
+ model_name = cfg.MODEL.MODEL_NAME
+ if model_name == 'deeplabv3p':
+ logits = deeplab.deeplabv3p_nas(image, class_num, arch)
+ else:
+ raise Exception(
+ "unknow model name, only support deeplabv3p"
+ )
+ return logits
+
+
+def softmax(logit):
+ logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+ logit = fluid.layers.softmax(logit)
+ logit = fluid.layers.transpose(logit, [0, 3, 1, 2])
+ return logit
+
+
+def sigmoid_to_softmax(logit):
+ """
+ one channel to two channel
+ """
+ logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+ logit = fluid.layers.sigmoid(logit)
+ logit_back = 1 - logit
+ logit = fluid.layers.concat([logit_back, logit], axis=-1)
+ logit = fluid.layers.transpose(logit, [0, 3, 1, 2])
+ return logit
+
+
+def export_preprocess(image):
+ """导出模型的预处理流程"""
+
+ image = fluid.layers.transpose(image, [0, 3, 1, 2])
+ origin_shape = fluid.layers.shape(image)[-2:]
+
+ # 不同AUG_METHOD方法的resize
+ if cfg.AUG.AUG_METHOD == 'unpadding':
+ h_fix = cfg.AUG.FIX_RESIZE_SIZE[1]
+ w_fix = cfg.AUG.FIX_RESIZE_SIZE[0]
+ image = fluid.layers.resize_bilinear(
+ image, out_shape=[h_fix, w_fix], align_corners=False, align_mode=0)
+ elif cfg.AUG.AUG_METHOD == 'rangescaling':
+ size = cfg.AUG.INF_RESIZE_VALUE
+ value = fluid.layers.reduce_max(origin_shape)
+ scale = float(size) / value.astype('float32')
+ image = fluid.layers.resize_bilinear(
+ image, scale=scale, align_corners=False, align_mode=0)
+
+ # 存储resize后图像shape
+ valid_shape = fluid.layers.shape(image)[-2:]
+
+ # padding到eval_crop_size大小
+ width = cfg.EVAL_CROP_SIZE[0]
+ height = cfg.EVAL_CROP_SIZE[1]
+ pad_target = fluid.layers.assign(
+ np.array([height, width]).astype('float32'))
+ up = fluid.layers.assign(np.array([0]).astype('float32'))
+ down = pad_target[0] - valid_shape[0]
+ left = up
+ right = pad_target[1] - valid_shape[1]
+ paddings = fluid.layers.concat([up, down, left, right])
+ paddings = fluid.layers.cast(paddings, 'int32')
+ image = fluid.layers.pad2d(image, paddings=paddings, pad_value=127.5)
+
+ # normalize
+ mean = np.array(cfg.MEAN).reshape(1, len(cfg.MEAN), 1, 1)
+ mean = fluid.layers.assign(mean.astype('float32'))
+ std = np.array(cfg.STD).reshape(1, len(cfg.STD), 1, 1)
+ std = fluid.layers.assign(std.astype('float32'))
+ image = (image / 255 - mean) / std
+ # 使后面的网络能通过类似image.shape获取特征图的shape
+ image = fluid.layers.reshape(
+ image, shape=[-1, cfg.DATASET.DATA_DIM, height, width])
+ return image, valid_shape, origin_shape
+
+
+def build_model(main_prog, start_prog, phase=ModelPhase.TRAIN, arch=None):
+ if not ModelPhase.is_valid_phase(phase):
+ raise ValueError("ModelPhase {} is not valid!".format(phase))
+ if ModelPhase.is_train(phase):
+ width = cfg.TRAIN_CROP_SIZE[0]
+ height = cfg.TRAIN_CROP_SIZE[1]
+ else:
+ width = cfg.EVAL_CROP_SIZE[0]
+ height = cfg.EVAL_CROP_SIZE[1]
+
+ image_shape = [cfg.DATASET.DATA_DIM, height, width]
+ grt_shape = [1, height, width]
+ class_num = cfg.DATASET.NUM_CLASSES
+
+ with fluid.program_guard(main_prog, start_prog):
+ with fluid.unique_name.guard():
+ # 在导出模型的时候,增加图像标准化预处理,减小预测部署时图像的处理流程
+ # 预测部署时只须对输入图像增加batch_size维度即可
+ if ModelPhase.is_predict(phase):
+ origin_image = fluid.layers.data(
+ name='image',
+ shape=[-1, -1, -1, cfg.DATASET.DATA_DIM],
+ dtype='float32',
+ append_batch_size=False)
+ image, valid_shape, origin_shape = export_preprocess(
+ origin_image)
+
+ else:
+ image = fluid.layers.data(
+ name='image', shape=image_shape, dtype='float32')
+ label = fluid.layers.data(
+ name='label', shape=grt_shape, dtype='int32')
+ mask = fluid.layers.data(
+ name='mask', shape=grt_shape, dtype='int32')
+
+ # use PyReader when doing traning and evaluation
+ if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
+ py_reader = fluid.io.PyReader(
+ feed_list=[image, label, mask],
+ capacity=cfg.DATALOADER.BUF_SIZE,
+ iterable=False,
+ use_double_buffer=True)
+
+ loss_type = cfg.SOLVER.LOSS
+ if not isinstance(loss_type, list):
+ loss_type = list(loss_type)
+
+ # dice_loss或bce_loss只适用两类分割中
+ if class_num > 2 and (("dice_loss" in loss_type) or
+ ("bce_loss" in loss_type)):
+ raise Exception(
+ "dice loss and bce loss is only applicable to binary classfication"
+ )
+
+ # 在两类分割情况下,当loss函数选择dice_loss或bce_loss的时候,最后logit输出通道数设置为1
+ if ("dice_loss" in loss_type) or ("bce_loss" in loss_type):
+ class_num = 1
+ if "softmax_loss" in loss_type:
+ raise Exception(
+ "softmax loss can not combine with dice loss or bce loss"
+ )
+ logits = seg_model(image, class_num, arch)
+
+ # 根据选择的loss函数计算相应的损失函数
+ if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
+ loss_valid = False
+ avg_loss_list = []
+ valid_loss = []
+ if "softmax_loss" in loss_type:
+ weight = cfg.SOLVER.CROSS_ENTROPY_WEIGHT
+ avg_loss_list.append(
+ multi_softmax_with_loss(logits, label, mask, class_num, weight))
+ loss_valid = True
+ valid_loss.append("softmax_loss")
+ if "dice_loss" in loss_type:
+ avg_loss_list.append(multi_dice_loss(logits, label, mask))
+ loss_valid = True
+ valid_loss.append("dice_loss")
+ if "bce_loss" in loss_type:
+ avg_loss_list.append(multi_bce_loss(logits, label, mask))
+ loss_valid = True
+ valid_loss.append("bce_loss")
+ if not loss_valid:
+ raise Exception(
+ "SOLVER.LOSS: {} is set wrong. it should "
+ "include one of (softmax_loss, bce_loss, dice_loss) at least"
+ " example: ['softmax_loss'], ['dice_loss'], ['bce_loss', 'dice_loss']"
+ .format(cfg.SOLVER.LOSS))
+
+ invalid_loss = [x for x in loss_type if x not in valid_loss]
+ if len(invalid_loss) > 0:
+ print(
+ "Warning: the loss {} you set is invalid. it will not be included in loss computed."
+ .format(invalid_loss))
+
+ avg_loss = 0
+ for i in range(0, len(avg_loss_list)):
+ avg_loss += avg_loss_list[i]
+
+ #get pred result in original size
+ if isinstance(logits, tuple):
+ logit = logits[0]
+ else:
+ logit = logits
+
+ if logit.shape[2:] != label.shape[2:]:
+ logit = fluid.layers.resize_bilinear(logit, label.shape[2:])
+
+ # return image input and logit output for inference graph prune
+ if ModelPhase.is_predict(phase):
+ # 两类分割中,使用dice_loss或bce_loss返回的logit为单通道,进行到两通道的变换
+ if class_num == 1:
+ logit = sigmoid_to_softmax(logit)
+ else:
+ logit = softmax(logit)
+
+ # 获取有效部分
+ logit = fluid.layers.slice(
+ logit, axes=[2, 3], starts=[0, 0], ends=valid_shape)
+
+ logit = fluid.layers.resize_bilinear(
+ logit,
+ out_shape=origin_shape,
+ align_corners=False,
+ align_mode=0)
+ logit = fluid.layers.argmax(logit, axis=1)
+ return origin_image, logit
+
+ if class_num == 1:
+ out = sigmoid_to_softmax(logit)
+ out = fluid.layers.transpose(out, [0, 2, 3, 1])
+ else:
+ out = fluid.layers.transpose(logit, [0, 2, 3, 1])
+
+ pred = fluid.layers.argmax(out, axis=3)
+ pred = fluid.layers.unsqueeze(pred, axes=[3])
+ if ModelPhase.is_visual(phase):
+ if class_num == 1:
+ logit = sigmoid_to_softmax(logit)
+ else:
+ logit = softmax(logit)
+ return pred, logit
+
+ if ModelPhase.is_eval(phase):
+ return py_reader, avg_loss, pred, label, mask
+
+ if ModelPhase.is_train(phase):
+ optimizer = solver.Solver(main_prog, start_prog)
+ decayed_lr = optimizer.optimise(avg_loss)
+ return py_reader, avg_loss, decayed_lr, pred, label, mask
+
+
+def to_int(string, dest="I"):
+ return struct.unpack(dest, string)[0]
+
+
+def parse_shape_from_file(filename):
+ with open(filename, "rb") as file:
+ version = file.read(4)
+ lod_level = to_int(file.read(8), dest="Q")
+ for i in range(lod_level):
+ _size = to_int(file.read(8), dest="Q")
+ _ = file.read(_size)
+ version = file.read(4)
+ tensor_desc_size = to_int(file.read(4))
+ tensor_desc = VarType.TensorDesc()
+ tensor_desc.ParseFromString(file.read(tensor_desc_size))
+ return tuple(tensor_desc.dims)
diff --git a/slim/nas/train_nas.py b/slim/nas/train_nas.py
new file mode 100644
index 0000000000000000000000000000000000000000..7822657fa264d053360199d5691098ae85fcd12c
--- /dev/null
+++ b/slim/nas/train_nas.py
@@ -0,0 +1,456 @@
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+# GPU memory garbage collection optimization flags
+os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
+
+import sys
+
+LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
+SEG_PATH = os.path.join(LOCAL_PATH, "../../", "pdseg")
+sys.path.append(SEG_PATH)
+
+import argparse
+import pprint
+import random
+import shutil
+import functools
+
+import paddle
+import numpy as np
+import paddle.fluid as fluid
+
+from utils.config import cfg
+from utils.timer import Timer, calculate_eta
+from metrics import ConfusionMatrix
+from reader import SegDataset
+from model_builder import build_model
+from model_builder import ModelPhase
+from model_builder import parse_shape_from_file
+from eval_nas import evaluate
+from vis import visualize
+from utils import dist_utils
+
+from mobilenetv2_search_space import MobileNetV2SpaceSeg
+from paddleslim.nas.search_space.search_space_factory import SearchSpaceFactory
+from paddleslim.analysis import flops
+from paddleslim.nas.sa_nas import SANAS
+from paddleslim.nas import search_space
+
+def parse_args():
+ parser = argparse.ArgumentParser(description='PaddleSeg training')
+ parser.add_argument(
+ '--cfg',
+ dest='cfg_file',
+ help='Config file for training (and optionally testing)',
+ default=None,
+ type=str)
+ parser.add_argument(
+ '--use_gpu',
+ dest='use_gpu',
+ help='Use gpu or cpu',
+ action='store_true',
+ default=False)
+ parser.add_argument(
+ '--use_mpio',
+ dest='use_mpio',
+ help='Use multiprocess I/O or not',
+ action='store_true',
+ default=False)
+ parser.add_argument(
+ '--log_steps',
+ dest='log_steps',
+ help='Display logging information at every log_steps',
+ default=10,
+ type=int)
+ parser.add_argument(
+ '--debug',
+ dest='debug',
+ help='debug mode, display detail information of training',
+ action='store_true')
+ parser.add_argument(
+ '--use_tb',
+ dest='use_tb',
+ help='whether to record the data during training to Tensorboard',
+ action='store_true')
+ parser.add_argument(
+ '--tb_log_dir',
+ dest='tb_log_dir',
+ help='Tensorboard logging directory',
+ default=None,
+ type=str)
+ parser.add_argument(
+ '--do_eval',
+ dest='do_eval',
+ help='Evaluation models result on every new checkpoint',
+ action='store_true')
+ parser.add_argument(
+ 'opts',
+ help='See utils/config.py for all options',
+ default=None,
+ nargs=argparse.REMAINDER)
+ parser.add_argument(
+ '--enable_ce',
+ dest='enable_ce',
+ help='If set True, enable continuous evaluation job.'
+ 'This flag is only used for internal test.',
+ action='store_true')
+ return parser.parse_args()
+
+
+def save_vars(executor, dirname, program=None, vars=None):
+ """
+ Temporary resolution for Win save variables compatability.
+ Will fix in PaddlePaddle v1.5.2
+ """
+
+ save_program = fluid.Program()
+ save_block = save_program.global_block()
+
+ for each_var in vars:
+ # NOTE: don't save the variable which type is RAW
+ if each_var.type == fluid.core.VarDesc.VarType.RAW:
+ continue
+ new_var = save_block.create_var(
+ name=each_var.name,
+ shape=each_var.shape,
+ dtype=each_var.dtype,
+ type=each_var.type,
+ lod_level=each_var.lod_level,
+ persistable=True)
+ file_path = os.path.join(dirname, new_var.name)
+ file_path = os.path.normpath(file_path)
+ save_block.append_op(
+ type='save',
+ inputs={'X': [new_var]},
+ outputs={},
+ attrs={'file_path': file_path})
+
+ executor.run(save_program)
+
+
+def save_checkpoint(exe, program, ckpt_name):
+ """
+ Save checkpoint for evaluation or resume training
+ """
+ ckpt_dir = os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, str(ckpt_name))
+ print("Save model checkpoint to {}".format(ckpt_dir))
+ if not os.path.isdir(ckpt_dir):
+ os.makedirs(ckpt_dir)
+
+ save_vars(
+ exe,
+ ckpt_dir,
+ program,
+ vars=list(filter(fluid.io.is_persistable, program.list_vars())))
+
+ return ckpt_dir
+
+
+def load_checkpoint(exe, program):
+ """
+ Load checkpoiont from pretrained model directory for resume training
+ """
+
+ print('Resume model training from:', cfg.TRAIN.RESUME_MODEL_DIR)
+ if not os.path.exists(cfg.TRAIN.RESUME_MODEL_DIR):
+ raise ValueError("TRAIN.PRETRAIN_MODEL {} not exist!".format(
+ cfg.TRAIN.RESUME_MODEL_DIR))
+
+ fluid.io.load_persistables(
+ exe, cfg.TRAIN.RESUME_MODEL_DIR, main_program=program)
+
+ model_path = cfg.TRAIN.RESUME_MODEL_DIR
+ # Check is path ended by path spearator
+ if model_path[-1] == os.sep:
+ model_path = model_path[0:-1]
+ epoch_name = os.path.basename(model_path)
+ # If resume model is final model
+ if epoch_name == 'final':
+ begin_epoch = cfg.SOLVER.NUM_EPOCHS
+ # If resume model path is end of digit, restore epoch status
+ elif epoch_name.isdigit():
+ epoch = int(epoch_name)
+ begin_epoch = epoch + 1
+ else:
+ raise ValueError("Resume model path is not valid!")
+ print("Model checkpoint loaded successfully!")
+
+ return begin_epoch
+
+
+def update_best_model(ckpt_dir):
+ best_model_dir = os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, 'best_model')
+ if os.path.exists(best_model_dir):
+ shutil.rmtree(best_model_dir)
+ shutil.copytree(ckpt_dir, best_model_dir)
+
+
+def print_info(*msg):
+ if cfg.TRAINER_ID == 0:
+ print(*msg)
+
+
+def train(cfg):
+ startup_prog = fluid.Program()
+ train_prog = fluid.Program()
+ if args.enable_ce:
+ startup_prog.random_seed = 1000
+ train_prog.random_seed = 1000
+ drop_last = True
+
+ dataset = SegDataset(
+ file_list=cfg.DATASET.TRAIN_FILE_LIST,
+ mode=ModelPhase.TRAIN,
+ shuffle=True,
+ data_dir=cfg.DATASET.DATA_DIR)
+
+ def data_generator():
+ if args.use_mpio:
+ data_gen = dataset.multiprocess_generator(
+ num_processes=cfg.DATALOADER.NUM_WORKERS,
+ max_queue_size=cfg.DATALOADER.BUF_SIZE)
+ else:
+ data_gen = dataset.generator()
+
+ batch_data = []
+ for b in data_gen:
+ batch_data.append(b)
+ if len(batch_data) == (cfg.BATCH_SIZE // cfg.NUM_TRAINERS):
+ for item in batch_data:
+ yield item[0], item[1], item[2]
+ batch_data = []
+ # If use sync batch norm strategy, drop last batch if number of samples
+ # in batch_data is less then cfg.BATCH_SIZE to avoid NCCL hang issues
+ if not cfg.TRAIN.SYNC_BATCH_NORM:
+ for item in batch_data:
+ yield item[0], item[1], item[2]
+
+ # Get device environment
+ # places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
+ # place = places[0]
+ gpu_id = int(os.environ.get('FLAGS_selected_gpus', 0))
+ place = fluid.CUDAPlace(gpu_id) if args.use_gpu else fluid.CPUPlace()
+ places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
+
+ # Get number of GPU
+ dev_count = cfg.NUM_TRAINERS if cfg.NUM_TRAINERS > 1 else len(places)
+ print_info("#Device count: {}".format(dev_count))
+
+ # Make sure BATCH_SIZE can divided by GPU cards
+ assert cfg.BATCH_SIZE % dev_count == 0, (
+ 'BATCH_SIZE:{} not divisble by number of GPUs:{}'.format(
+ cfg.BATCH_SIZE, dev_count))
+ # If use multi-gpu training mode, batch data will allocated to each GPU evenly
+ batch_size_per_dev = cfg.BATCH_SIZE // dev_count
+ print_info("batch_size_per_dev: {}".format(batch_size_per_dev))
+
+ config_info = {'input_size': 769, 'output_size': 1, 'block_num': 7}
+ config = ([(cfg.SLIM.NAS_SPACE_NAME, config_info)])
+ factory = SearchSpaceFactory()
+ space = factory.get_search_space(config)
+
+ port = cfg.SLIM.NAS_PORT
+ server_address = (cfg.SLIM.NAS_ADDRESS, port)
+ sa_nas = SANAS(config, server_addr=server_address, search_steps=cfg.SLIM.NAS_SEARCH_STEPS,
+ is_server=cfg.SLIM.NAS_IS_SERVER)
+ for step in range(cfg.SLIM.NAS_SEARCH_STEPS):
+ arch = sa_nas.next_archs()[0]
+
+ start_prog = fluid.Program()
+ train_prog = fluid.Program()
+
+ py_reader, avg_loss, lr, pred, grts, masks = build_model(
+ train_prog, start_prog, arch=arch, phase=ModelPhase.TRAIN)
+
+ cur_flops = flops(train_prog)
+ print('current step:', step, 'flops:', cur_flops)
+
+ py_reader.decorate_sample_generator(
+ data_generator, batch_size=batch_size_per_dev, drop_last=drop_last)
+
+ exe = fluid.Executor(place)
+ exe.run(start_prog)
+
+ exec_strategy = fluid.ExecutionStrategy()
+ # Clear temporary variables every 100 iteration
+ if args.use_gpu:
+ exec_strategy.num_threads = fluid.core.get_cuda_device_count()
+ exec_strategy.num_iteration_per_drop_scope = 100
+ build_strategy = fluid.BuildStrategy()
+
+ if cfg.NUM_TRAINERS > 1 and args.use_gpu:
+ dist_utils.prepare_for_multi_process(exe, build_strategy, train_prog)
+ exec_strategy.num_threads = 1
+
+ if cfg.TRAIN.SYNC_BATCH_NORM and args.use_gpu:
+ if dev_count > 1:
+ # Apply sync batch norm strategy
+ print_info("Sync BatchNorm strategy is effective.")
+ build_strategy.sync_batch_norm = True
+ else:
+ print_info(
+ "Sync BatchNorm strategy will not be effective if GPU device"
+ " count <= 1")
+ compiled_train_prog = fluid.CompiledProgram(train_prog).with_data_parallel(
+ loss_name=avg_loss.name,
+ exec_strategy=exec_strategy,
+ build_strategy=build_strategy)
+
+ # Resume training
+ begin_epoch = cfg.SOLVER.BEGIN_EPOCH
+ if cfg.TRAIN.RESUME_MODEL_DIR:
+ begin_epoch = load_checkpoint(exe, train_prog)
+ # Load pretrained model
+ elif os.path.exists(cfg.TRAIN.PRETRAINED_MODEL_DIR):
+ print_info('Pretrained model dir: ', cfg.TRAIN.PRETRAINED_MODEL_DIR)
+ load_vars = []
+ load_fail_vars = []
+
+ def var_shape_matched(var, shape):
+ """
+ Check whehter persitable variable shape is match with current network
+ """
+ var_exist = os.path.exists(
+ os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name))
+ if var_exist:
+ var_shape = parse_shape_from_file(
+ os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name))
+ return var_shape == shape
+ return False
+
+ for x in train_prog.list_vars():
+ if isinstance(x, fluid.framework.Parameter):
+ shape = tuple(fluid.global_scope().find_var(
+ x.name).get_tensor().shape())
+ if var_shape_matched(x, shape):
+ load_vars.append(x)
+ else:
+ load_fail_vars.append(x)
+
+ fluid.io.load_vars(
+ exe, dirname=cfg.TRAIN.PRETRAINED_MODEL_DIR, vars=load_vars)
+ for var in load_vars:
+ print_info("Parameter[{}] loaded sucessfully!".format(var.name))
+ for var in load_fail_vars:
+ print_info(
+ "Parameter[{}] don't exist or shape does not match current network, skip"
+ " to load it.".format(var.name))
+ print_info("{}/{} pretrained parameters loaded successfully!".format(
+ len(load_vars),
+ len(load_vars) + len(load_fail_vars)))
+ else:
+ print_info(
+ 'Pretrained model dir {} not exists, training from scratch...'.
+ format(cfg.TRAIN.PRETRAINED_MODEL_DIR))
+
+ fetch_list = [avg_loss.name, lr.name]
+
+ global_step = 0
+ all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE
+ if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True:
+ all_step += 1
+ all_step *= (cfg.SOLVER.NUM_EPOCHS - begin_epoch + 1)
+
+ avg_loss = 0.0
+ timer = Timer()
+ timer.start()
+ if begin_epoch > cfg.SOLVER.NUM_EPOCHS:
+ raise ValueError(
+ ("begin epoch[{}] is larger than cfg.SOLVER.NUM_EPOCHS[{}]").format(
+ begin_epoch, cfg.SOLVER.NUM_EPOCHS))
+
+ if args.use_mpio:
+ print_info("Use multiprocess reader")
+ else:
+ print_info("Use multi-thread reader")
+
+ best_miou = 0.0
+ for epoch in range(begin_epoch, cfg.SOLVER.NUM_EPOCHS + 1):
+ py_reader.start()
+ while True:
+ try:
+ loss, lr = exe.run(
+ program=compiled_train_prog,
+ fetch_list=fetch_list,
+ return_numpy=True)
+ avg_loss += np.mean(np.array(loss))
+ global_step += 1
+
+ if global_step % args.log_steps == 0 and cfg.TRAINER_ID == 0:
+ avg_loss /= args.log_steps
+ speed = args.log_steps / timer.elapsed_time()
+ print((
+ "epoch={} step={} lr={:.5f} loss={:.4f} step/sec={:.3f} | ETA {}"
+ ).format(epoch, global_step, lr[0], avg_loss, speed,
+ calculate_eta(all_step - global_step, speed)))
+
+ sys.stdout.flush()
+ avg_loss = 0.0
+ timer.restart()
+
+ except fluid.core.EOFException:
+ py_reader.reset()
+ break
+ except Exception as e:
+ print(e)
+ if epoch > cfg.SLIM.NAS_START_EVAL_EPOCH:
+ ckpt_dir = save_checkpoint(exe, train_prog, '{}_tmp'.format(port))
+ _, mean_iou, _, mean_acc = evaluate(
+ cfg=cfg,
+ arch=arch,
+ ckpt_dir=ckpt_dir,
+ use_gpu=args.use_gpu,
+ use_mpio=args.use_mpio)
+ if best_miou < mean_iou:
+ print('search step {}, epoch {} best iou {}'.format(step, epoch, mean_iou))
+ best_miou = mean_iou
+
+ sa_nas.reward(float(best_miou))
+
+
+def main(args):
+ if args.cfg_file is not None:
+ cfg.update_from_file(args.cfg_file)
+ if args.opts:
+ cfg.update_from_list(args.opts)
+ if args.enable_ce:
+ random.seed(0)
+ np.random.seed(0)
+
+ cfg.TRAINER_ID = int(os.getenv("PADDLE_TRAINER_ID", 0))
+ cfg.NUM_TRAINERS = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
+
+ cfg.check_and_infer()
+ print_info(pprint.pformat(cfg))
+ train(cfg)
+
+
+if __name__ == '__main__':
+ args = parse_args()
+ if fluid.core.is_compiled_with_cuda() != True and args.use_gpu == True:
+ print(
+ "You can not set use_gpu = True in the model because you are using paddlepaddle-cpu."
+ )
+ print(
+ "Please: 1. Install paddlepaddle-gpu to run your models on GPU or 2. Set use_gpu=False to run models on CPU."
+ )
+ sys.exit(1)
+ main(args)
diff --git a/slim/prune/README.md b/slim/prune/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..b6a45238938567a845b44ff768db6982bfeab55c
--- /dev/null
+++ b/slim/prune/README.md
@@ -0,0 +1,58 @@
+# PaddleSeg剪裁教程
+
+在阅读本教程前,请确保您已经了解过[PaddleSeg使用说明](../../docs/usage.md)等章节,以便对PaddleSeg有一定的了解
+
+该文档介绍如何使用[PaddleSlim](https://paddlepaddle.github.io/PaddleSlim)的卷积通道剪裁接口对检测库中的模型的卷积层的通道数进行剪裁。
+
+在分割库中,可以直接调用`PaddleSeg/slim/prune/train_prune.py`脚本实现剪裁,在该脚本中调用了PaddleSlim的[paddleslim.prune.Pruner](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/#Pruner)接口。
+
+该教程中所示操作,如无特殊说明,均在`PaddleSeg/`路径下执行。
+
+## 1. 数据与预训练模型准备
+执行如下命令,下载cityscapes数据集
+```
+python dataset/download_cityscapes.py
+```
+参照[预训练模型列表](../../docs/model_zoo.md)获取所需预训练模型
+
+## 2. 确定待分析参数
+
+我们通过剪裁卷积层参数达到缩减卷积层通道数的目的,在剪裁之前,我们需要确定待裁卷积层的参数的名称。
+通过以下命令查看当前模型的所有参数:
+
+```python
+# 查看模型所有Paramters
+for x in train_prog.list_vars():
+ if isinstance(x, fluid.framework.Parameter):
+ print(x.name, x.shape)
+
+```
+
+通过观察参数名称和参数的形状,筛选出所有卷积层参数,并确定要裁剪的卷积层参数。
+
+## 3. 启动剪裁任务
+
+使用`train_prune.py`启动裁剪任务时,通过`SLIM.PRUNE_PARAMS`选项指定待裁剪的参数名称列表,参数名之间用逗号分隔,通过`SLIM.PRUNE_RATIOS`选项指定各个参数被裁掉的比例。
+
+```shell
+CUDA_VISIBLE_DEVICES=0
+python -u ./slim/prune/train_prune.py --log_steps 10 --cfg configs/cityscape_fast_scnn.yaml --use_gpu --use_mpio \
+SLIM.PRUNE_PARAMS 'learning_to_downsample/weights,learning_to_downsample/dsconv1/pointwise/weights,learning_to_downsample/dsconv2/pointwise/weights' \
+SLIM.PRUNE_RATIOS '[0.1,0.1,0.1]'
+```
+这里我们选取三个参数,按0.1的比例剪裁。
+
+## 4. 评估
+
+```shell
+CUDA_VISIBLE_DEVICES=0
+python -u ./slim/prune/eval_prune.py --cfg configs/cityscape_fast_scnn.yaml --use_gpu --use_mpio \
+TEST.TEST_MODEL your_trained_model \
+```
+
+## 5. 模型
+
+| 模型 | 数据集合 | 下载地址 |剪裁方法| flops | mIoU on val|
+|---|---|---|---|---|---|
+| Fast-SCNN/bn | Cityscapes |[fast_scnn_cityscapes.tar](https://paddleseg.bj.bcebos.com/models/fast_scnn_cityscape.tar) | 无 | 7.21g | 0.6964 |
+| Fast-SCNN/bn | Cityscapes |[fast_scnn_cityscapes-uniform-51.tar](https://paddleseg.bj.bcebos.com/models/fast_scnn_cityscape-uniform-51.tar) | uniform | 3.54g | 0.6990 |
diff --git a/slim/prune/eval_prune.py b/slim/prune/eval_prune.py
new file mode 100644
index 0000000000000000000000000000000000000000..b8275d03475b8fea67d73682b54a38172fbc25e2
--- /dev/null
+++ b/slim/prune/eval_prune.py
@@ -0,0 +1,185 @@
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+# GPU memory garbage collection optimization flags
+os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
+
+import sys
+
+LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
+SEG_PATH = os.path.join(LOCAL_PATH, "../../", "pdseg")
+sys.path.append(SEG_PATH)
+
+import time
+import argparse
+import functools
+import pprint
+import cv2
+import numpy as np
+import paddle
+import paddle.fluid as fluid
+
+from utils.config import cfg
+from utils.timer import Timer, calculate_eta
+from models.model_builder import build_model
+from models.model_builder import ModelPhase
+from reader import SegDataset
+from metrics import ConfusionMatrix
+
+from paddleslim.prune import load_model
+
+def parse_args():
+ parser = argparse.ArgumentParser(description='PaddleSeg model evalution')
+ parser.add_argument(
+ '--cfg',
+ dest='cfg_file',
+ help='Config file for training (and optionally testing)',
+ default=None,
+ type=str)
+ parser.add_argument(
+ '--use_gpu',
+ dest='use_gpu',
+ help='Use gpu or cpu',
+ action='store_true',
+ default=False)
+ parser.add_argument(
+ '--use_mpio',
+ dest='use_mpio',
+ help='Use multiprocess IO or not',
+ action='store_true',
+ default=False)
+ parser.add_argument(
+ 'opts',
+ help='See utils/config.py for all options',
+ default=None,
+ nargs=argparse.REMAINDER)
+ if len(sys.argv) == 1:
+ parser.print_help()
+ sys.exit(1)
+ return parser.parse_args()
+
+
+def evaluate(cfg, ckpt_dir=None, use_gpu=False, use_mpio=False, **kwargs):
+ np.set_printoptions(precision=5, suppress=True)
+
+ startup_prog = fluid.Program()
+ test_prog = fluid.Program()
+ dataset = SegDataset(
+ file_list=cfg.DATASET.VAL_FILE_LIST,
+ mode=ModelPhase.EVAL,
+ data_dir=cfg.DATASET.DATA_DIR)
+
+ def data_generator():
+ #TODO: check is batch reader compatitable with Windows
+ if use_mpio:
+ data_gen = dataset.multiprocess_generator(
+ num_processes=cfg.DATALOADER.NUM_WORKERS,
+ max_queue_size=cfg.DATALOADER.BUF_SIZE)
+ else:
+ data_gen = dataset.generator()
+
+ for b in data_gen:
+ yield b[0], b[1], b[2]
+
+ py_reader, avg_loss, pred, grts, masks = build_model(
+ test_prog, startup_prog, phase=ModelPhase.EVAL)
+
+ py_reader.decorate_sample_generator(
+ data_generator, drop_last=False, batch_size=cfg.BATCH_SIZE)
+
+ # Get device environment
+ places = fluid.cuda_places() if use_gpu else fluid.cpu_places()
+ place = places[0]
+ dev_count = len(places)
+ print("#Device count: {}".format(dev_count))
+
+ exe = fluid.Executor(place)
+ exe.run(startup_prog)
+
+ test_prog = test_prog.clone(for_test=True)
+
+ ckpt_dir = cfg.TEST.TEST_MODEL if not ckpt_dir else ckpt_dir
+
+ if not os.path.exists(ckpt_dir):
+ raise ValueError('The TEST.TEST_MODEL {} is not found'.format(ckpt_dir))
+
+ if ckpt_dir is not None:
+ print('load test model:', ckpt_dir)
+ load_model(exe, test_prog, ckpt_dir)
+
+ # Use streaming confusion matrix to calculate mean_iou
+ np.set_printoptions(
+ precision=4, suppress=True, linewidth=160, floatmode="fixed")
+ conf_mat = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
+ fetch_list = [avg_loss.name, pred.name, grts.name, masks.name]
+ num_images = 0
+ step = 0
+ all_step = cfg.DATASET.TEST_TOTAL_IMAGES // cfg.BATCH_SIZE + 1
+ timer = Timer()
+ timer.start()
+ py_reader.start()
+ while True:
+ try:
+ step += 1
+ loss, pred, grts, masks = exe.run(
+ test_prog, fetch_list=fetch_list, return_numpy=True)
+
+ loss = np.mean(np.array(loss))
+
+ num_images += pred.shape[0]
+ conf_mat.calculate(pred, grts, masks)
+ _, iou = conf_mat.mean_iou()
+ _, acc = conf_mat.accuracy()
+
+ speed = 1.0 / timer.elapsed_time()
+
+ print(
+ "[EVAL]step={} loss={:.5f} acc={:.4f} IoU={:.4f} step/sec={:.2f} | ETA {}"
+ .format(step, loss, acc, iou, speed,
+ calculate_eta(all_step - step, speed)))
+ timer.restart()
+ sys.stdout.flush()
+ except fluid.core.EOFException:
+ break
+
+ category_iou, avg_iou = conf_mat.mean_iou()
+ category_acc, avg_acc = conf_mat.accuracy()
+ print("[EVAL]#image={} acc={:.4f} IoU={:.4f}".format(
+ num_images, avg_acc, avg_iou))
+ print("[EVAL]Category IoU:", category_iou)
+ print("[EVAL]Category Acc:", category_acc)
+ print("[EVAL]Kappa:{:.4f}".format(conf_mat.kappa()))
+
+ return category_iou, avg_iou, category_acc, avg_acc
+
+
+def main():
+ args = parse_args()
+ if args.cfg_file is not None:
+ cfg.update_from_file(args.cfg_file)
+ if args.opts:
+ cfg.update_from_list(args.opts)
+ cfg.check_and_infer()
+ print(pprint.pformat(cfg))
+ evaluate(cfg, **args.__dict__)
+
+
+if __name__ == '__main__':
+ main()
diff --git a/slim/prune/train_prune.py b/slim/prune/train_prune.py
new file mode 100644
index 0000000000000000000000000000000000000000..06e1658f1a3f721842fbe780820103aceac87a16
--- /dev/null
+++ b/slim/prune/train_prune.py
@@ -0,0 +1,504 @@
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+# GPU memory garbage collection optimization flags
+os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
+
+import sys
+
+LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
+SEG_PATH = os.path.join(LOCAL_PATH, "../../", "pdseg")
+sys.path.append(SEG_PATH)
+
+import argparse
+import pprint
+import shutil
+import functools
+
+import paddle
+import numpy as np
+import paddle.fluid as fluid
+
+from utils.config import cfg
+from utils.timer import Timer, calculate_eta
+from metrics import ConfusionMatrix
+from reader import SegDataset
+from models.model_builder import build_model
+from models.model_builder import ModelPhase
+from models.model_builder import parse_shape_from_file
+from eval_prune import evaluate
+from vis import visualize
+from utils import dist_utils
+
+from paddleslim.prune import Pruner, save_model
+from paddleslim.analysis import flops
+
+def parse_args():
+ parser = argparse.ArgumentParser(description='PaddleSeg training')
+ parser.add_argument(
+ '--cfg',
+ dest='cfg_file',
+ help='Config file for training (and optionally testing)',
+ default=None,
+ type=str)
+ parser.add_argument(
+ '--use_gpu',
+ dest='use_gpu',
+ help='Use gpu or cpu',
+ action='store_true',
+ default=False)
+ parser.add_argument(
+ '--use_mpio',
+ dest='use_mpio',
+ help='Use multiprocess I/O or not',
+ action='store_true',
+ default=False)
+ parser.add_argument(
+ '--log_steps',
+ dest='log_steps',
+ help='Display logging information at every log_steps',
+ default=10,
+ type=int)
+ parser.add_argument(
+ '--debug',
+ dest='debug',
+ help='debug mode, display detail information of training',
+ action='store_true')
+ parser.add_argument(
+ '--use_tb',
+ dest='use_tb',
+ help='whether to record the data during training to Tensorboard',
+ action='store_true')
+ parser.add_argument(
+ '--tb_log_dir',
+ dest='tb_log_dir',
+ help='Tensorboard logging directory',
+ default=None,
+ type=str)
+ parser.add_argument(
+ '--do_eval',
+ dest='do_eval',
+ help='Evaluation models result on every new checkpoint',
+ action='store_true')
+ parser.add_argument(
+ 'opts',
+ help='See utils/config.py for all options',
+ default=None,
+ nargs=argparse.REMAINDER)
+ return parser.parse_args()
+
+
+def save_vars(executor, dirname, program=None, vars=None):
+ """
+ Temporary resolution for Win save variables compatability.
+ Will fix in PaddlePaddle v1.5.2
+ """
+
+ save_program = fluid.Program()
+ save_block = save_program.global_block()
+
+ for each_var in vars:
+ # NOTE: don't save the variable which type is RAW
+ if each_var.type == fluid.core.VarDesc.VarType.RAW:
+ continue
+ new_var = save_block.create_var(
+ name=each_var.name,
+ shape=each_var.shape,
+ dtype=each_var.dtype,
+ type=each_var.type,
+ lod_level=each_var.lod_level,
+ persistable=True)
+ file_path = os.path.join(dirname, new_var.name)
+ file_path = os.path.normpath(file_path)
+ save_block.append_op(
+ type='save',
+ inputs={'X': [new_var]},
+ outputs={},
+ attrs={'file_path': file_path})
+
+ executor.run(save_program)
+
+
+def save_prune_checkpoint(exe, program, ckpt_name):
+ """
+ Save checkpoint for evaluation or resume training
+ """
+ ckpt_dir = os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, str(ckpt_name))
+ print("Save model checkpoint to {}".format(ckpt_dir))
+ if not os.path.isdir(ckpt_dir):
+ os.makedirs(ckpt_dir)
+
+ save_model(exe, program, ckpt_dir)
+
+ return ckpt_dir
+
+
+def load_checkpoint(exe, program):
+ """
+ Load checkpoiont from pretrained model directory for resume training
+ """
+
+ print('Resume model training from:', cfg.TRAIN.RESUME_MODEL_DIR)
+ if not os.path.exists(cfg.TRAIN.RESUME_MODEL_DIR):
+ raise ValueError("TRAIN.PRETRAIN_MODEL {} not exist!".format(
+ cfg.TRAIN.RESUME_MODEL_DIR))
+
+ fluid.io.load_persistables(
+ exe, cfg.TRAIN.RESUME_MODEL_DIR, main_program=program)
+
+ model_path = cfg.TRAIN.RESUME_MODEL_DIR
+ # Check is path ended by path spearator
+ if model_path[-1] == os.sep:
+ model_path = model_path[0:-1]
+ epoch_name = os.path.basename(model_path)
+ # If resume model is final model
+ if epoch_name == 'final':
+ begin_epoch = cfg.SOLVER.NUM_EPOCHS
+ # If resume model path is end of digit, restore epoch status
+ elif epoch_name.isdigit():
+ epoch = int(epoch_name)
+ begin_epoch = epoch + 1
+ else:
+ raise ValueError("Resume model path is not valid!")
+ print("Model checkpoint loaded successfully!")
+
+ return begin_epoch
+
+def print_info(*msg):
+ if cfg.TRAINER_ID == 0:
+ print(*msg)
+
+def train(cfg):
+ startup_prog = fluid.Program()
+ train_prog = fluid.Program()
+ drop_last = True
+
+ dataset = SegDataset(
+ file_list=cfg.DATASET.TRAIN_FILE_LIST,
+ mode=ModelPhase.TRAIN,
+ shuffle=True,
+ data_dir=cfg.DATASET.DATA_DIR)
+
+ def data_generator():
+ if args.use_mpio:
+ data_gen = dataset.multiprocess_generator(
+ num_processes=cfg.DATALOADER.NUM_WORKERS,
+ max_queue_size=cfg.DATALOADER.BUF_SIZE)
+ else:
+ data_gen = dataset.generator()
+
+ batch_data = []
+ for b in data_gen:
+ batch_data.append(b)
+ if len(batch_data) == (cfg.BATCH_SIZE // cfg.NUM_TRAINERS):
+ for item in batch_data:
+ yield item[0], item[1], item[2]
+ batch_data = []
+ # If use sync batch norm strategy, drop last batch if number of samples
+ # in batch_data is less then cfg.BATCH_SIZE to avoid NCCL hang issues
+ if not cfg.TRAIN.SYNC_BATCH_NORM:
+ for item in batch_data:
+ yield item[0], item[1], item[2]
+
+ # Get device environment
+ # places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
+ # place = places[0]
+ gpu_id = int(os.environ.get('FLAGS_selected_gpus', 0))
+ place = fluid.CUDAPlace(gpu_id) if args.use_gpu else fluid.CPUPlace()
+ places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
+
+ # Get number of GPU
+ dev_count = cfg.NUM_TRAINERS if cfg.NUM_TRAINERS > 1 else len(places)
+ print_info("#Device count: {}".format(dev_count))
+
+ # Make sure BATCH_SIZE can divided by GPU cards
+ assert cfg.BATCH_SIZE % dev_count == 0, (
+ 'BATCH_SIZE:{} not divisble by number of GPUs:{}'.format(
+ cfg.BATCH_SIZE, dev_count))
+ # If use multi-gpu training mode, batch data will allocated to each GPU evenly
+ batch_size_per_dev = cfg.BATCH_SIZE // dev_count
+ print_info("batch_size_per_dev: {}".format(batch_size_per_dev))
+
+ py_reader, avg_loss, lr, pred, grts, masks = build_model(
+ train_prog, startup_prog, phase=ModelPhase.TRAIN)
+ py_reader.decorate_sample_generator(
+ data_generator, batch_size=batch_size_per_dev, drop_last=drop_last)
+
+ exe = fluid.Executor(place)
+ exe.run(startup_prog)
+
+ exec_strategy = fluid.ExecutionStrategy()
+ # Clear temporary variables every 100 iteration
+ if args.use_gpu:
+ exec_strategy.num_threads = fluid.core.get_cuda_device_count()
+ exec_strategy.num_iteration_per_drop_scope = 100
+ build_strategy = fluid.BuildStrategy()
+
+ if cfg.NUM_TRAINERS > 1 and args.use_gpu:
+ dist_utils.prepare_for_multi_process(exe, build_strategy, train_prog)
+ exec_strategy.num_threads = 1
+
+ if cfg.TRAIN.SYNC_BATCH_NORM and args.use_gpu:
+ if dev_count > 1:
+ # Apply sync batch norm strategy
+ print_info("Sync BatchNorm strategy is effective.")
+ build_strategy.sync_batch_norm = True
+ else:
+ print_info("Sync BatchNorm strategy will not be effective if GPU device"
+ " count <= 1")
+
+ pruned_params = cfg.SLIM.PRUNE_PARAMS.strip().split(',')
+ pruned_ratios = cfg.SLIM.PRUNE_RATIOS
+
+ if isinstance(pruned_ratios, float):
+ pruned_ratios = [pruned_ratios] * len(pruned_params)
+ elif isinstance(pruned_ratios, (list, tuple)):
+ pruned_ratios = list(pruned_ratios)
+ else:
+ raise ValueError('expect SLIM.PRUNE_RATIOS type is float, list, tuple, '
+ 'but received {}'.format(type(pruned_ratios)))
+
+ # Resume training
+ begin_epoch = cfg.SOLVER.BEGIN_EPOCH
+ if cfg.TRAIN.RESUME_MODEL_DIR:
+ begin_epoch = load_checkpoint(exe, train_prog)
+ # Load pretrained model
+ elif os.path.exists(cfg.TRAIN.PRETRAINED_MODEL_DIR):
+ print_info('Pretrained model dir: ', cfg.TRAIN.PRETRAINED_MODEL_DIR)
+ load_vars = []
+ load_fail_vars = []
+
+ def var_shape_matched(var, shape):
+ """
+ Check whehter persitable variable shape is match with current network
+ """
+ var_exist = os.path.exists(
+ os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name))
+ if var_exist:
+ var_shape = parse_shape_from_file(
+ os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name))
+ return var_shape == shape
+ return False
+
+ for x in train_prog.list_vars():
+ if isinstance(x, fluid.framework.Parameter):
+ shape = tuple(fluid.global_scope().find_var(
+ x.name).get_tensor().shape())
+ if var_shape_matched(x, shape):
+ load_vars.append(x)
+ else:
+ load_fail_vars.append(x)
+
+ fluid.io.load_vars(
+ exe, dirname=cfg.TRAIN.PRETRAINED_MODEL_DIR, vars=load_vars)
+ for var in load_vars:
+ print_info("Parameter[{}] loaded sucessfully!".format(var.name))
+ for var in load_fail_vars:
+ print_info("Parameter[{}] don't exist or shape does not match current network, skip"
+ " to load it.".format(var.name))
+ print_info("{}/{} pretrained parameters loaded successfully!".format(
+ len(load_vars),
+ len(load_vars) + len(load_fail_vars)))
+ else:
+ print_info('Pretrained model dir {} not exists, training from scratch...'.
+ format(cfg.TRAIN.PRETRAINED_MODEL_DIR))
+
+ fetch_list = [avg_loss.name, lr.name]
+ if args.debug:
+ # Fetch more variable info and use streaming confusion matrix to
+ # calculate IoU results if in debug mode
+ np.set_printoptions(
+ precision=4, suppress=True, linewidth=160, floatmode="fixed")
+ fetch_list.extend([pred.name, grts.name, masks.name])
+ cm = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
+
+ if args.use_tb:
+ if not args.tb_log_dir:
+ print_info("Please specify the log directory by --tb_log_dir.")
+ exit(1)
+
+ from tb_paddle import SummaryWriter
+ log_writer = SummaryWriter(args.tb_log_dir)
+
+ pruner = Pruner()
+ train_prog = pruner.prune(
+ train_prog,
+ fluid.global_scope(),
+ params=pruned_params,
+ ratios=pruned_ratios,
+ place=place,
+ only_graph=False)[0]
+
+ compiled_train_prog = fluid.CompiledProgram(train_prog).with_data_parallel(
+ loss_name=avg_loss.name,
+ exec_strategy=exec_strategy,
+ build_strategy=build_strategy)
+
+ global_step = 0
+ all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE
+ if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True:
+ all_step += 1
+ all_step *= (cfg.SOLVER.NUM_EPOCHS - begin_epoch + 1)
+
+ avg_loss = 0.0
+ timer = Timer()
+ timer.start()
+ if begin_epoch > cfg.SOLVER.NUM_EPOCHS:
+ raise ValueError(
+ ("begin epoch[{}] is larger than cfg.SOLVER.NUM_EPOCHS[{}]").format(
+ begin_epoch, cfg.SOLVER.NUM_EPOCHS))
+
+ if args.use_mpio:
+ print_info("Use multiprocess reader")
+ else:
+ print_info("Use multi-thread reader")
+
+ for epoch in range(begin_epoch, cfg.SOLVER.NUM_EPOCHS + 1):
+ py_reader.start()
+ while True:
+ try:
+ if args.debug:
+ # Print category IoU and accuracy to check whether the
+ # traning process is corresponed to expectation
+ loss, lr, pred, grts, masks = exe.run(
+ program=compiled_train_prog,
+ fetch_list=fetch_list,
+ return_numpy=True)
+ cm.calculate(pred, grts, masks)
+ avg_loss += np.mean(np.array(loss))
+ global_step += 1
+
+ if global_step % args.log_steps == 0:
+ speed = args.log_steps / timer.elapsed_time()
+ avg_loss /= args.log_steps
+ category_acc, mean_acc = cm.accuracy()
+ category_iou, mean_iou = cm.mean_iou()
+
+ print_info((
+ "epoch={} step={} lr={:.5f} loss={:.4f} acc={:.5f} mIoU={:.5f} step/sec={:.3f} | ETA {}"
+ ).format(epoch, global_step, lr[0], avg_loss, mean_acc,
+ mean_iou, speed,
+ calculate_eta(all_step - global_step, speed)))
+ print_info("Category IoU: ", category_iou)
+ print_info("Category Acc: ", category_acc)
+ if args.use_tb:
+ log_writer.add_scalar('Train/mean_iou', mean_iou,
+ global_step)
+ log_writer.add_scalar('Train/mean_acc', mean_acc,
+ global_step)
+ log_writer.add_scalar('Train/loss', avg_loss,
+ global_step)
+ log_writer.add_scalar('Train/lr', lr[0],
+ global_step)
+ log_writer.add_scalar('Train/step/sec', speed,
+ global_step)
+ sys.stdout.flush()
+ avg_loss = 0.0
+ cm.zero_matrix()
+ timer.restart()
+ else:
+ # If not in debug mode, avoid unnessary log and calculate
+ loss, lr = exe.run(
+ program=compiled_train_prog,
+ fetch_list=fetch_list,
+ return_numpy=True)
+ avg_loss += np.mean(np.array(loss))
+ global_step += 1
+
+ if global_step % args.log_steps == 0 and cfg.TRAINER_ID == 0:
+ avg_loss /= args.log_steps
+ speed = args.log_steps / timer.elapsed_time()
+ print((
+ "epoch={} step={} lr={:.5f} loss={:.4f} step/sec={:.3f} | ETA {}"
+ ).format(epoch, global_step, lr[0], avg_loss, speed,
+ calculate_eta(all_step - global_step, speed)))
+ if args.use_tb:
+ log_writer.add_scalar('Train/loss', avg_loss,
+ global_step)
+ log_writer.add_scalar('Train/lr', lr[0],
+ global_step)
+ log_writer.add_scalar('Train/speed', speed,
+ global_step)
+ sys.stdout.flush()
+ avg_loss = 0.0
+ timer.restart()
+
+ except fluid.core.EOFException:
+ py_reader.reset()
+ break
+ except Exception as e:
+ print(e)
+
+ if epoch % cfg.TRAIN.SNAPSHOT_EPOCH == 0 and cfg.TRAINER_ID == 0:
+
+ ckpt_dir = save_prune_checkpoint(exe, train_prog, epoch)
+
+ if args.do_eval:
+ print("Evaluation start")
+ _, mean_iou, _, mean_acc = evaluate(
+ cfg=cfg,
+ ckpt_dir=ckpt_dir,
+ use_gpu=args.use_gpu,
+ use_mpio=args.use_mpio)
+ if args.use_tb:
+ log_writer.add_scalar('Evaluate/mean_iou', mean_iou,
+ global_step)
+ log_writer.add_scalar('Evaluate/mean_acc', mean_acc,
+ global_step)
+
+ # Use Tensorboard to visualize results
+ if args.use_tb and cfg.DATASET.VIS_FILE_LIST is not None:
+ visualize(
+ cfg=cfg,
+ use_gpu=args.use_gpu,
+ vis_file_list=cfg.DATASET.VIS_FILE_LIST,
+ vis_dir="visual",
+ ckpt_dir=ckpt_dir,
+ log_writer=log_writer)
+
+ # save final model
+ if cfg.TRAINER_ID == 0:
+ save_prune_checkpoint(exe, train_prog, 'final')
+
+def main(args):
+ if args.cfg_file is not None:
+ cfg.update_from_file(args.cfg_file)
+ if args.opts is not None:
+ cfg.update_from_list(args.opts)
+
+ cfg.TRAINER_ID = int(os.getenv("PADDLE_TRAINER_ID", 0))
+ cfg.NUM_TRAINERS = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
+
+ cfg.check_and_infer()
+ print_info(pprint.pformat(cfg))
+ train(cfg)
+
+
+if __name__ == '__main__':
+ args = parse_args()
+ if fluid.core.is_compiled_with_cuda() != True and args.use_gpu == True:
+ print(
+ "You can not set use_gpu = True in the model because you are using paddlepaddle-cpu."
+ )
+ print(
+ "Please: 1. Install paddlepaddle-gpu to run your models on GPU or 2. Set use_gpu=False to run models on CPU."
+ )
+ sys.exit(1)
+ main(args)
diff --git a/slim/quantization/README.md b/slim/quantization/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..9af04033b3a9af84d4b1fdf081f156be6f8dc0c2
--- /dev/null
+++ b/slim/quantization/README.md
@@ -0,0 +1,142 @@
+>运行该示例前请安装Paddle1.6或更高版本和PaddleSlim
+
+# 分割模型量化压缩示例
+
+## 概述
+
+该示例使用PaddleSlim提供的[量化压缩API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/)对分割模型进行压缩。
+在阅读该示例前,建议您先了解以下内容:
+
+- [分割模型的常规训练方法](../../docs/usage.md)
+- [PaddleSlim使用文档](https://paddlepaddle.github.io/PaddleSlim/)
+
+
+## 安装PaddleSlim
+可按照[PaddleSlim使用文档](https://paddlepaddle.github.io/PaddleSlim/)中的步骤安装PaddleSlim。
+
+
+## 训练
+
+
+### 数据集
+请按照分割库的教程下载数据集并放到对应位置。
+
+### 下载训练好的分割模型
+
+在分割库根目录下运行以下命令:
+```bash
+mkdir pretrain
+cd pretrain
+wget https://paddleseg.bj.bcebos.com/models/mobilenet_cityscapes.tgz
+tar xf mobilenet_cityscapes.tgz
+```
+
+### 定义量化配置
+config = {
+ 'weight_quantize_type': 'channel_wise_abs_max',
+ 'activation_quantize_type': 'moving_average_abs_max',
+ 'quantize_op_types': ['depthwise_conv2d', 'mul', 'conv2d'],
+ 'not_quant_pattern': ['last_conv']
+ }
+
+如何配置以及含义请参考[PaddleSlim 量化API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/)。
+
+### 插入量化反量化OP
+使用[PaddleSlim quant_aware API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/#quant_aware)在Program中插入量化和反量化OP。
+```
+compiled_train_prog = quant_aware(train_prog, place, config, for_test=False)
+```
+
+### 关闭一些训练策略
+
+因为量化要对Program做修改,所以一些会修改Program的训练策略需要关闭。``sync_batch_norm`` 和量化多卡训练同时使用时会出错, 需要将其关闭。
+```
+build_strategy.fuse_all_reduce_ops = False
+build_strategy.sync_batch_norm = False
+```
+
+### 开始训练
+
+
+step1: 设置gpu卡
+```
+export CUDA_VISIBLE_DEVICES=0
+```
+step2: 将``pdseg``文件夹加到系统路径
+
+分割库根目录下运行以下命令
+```
+export PYTHONPATH=$PYTHONPATH:./pdseg
+```
+
+step2: 开始训练
+
+
+在分割库根目录下运行以下命令进行训练。
+```
+python -u ./slim/quantization/train_quant.py --log_steps 10 --not_quant_pattern last_conv --cfg configs/deeplabv3p_mobilenetv2_cityscapes.yaml --use_gpu --use_mpio --do_eval \
+TRAIN.PRETRAINED_MODEL_DIR "./pretrain/mobilenet_cityscapes/" \
+TRAIN.MODEL_SAVE_DIR "./snapshots/mobilenetv2_quant" \
+MODEL.DEEPLAB.ENCODER_WITH_ASPP False \
+MODEL.DEEPLAB.ENABLE_DECODER False \
+TRAIN.SYNC_BATCH_NORM False \
+SOLVER.LR 0.0001 \
+TRAIN.SNAPSHOT_EPOCH 1 \
+SOLVER.NUM_EPOCHS 30 \
+BATCH_SIZE 16 \
+```
+
+
+### 训练时的模型结构
+[PaddleSlim 量化API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/)文档中介绍了``paddleslim.quant.quant_aware``和``paddleslim.quant.convert``两个接口。
+``paddleslim.quant.quant_aware`` 作用是在网络中的conv2d、depthwise_conv2d、mul等算子的各个输入前插入连续的量化op和反量化op,并改变相应反向算子的某些输入。示例图如下:
+
+
+
+图1:应用 paddleslim.quant.quant_aware 后的结果
+
+
+
+### 边训练边测试
+
+在脚本中边训练边测试得到的测试精度是基于图1中的网络结构进行的。
+
+## 评估
+
+### 最终评估模型
+
+``paddleslim.quant.convert`` 主要用于改变Program中量化op和反量化op的顺序,即将类似图1中的量化op和反量化op顺序改变为图2中的布局。除此之外,``paddleslim.quant.convert`` 还会将`conv2d`、`depthwise_conv2d`、`mul`等算子参数变为量化后的int8_t范围内的值(但数据类型仍为float32),示例如图2:
+
+
+
+图2:paddleslim.quant.convert 后的结果
+
+
+所以在调用 ``paddleslim.quant.convert`` 之后,才得到最终的量化模型。此模型可使用PaddleLite进行加载预测,可参见教程[Paddle-Lite如何加载运行量化模型](https://github.com/PaddlePaddle/Paddle-Lite/wiki/model_quantization)。
+
+### 评估脚本
+使用脚本[slim/quantization/eval_quant.py](./eval_quant.py)进行评估。
+
+- 定义配置。使用和训练脚本中一样的量化配置,以得到和量化训练时同样的模型。
+- 使用 ``paddleslim.quant.quant_aware`` 插入量化和反量化op。
+- 使用 ``paddleslim.quant.convert`` 改变op顺序,得到最终量化模型进行评估。
+
+评估命令:
+
+分割库根目录下运行
+```
+python -u ./slim/quantization/eval_quant.py --cfg configs/deeplabv3p_mobilenetv2_cityscapes.yaml --use_gpu --not_quant_pattern last_conv --use_mpio --convert \
+TEST.TEST_MODEL "./snapshots/mobilenetv2_quant/best_model" \
+MODEL.DEEPLAB.ENCODER_WITH_ASPP False \
+MODEL.DEEPLAB.ENABLE_DECODER False \
+TRAIN.SYNC_BATCH_NORM False \
+BATCH_SIZE 16 \
+```
+
+
+
+## 量化结果
+
+
+
+## FAQ
diff --git a/slim/quantization/eval_quant.py b/slim/quantization/eval_quant.py
new file mode 100644
index 0000000000000000000000000000000000000000..f40021df10ac5cabee789ca4de04b7489b37f182
--- /dev/null
+++ b/slim/quantization/eval_quant.py
@@ -0,0 +1,203 @@
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+
+import sys
+import time
+import argparse
+import functools
+import pprint
+import cv2
+import numpy as np
+import paddle
+import paddle.fluid as fluid
+
+from utils.config import cfg
+from utils.timer import Timer, calculate_eta
+from models.model_builder import build_model
+from models.model_builder import ModelPhase
+from reader import SegDataset
+from metrics import ConfusionMatrix
+from paddleslim.quant import quant_aware, convert
+
+
+def parse_args():
+ parser = argparse.ArgumentParser(description='PaddleSeg model evalution')
+ parser.add_argument(
+ '--cfg',
+ dest='cfg_file',
+ help='Config file for training (and optionally testing)',
+ default=None,
+ type=str)
+ parser.add_argument(
+ '--use_gpu',
+ dest='use_gpu',
+ help='Use gpu or cpu',
+ action='store_true',
+ default=False)
+ parser.add_argument(
+ '--use_mpio',
+ dest='use_mpio',
+ help='Use multiprocess IO or not',
+ action='store_true',
+ default=False)
+ parser.add_argument(
+ 'opts',
+ help='See utils/config.py for all options',
+ default=None,
+ nargs=argparse.REMAINDER)
+ parser.add_argument(
+ '--convert',
+ dest='convert',
+ help='Convert or not',
+ action='store_true',
+ default=False)
+ parser.add_argument(
+ "--not_quant_pattern",
+ nargs='+',
+ type=str,
+ help=
+ "Layers which name_scope contains string in not_quant_pattern will not be quantized"
+ )
+
+ if len(sys.argv) == 1:
+ parser.print_help()
+ sys.exit(1)
+ return parser.parse_args()
+
+
+def evaluate(cfg, ckpt_dir=None, use_gpu=False, use_mpio=False, **kwargs):
+ np.set_printoptions(precision=5, suppress=True)
+
+ startup_prog = fluid.Program()
+ test_prog = fluid.Program()
+ dataset = SegDataset(
+ file_list=cfg.DATASET.VAL_FILE_LIST,
+ mode=ModelPhase.EVAL,
+ data_dir=cfg.DATASET.DATA_DIR)
+
+ def data_generator():
+ #TODO: check is batch reader compatitable with Windows
+ if use_mpio:
+ data_gen = dataset.multiprocess_generator(
+ num_processes=cfg.DATALOADER.NUM_WORKERS,
+ max_queue_size=cfg.DATALOADER.BUF_SIZE)
+ else:
+ data_gen = dataset.generator()
+
+ for b in data_gen:
+ yield b[0], b[1], b[2]
+
+ py_reader, avg_loss, pred, grts, masks = build_model(
+ test_prog, startup_prog, phase=ModelPhase.EVAL)
+
+ py_reader.decorate_sample_generator(
+ data_generator, drop_last=False, batch_size=cfg.BATCH_SIZE)
+
+ # Get device environment
+ places = fluid.cuda_places() if use_gpu else fluid.cpu_places()
+ place = places[0]
+ dev_count = len(places)
+ print("#Device count: {}".format(dev_count))
+
+ exe = fluid.Executor(place)
+ exe.run(startup_prog)
+
+ test_prog = test_prog.clone(for_test=True)
+ not_quant_pattern_list = []
+ if kwargs['not_quant_pattern'] is not None:
+ not_quant_pattern_list = kwargs['not_quant_pattern']
+ config = {
+ 'weight_quantize_type': 'channel_wise_abs_max',
+ 'activation_quantize_type': 'moving_average_abs_max',
+ 'quantize_op_types': ['depthwise_conv2d', 'mul', 'conv2d'],
+ 'not_quant_pattern': not_quant_pattern_list
+ }
+ test_prog = quant_aware(test_prog, place, config, for_test=True)
+
+ ckpt_dir = cfg.TEST.TEST_MODEL if not ckpt_dir else ckpt_dir
+
+ if not os.path.exists(ckpt_dir):
+ raise ValueError('The TEST.TEST_MODEL {} is not found'.format(ckpt_dir))
+
+ if ckpt_dir is not None:
+ print('load test model:', ckpt_dir)
+ fluid.io.load_persistables(exe, ckpt_dir, main_program=test_prog)
+ if kwargs['convert']:
+ test_prog = convert(test_prog, place, config)
+ # Use streaming confusion matrix to calculate mean_iou
+ np.set_printoptions(
+ precision=4, suppress=True, linewidth=160, floatmode="fixed")
+ conf_mat = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
+ fetch_list = [avg_loss.name, pred.name, grts.name, masks.name]
+ num_images = 0
+ step = 0
+ all_step = cfg.DATASET.TEST_TOTAL_IMAGES // cfg.BATCH_SIZE + 1
+ timer = Timer()
+ timer.start()
+ py_reader.start()
+ while True:
+ try:
+ step += 1
+ loss, pred, grts, masks = exe.run(
+ test_prog, fetch_list=fetch_list, return_numpy=True)
+
+ loss = np.mean(np.array(loss))
+
+ num_images += pred.shape[0]
+ conf_mat.calculate(pred, grts, masks)
+ _, iou = conf_mat.mean_iou()
+ _, acc = conf_mat.accuracy()
+
+ speed = 1.0 / timer.elapsed_time()
+
+ print(
+ "[EVAL]step={} loss={:.5f} acc={:.4f} IoU={:.4f} step/sec={:.2f} | ETA {}"
+ .format(step, loss, acc, iou, speed,
+ calculate_eta(all_step - step, speed)))
+ timer.restart()
+ sys.stdout.flush()
+ except fluid.core.EOFException:
+ break
+
+ category_iou, avg_iou = conf_mat.mean_iou()
+ category_acc, avg_acc = conf_mat.accuracy()
+ print("[EVAL]#image={} acc={:.4f} IoU={:.4f}".format(
+ num_images, avg_acc, avg_iou))
+ print("[EVAL]Category IoU:", category_iou)
+ print("[EVAL]Category Acc:", category_acc)
+ print("[EVAL]Kappa:{:.4f}".format(conf_mat.kappa()))
+
+ return category_iou, avg_iou, category_acc, avg_acc
+
+
+def main():
+ args = parse_args()
+ if args.cfg_file is not None:
+ cfg.update_from_file(args.cfg_file)
+ if args.opts:
+ cfg.update_from_list(args.opts)
+ cfg.check_and_infer()
+ print(pprint.pformat(cfg))
+ evaluate(cfg, **args.__dict__)
+
+
+if __name__ == '__main__':
+ main()
diff --git a/slim/quantization/images/ConvertToInt8Pass.png b/slim/quantization/images/ConvertToInt8Pass.png
new file mode 100644
index 0000000000000000000000000000000000000000..8b5849819c0bc8e592dc8f864d8945330df85ab1
Binary files /dev/null and b/slim/quantization/images/ConvertToInt8Pass.png differ
diff --git a/slim/quantization/images/FreezePass.png b/slim/quantization/images/FreezePass.png
new file mode 100644
index 0000000000000000000000000000000000000000..acd2b0a890a8af85bec6eecdb22e47ad386a178c
Binary files /dev/null and b/slim/quantization/images/FreezePass.png differ
diff --git a/slim/quantization/images/TransformForMobilePass.png b/slim/quantization/images/TransformForMobilePass.png
new file mode 100644
index 0000000000000000000000000000000000000000..4104cacc67af0be1c7bc152696e2ae544127aace
Binary files /dev/null and b/slim/quantization/images/TransformForMobilePass.png differ
diff --git a/slim/quantization/images/TransformPass.png b/slim/quantization/images/TransformPass.png
new file mode 100644
index 0000000000000000000000000000000000000000..f29ab62753e0e6ddf28d0c1dda7139705fc24b18
Binary files /dev/null and b/slim/quantization/images/TransformPass.png differ
diff --git a/slim/quantization/train_quant.py b/slim/quantization/train_quant.py
new file mode 100644
index 0000000000000000000000000000000000000000..6a29dccdbaeda54b06c11299fb37e979cec6e401
--- /dev/null
+++ b/slim/quantization/train_quant.py
@@ -0,0 +1,388 @@
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+
+import sys
+import argparse
+import pprint
+import random
+import shutil
+import functools
+
+import paddle
+import numpy as np
+import paddle.fluid as fluid
+
+from utils.config import cfg
+from utils.timer import Timer, calculate_eta
+from metrics import ConfusionMatrix
+from reader import SegDataset
+from models.model_builder import build_model
+from models.model_builder import ModelPhase
+from models.model_builder import parse_shape_from_file
+from eval_quant import evaluate
+from vis import visualize
+from utils import dist_utils
+from train import save_vars, save_checkpoint, load_checkpoint, update_best_model, print_info
+
+from paddleslim.quant import quant_aware
+
+
+def parse_args():
+ parser = argparse.ArgumentParser(description='PaddleSeg training')
+ parser.add_argument(
+ '--cfg',
+ dest='cfg_file',
+ help='Config file for training (and optionally testing)',
+ default=None,
+ type=str)
+ parser.add_argument(
+ '--use_gpu',
+ dest='use_gpu',
+ help='Use gpu or cpu',
+ action='store_true',
+ default=False)
+ parser.add_argument(
+ '--use_mpio',
+ dest='use_mpio',
+ help='Use multiprocess I/O or not',
+ action='store_true',
+ default=False)
+ parser.add_argument(
+ '--log_steps',
+ dest='log_steps',
+ help='Display logging information at every log_steps',
+ default=10,
+ type=int)
+ parser.add_argument(
+ '--debug',
+ dest='debug',
+ help='debug mode, display detail information of training',
+ action='store_true')
+ parser.add_argument(
+ '--do_eval',
+ dest='do_eval',
+ help='Evaluation models result on every new checkpoint',
+ action='store_true')
+ parser.add_argument(
+ 'opts',
+ help='See utils/config.py for all options',
+ default=None,
+ nargs=argparse.REMAINDER)
+ parser.add_argument(
+ '--enable_ce',
+ dest='enable_ce',
+ help='If set True, enable continuous evaluation job.'
+ 'This flag is only used for internal test.',
+ action='store_true')
+ parser.add_argument(
+ "--not_quant_pattern",
+ nargs='+',
+ type=str,
+ help=
+ "Layers which name_scope contains string in not_quant_pattern will not be quantized"
+ )
+
+ return parser.parse_args()
+
+
+def train_quant(cfg):
+ startup_prog = fluid.Program()
+ train_prog = fluid.Program()
+ if args.enable_ce:
+ startup_prog.random_seed = 1000
+ train_prog.random_seed = 1000
+ drop_last = True
+
+ dataset = SegDataset(
+ file_list=cfg.DATASET.TRAIN_FILE_LIST,
+ mode=ModelPhase.TRAIN,
+ shuffle=True,
+ data_dir=cfg.DATASET.DATA_DIR)
+
+ def data_generator():
+ if args.use_mpio:
+ data_gen = dataset.multiprocess_generator(
+ num_processes=cfg.DATALOADER.NUM_WORKERS,
+ max_queue_size=cfg.DATALOADER.BUF_SIZE)
+ else:
+ data_gen = dataset.generator()
+
+ batch_data = []
+ for b in data_gen:
+ batch_data.append(b)
+ if len(batch_data) == (cfg.BATCH_SIZE // cfg.NUM_TRAINERS):
+ for item in batch_data:
+ yield item[0], item[1], item[2]
+ batch_data = []
+ # If use sync batch norm strategy, drop last batch if number of samples
+ # in batch_data is less then cfg.BATCH_SIZE to avoid NCCL hang issues
+ if not cfg.TRAIN.SYNC_BATCH_NORM:
+ for item in batch_data:
+ yield item[0], item[1], item[2]
+
+ # Get device environment
+ # places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
+ # place = places[0]
+ gpu_id = int(os.environ.get('FLAGS_selected_gpus', 0))
+ place = fluid.CUDAPlace(gpu_id) if args.use_gpu else fluid.CPUPlace()
+ places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
+
+ # Get number of GPU
+ dev_count = cfg.NUM_TRAINERS if cfg.NUM_TRAINERS > 1 else len(places)
+ print_info("#Device count: {}".format(dev_count))
+
+ # Make sure BATCH_SIZE can divided by GPU cards
+ assert cfg.BATCH_SIZE % dev_count == 0, (
+ 'BATCH_SIZE:{} not divisble by number of GPUs:{}'.format(
+ cfg.BATCH_SIZE, dev_count))
+ # If use multi-gpu training mode, batch data will allocated to each GPU evenly
+ batch_size_per_dev = cfg.BATCH_SIZE // dev_count
+ print_info("batch_size_per_dev: {}".format(batch_size_per_dev))
+
+ py_reader, avg_loss, lr, pred, grts, masks = build_model(
+ train_prog, startup_prog, phase=ModelPhase.TRAIN)
+ py_reader.decorate_sample_generator(
+ data_generator, batch_size=batch_size_per_dev, drop_last=drop_last)
+
+ exe = fluid.Executor(place)
+ exe.run(startup_prog)
+
+ exec_strategy = fluid.ExecutionStrategy()
+ # Clear temporary variables every 100 iteration
+ if args.use_gpu:
+ exec_strategy.num_threads = fluid.core.get_cuda_device_count()
+ exec_strategy.num_iteration_per_drop_scope = 100
+ build_strategy = fluid.BuildStrategy()
+
+ if cfg.NUM_TRAINERS > 1 and args.use_gpu:
+ dist_utils.prepare_for_multi_process(exe, build_strategy, train_prog)
+ exec_strategy.num_threads = 1
+
+ # Resume training
+ begin_epoch = cfg.SOLVER.BEGIN_EPOCH
+ if cfg.TRAIN.RESUME_MODEL_DIR:
+ begin_epoch = load_checkpoint(exe, train_prog)
+ # Load pretrained model
+ elif os.path.exists(cfg.TRAIN.PRETRAINED_MODEL_DIR):
+ print_info('Pretrained model dir: ', cfg.TRAIN.PRETRAINED_MODEL_DIR)
+ load_vars = []
+ load_fail_vars = []
+
+ def var_shape_matched(var, shape):
+ """
+ Check whehter persitable variable shape is match with current network
+ """
+ var_exist = os.path.exists(
+ os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name))
+ if var_exist:
+ var_shape = parse_shape_from_file(
+ os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name))
+ return var_shape == shape
+ return False
+
+ for x in train_prog.list_vars():
+ if isinstance(x, fluid.framework.Parameter):
+ shape = tuple(fluid.global_scope().find_var(
+ x.name).get_tensor().shape())
+ if var_shape_matched(x, shape):
+ load_vars.append(x)
+ else:
+ load_fail_vars.append(x)
+
+ fluid.io.load_vars(
+ exe, dirname=cfg.TRAIN.PRETRAINED_MODEL_DIR, vars=load_vars)
+ for var in load_vars:
+ print_info("Parameter[{}] loaded sucessfully!".format(var.name))
+ for var in load_fail_vars:
+ print_info(
+ "Parameter[{}] don't exist or shape does not match current network, skip"
+ " to load it.".format(var.name))
+ print_info("{}/{} pretrained parameters loaded successfully!".format(
+ len(load_vars),
+ len(load_vars) + len(load_fail_vars)))
+ else:
+ print_info(
+ 'Pretrained model dir {} not exists, training from scratch...'.
+ format(cfg.TRAIN.PRETRAINED_MODEL_DIR))
+
+ fetch_list = [avg_loss.name, lr.name]
+ if args.debug:
+ # Fetch more variable info and use streaming confusion matrix to
+ # calculate IoU results if in debug mode
+ np.set_printoptions(
+ precision=4, suppress=True, linewidth=160, floatmode="fixed")
+ fetch_list.extend([pred.name, grts.name, masks.name])
+ cm = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
+
+ not_quant_pattern = []
+ if args.not_quant_pattern:
+ not_quant_pattern = args.not_quant_pattern
+ config = {
+ 'weight_quantize_type': 'channel_wise_abs_max',
+ 'activation_quantize_type': 'moving_average_abs_max',
+ 'quantize_op_types': ['depthwise_conv2d', 'mul', 'conv2d'],
+ 'not_quant_pattern': not_quant_pattern
+ }
+ compiled_train_prog = quant_aware(train_prog, place, config, for_test=False)
+ eval_prog = quant_aware(train_prog, place, config, for_test=True)
+ build_strategy.fuse_all_reduce_ops = False
+ build_strategy.sync_batch_norm = False
+ compiled_train_prog = compiled_train_prog.with_data_parallel(
+ loss_name=avg_loss.name,
+ exec_strategy=exec_strategy,
+ build_strategy=build_strategy)
+
+ # trainer_id = int(os.getenv("PADDLE_TRAINER_ID", 0))
+ # num_trainers = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
+ global_step = 0
+ all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE
+ if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True:
+ all_step += 1
+ all_step *= (cfg.SOLVER.NUM_EPOCHS - begin_epoch + 1)
+
+ avg_loss = 0.0
+ best_mIoU = 0.0
+
+ timer = Timer()
+ timer.start()
+ if begin_epoch > cfg.SOLVER.NUM_EPOCHS:
+ raise ValueError(
+ ("begin epoch[{}] is larger than cfg.SOLVER.NUM_EPOCHS[{}]").format(
+ begin_epoch, cfg.SOLVER.NUM_EPOCHS))
+
+ if args.use_mpio:
+ print_info("Use multiprocess reader")
+ else:
+ print_info("Use multi-thread reader")
+
+ for epoch in range(begin_epoch, cfg.SOLVER.NUM_EPOCHS + 1):
+ py_reader.start()
+ while True:
+ try:
+ if args.debug:
+ # Print category IoU and accuracy to check whether the
+ # traning process is corresponed to expectation
+ loss, lr, pred, grts, masks = exe.run(
+ program=compiled_train_prog,
+ fetch_list=fetch_list,
+ return_numpy=True)
+ cm.calculate(pred, grts, masks)
+ avg_loss += np.mean(np.array(loss))
+ global_step += 1
+
+ if global_step % args.log_steps == 0:
+ speed = args.log_steps / timer.elapsed_time()
+ avg_loss /= args.log_steps
+ category_acc, mean_acc = cm.accuracy()
+ category_iou, mean_iou = cm.mean_iou()
+
+ print_info((
+ "epoch={} step={} lr={:.5f} loss={:.4f} acc={:.5f} mIoU={:.5f} step/sec={:.3f} | ETA {}"
+ ).format(epoch, global_step, lr[0], avg_loss, mean_acc,
+ mean_iou, speed,
+ calculate_eta(all_step - global_step, speed)))
+ print_info("Category IoU: ", category_iou)
+ print_info("Category Acc: ", category_acc)
+ sys.stdout.flush()
+ avg_loss = 0.0
+ cm.zero_matrix()
+ timer.restart()
+ else:
+ # If not in debug mode, avoid unnessary log and calculate
+ loss, lr = exe.run(
+ program=compiled_train_prog,
+ fetch_list=fetch_list,
+ return_numpy=True)
+ avg_loss += np.mean(np.array(loss))
+ global_step += 1
+
+ if global_step % args.log_steps == 0 and cfg.TRAINER_ID == 0:
+ avg_loss /= args.log_steps
+ speed = args.log_steps / timer.elapsed_time()
+ print((
+ "epoch={} step={} lr={:.5f} loss={:.4f} step/sec={:.3f} | ETA {}"
+ ).format(epoch, global_step, lr[0], avg_loss, speed,
+ calculate_eta(all_step - global_step, speed)))
+ sys.stdout.flush()
+ avg_loss = 0.0
+ timer.restart()
+
+ except fluid.core.EOFException:
+ py_reader.reset()
+ break
+ except Exception as e:
+ print(e)
+
+ if (epoch % cfg.TRAIN.SNAPSHOT_EPOCH == 0
+ or epoch == cfg.SOLVER.NUM_EPOCHS) and cfg.TRAINER_ID == 0:
+ ckpt_dir = save_checkpoint(exe, eval_prog, epoch)
+
+ if args.do_eval:
+ print("Evaluation start")
+ _, mean_iou, _, mean_acc = evaluate(
+ cfg=cfg,
+ ckpt_dir=ckpt_dir,
+ use_gpu=args.use_gpu,
+ use_mpio=args.use_mpio,
+ not_quant_pattern=args.not_quant_pattern,
+ convert=False)
+
+ if mean_iou > best_mIoU:
+ best_mIoU = mean_iou
+ update_best_model(ckpt_dir)
+ print_info("Save best model {} to {}, mIoU = {:.4f}".format(
+ ckpt_dir,
+ os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, 'best_model'),
+ mean_iou))
+
+ # save final model
+ if cfg.TRAINER_ID == 0:
+ save_checkpoint(exe, eval_prog, 'final')
+
+
+def main(args):
+ if args.cfg_file is not None:
+ cfg.update_from_file(args.cfg_file)
+ if args.opts:
+ cfg.update_from_list(args.opts)
+ if args.enable_ce:
+ random.seed(0)
+ np.random.seed(0)
+
+ cfg.TRAINER_ID = int(os.getenv("PADDLE_TRAINER_ID", 0))
+ cfg.NUM_TRAINERS = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
+
+ cfg.check_and_infer()
+ print_info(pprint.pformat(cfg))
+ train_quant(cfg)
+
+
+if __name__ == '__main__':
+ args = parse_args()
+ if fluid.core.is_compiled_with_cuda() != True and args.use_gpu == True:
+ print(
+ "You can not set use_gpu = True in the model because you are using paddlepaddle-cpu."
+ )
+ print(
+ "Please: 1. Install paddlepaddle-gpu to run your models on GPU or 2. Set use_gpu=False to run models on CPU."
+ )
+ sys.exit(1)
+ main(args)
diff --git a/turtorial/finetune_deeplabv3plus.md b/turtorial/finetune_deeplabv3plus.md
index 35fb677d9d416512a79ded14bcdcadf516aa6b70..d254ce5eb6e7cbe62b64deac78e003a04fe027bf 100644
--- a/turtorial/finetune_deeplabv3plus.md
+++ b/turtorial/finetune_deeplabv3plus.md
@@ -1,29 +1,32 @@
-# DeepLabv3+模型训练教程
+# DeepLabv3+模型使用教程
-* 本教程旨在介绍如何通过使用PaddleSeg提供的 ***`DeeplabV3+/Xception65/BatchNorm`*** 预训练模型在自定义数据集上进行训练。除了该配置之外,DeeplabV3+还支持以下不同[模型组合](#模型组合)的预训练模型,如果需要使用对应模型作为预训练模型,将下述内容中的Xception Backbone中的内容进行替换即可
+本教程旨在介绍如何使用`DeepLabv3+`预训练模型在自定义数据集上进行训练、评估和可视化。我们以`DeeplabV3+/Xception65/BatchNorm`预训练模型为例。
-* 在阅读本教程前,请确保您已经了解过PaddleSeg的[快速入门](../README.md#快速入门)和[基础功能](../README.md#基础功能)等章节,以便对PaddleSeg有一定的了解
+* 在阅读本教程前,请确保您已经了解过PaddleSeg的[快速入门](../README.md#快速入门)和[基础功能](../README.md#基础功能)等章节,以便对PaddleSeg有一定的了解。
-* 本教程的所有命令都基于PaddleSeg主目录进行执行
+* 本教程的所有命令都基于PaddleSeg主目录进行执行。
## 一. 准备待训练数据
-我们提前准备好了一份数据集,通过以下代码进行下载
+
+
+我们提前准备好了一份眼底医疗分割数据集,包含267张训练图片、76张验证图片、38张测试图片。通过以下命令进行下载:
```shell
-python dataset/download_pet.py
+python dataset/download_optic.py
```
## 二. 下载预训练模型
-关于PaddleSeg支持的所有预训练模型的列表,我们可以从[模型组合](#模型组合)中查看我们所需模型的名字和配置
-
接着下载对应的预训练模型
```shell
python pretrained_model/download_model.py deeplabv3p_xception65_bn_coco
```
+关于已有的DeepLabv3+预训练模型的列表,请参见[模型组合](#模型组合)。如果需要使用其他预训练模型,下载该模型并将配置中的BACKBONE、NORM_TYPE等进行替换即可。
+
+
## 三. 准备配置
接着我们需要确定相关配置,从本教程的角度,配置分为三部分:
@@ -45,19 +48,19 @@ python pretrained_model/download_model.py deeplabv3p_xception65_bn_coco
在三者中,预训练模型的配置尤为重要,如果模型或者BACKBONE配置错误,会导致预训练的参数没有加载,进而影响收敛速度。预训练模型相关的配置如第二步所展示。
-数据集的配置和数据路径有关,在本教程中,数据存放在`dataset/mini_pet`中
+数据集的配置和数据路径有关,在本教程中,数据存放在`dataset/optic_disc_seg`中。
-其他配置则根据数据集和机器环境的情况进行调节,最终我们保存一个如下内容的yaml配置文件,存放路径为**configs/deeplabv3p_xception65_pet.yaml**
+其他配置则根据数据集和机器环境的情况进行调节,最终我们保存一个如下内容的yaml配置文件,存放路径为**configs/deeplabv3p_xception65_optic.yaml**。
```yaml
# 数据集配置
DATASET:
- DATA_DIR: "./dataset/mini_pet/"
- NUM_CLASSES: 3
- TEST_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
- TRAIN_FILE_LIST: "./dataset/mini_pet/file_list/train_list.txt"
- VAL_FILE_LIST: "./dataset/mini_pet/file_list/val_list.txt"
- VIS_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
+ DATA_DIR: "./dataset/optic_disc_seg/"
+ NUM_CLASSES: 2
+ TEST_FILE_LIST: "./dataset/optic_disc_seg/test_list.txt"
+ TRAIN_FILE_LIST: "./dataset/optic_disc_seg/train_list.txt"
+ VAL_FILE_LIST: "./dataset/optic_disc_seg/val_list.txt"
+ VIS_FILE_LIST: "./dataset/optic_disc_seg/test_list.txt"
# 预训练模型配置
MODEL:
@@ -75,15 +78,15 @@ AUG:
BATCH_SIZE: 4
TRAIN:
PRETRAINED_MODEL_DIR: "./pretrained_model/deeplabv3p_xception65_bn_coco/"
- MODEL_SAVE_DIR: "./saved_model/deeplabv3p_xception65_bn_pet/"
- SNAPSHOT_EPOCH: 10
+ MODEL_SAVE_DIR: "./saved_model/deeplabv3p_xception65_bn_optic/"
+ SNAPSHOT_EPOCH: 5
TEST:
- TEST_MODEL: "./saved_model/deeplabv3p_xception65_bn_pet/final"
+ TEST_MODEL: "./saved_model/deeplabv3p_xception65_bn_optic/final"
SOLVER:
- NUM_EPOCHS: 100
- LR: 0.005
+ NUM_EPOCHS: 10
+ LR: 0.001
LR_POLICY: "poly"
- OPTIMIZER: "sgd"
+ OPTIMIZER: "adam"
```
## 四. 配置/数据校验
@@ -91,7 +94,7 @@ SOLVER:
在开始训练和评估之前,我们还需要对配置和数据进行一次校验,确保数据和配置是正确的。使用下述命令启动校验流程
```shell
-python pdseg/check.py --cfg ./configs/deeplabv3p_xception65_pet.yaml
+python pdseg/check.py --cfg ./configs/deeplabv3p_xception65_optic.yaml
```
@@ -100,7 +103,10 @@ python pdseg/check.py --cfg ./configs/deeplabv3p_xception65_pet.yaml
校验通过后,使用下述命令启动训练
```shell
-python pdseg/train.py --use_gpu --cfg ./configs/deeplabv3p_xception65_pet.yaml
+# 指定GPU卡号(以0号卡为例)
+export CUDA_VISIBLE_DEVICES=0
+# 训练
+python pdseg/train.py --use_gpu --cfg ./configs/deeplabv3p_xception65_optic.yaml
```
## 六. 进行评估
@@ -108,22 +114,39 @@ python pdseg/train.py --use_gpu --cfg ./configs/deeplabv3p_xception65_pet.yaml
模型训练完成,使用下述命令启动评估
```shell
-python pdseg/eval.py --use_gpu --cfg ./configs/deeplabv3p_xception65_pet.yaml
+python pdseg/eval.py --use_gpu --cfg ./configs/deeplabv3p_xception65_optic.yaml
+```
+
+## 七. 进行可视化
+
+使用下述命令启动预测和可视化
+
+```shell
+python pdseg/vis.py --use_gpu --cfg ./configs/deeplabv3p_xception65_optic.yaml
```
+预测结果将保存在`visual`目录下,以下展示其中1张图片的预测效果:
+
+
+
+## 在线体验
+
+PaddleSeg在AI Studio平台上提供了在线体验的DeepLabv3+图像分割教程,欢迎[点击体验](https://aistudio.baidu.com/aistudio/projectDetail/226703)。
+
+
## 模型组合
-|预训练模型名称|BackBone|Norm Type|数据集|配置|
+|预训练模型名称|Backbone|Norm Type|数据集|配置|
|-|-|-|-|-|
-|mobilenetv2-2-0_bn_imagenet|-|bn|ImageNet|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: mobilenetv2
MODEL.DEEPLAB.DEPTH_MULTIPLIER: 2.0
MODEL.DEFAULT_NORM_TYPE: bn|
-|mobilenetv2-1-5_bn_imagenet|-|bn|ImageNet|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: mobilenetv2
MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.5
MODEL.DEFAULT_NORM_TYPE: bn|
-|mobilenetv2-1-0_bn_imagenet|-|bn|ImageNet|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: mobilenetv2
MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.0
MODEL.DEFAULT_NORM_TYPE: bn|
-|mobilenetv2-0-5_bn_imagenet|-|bn|ImageNet|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: mobilenetv2
MODEL.DEEPLAB.DEPTH_MULTIPLIER: 0.5
MODEL.DEFAULT_NORM_TYPE: bn|
-|mobilenetv2-0-25_bn_imagenet|-|bn|ImageNet|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: mobilenetv2
MODEL.DEEPLAB.DEPTH_MULTIPLIER: 0.25
MODEL.DEFAULT_NORM_TYPE: bn|
-|xception41_imagenet|-|bn|ImageNet|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: xception_41
MODEL.DEFAULT_NORM_TYPE: bn|
-|xception65_imagenet|-|bn|ImageNet|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: xception_65
MODEL.DEFAULT_NORM_TYPE: bn|
-|deeplabv3p_mobilenetv2-1-0_bn_coco|MobileNet V2|bn|COCO|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: mobilenetv2
MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.0
MODEL.DEEPLAB.ENCODER_WITH_ASPP: False
MODEL.DEEPLAB.ENABLE_DECODER: False
MODEL.DEFAULT_NORM_TYPE: bn|
-|**deeplabv3p_xception65_bn_coco**|Xception|bn|COCO|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: xception_65
MODEL.DEFAULT_NORM_TYPE: bn |
-|deeplabv3p_mobilenetv2-1-0_bn_cityscapes|MobileNet V2|bn|Cityscapes|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: mobilenetv2
MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.0
MODEL.DEEPLAB.ENCODER_WITH_ASPP: False
MODEL.DEEPLAB.ENABLE_DECODER: False
MODEL.DEFAULT_NORM_TYPE: bn|
-|deeplabv3p_xception65_gn_cityscapes|Xception|gn|Cityscapes|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: xception_65
MODEL.DEFAULT_NORM_TYPE: gn|
-|deeplabv3p_xception65_bn_cityscapes|Xception|bn|Cityscapes|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: xception_65
MODEL.DEFAULT_NORM_TYPE: bn|
+|mobilenetv2-2-0_bn_imagenet|MobileNetV2|bn|ImageNet|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: mobilenetv2
MODEL.DEEPLAB.DEPTH_MULTIPLIER: 2.0
MODEL.DEFAULT_NORM_TYPE: bn|
+|mobilenetv2-1-5_bn_imagenet|MobileNetV2|bn|ImageNet|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: mobilenetv2
MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.5
MODEL.DEFAULT_NORM_TYPE: bn|
+|mobilenetv2-1-0_bn_imagenet|MobileNetV2|bn|ImageNet|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: mobilenetv2
MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.0
MODEL.DEFAULT_NORM_TYPE: bn|
+|mobilenetv2-0-5_bn_imagenet|MobileNetV2|bn|ImageNet|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: mobilenetv2
MODEL.DEEPLAB.DEPTH_MULTIPLIER: 0.5
MODEL.DEFAULT_NORM_TYPE: bn|
+|mobilenetv2-0-25_bn_imagenet|MobileNetV2|bn|ImageNet|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: mobilenetv2
MODEL.DEEPLAB.DEPTH_MULTIPLIER: 0.25
MODEL.DEFAULT_NORM_TYPE: bn|
+|xception41_imagenet|Xception41|bn|ImageNet|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: xception_41
MODEL.DEFAULT_NORM_TYPE: bn|
+|xception65_imagenet|Xception65|bn|ImageNet|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: xception_65
MODEL.DEFAULT_NORM_TYPE: bn|
+|deeplabv3p_mobilenetv2-1-0_bn_coco|MobileNetV2|bn|COCO|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: mobilenetv2
MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.0
MODEL.DEEPLAB.ENCODER_WITH_ASPP: False
MODEL.DEEPLAB.ENABLE_DECODER: False
MODEL.DEFAULT_NORM_TYPE: bn|
+|**deeplabv3p_xception65_bn_coco**|Xception65|bn|COCO|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: xception_65
MODEL.DEFAULT_NORM_TYPE: bn |
+|deeplabv3p_mobilenetv2-1-0_bn_cityscapes|MobileNetV2|bn|Cityscapes|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: mobilenetv2
MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.0
MODEL.DEEPLAB.ENCODER_WITH_ASPP: False
MODEL.DEEPLAB.ENABLE_DECODER: False
MODEL.DEFAULT_NORM_TYPE: bn|
+|deeplabv3p_xception65_gn_cityscapes|Xception65|gn|Cityscapes|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: xception_65
MODEL.DEFAULT_NORM_TYPE: gn|
+|deeplabv3p_xception65_bn_cityscapes|Xception65|bn|Cityscapes|MODEL.MODEL_NAME: deeplabv3p
MODEL.DEEPLAB.BACKBONE: xception_65
MODEL.DEFAULT_NORM_TYPE: bn|
diff --git a/turtorial/finetune_fast_scnn.md b/turtorial/finetune_fast_scnn.md
new file mode 100644
index 0000000000000000000000000000000000000000..188a51edf9d138bb6832849c9ab2ad8afbcd3cd4
--- /dev/null
+++ b/turtorial/finetune_fast_scnn.md
@@ -0,0 +1,119 @@
+# Fast-SCNN模型训练教程
+
+* 本教程旨在介绍如何通过使用PaddleSeg提供的 ***`Fast_scnn_cityscapes`*** 预训练模型在自定义数据集上进行训练。
+
+* 在阅读本教程前,请确保您已经了解过PaddleSeg的[快速入门](../README.md#快速入门)和[基础功能](../README.md#基础功能)等章节,以便对PaddleSeg有一定的了解
+
+* 本教程的所有命令都基于PaddleSeg主目录进行执行
+
+## 一. 准备待训练数据
+
+我们提前准备好了一份数据集,通过以下代码进行下载
+
+```shell
+python dataset/download_pet.py
+```
+
+## 二. 下载预训练模型
+
+```shell
+python pretrained_model/download_model.py fast_scnn_cityscapes
+```
+
+## 三. 准备配置
+
+接着我们需要确定相关配置,从本教程的角度,配置分为三部分:
+
+* 数据集
+ * 训练集主目录
+ * 训练集文件列表
+ * 测试集文件列表
+ * 评估集文件列表
+* 预训练模型
+ * 预训练模型名称
+ * 预训练模型的backbone网络
+ * 预训练模型的Normalization类型
+ * 预训练模型路径
+* 其他
+ * 学习率
+ * Batch大小
+ * ...
+
+在三者中,预训练模型的配置尤为重要,如果模型或者BACKBONE配置错误,会导致预训练的参数没有加载,进而影响收敛速度。预训练模型相关的配置如第二步所展示。
+
+数据集的配置和数据路径有关,在本教程中,数据存放在`dataset/mini_pet`中
+
+其他配置则根据数据集和机器环境的情况进行调节,最终我们保存一个如下内容的yaml配置文件,存放路径为**configs/fast_scnn_pet.yaml**
+
+```yaml
+# 数据集配置
+DATASET:
+ DATA_DIR: "./dataset/mini_pet/"
+ NUM_CLASSES: 3
+ TEST_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
+ TRAIN_FILE_LIST: "./dataset/mini_pet/file_list/train_list.txt"
+ VAL_FILE_LIST: "./dataset/mini_pet/file_list/val_list.txt"
+ VIS_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
+
+# 预训练模型配置
+MODEL:
+ MODEL_NAME: "fast_scnn"
+ DEFAULT_NORM_TYPE: "bn"
+
+# 其他配置
+TRAIN_CROP_SIZE: (512, 512)
+EVAL_CROP_SIZE: (512, 512)
+AUG:
+ AUG_METHOD: "unpadding"
+ FIX_RESIZE_SIZE: (512, 512)
+BATCH_SIZE: 4
+TRAIN:
+ PRETRAINED_MODEL_DIR: "./pretrained_model/fast_scnn_cityscapes/"
+ MODEL_SAVE_DIR: "./saved_model/fast_scnn_pet/"
+ SNAPSHOT_EPOCH: 10
+TEST:
+ TEST_MODEL: "./saved_model/fast_scnn_pet/final"
+SOLVER:
+ NUM_EPOCHS: 100
+ LR: 0.005
+ LR_POLICY: "poly"
+ OPTIMIZER: "sgd"
+```
+
+## 四. 配置/数据校验
+
+在开始训练和评估之前,我们还需要对配置和数据进行一次校验,确保数据和配置是正确的。使用下述命令启动校验流程
+
+```shell
+python pdseg/check.py --cfg ./configs/fast_scnn_pet.yaml
+```
+
+
+## 五. 开始训练
+
+校验通过后,使用下述命令启动训练
+
+```shell
+python pdseg/train.py --use_gpu --cfg ./configs/fast_scnn_pet.yaml
+```
+
+## 六. 进行评估
+
+模型训练完成,使用下述命令启动评估
+
+```shell
+python pdseg/eval.py --use_gpu --cfg ./configs/fast_scnn_pet.yaml
+```
+
+
+## 七. 实时分割模型推理时间比较
+
+| 模型 | eval size | inference time | mIoU on cityscape val|
+|---|---|---|---|
+| DeepLabv3+/MobileNetv2/bn | (1024, 2048) |16.14ms| 0.698|
+| ICNet/bn |(1024, 2048) |8.76ms| 0.6831 |
+| Fast-SCNN/bn | (1024, 2048) |6.28ms| 0.6964 |
+
+上述测试环境为v100. 测试使用paddle的推理接口[zero_copy](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_usage/deploy/inference/python_infer_cn.html#id8)的方式,模型输出是类别,即argmax后的值。
+
+
diff --git a/turtorial/finetune_hrnet.md b/turtorial/finetune_hrnet.md
index f7feb9ddafd909fa829cf5f3e3d1c66c82505f57..9475a8aab8386364ab6be7e976ac30dae73d4645 100644
--- a/turtorial/finetune_hrnet.md
+++ b/turtorial/finetune_hrnet.md
@@ -1,22 +1,23 @@
-# HRNet模型训练教程
+# HRNet模型使用教程
-* 本教程旨在介绍如何通过使用PaddleSeg提供的 ***`HRNet`*** 预训练模型在自定义数据集上进行训练。
+本教程旨在介绍如何通过使用PaddleSeg提供的 ***`HRNet`*** 预训练模型在自定义数据集上进行训练、评估和可视化。
-* 在阅读本教程前,请确保您已经了解过PaddleSeg的[快速入门](../README.md#快速入门)和[基础功能](../README.md#基础功能)等章节,以便对PaddleSeg有一定的了解
+* 在阅读本教程前,请确保您已经了解过PaddleSeg的[快速入门](../README.md#快速入门)和[基础功能](../README.md#基础功能)等章节,以便对PaddleSeg有一定的了解。
-* 本教程的所有命令都基于PaddleSeg主目录进行执行
+* 本教程的所有命令都基于PaddleSeg主目录进行执行。
## 一. 准备待训练数据
-我们提前准备好了一份数据集,通过以下代码进行下载
+
+
+我们提前准备好了一份眼底医疗分割数据集,包含267张训练图片、76张验证图片、38张测试图片。通过以下命令进行下载:
```shell
-python dataset/download_pet.py
+python dataset/download_optic.py
```
-## 二. 下载预训练模型
-关于PaddleSeg支持的所有预训练模型的列表,我们可以从[模型组合](#模型组合)中查看我们所需模型的名字和配置
+## 二. 下载预训练模型
接着下载对应的预训练模型
@@ -24,6 +25,8 @@ python dataset/download_pet.py
python pretrained_model/download_model.py hrnet_w18_bn_cityscapes
```
+关于已有的HRNet预训练模型的列表,请参见[模型组合](#模型组合)。如果需要使用其他预训练模型,下载该模型并将配置中的BACKBONE、NORM_TYPE等进行替换即可。
+
## 三. 准备配置
接着我们需要确定相关配置,从本教程的角度,配置分为三部分:
@@ -45,19 +48,19 @@ python pretrained_model/download_model.py hrnet_w18_bn_cityscapes
在三者中,预训练模型的配置尤为重要,如果模型配置错误,会导致预训练的参数没有加载,进而影响收敛速度。预训练模型相关的配置如第二步所展示。
-数据集的配置和数据路径有关,在本教程中,数据存放在`dataset/mini_pet`中
+数据集的配置和数据路径有关,在本教程中,数据存放在`dataset/optic_disc_seg`中
-其他配置则根据数据集和机器环境的情况进行调节,最终我们保存一个如下内容的yaml配置文件,存放路径为**configs/hrnet_w18_pet.yaml**
+其他配置则根据数据集和机器环境的情况进行调节,最终我们保存一个如下内容的yaml配置文件,存放路径为**configs/hrnet_optic.yaml**
```yaml
# 数据集配置
DATASET:
- DATA_DIR: "./dataset/mini_pet/"
- NUM_CLASSES: 3
- TEST_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
- TRAIN_FILE_LIST: "./dataset/mini_pet/file_list/train_list.txt"
- VAL_FILE_LIST: "./dataset/mini_pet/file_list/val_list.txt"
- VIS_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
+ DATA_DIR: "./dataset/optic_disc_seg/"
+ NUM_CLASSES: 2
+ TEST_FILE_LIST: "./dataset/optic_disc_seg/test_list.txt"
+ TRAIN_FILE_LIST: "./dataset/optic_disc_seg/train_list.txt"
+ VAL_FILE_LIST: "./dataset/optic_disc_seg/val_list.txt"
+ VIS_FILE_LIST: "./dataset/optic_disc_seg/test_list.txt"
# 预训练模型配置
MODEL:
@@ -80,15 +83,15 @@ AUG:
BATCH_SIZE: 4
TRAIN:
PRETRAINED_MODEL_DIR: "./pretrained_model/hrnet_w18_bn_cityscapes/"
- MODEL_SAVE_DIR: "./saved_model/hrnet_w18_bn_pet/"
- SNAPSHOT_EPOCH: 10
+ MODEL_SAVE_DIR: "./saved_model/hrnet_optic/"
+ SNAPSHOT_EPOCH: 5
TEST:
- TEST_MODEL: "./saved_model/hrnet_w18_bn_pet/final"
+ TEST_MODEL: "./saved_model/hrnet_optic/final"
SOLVER:
- NUM_EPOCHS: 100
- LR: 0.005
+ NUM_EPOCHS: 10
+ LR: 0.001
LR_POLICY: "poly"
- OPTIMIZER: "sgd"
+ OPTIMIZER: "adam"
```
## 四. 配置/数据校验
@@ -96,7 +99,7 @@ SOLVER:
在开始训练和评估之前,我们还需要对配置和数据进行一次校验,确保数据和配置是正确的。使用下述命令启动校验流程
```shell
-python pdseg/check.py --cfg ./configs/hrnet_w18_pet.yaml
+python pdseg/check.py --cfg ./configs/hrnet_optic.yaml
```
@@ -105,7 +108,10 @@ python pdseg/check.py --cfg ./configs/hrnet_w18_pet.yaml
校验通过后,使用下述命令启动训练
```shell
-python pdseg/train.py --use_gpu --cfg ./configs/hrnet_w18_pet.yaml
+# 指定GPU卡号(以0号卡为例)
+export CUDA_VISIBLE_DEVICES=0
+# 训练
+python pdseg/train.py --use_gpu --cfg ./configs/hrnet_optic.yaml
```
## 六. 进行评估
@@ -113,19 +119,30 @@ python pdseg/train.py --use_gpu --cfg ./configs/hrnet_w18_pet.yaml
模型训练完成,使用下述命令启动评估
```shell
-python pdseg/eval.py --use_gpu --cfg ./configs/hrnet_w18_pet.yaml
+python pdseg/eval.py --use_gpu --cfg ./configs/hrnet_optic.yaml
+```
+
+## 七. 进行可视化
+使用下述命令启动预测和可视化
+
+```shell
+python pdseg/vis.py --use_gpu --cfg ./configs/hrnet_optic.yaml
```
+预测结果将保存在visual目录下,以下展示其中1张图片的预测效果:
+
+
+
## 模型组合
-|预训练模型名称|BackBone|Norm Type|数据集|配置|
-|-|-|-|-|-|
-|hrnet_w18_bn_cityscapes|-|bn| ImageNet | MODEL.MODEL_NAME: hrnet
MODEL.HRNET.STAGE2.NUM_CHANNELS: [18, 36]
MODEL.HRNET.STAGE3.NUM_CHANNELS: [18, 36, 72]
MODEL.HRNET.STAGE4.NUM_CHANNELS: [18, 36, 72, 144]
MODEL.DEFAULT_NORM_TYPE: bn|
-| hrnet_w18_bn_imagenet |-|bn| ImageNet | MODEL.MODEL_NAME: hrnet
MODEL.HRNET.STAGE2.NUM_CHANNELS: [18, 36]
MODEL.HRNET.STAGE3.NUM_CHANNELS: [18, 36, 72]
MODEL.HRNET.STAGE4.NUM_CHANNELS: [18, 36, 72, 144]
MODEL.DEFAULT_NORM_TYPE: bn |
-| hrnet_w30_bn_imagenet |-|bn| ImageNet | MODEL.MODEL_NAME: hrnet
MODEL.HRNET.STAGE2.NUM_CHANNELS: [30, 60]
MODEL.HRNET.STAGE3.NUM_CHANNELS: [30, 60, 120]
MODEL.HRNET.STAGE4.NUM_CHANNELS: [30, 60, 120, 240]
MODEL.DEFAULT_NORM_TYPE: bn |
-| hrnet_w32_bn_imagenet |-|bn| ImageNet | MODEL.MODEL_NAME: hrnet
MODEL.HRNET.STAGE2.NUM_CHANNELS: [32, 64]
MODEL.HRNET.STAGE3.NUM_CHANNELS: [32, 64, 128]
MODEL.HRNET.STAGE4.NUM_CHANNELS: [32, 64, 128, 256]
MODEL.DEFAULT_NORM_TYPE: bn |
-| hrnet_w40_bn_imagenet |-|bn| ImageNet | MODEL.MODEL_NAME: hrnet
MODEL.HRNET.STAGE2.NUM_CHANNELS: [40, 80]
MODEL.HRNET.STAGE3.NUM_CHANNELS: [40, 80, 160]
MODEL.HRNET.STAGE4.NUM_CHANNELS: [40, 80, 160, 320]
MODEL.DEFAULT_NORM_TYPE: bn |
-| hrnet_w44_bn_imagenet |-|bn| ImageNet | MODEL.MODEL_NAME: hrnet
MODEL.HRNET.STAGE2.NUM_CHANNELS: [44, 88]
MODEL.HRNET.STAGE3.NUM_CHANNELS: [44, 88, 176]
MODEL.HRNET.STAGE4.NUM_CHANNELS: [44, 88, 176, 352]
MODEL.DEFAULT_NORM_TYPE: bn |
-| hrnet_w48_bn_imagenet |-|bn| ImageNet | MODEL.MODEL_NAME: hrnet
MODEL.HRNET.STAGE2.NUM_CHANNELS: [48, 96]
MODEL.HRNET.STAGE3.NUM_CHANNELS: [48, 96, 192]
MODEL.HRNET.STAGE4.NUM_CHANNELS: [48, 96, 192, 384]
MODEL.DEFAULT_NORM_TYPE: bn |
-| hrnet_w64_bn_imagenet |-|bn| ImageNet | MODEL.MODEL_NAME: hrnet
MODEL.HRNET.STAGE2.NUM_CHANNELS: [64, 128]
MODEL.HRNET.STAGE3.NUM_CHANNELS: [64, 128, 256]
MODEL.HRNET.STAGE4.NUM_CHANNELS: [64, 128, 256, 512]
MODEL.DEFAULT_NORM_TYPE: bn |
+|预训练模型名称|Backbone|数据集|配置|
+|-|-|-|-|
+|hrnet_w18_bn_cityscapes|HRNet| Cityscapes | MODEL.MODEL_NAME: hrnet
MODEL.HRNET.STAGE2.NUM_CHANNELS: [18, 36]
MODEL.HRNET.STAGE3.NUM_CHANNELS: [18, 36, 72]
MODEL.HRNET.STAGE4.NUM_CHANNELS: [18, 36, 72, 144]
MODEL.DEFAULT_NORM_TYPE: bn|
+| hrnet_w18_bn_imagenet |HRNet| ImageNet | MODEL.MODEL_NAME: hrnet
MODEL.HRNET.STAGE2.NUM_CHANNELS: [18, 36]
MODEL.HRNET.STAGE3.NUM_CHANNELS: [18, 36, 72]
MODEL.HRNET.STAGE4.NUM_CHANNELS: [18, 36, 72, 144]
MODEL.DEFAULT_NORM_TYPE: bn |
+| hrnet_w30_bn_imagenet |HRNet| ImageNet | MODEL.MODEL_NAME: hrnet
MODEL.HRNET.STAGE2.NUM_CHANNELS: [30, 60]
MODEL.HRNET.STAGE3.NUM_CHANNELS: [30, 60, 120]
MODEL.HRNET.STAGE4.NUM_CHANNELS: [30, 60, 120, 240]
MODEL.DEFAULT_NORM_TYPE: bn |
+| hrnet_w32_bn_imagenet |HRNet|ImageNet | MODEL.MODEL_NAME: hrnet
MODEL.HRNET.STAGE2.NUM_CHANNELS: [32, 64]
MODEL.HRNET.STAGE3.NUM_CHANNELS: [32, 64, 128]
MODEL.HRNET.STAGE4.NUM_CHANNELS: [32, 64, 128, 256]
MODEL.DEFAULT_NORM_TYPE: bn |
+| hrnet_w40_bn_imagenet |HRNet| ImageNet | MODEL.MODEL_NAME: hrnet
MODEL.HRNET.STAGE2.NUM_CHANNELS: [40, 80]
MODEL.HRNET.STAGE3.NUM_CHANNELS: [40, 80, 160]
MODEL.HRNET.STAGE4.NUM_CHANNELS: [40, 80, 160, 320]
MODEL.DEFAULT_NORM_TYPE: bn |
+| hrnet_w44_bn_imagenet |HRNet| ImageNet | MODEL.MODEL_NAME: hrnet
MODEL.HRNET.STAGE2.NUM_CHANNELS: [44, 88]
MODEL.HRNET.STAGE3.NUM_CHANNELS: [44, 88, 176]
MODEL.HRNET.STAGE4.NUM_CHANNELS: [44, 88, 176, 352]
MODEL.DEFAULT_NORM_TYPE: bn |
+| hrnet_w48_bn_imagenet |HRNet| ImageNet | MODEL.MODEL_NAME: hrnet
MODEL.HRNET.STAGE2.NUM_CHANNELS: [48, 96]
MODEL.HRNET.STAGE3.NUM_CHANNELS: [48, 96, 192]
MODEL.HRNET.STAGE4.NUM_CHANNELS: [48, 96, 192, 384]
MODEL.DEFAULT_NORM_TYPE: bn |
+| hrnet_w64_bn_imagenet |HRNet| ImageNet | MODEL.MODEL_NAME: hrnet
MODEL.HRNET.STAGE2.NUM_CHANNELS: [64, 128]
MODEL.HRNET.STAGE3.NUM_CHANNELS: [64, 128, 256]
MODEL.HRNET.STAGE4.NUM_CHANNELS: [64, 128, 256, 512]
MODEL.DEFAULT_NORM_TYPE: bn |
diff --git a/turtorial/finetune_icnet.md b/turtorial/finetune_icnet.md
index 00caf4f87f206000bc2dde8440bdbe08ff03f555..57adc200d9d4857768d5055d8160b7b729332389 100644
--- a/turtorial/finetune_icnet.md
+++ b/turtorial/finetune_icnet.md
@@ -1,32 +1,34 @@
-# ICNet模型训练教程
+# ICNet模型使用教程
-* 本教程旨在介绍如何通过使用PaddleSeg提供的 ***`ICNet`*** 预训练模型在自定义数据集上进行训练
+本教程旨在介绍如何通过使用PaddleSeg提供的 ***`ICNet`*** 预训练模型在自定义数据集上进行训练、评估和可视化。
-* 在阅读本教程前,请确保您已经了解过PaddleSeg的[快速入门](../README.md#快速入门)和[基础功能](../README.md#基础功能)等章节,以便对PaddleSeg有一定的了解
+* 在阅读本教程前,请确保您已经了解过PaddleSeg的[快速入门](../README.md#快速入门)和[基础功能](../README.md#基础功能)等章节,以便对PaddleSeg有一定的了解。
-* 本教程的所有命令都基于PaddleSeg主目录进行执行
+* 本教程的所有命令都基于PaddleSeg主目录进行执行。
* 注意 ***`ICNet`*** 不支持在cpu环境上训练和评估
## 一. 准备待训练数据
-我们提前准备好了一份数据集,通过以下代码进行下载
+
+
+我们提前准备好了一份眼底医疗分割数据集,包含267张训练图片、76张验证图片、38张测试图片。通过以下命令进行下载:
```shell
-python dataset/download_pet.py
+python dataset/download_optic.py
```
## 二. 下载预训练模型
-关于PaddleSeg支持的所有预训练模型的列表,我们可以从[模型组合](#模型组合)中查看我们所需模型的名字和配置。
-
接着下载对应的预训练模型
```shell
python pretrained_model/download_model.py icnet_bn_cityscapes
```
+关于已有的ICNet预训练模型的列表,请参见[模型组合](#模型组合)。如果需要使用其他预训练模型,下载该模型并将配置中的BACKBONE、NORM_TYPE等进行替换即可。
+
## 三. 准备配置
接着我们需要确定相关配置,从本教程的角度,配置分为三部分:
@@ -48,20 +50,19 @@ python pretrained_model/download_model.py icnet_bn_cityscapes
在三者中,预训练模型的配置尤为重要,如果模型或者BACKBONE配置错误,会导致预训练的参数没有加载,进而影响收敛速度。预训练模型相关的配置如第二步所示。
-数据集的配置和数据路径有关,在本教程中,数据存放在`dataset/mini_pet`中
+数据集的配置和数据路径有关,在本教程中,数据存放在`dataset/optic_disc_seg`中
-其他配置则根据数据集和机器环境的情况进行调节,最终我们保存一个如下内容的yaml配置文件,存放路径为**configs/icnet_pet.yaml**
+其他配置则根据数据集和机器环境的情况进行调节,最终我们保存一个如下内容的yaml配置文件,存放路径为**configs/icnet_optic.yaml**
```yaml
# 数据集配置
DATASET:
- DATA_DIR: "./dataset/mini_pet/"
- NUM_CLASSES: 3
- TEST_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
- TRAIN_FILE_LIST: "./dataset/mini_pet/file_list/train_list.txt"
- VAL_FILE_LIST: "./dataset/mini_pet/file_list/val_list.txt"
- VIS_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
-
+ DATA_DIR: "./dataset/optic_disc_seg/"
+ NUM_CLASSES: 2
+ TEST_FILE_LIST: "./dataset/optic_disc_seg/test_list.txt"
+ TRAIN_FILE_LIST: "./dataset/optic_disc_seg/train_list.txt"
+ VAL_FILE_LIST: "./dataset/optic_disc_seg/val_list.txt"
+ VIS_FILE_LIST: "./dataset/optic_disc_seg/test_list.txt"
# 预训练模型配置
MODEL:
@@ -80,15 +81,15 @@ AUG:
BATCH_SIZE: 4
TRAIN:
PRETRAINED_MODEL_DIR: "./pretrained_model/icnet_bn_cityscapes/"
- MODEL_SAVE_DIR: "./saved_model/icnet_pet/"
- SNAPSHOT_EPOCH: 10
+ MODEL_SAVE_DIR: "./saved_model/icnet_optic/"
+ SNAPSHOT_EPOCH: 5
TEST:
- TEST_MODEL: "./saved_model/icnet_pet/final"
+ TEST_MODEL: "./saved_model/icnet_optic/final"
SOLVER:
- NUM_EPOCHS: 100
- LR: 0.005
+ NUM_EPOCHS: 10
+ LR: 0.001
LR_POLICY: "poly"
- OPTIMIZER: "sgd"
+ OPTIMIZER: "adam"
```
## 四. 配置/数据校验
@@ -96,7 +97,7 @@ SOLVER:
在开始训练和评估之前,我们还需要对配置和数据进行一次校验,确保数据和配置是正确的。使用下述命令启动校验流程
```shell
-python pdseg/check.py --cfg ./configs/icnet_pet.yaml
+python pdseg/check.py --cfg ./configs/icnet_optic.yaml
```
@@ -105,7 +106,10 @@ python pdseg/check.py --cfg ./configs/icnet_pet.yaml
校验通过后,使用下述命令启动训练
```shell
-python pdseg/train.py --use_gpu --cfg ./configs/icnet_pet.yaml
+# 指定GPU卡号(以0号卡为例)
+export CUDA_VISIBLE_DEVICES=0
+# 训练
+python pdseg/train.py --use_gpu --cfg ./configs/icnet_optic.yaml
```
## 六. 进行评估
@@ -113,11 +117,22 @@ python pdseg/train.py --use_gpu --cfg ./configs/icnet_pet.yaml
模型训练完成,使用下述命令启动评估
```shell
-python pdseg/eval.py --use_gpu --cfg ./configs/icnet_pet.yaml
+python pdseg/eval.py --use_gpu --cfg ./configs/icnet_optic.yaml
+```
+
+## 七. 进行可视化
+使用下述命令启动预测和可视化
+
+```shell
+python pdseg/vis.py --use_gpu --cfg ./configs/icnet_optic.yaml
```
+预测结果将保存在visual目录下,以下展示其中1张图片的预测效果:
+
+
+
## 模型组合
-|预训练模型名称|BackBone|Norm|数据集|配置|
-|-|-|-|-|-|
-|icnet_bn_cityscapes|-|bn|Cityscapes|MODEL.MODEL_NAME: icnet
MODEL.DEFAULT_NORM_TYPE: bn
MODEL.MULTI_LOSS_WEIGHT: [1.0, 0.4, 0.16]|
+|预训练模型名称|Backbone|数据集|配置|
+|-|-|-|-|
+|icnet_bn_cityscapes|ResNet50|Cityscapes|MODEL.MODEL_NAME: icnet
MODEL.DEFAULT_NORM_TYPE: bn
MODEL.MULTI_LOSS_WEIGHT: [1.0, 0.4, 0.16]|
diff --git a/turtorial/finetune_pspnet.md b/turtorial/finetune_pspnet.md
index 931c3c5f7515e2ebec3d4fccf3069ecc6d6c00fb..8c52bbe4646d253f70a24001ed6e414a1bee3cc3 100644
--- a/turtorial/finetune_pspnet.md
+++ b/turtorial/finetune_pspnet.md
@@ -1,29 +1,31 @@
# PSPNET模型训练教程
-* 本教程旨在介绍如何通过使用PaddleSeg提供的 ***`PSPNET`*** 预训练模型在自定义数据集上进行训练
+本教程旨在介绍如何通过使用PaddleSeg提供的 ***`PSPNET`*** 预训练模型在自定义数据集上进行训练。
-* 在阅读本教程前,请确保您已经了解过PaddleSeg的[快速入门](../README.md#快速入门)和[基础功能](../README.md#基础功能)等章节,以便对PaddleSeg有一定的了解
+* 在阅读本教程前,请确保您已经了解过PaddleSeg的[快速入门](../README.md#快速入门)和[基础功能](../README.md#基础功能)等章节,以便对PaddleSeg有一定的了解。
-* 本教程的所有命令都基于PaddleSeg主目录进行执行
+* 本教程的所有命令都基于PaddleSeg主目录进行执行。
## 一. 准备待训练数据
-我们提前准备好了一份数据集,通过以下代码进行下载
+
+
+我们提前准备好了一份眼底医疗分割数据集,包含267张训练图片、76张验证图片、38张测试图片。通过以下命令进行下载:
```shell
-python dataset/download_pet.py
+python dataset/download_optic.py
```
## 二. 下载预训练模型
-关于PaddleSeg支持的所有预训练模型的列表,我们可以从[模型组合](#模型组合)中查看我们所需模型的名字和配置。
-
接着下载对应的预训练模型
```shell
python pretrained_model/download_model.py pspnet50_bn_cityscapes
```
+关于已有的PSPNet预训练模型的列表,请参见[PSPNet预训练模型组合](#PSPNet预训练模型组合)。如果需要使用其他预训练模型,下载该模型并将配置中的BACKBONE、NORM_TYPE等进行替换即可。
+
## 三. 准备配置
接着我们需要确定相关配置,从本教程的角度,配置分为三部分:
@@ -45,20 +47,19 @@ python pretrained_model/download_model.py pspnet50_bn_cityscapes
在三者中,预训练模型的配置尤为重要,如果模型或者BACKBONE配置错误,会导致预训练的参数没有加载,进而影响收敛速度。预训练模型相关的配置如第二步所示。
-数据集的配置和数据路径有关,在本教程中,数据存放在`dataset/mini_pet`中
+数据集的配置和数据路径有关,在本教程中,数据存放在`dataset/optic_disc_seg`中
-其他配置则根据数据集和机器环境的情况进行调节,最终我们保存一个如下内容的yaml配置文件,存放路径为`configs/test_pet.yaml`
+其他配置则根据数据集和机器环境的情况进行调节,最终我们保存一个如下内容的yaml配置文件,存放路径为`configs/pspnet_optic.yaml`
```yaml
# 数据集配置
DATASET:
- DATA_DIR: "./dataset/mini_pet/"
- NUM_CLASSES: 3
- TEST_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
- TRAIN_FILE_LIST: "./dataset/mini_pet/file_list/train_list.txt"
- VAL_FILE_LIST: "./dataset/mini_pet/file_list/val_list.txt"
- VIS_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
-
+ DATA_DIR: "./dataset/optic_disc_seg/"
+ NUM_CLASSES: 2
+ TEST_FILE_LIST: "./dataset/optic_disc_seg/test_list.txt"
+ TRAIN_FILE_LIST: "./dataset/optic_disc_seg/train_list.txt"
+ VAL_FILE_LIST: "./dataset/optic_disc_seg/val_list.txt"
+ VIS_FILE_LIST: "./dataset/optic_disc_seg/test_list.txt"
# 预训练模型配置
MODEL:
@@ -77,15 +78,15 @@ AUG:
BATCH_SIZE: 4
TRAIN:
PRETRAINED_MODEL_DIR: "./pretrained_model/pspnet50_bn_cityscapes/"
- MODEL_SAVE_DIR: "./saved_model/pspnet_pet/"
- SNAPSHOT_EPOCH: 10
+ MODEL_SAVE_DIR: "./saved_model/pspnet_optic/"
+ SNAPSHOT_EPOCH: 5
TEST:
- TEST_MODEL: "./saved_model/pspnet_pet/final"
+ TEST_MODEL: "./saved_model/pspnet_optic/final"
SOLVER:
- NUM_EPOCHS: 100
- LR: 0.005
+ NUM_EPOCHS: 10
+ LR: 0.001
LR_POLICY: "poly"
- OPTIMIZER: "sgd"
+ OPTIMIZER: "adam"
```
## 四. 配置/数据校验
@@ -93,7 +94,7 @@ SOLVER:
在开始训练和评估之前,我们还需要对配置和数据进行一次校验,确保数据和配置是正确的。使用下述命令启动校验流程
```shell
-python pdseg/check.py --cfg ./configs/test_pet.yaml
+python pdseg/check.py --cfg ./configs/pspnet_optic.yaml
```
@@ -102,7 +103,10 @@ python pdseg/check.py --cfg ./configs/test_pet.yaml
校验通过后,使用下述命令启动训练
```shell
-python pdseg/train.py --use_gpu --cfg ./configs/test_pet.yaml
+# 指定GPU卡号(以0号卡为例)
+export CUDA_VISIBLE_DEVICES=0
+# 训练
+python pdseg/train.py --use_gpu --cfg ./configs/pspnet_optic.yaml
```
## 六. 进行评估
@@ -110,12 +114,27 @@ python pdseg/train.py --use_gpu --cfg ./configs/test_pet.yaml
模型训练完成,使用下述命令启动评估
```shell
-python pdseg/eval.py --use_gpu --cfg ./configs/test_pet.yaml
+python pdseg/eval.py --use_gpu --cfg ./configs/pspnet_optic.yaml
+```
+
+## 七. 进行可视化
+使用下述命令启动预测和可视化
+
+```shell
+python pdseg/vis.py --use_gpu --cfg ./configs/pspnet_optic.yaml
```
-## 模型组合
+预测结果将保存在visual目录下,以下展示其中1张图片的预测效果:
+
+
+
+## PSPNet预训练模型组合
-|预训练模型名称|BackBone|Norm|数据集|配置|
-|-|-|-|-|-|
-|pspnet50_bn_cityscapes|ResNet50|bn|Cityscapes|MODEL.MODEL_NAME: pspnet
MODEL.DEFAULT_NORM_TYPE: bn
MODEL.PSPNET.LAYERS: 50|
-|pspnet101_bn_cityscapes|ResNet101|bn|Cityscapes|MODEL.MODEL_NAME: pspnet
MODEL.DEFAULT_NORM_TYPE: bn
MODEL.PSPNET.LAYERS: 101|
+|模型|BackBone|数据集|配置|
+|-|-|-|-|
+|[pspnet50_cityscapes](https://paddleseg.bj.bcebos.com/models/pspnet50_cityscapes.tgz)|ResNet50(适配PSPNet)|Cityscapes |MODEL.MODEL_NAME: pspnet
MODEL.DEFAULT_NORM_TYPE: bn
MODEL.PSPNET.LAYERS: 50|
+|[pspnet101_cityscapes](https://paddleseg.bj.bcebos.com/models/pspnet101_cityscapes.tgz)|ResNet101(适配PSPNet)|Cityscapes |MODEL.MODEL_NAME: pspnet
MODEL.DEFAULT_NORM_TYPE: bn
MODEL.PSPNET.LAYERS: 101|
+| [pspnet50_coco](https://paddleseg.bj.bcebos.com/models/pspnet50_coco.tgz)|ResNet50(适配PSPNet)|COCO |MODEL.MODEL_NAME: pspnet
MODEL.DEFAULT_NORM_TYPE: bn
MODEL.PSPNET.LAYERS: 50|
+| [pspnet101_coco](https://paddleseg.bj.bcebos.com/models/pspnet101_coco.tgz) |ResNet101(适配PSPNet)| COCO |MODEL.MODEL_NAME: pspnet
MODEL.DEFAULT_NORM_TYPE: bn
MODEL.PSPNET.LAYERS: 101|
+| [resnet50_v2_pspnet](https://paddleseg.bj.bcebos.com/resnet50_v2_pspnet.tgz)| ResNet50(适配PSPNet) | ImageNet | MODEL.MODEL_NAME: pspnet
MODEL.DEFAULT_NORM_TYPE: bn
MODEL.PSPNET.LAYERS: 50 |
+| [resnet101_v2_pspnet](https://paddleseg.bj.bcebos.com/resnet101_v2_pspnet.tgz)| ResNet101(适配PSPNet) | ImageNet | MODEL.MODEL_NAME: pspnet
MODEL.DEFAULT_NORM_TYPE: bn
MODEL.PSPNET.LAYERS: 101 |
diff --git a/turtorial/finetune_unet.md b/turtorial/finetune_unet.md
index b1baff8b0d6a9438df0ae4ed6a5f0dfdae4d3414..dd2945cf587fc18ed760639a56ad7b8edebc0087 100644
--- a/turtorial/finetune_unet.md
+++ b/turtorial/finetune_unet.md
@@ -1,29 +1,31 @@
-# U-Net模型训练教程
+# U-Net模型使用教程
-* 本教程旨在介绍如何通过使用PaddleSeg提供的 ***`U-Net`*** 预训练模型在自定义数据集上进行训练
+本教程旨在介绍如何通过使用PaddleSeg提供的 ***`U-Net`*** 预训练模型在自定义数据集上进行训练、评估和可视化。
-* 在阅读本教程前,请确保您已经了解过PaddleSeg的[快速入门](../README.md#快速入门)和[基础功能](../README.md#基础功能)等章节,以便对PaddleSeg有一定的了解
+* 在阅读本教程前,请确保您已经了解过PaddleSeg的[快速入门](../README.md#快速入门)和[基础功能](../README.md#基础功能)等章节,以便对PaddleSeg有一定的了解。
-* 本教程的所有命令都基于PaddleSeg主目录进行执行
+* 本教程的所有命令都基于PaddleSeg主目录进行执行。
## 一. 准备待训练数据
-我们提前准备好了一份数据集,通过以下代码进行下载
+
+
+我们提前准备好了一份眼底医疗分割数据集,包含267张训练图片、76张验证图片、38张测试图片。通过以下命令进行下载:
```shell
-python dataset/download_pet.py
+python dataset/download_optic.py
```
## 二. 下载预训练模型
-关于PaddleSeg支持的所有预训练模型的列表,我们可以从[模型组合](#模型组合)中查看我们所需模型的名字和配置。
-
接着下载对应的预训练模型
```shell
python pretrained_model/download_model.py unet_bn_coco
```
+关于已有的U-Net预训练模型的列表,请参见[模型组合](#模型组合)。如果需要使用其他预训练模型,下载该模型并将配置中的BACKBONE、NORM_TYPE等进行替换即可。
+
## 三. 准备配置
接着我们需要确定相关配置,从本教程的角度,配置分为三部分:
@@ -45,20 +47,19 @@ python pretrained_model/download_model.py unet_bn_coco
在三者中,预训练模型的配置尤为重要,如果模型或者BACKBONE配置错误,会导致预训练的参数没有加载,进而影响收敛速度。预训练模型相关的配置如第二步所展示。
-数据集的配置和数据路径有关,在本教程中,数据存放在`dataset/mini_pet`中
+数据集的配置和数据路径有关,在本教程中,数据存放在`dataset/optic_disc_seg`中
-其他配置则根据数据集和机器环境的情况进行调节,最终我们保存一个如下内容的yaml配置文件,存放路径为**configs/unet_pet.yaml**
+其他配置则根据数据集和机器环境的情况进行调节,最终我们保存一个如下内容的yaml配置文件,存放路径为**configs/unet_optic.yaml**
```yaml
# 数据集配置
DATASET:
- DATA_DIR: "./dataset/mini_pet/"
- NUM_CLASSES: 3
- TEST_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
- TRAIN_FILE_LIST: "./dataset/mini_pet/file_list/train_list.txt"
- VAL_FILE_LIST: "./dataset/mini_pet/file_list/val_list.txt"
- VIS_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
-
+ DATA_DIR: "./dataset/optic_disc_seg/"
+ NUM_CLASSES: 2
+ TEST_FILE_LIST: "./dataset/optic_disc_seg/test_list.txt"
+ TRAIN_FILE_LIST: "./dataset/optic_disc_seg/train_list.txt"
+ VAL_FILE_LIST: "./dataset/optic_disc_seg/val_list.txt"
+ VIS_FILE_LIST: "./dataset/optic_disc_seg/test_list.txt"
# 预训练模型配置
MODEL:
@@ -74,13 +75,13 @@ AUG:
BATCH_SIZE: 4
TRAIN:
PRETRAINED_MODEL_DIR: "./pretrained_model/unet_bn_coco/"
- MODEL_SAVE_DIR: "./saved_model/unet_pet/"
- SNAPSHOT_EPOCH: 10
+ MODEL_SAVE_DIR: "./saved_model/unet_optic/"
+ SNAPSHOT_EPOCH: 5
TEST:
- TEST_MODEL: "./saved_model/unet_pet/final"
+ TEST_MODEL: "./saved_model/unet_optic/final"
SOLVER:
- NUM_EPOCHS: 100
- LR: 0.005
+ NUM_EPOCHS: 10
+ LR: 0.001
LR_POLICY: "poly"
OPTIMIZER: "adam"
```
@@ -90,7 +91,7 @@ SOLVER:
在开始训练和评估之前,我们还需要对配置和数据进行一次校验,确保数据和配置是正确的。使用下述命令启动校验流程
```shell
-python pdseg/check.py --cfg ./configs/unet_pet.yaml
+python pdseg/check.py --cfg ./configs/unet_optic.yaml
```
@@ -99,7 +100,10 @@ python pdseg/check.py --cfg ./configs/unet_pet.yaml
校验通过后,使用下述命令启动训练
```shell
-python pdseg/train.py --use_gpu --cfg ./configs/unet_pet.yaml
+# 指定GPU卡号(以0号卡为例)
+export CUDA_VISIBLE_DEVICES=0
+# 训练
+python pdseg/train.py --use_gpu --cfg ./configs/unet_optic.yaml
```
## 六. 进行评估
@@ -107,11 +111,26 @@ python pdseg/train.py --use_gpu --cfg ./configs/unet_pet.yaml
模型训练完成,使用下述命令启动评估
```shell
-python pdseg/eval.py --use_gpu --cfg ./configs/unet_pet.yaml
+python pdseg/eval.py --use_gpu --cfg ./configs/unet_optic.yaml
+```
+
+## 七. 进行可视化
+使用下述命令启动预测和可视化
+
+```shell
+python pdseg/vis.py --use_gpu --cfg ./configs/unet_optic.yaml
```
+预测结果将保存在visual目录下,以下展示其中1张图片的预测效果:
+
+
+
+## 在线体验
+
+PaddleSeg在AI Studio平台上提供了在线体验的U-Net分割教程,欢迎[点击体验](https://aistudio.baidu.com/aistudio/projectDetail/102889)。
+
## 模型组合
-|预训练模型名称|BackBone|Norm|数据集|配置|
-|-|-|-|-|-|
-|unet_bn_coco|-|bn|COCO|MODEL.MODEL_NAME: unet
MODEL.DEFAULT_NORM_TYPE: bn|
+|预训练模型名称|Backbone|数据集|配置|
+|-|-|-|-|
+|unet_bn_coco|VGG16|COCO|MODEL.MODEL_NAME: unet
MODEL.DEFAULT_NORM_TYPE: bn|
diff --git a/turtorial/imgs/optic.png b/turtorial/imgs/optic.png
new file mode 100644
index 0000000000000000000000000000000000000000..34acaae49303e71e6b59db26202a9079965f05eb
Binary files /dev/null and b/turtorial/imgs/optic.png differ
diff --git a/turtorial/imgs/optic_deeplab.png b/turtorial/imgs/optic_deeplab.png
new file mode 100644
index 0000000000000000000000000000000000000000..8edc957362715bb742042d6f0f6e6c36fd7aec52
Binary files /dev/null and b/turtorial/imgs/optic_deeplab.png differ
diff --git a/turtorial/imgs/optic_hrnet.png b/turtorial/imgs/optic_hrnet.png
new file mode 100644
index 0000000000000000000000000000000000000000..8d19190aa5a057fe5aa72cd800c1c9fed642d9ef
Binary files /dev/null and b/turtorial/imgs/optic_hrnet.png differ
diff --git a/turtorial/imgs/optic_icnet.png b/turtorial/imgs/optic_icnet.png
new file mode 100644
index 0000000000000000000000000000000000000000..a4d36b7ab0f086af46840a5c1e8f1624054048be
Binary files /dev/null and b/turtorial/imgs/optic_icnet.png differ
diff --git a/turtorial/imgs/optic_pspnet.png b/turtorial/imgs/optic_pspnet.png
new file mode 100644
index 0000000000000000000000000000000000000000..44fd2795d6edfdc95378046da906949ad01431d9
Binary files /dev/null and b/turtorial/imgs/optic_pspnet.png differ
diff --git a/turtorial/imgs/optic_unet.png b/turtorial/imgs/optic_unet.png
new file mode 100644
index 0000000000000000000000000000000000000000..9ca439ebc76427516127d56aac56b5d09dd68263
Binary files /dev/null and b/turtorial/imgs/optic_unet.png differ