提交 2213cff9 编写于 作者: C chulutao

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleSeg into develop

......@@ -94,6 +94,7 @@ pip install -r requirements.txt
* [ICNet模型使用教程](./turtorial/finetune_icnet.md)
* [PSPNet模型使用教程](./turtorial/finetune_pspnet.md)
* [HRNet模型使用教程](./turtorial/finetune_hrnet.md)
* [Fast-SCNN模型使用教程](./turtorial/finetune_fast_scnn.md)
### 预测部署
......@@ -109,7 +110,7 @@ pip install -r requirements.txt
* [如何解决二分类中类别不均衡问题](./docs/loss_select.md)
* [特色垂类模型使用](./contrib)
* [多进程训练和混合精度训练](./docs/multiple_gpus_train_and_mixed_precision_train.md)
* 使用PaddleSlim进行分割模型压缩([量化](./slim/quantization/README.md), [蒸馏](./slim/distillation/README.md), [剪枝](./slim/prune/README.md), [搜索](./slim/nas/README.md))
## 在线体验
我们在AI Studio平台上提供了在线体验的教程,欢迎体验:
......
EVAL_CROP_SIZE: (2048, 1024) # (width, height), for unpadding rangescaling and stepscaling
TRAIN_CROP_SIZE: (1024, 1024) # (width, height), for unpadding rangescaling and stepscaling
AUG:
AUG_METHOD: "stepscaling" # choice unpadding rangescaling and stepscaling
FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding
INF_RESIZE_VALUE: 500 # for rangescaling
MAX_RESIZE_VALUE: 600 # for rangescaling
MIN_RESIZE_VALUE: 400 # for rangescaling
MAX_SCALE_FACTOR: 2.0 # for stepscaling
MIN_SCALE_FACTOR: 0.5 # for stepscaling
SCALE_STEP_SIZE: 0.25 # for stepscaling
MIRROR: True
FLIP: False
FLIP_RATIO: 0.2
RICH_CROP:
ENABLE: True
ASPECT_RATIO: 0.0
BLUR: False
BLUR_RATIO: 0.1
MAX_ROTATION: 0
MIN_AREA_RATIO: 0.0
BRIGHTNESS_JITTER_RATIO: 0.4
CONTRAST_JITTER_RATIO: 0.4
SATURATION_JITTER_RATIO: 0.4
BATCH_SIZE: 12
MEAN: [0.5, 0.5, 0.5]
STD: [0.5, 0.5, 0.5]
DATASET:
DATA_DIR: "./dataset/cityscapes/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 19
TEST_FILE_LIST: "dataset/cityscapes/val.list"
TRAIN_FILE_LIST: "dataset/cityscapes/train.list"
VAL_FILE_LIST: "dataset/cityscapes/val.list"
IGNORE_INDEX: 255
FREEZE:
MODEL_FILENAME: "model"
PARAMS_FILENAME: "params"
MODEL:
DEFAULT_NORM_TYPE: "bn"
MODEL_NAME: "fast_scnn"
TEST:
TEST_MODEL: "snapshots/cityscape_fast_scnn/final/"
TRAIN:
MODEL_SAVE_DIR: "snapshots/cityscape_fast_scnn/"
SNAPSHOT_EPOCH: 10
SOLVER:
LR: 0.001
LR_POLICY: "poly"
OPTIMIZER: "sgd"
NUM_EPOCHS: 100
TRAIN_CROP_SIZE: (512, 512) # (width, height), for unpadding rangescaling and stepscaling
EVAL_CROP_SIZE: (512, 512) # (width, height), for unpadding rangescaling and stepscaling
AUG:
AUG_METHOD: "unpadding" # choice unpadding rangescaling and stepscaling
FIX_RESIZE_SIZE: (512, 512) # (width, height), for unpadding
INF_RESIZE_VALUE: 500 # for rangescaling
MAX_RESIZE_VALUE: 600 # for rangescaling
MIN_RESIZE_VALUE: 400 # for rangescaling
MAX_SCALE_FACTOR: 1.25 # for stepscaling
MIN_SCALE_FACTOR: 0.75 # for stepscaling
SCALE_STEP_SIZE: 0.25 # for stepscaling
MIRROR: True
BATCH_SIZE: 4
DATASET:
DATA_DIR: "./dataset/mini_pet/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 3
TEST_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
TRAIN_FILE_LIST: "./dataset/mini_pet/file_list/train_list.txt"
VAL_FILE_LIST: "./dataset/mini_pet/file_list/val_list.txt"
VIS_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
IGNORE_INDEX: 255
SEPARATOR: " "
FREEZE:
MODEL_FILENAME: "__model__"
PARAMS_FILENAME: "__params__"
MODEL:
MODEL_NAME: "fast_scnn"
DEFAULT_NORM_TYPE: "bn"
TRAIN:
PRETRAINED_MODEL_DIR: "./pretrained_model/fast_scnn_cityscape/"
MODEL_SAVE_DIR: "./saved_model/fast_scnn_pet/"
SNAPSHOT_EPOCH: 10
TEST:
TEST_MODEL: "./saved_model/fast_scnn_pet/final"
SOLVER:
NUM_EPOCHS: 100
LR: 0.005
LR_POLICY: "poly"
OPTIMIZER: "sgd"
......@@ -63,3 +63,6 @@ train数据集合为Cityscapes训练集合,测试为Cityscapes的验证集合
| PSPNet/bn | Cityscapes |[pspnet50_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/pspnet50_cityscapes.tgz) |16|false| 0.7013 |
| PSPNet/bn | Cityscapes |[pspnet101_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/pspnet101_cityscapes.tgz) |16|false| 0.7734 |
| HRNet_W18/bn | Cityscapes |[hrnet_w18_bn_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/hrnet_w18_bn_cityscapes.tgz) | 4 | false | 0.7936 |
| Fast-SCNN/bn | Cityscapes |[fast_scnn_cityscapes.tar](https://paddleseg.bj.bcebos.com/models/fast_scnn_cityscape.tar) | 32 | false | 0.6964 |
测试环境为python 3.7.3,v100,cudnn 7.6.2。
......@@ -4,7 +4,7 @@
* PaddlePaddle >= 1.6.1
* NVIDIA NCCL >= 2.4.7
环境配置,数据,预训练模型准备等工作请参考[安装说明](./installation.md)[PaddleSeg使用说明](./usage.md)
环境配置,数据,预训练模型准备等工作请参考[PaddleSeg使用说明](./usage.md)
### 多进程训练示例
......
......@@ -14,4 +14,4 @@
# limitations under the License.
import models
import utils
import tools
\ No newline at end of file
from . import tools
\ No newline at end of file
......@@ -71,6 +71,7 @@ def softmax_with_loss(logit, label, ignore_mask=None, num_classes=2, weight=None
ignore_mask.stop_gradient = True
return avg_loss
# to change, how to appicate ignore index and ignore mask
def dice_loss(logit, label, ignore_mask=None, epsilon=0.00001):
if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1:
......@@ -93,6 +94,7 @@ def dice_loss(logit, label, ignore_mask=None, epsilon=0.00001):
ignore_mask.stop_gradient = True
return fluid.layers.reduce_mean(dice_score)
def bce_loss(logit, label, ignore_mask=None):
if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1:
raise Exception("bce loss is only applicable to binary classfication")
......@@ -112,16 +114,18 @@ def multi_softmax_with_loss(logits, label, ignore_mask=None, num_classes=2, weig
if isinstance(logits, tuple):
avg_loss = 0
for i, logit in enumerate(logits):
logit_label = fluid.layers.resize_nearest(label, logit.shape[2:])
logit_mask = (logit_label.astype('int32') !=
if label.shape[2] != logit.shape[2] or label.shape[3] != logit.shape[3]:
label = fluid.layers.resize_nearest(label, logit.shape[2:])
logit_mask = (label.astype('int32') !=
cfg.DATASET.IGNORE_INDEX).astype('int32')
loss = softmax_with_loss(logit, logit_label, logit_mask,
loss = softmax_with_loss(logit, label, logit_mask,
num_classes)
avg_loss += cfg.MODEL.MULTI_LOSS_WEIGHT[i] * loss
else:
avg_loss = softmax_with_loss(logits, label, ignore_mask, num_classes, weight=weight)
return avg_loss
def multi_dice_loss(logits, label, ignore_mask=None):
if isinstance(logits, tuple):
avg_loss = 0
......@@ -135,6 +139,7 @@ def multi_dice_loss(logits, label, ignore_mask=None):
avg_loss = dice_loss(logits, label, ignore_mask)
return avg_loss
def multi_bce_loss(logits, label, ignore_mask=None):
if isinstance(logits, tuple):
avg_loss = 0
......
......@@ -164,3 +164,37 @@ def separate_conv(input, channel, stride, filter, dilation=1, act=None):
input = bn(input)
if act: input = act(input)
return input
def conv_bn_layer(input,
filter_size,
num_filters,
stride,
padding,
channels=None,
num_groups=1,
if_act=True,
name=None,
use_cudnn=True):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=fluid.ParamAttr(name=name + '_weights'),
bias_attr=False)
bn_name = name + '_bn'
bn = fluid.layers.batch_norm(
input=conv,
param_attr=fluid.ParamAttr(name=bn_name + "_scale"),
bias_attr=fluid.ParamAttr(name=bn_name + "_offset"),
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance')
if if_act:
return fluid.layers.relu6(bn)
else:
return bn
\ No newline at end of file
......@@ -24,7 +24,7 @@ from utils.config import cfg
from loss import multi_softmax_with_loss
from loss import multi_dice_loss
from loss import multi_bce_loss
from models.modeling import deeplab, unet, icnet, pspnet, hrnet
from models.modeling import deeplab, unet, icnet, pspnet, hrnet, fast_scnn
class ModelPhase(object):
......@@ -81,6 +81,8 @@ def seg_model(image, class_num):
logits = pspnet.pspnet(image, class_num)
elif model_name == 'hrnet':
logits = hrnet.hrnet(image, class_num)
elif model_name == 'fast_scnn':
logits = fast_scnn.fast_scnn(image, class_num)
else:
raise Exception(
"unknow model name, only support unet, deeplabv3p, icnet, pspnet, hrnet"
......
......@@ -27,6 +27,7 @@ from models.libs.model_libs import separate_conv
from models.backbone.mobilenet_v2 import MobileNetV2 as mobilenet_backbone
from models.backbone.xception import Xception as xception_backbone
def encoder(input):
# 编码器配置,采用ASPP架构,pooling + 1x1_conv + 三个不同尺度的空洞卷积并行, concat后1x1conv
# ASPP_WITH_SEP_CONV:默认为真,使用depthwise可分离卷积,否则使用普通卷积
......@@ -47,8 +48,7 @@ def encoder(input):
with scope('encoder'):
channel = 256
with scope("image_pool"):
image_avg = fluid.layers.reduce_mean(
input, [2, 3], keep_dim=True)
image_avg = fluid.layers.reduce_mean(input, [2, 3], keep_dim=True)
image_avg = bn_relu(
conv(
image_avg,
......@@ -250,14 +250,15 @@ def deeplabv3p(img, num_classes):
regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
with scope('logit'):
logit = conv(
data,
num_classes,
1,
stride=1,
padding=0,
bias_attr=True,
param_attr=param_attr)
with fluid.name_scope('last_conv'):
logit = conv(
data,
num_classes,
1,
stride=1,
padding=0,
bias_attr=True,
param_attr=param_attr)
logit = fluid.layers.resize_bilinear(logit, img.shape[2:])
return logit
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.fluid as fluid
from models.libs.model_libs import scope
from models.libs.model_libs import bn, bn_relu, relu, conv_bn_layer
from models.libs.model_libs import conv, avg_pool
from models.libs.model_libs import separate_conv
from utils.config import cfg
def learning_to_downsample(x, dw_channels1=32, dw_channels2=48, out_channels=64):
x = relu(bn(conv(x, dw_channels1, 3, 2)))
with scope('dsconv1'):
x = separate_conv(x, dw_channels2, stride=2, filter=3, act=fluid.layers.relu)
with scope('dsconv2'):
x = separate_conv(x, out_channels, stride=2, filter=3, act=fluid.layers.relu)
return x
def shortcut(input, data_residual):
return fluid.layers.elementwise_add(input, data_residual)
def dropout2d(input, prob, is_train=False):
if not is_train:
return input
channels = input.shape[1]
keep_prob = 1.0 - prob
random_tensor = keep_prob + fluid.layers.uniform_random_batch_size_like(input, [-1, channels, 1, 1], min=0., max=1.)
binary_tensor = fluid.layers.floor(random_tensor)
output = input / keep_prob * binary_tensor
return output
def inverted_residual_unit(input,
num_in_filter,
num_filters,
ifshortcut,
stride,
filter_size,
padding,
expansion_factor,
name=None):
num_expfilter = int(round(num_in_filter * expansion_factor))
channel_expand = conv_bn_layer(
input=input,
num_filters=num_expfilter,
filter_size=1,
stride=1,
padding=0,
num_groups=1,
if_act=True,
name=name + '_expand')
bottleneck_conv = conv_bn_layer(
input=channel_expand,
num_filters=num_expfilter,
filter_size=filter_size,
stride=stride,
padding=padding,
num_groups=num_expfilter,
if_act=True,
name=name + '_dwise',
use_cudnn=False)
depthwise_output = bottleneck_conv
linear_out = conv_bn_layer(
input=bottleneck_conv,
num_filters=num_filters,
filter_size=1,
stride=1,
padding=0,
num_groups=1,
if_act=False,
name=name + '_linear')
if ifshortcut:
out = shortcut(input=input, data_residual=linear_out)
return out, depthwise_output
else:
return linear_out, depthwise_output
def inverted_blocks(input, in_c, t, c, n, s, name=None):
first_block, depthwise_output = inverted_residual_unit(
input=input,
num_in_filter=in_c,
num_filters=c,
ifshortcut=False,
stride=s,
filter_size=3,
padding=1,
expansion_factor=t,
name=name + '_1')
last_residual_block = first_block
last_c = c
for i in range(1, n):
last_residual_block, depthwise_output = inverted_residual_unit(
input=last_residual_block,
num_in_filter=last_c,
num_filters=c,
ifshortcut=True,
stride=1,
filter_size=3,
padding=1,
expansion_factor=t,
name=name + '_' + str(i + 1))
return last_residual_block, depthwise_output
def psp_module(input, out_features):
cat_layers = []
sizes = (1, 2, 3, 6)
for size in sizes:
psp_name = "psp" + str(size)
with scope(psp_name):
pool = fluid.layers.adaptive_pool2d(input,
pool_size=[size, size],
pool_type='avg',
name=psp_name + '_adapool')
data = conv(pool, out_features,
filter_size=1,
bias_attr=False,
name=psp_name + '_conv')
data_bn = bn(data, act='relu')
interp = fluid.layers.resize_bilinear(data_bn,
out_shape=input.shape[2:],
name=psp_name + '_interp', align_mode=0)
cat_layers.append(interp)
cat_layers = [input] + cat_layers
out = fluid.layers.concat(cat_layers, axis=1, name='psp_cat')
return out
class FeatureFusionModule:
"""Feature fusion module"""
def __init__(self, higher_in_channels, lower_in_channels, out_channels, scale_factor=4):
self.higher_in_channels = higher_in_channels
self.lower_in_channels = lower_in_channels
self.out_channels = out_channels
self.scale_factor = scale_factor
def net(self, higher_res_feature, lower_res_feature):
h, w = higher_res_feature.shape[2:]
lower_res_feature = fluid.layers.resize_bilinear(lower_res_feature, [h, w], align_mode=0)
with scope('dwconv'):
lower_res_feature = relu(bn(conv(lower_res_feature, self.out_channels, 1)))#(lower_res_feature)
with scope('conv_lower_res'):
lower_res_feature = bn(conv(lower_res_feature, self.out_channels, 1, bias_attr=True))
with scope('conv_higher_res'):
higher_res_feature = bn(conv(higher_res_feature, self.out_channels, 1, bias_attr=True))
out = higher_res_feature + lower_res_feature
return relu(out)
class GlobalFeatureExtractor():
"""Global feature extractor module"""
def __init__(self, in_channels=64, block_channels=(64, 96, 128), out_channels=128,
t=6, num_blocks=(3, 3, 3)):
self.in_channels = in_channels
self.block_channels = block_channels
self.out_channels = out_channels
self.t = t
self.num_blocks = num_blocks
def net(self, x):
x, _ = inverted_blocks(x, self.in_channels, self.t, self.block_channels[0],
self.num_blocks[0], 2, 'inverted_block_1')
x, _ = inverted_blocks(x, self.block_channels[0], self.t, self.block_channels[1],
self.num_blocks[1], 2, 'inverted_block_2')
x, _ = inverted_blocks(x, self.block_channels[1], self.t, self.block_channels[2],
self.num_blocks[2], 1, 'inverted_block_3')
x = psp_module(x, self.block_channels[2] // 4)
with scope('out'):
x = relu(bn(conv(x, self.out_channels, 1)))
return x
class Classifier:
"""Classifier"""
def __init__(self, dw_channels, num_classes, stride=1):
self.dw_channels = dw_channels
self.num_classes = num_classes
self.stride = stride
def net(self, x):
with scope('dsconv1'):
x = separate_conv(x, self.dw_channels, stride=self.stride, filter=3, act=fluid.layers.relu)
with scope('dsconv2'):
x = separate_conv(x, self.dw_channels, stride=self.stride, filter=3, act=fluid.layers.relu)
x = dropout2d(x, 0.1, is_train=cfg.PHASE=='train')
x = conv(x, self.num_classes, 1, bias_attr=True)
return x
def aux_layer(x, num_classes):
x = relu(bn(conv(x, 32, 3, padding=1)))
x = dropout2d(x, 0.1, is_train=(cfg.PHASE == 'train'))
with scope('logit'):
x = conv(x, num_classes, 1, bias_attr=True)
return x
def fast_scnn(img, num_classes):
size = img.shape[2:]
classifier = Classifier(128, num_classes)
global_feature_extractor = GlobalFeatureExtractor(64, [64, 96, 128], 128, 6, [3, 3, 3])
feature_fusion = FeatureFusionModule(64, 128, 128)
with scope('learning_to_downsample'):
higher_res_features = learning_to_downsample(img, 32, 48, 64)
with scope('global_feature_extractor'):
lower_res_feature = global_feature_extractor.net(higher_res_features)
with scope('feature_fusion'):
x = feature_fusion.net(higher_res_features, lower_res_feature)
with scope('classifier'):
logit = classifier.net(x)
logit = fluid.layers.resize_bilinear(logit, size, align_mode=0)
if len(cfg.MODEL.MULTI_LOSS_WEIGHT) == 3:
with scope('aux_layer_higher'):
higher_logit = aux_layer(higher_res_features, num_classes)
higher_logit = fluid.layers.resize_bilinear(higher_logit, size, align_mode=0)
with scope('aux_layer_lower'):
lower_logit = aux_layer(lower_res_feature, num_classes)
lower_logit = fluid.layers.resize_bilinear(lower_logit, size, align_mode=0)
return logit, higher_logit, lower_logit
elif len(cfg.MODEL.MULTI_LOSS_WEIGHT) == 2:
with scope('aux_layer_higher'):
higher_logit = aux_layer(higher_res_features, num_classes)
higher_logit = fluid.layers.resize_bilinear(higher_logit, size, align_mode=0)
return logit, higher_logit
return logit
\ No newline at end of file
......@@ -98,8 +98,8 @@ class SegDataset(object):
# Re-shuffle file list
if self.shuffle and cfg.NUM_TRAINERS > 1:
np.random.RandomState(self.shuffle_seed).shuffle(self.all_lines)
num_lines = len(self.all_lines) // self.num_trainers
self.lines = self.all_lines[num_lines * self.trainer_id: num_lines * (self.trainer_id + 1)]
num_lines = len(self.all_lines) // cfg.NUM_TRAINERS
self.lines = self.all_lines[num_lines * cfg.TRAINER_ID: num_lines * (cfg.TRAINER_ID + 1)]
self.shuffle_seed += 1
elif self.shuffle:
np.random.shuffle(self.lines)
......
......@@ -236,3 +236,19 @@ cfg.FREEZE.MODEL_FILENAME = '__model__'
cfg.FREEZE.PARAMS_FILENAME = '__params__'
# 预测模型参数保存的路径
cfg.FREEZE.SAVE_DIR = 'freeze_model'
########################## paddle-slim ######################################
cfg.SLIM.KNOWLEDGE_DISTILL_IS_TEACHER = False
cfg.SLIM.KNOWLEDGE_DISTILL = False
cfg.SLIM.KNOWLEDGE_DISTILL_TEACHER_MODEL_DIR = ""
cfg.SLIM.NAS_PORT = 23333
cfg.SLIM.NAS_ADDRESS = ""
cfg.SLIM.NAS_SEARCH_STEPS = 100
cfg.SLIM.NAS_START_EVAL_EPOCH = 0
cfg.SLIM.NAS_IS_SERVER = True
cfg.SLIM.NAS_SPACE_NAME = ""
cfg.SLIM.PRUNE_PARAMS = ''
cfg.SLIM.PRUNE_RATIOS = []
......@@ -81,6 +81,8 @@ model_urls = {
"https://paddleseg.bj.bcebos.com/models/pspnet101_cityscapes.tgz",
"hrnet_w18_bn_cityscapes":
"https://paddleseg.bj.bcebos.com/models/hrnet_w18_bn_cityscapes.tgz",
"fast_scnn_cityscapes":
"https://paddleseg.bj.bcebos.com/models/fast_scnn_cityscape.tar",
}
if __name__ == "__main__":
......
>运行该示例前请安装PaddleSlim和Paddle1.6或更高版本
# PaddleSeg蒸馏教程
在阅读本教程前,请确保您已经了解过[PaddleSeg使用说明](../../docs/usage.md)等章节,以便对PaddleSeg有一定的了解
该文档介绍如何使用[PaddleSlim](https://paddlepaddle.github.io/PaddleSlim)对分割库中的模型进行蒸馏。
该教程中所示操作,如无特殊说明,均在`PaddleSeg/`路径下执行。
## 概述
该示例使用PaddleSlim提供的[蒸馏策略](https://paddlepaddle.github.io/PaddleSlim/algo/algo/#3)对分割库中的模型进行蒸馏训练。
在阅读该示例前,建议您先了解以下内容:
- [PaddleSlim蒸馏API文档](https://paddlepaddle.github.io/PaddleSlim/api/single_distiller_api/)
## 安装PaddleSlim
可按照[PaddleSlim使用文档](https://paddlepaddle.github.io/PaddleSlim/)中的步骤安装PaddleSlim
## 蒸馏策略说明
关于蒸馏API如何使用您可以参考PaddleSlim蒸馏API文档
这里以Deeplabv3-xception蒸馏训练Deeplabv3-mobilenet模型为例,首先,为了对`student model``teacher model`有个总体的认识,进一步确认蒸馏的对象,我们通过以下命令分别观察两个网络变量(Variables)的名称和形状:
```python
# 观察student model的Variables
student_vars = []
for v in fluid.default_main_program().list_vars():
try:
student_vars.append((v.name, v.shape))
except:
pass
print("="*50+"student_model_vars"+"="*50)
print(student_vars)
# 观察teacher model的Variables
teacher_vars = []
for v in teacher_program.list_vars():
try:
teacher_vars.append((v.name, v.shape))
except:
pass
print("="*50+"teacher_model_vars"+"="*50)
print(teacher_vars)
```
经过对比可以发现,`student model``teacher model`输入到`loss`的特征图分别为:
```bash
# student model
bilinear_interp_0.tmp_0
# teacher model
bilinear_interp_2.tmp_0
```
它们形状两两相同,且分别处于两个网络的输出部分。所以,我们用`l2_loss`对这几个特征图两两对应添加蒸馏loss。需要注意的是,teacher的Variable在merge过程中被自动添加了一个`name_prefix`,所以这里也需要加上这个前缀`"teacher_"`,merge过程请参考[蒸馏API文档](https://paddlepaddle.github.io/PaddleSlim/api/single_distiller_api/#merge)
```python
distill_loss = l2_loss('teacher_bilinear_interp_2.tmp_0', 'bilinear_interp_0.tmp_0')
```
我们也可以根据上述操作为蒸馏策略选择其他loss,PaddleSlim支持的有`FSP_loss`, `L2_loss`, `softmax_with_cross_entropy_loss` 以及自定义的任何loss。
## 训练
根据[PaddleSeg/pdseg/train.py](../../pdseg/train.py)编写压缩脚本`train_distill.py`
在该脚本中定义了teacher_model和student_model,用teacher_model的输出指导student_model的训练
### 执行示例
下载teacher的预训练模型([deeplabv3p_xception65_bn_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/xception65_bn_cityscapes.tgz))和student的预训练模型([mobilenet_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/mobilenet_cityscapes.tgz)),
修改student config file(./slim/distillation/cityscape.yaml)中预训练模型的路径:
```
TRAIN:
PRETRAINED_MODEL_DIR: your_student_pretrained_model_dir
```
修改teacher config file(./slim/distillation/cityscape_teacher.yaml)中预训练模型的路径:
```
SLIM:
KNOWLEDGE_DISTILL_TEACHER_MODEL_DIR: your_teacher_pretrained_model_dir
```
执行如下命令启动训练,每间隔```cfg.TRAIN.SNAPSHOT_EPOCH```会进行一次评估。
```shell
CUDA_VISIBLE_DEVICES=0,1
python -m paddle.distributed.launch ./slim/distillation/train_distill.py \
--log_steps 10 --cfg ./slim/distillation/cityscape.yaml \
--teacher_cfg ./slim/distillation/cityscape_teacher.yaml \
--use_gpu \
--use_mpio \
--do_eval
```
注意:如需修改配置文件中的参数,请在对应的配置文件中直接修改,暂不支持命令行输入覆盖。
## 评估预测
训练完成后的评估和预测请参考PaddleSeg的[快速入门](../../README.md#快速入门)[基础功能](../../README.md#基础功能)等章节
EVAL_CROP_SIZE: (2049, 1025) # (width, height), for unpadding rangescaling and stepscaling
TRAIN_CROP_SIZE: (769, 769) # (width, height), for unpadding rangescaling and stepscaling
AUG:
AUG_METHOD: "stepscaling" # choice unpadding rangescaling and stepscaling
FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding
INF_RESIZE_VALUE: 500 # for rangescaling
MAX_RESIZE_VALUE: 600 # for rangescaling
MIN_RESIZE_VALUE: 400 # for rangescaling
MAX_SCALE_FACTOR: 2.0 # for stepscaling
MIN_SCALE_FACTOR: 0.5 # for stepscaling
SCALE_STEP_SIZE: 0.25 # for stepscaling
MIRROR: True
FLIP: True
FLIP_RATIO: 0.2
RICH_CROP:
ENABLE: False
ASPECT_RATIO: 0.33
BLUR: True
BLUR_RATIO: 0.1
MAX_ROTATION: 15
MIN_AREA_RATIO: 0.5
BRIGHTNESS_JITTER_RATIO: 0.5
CONTRAST_JITTER_RATIO: 0.5
SATURATION_JITTER_RATIO: 0.5
BATCH_SIZE: 16
MEAN: [0.5, 0.5, 0.5]
STD: [0.5, 0.5, 0.5]
DATASET:
DATA_DIR: "./dataset/cityscapes/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 19
TEST_FILE_LIST: "dataset/cityscapes/val.list"
TRAIN_FILE_LIST: "dataset/cityscapes/train.list"
VAL_FILE_LIST: "dataset/cityscapes/val.list"
IGNORE_INDEX: 255
FREEZE:
MODEL_FILENAME: "model"
PARAMS_FILENAME: "params"
MODEL:
DEFAULT_NORM_TYPE: "bn"
MODEL_NAME: "deeplabv3p"
DEEPLAB:
BACKBONE: "mobilenet"
ASPP_WITH_SEP_CONV: True
DECODER_USE_SEP_CONV: True
ENCODER_WITH_ASPP: False
ENABLE_DECODER: False
TEST:
TEST_MODEL: "snapshots/cityscape_v5/final/"
TRAIN:
MODEL_SAVE_DIR: "snapshots/cityscape_mbv2_kd_e100_1/"
PRETRAINED_MODEL_DIR: u"pretrained_model/mobilenet_cityscapes"
SNAPSHOT_EPOCH: 5
SYNC_BATCH_NORM: True
SOLVER:
LR: 0.001
LR_POLICY: "poly"
OPTIMIZER: "sgd"
NUM_EPOCHS: 100
EVAL_CROP_SIZE: (2049, 1025) # (width, height), for unpadding rangescaling and stepscaling
TRAIN_CROP_SIZE: (769, 769) # (width, height), for unpadding rangescaling and stepscaling
AUG:
AUG_METHOD: "stepscaling" # choice unpadding rangescaling and stepscaling
FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding
INF_RESIZE_VALUE: 500 # for rangescaling
MAX_RESIZE_VALUE: 600 # for rangescaling
MIN_RESIZE_VALUE: 400 # for rangescaling
MAX_SCALE_FACTOR: 2.0 # for stepscaling
MIN_SCALE_FACTOR: 0.5 # for stepscaling
SCALE_STEP_SIZE: 0.25 # for stepscaling
MIRROR: True
FLIP: True
FLIP_RATIO: 0.2
RICH_CROP:
ENABLE: False
ASPECT_RATIO: 0.33
BLUR: True
BLUR_RATIO: 0.1
MAX_ROTATION: 15
MIN_AREA_RATIO: 0.5
BRIGHTNESS_JITTER_RATIO: 0.5
CONTRAST_JITTER_RATIO: 0.5
SATURATION_JITTER_RATIO: 0.5
BATCH_SIZE: 16
MEAN: [0.5, 0.5, 0.5]
STD: [0.5, 0.5, 0.5]
DATASET:
DATA_DIR: "./dataset/cityscapes/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 19
TEST_FILE_LIST: "dataset/cityscapes/val.list"
TRAIN_FILE_LIST: "dataset/cityscapes/train.list"
VAL_FILE_LIST: "dataset/cityscapes/val.list"
IGNORE_INDEX: 255
FREEZE:
MODEL_FILENAME: "model"
PARAMS_FILENAME: "params"
MODEL:
DEFAULT_NORM_TYPE: "bn"
MODEL_NAME: "deeplabv3p"
DEEPLAB:
BACKBONE: "xception_65"
ASPP_WITH_SEP_CONV: True
DECODER_USE_SEP_CONV: True
ENCODER_WITH_ASPP: True
ENABLE_DECODER: True
TEST:
TEST_MODEL: "snapshots/cityscape_v5/final/"
TRAIN:
MODEL_SAVE_DIR: "snapshots/cityscape_v7/"
PRETRAINED_MODEL_DIR: u"pretrain/deeplabv3plus_gn_init"
SNAPSHOT_EPOCH: 5
SYNC_BATCH_NORM: True
SOLVER:
LR: 0.001
LR_POLICY: "poly"
OPTIMIZER: "sgd"
NUM_EPOCHS: 100
SLIM:
KNOWLEDGE_DISTILL_IS_TEACHER: True
KNOWLEDGE_DISTILL: True
KNOWLEDGE_DISTILL_TEACHER_MODEL_DIR: "pretrained_model/xception65_bn_cityscapes"
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import struct
import paddle.fluid as fluid
import numpy as np
from paddle.fluid.proto.framework_pb2 import VarType
import solver
from utils.config import cfg
from loss import multi_softmax_with_loss
from loss import multi_dice_loss
from loss import multi_bce_loss
from models.modeling import deeplab, unet, icnet, pspnet, hrnet, fast_scnn
class ModelPhase(object):
"""
Standard name for model phase in PaddleSeg
The following standard keys are defined:
* `TRAIN`: training mode.
* `EVAL`: testing/evaluation mode.
* `PREDICT`: prediction/inference mode.
* `VISUAL` : visualization mode
"""
TRAIN = 'train'
EVAL = 'eval'
PREDICT = 'predict'
VISUAL = 'visual'
@staticmethod
def is_train(phase):
return phase == ModelPhase.TRAIN
@staticmethod
def is_predict(phase):
return phase == ModelPhase.PREDICT
@staticmethod
def is_eval(phase):
return phase == ModelPhase.EVAL
@staticmethod
def is_visual(phase):
return phase == ModelPhase.VISUAL
@staticmethod
def is_valid_phase(phase):
""" Check valid phase """
if ModelPhase.is_train(phase) or ModelPhase.is_predict(phase) \
or ModelPhase.is_eval(phase) or ModelPhase.is_visual(phase):
return True
return False
def seg_model(image, class_num):
model_name = cfg.MODEL.MODEL_NAME
if model_name == 'unet':
logits = unet.unet(image, class_num)
elif model_name == 'deeplabv3p':
logits = deeplab.deeplabv3p(image, class_num)
elif model_name == 'icnet':
logits = icnet.icnet(image, class_num)
elif model_name == 'pspnet':
logits = pspnet.pspnet(image, class_num)
elif model_name == 'hrnet':
logits = hrnet.hrnet(image, class_num)
elif model_name == 'fast_scnn':
logits = fast_scnn.fast_scnn(image, class_num)
else:
raise Exception(
"unknow model name, only support unet, deeplabv3p, icnet, pspnet, hrnet"
)
return logits
def softmax(logit):
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
logit = fluid.layers.softmax(logit)
logit = fluid.layers.transpose(logit, [0, 3, 1, 2])
return logit
def sigmoid_to_softmax(logit):
"""
one channel to two channel
"""
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
logit = fluid.layers.sigmoid(logit)
logit_back = 1 - logit
logit = fluid.layers.concat([logit_back, logit], axis=-1)
logit = fluid.layers.transpose(logit, [0, 3, 1, 2])
return logit
def export_preprocess(image):
"""导出模型的预处理流程"""
image = fluid.layers.transpose(image, [0, 3, 1, 2])
origin_shape = fluid.layers.shape(image)[-2:]
# 不同AUG_METHOD方法的resize
if cfg.AUG.AUG_METHOD == 'unpadding':
h_fix = cfg.AUG.FIX_RESIZE_SIZE[1]
w_fix = cfg.AUG.FIX_RESIZE_SIZE[0]
image = fluid.layers.resize_bilinear(
image, out_shape=[h_fix, w_fix], align_corners=False, align_mode=0)
elif cfg.AUG.AUG_METHOD == 'rangescaling':
size = cfg.AUG.INF_RESIZE_VALUE
value = fluid.layers.reduce_max(origin_shape)
scale = float(size) / value.astype('float32')
image = fluid.layers.resize_bilinear(
image, scale=scale, align_corners=False, align_mode=0)
# 存储resize后图像shape
valid_shape = fluid.layers.shape(image)[-2:]
# padding到eval_crop_size大小
width = cfg.EVAL_CROP_SIZE[0]
height = cfg.EVAL_CROP_SIZE[1]
pad_target = fluid.layers.assign(
np.array([height, width]).astype('float32'))
up = fluid.layers.assign(np.array([0]).astype('float32'))
down = pad_target[0] - valid_shape[0]
left = up
right = pad_target[1] - valid_shape[1]
paddings = fluid.layers.concat([up, down, left, right])
paddings = fluid.layers.cast(paddings, 'int32')
image = fluid.layers.pad2d(image, paddings=paddings, pad_value=127.5)
# normalize
mean = np.array(cfg.MEAN).reshape(1, len(cfg.MEAN), 1, 1)
mean = fluid.layers.assign(mean.astype('float32'))
std = np.array(cfg.STD).reshape(1, len(cfg.STD), 1, 1)
std = fluid.layers.assign(std.astype('float32'))
image = (image / 255 - mean) / std
# 使后面的网络能通过类似image.shape获取特征图的shape
image = fluid.layers.reshape(
image, shape=[-1, cfg.DATASET.DATA_DIM, height, width])
return image, valid_shape, origin_shape
def build_model(main_prog=None, start_prog=None, phase=ModelPhase.TRAIN, **kwargs):
if not ModelPhase.is_valid_phase(phase):
raise ValueError("ModelPhase {} is not valid!".format(phase))
if ModelPhase.is_train(phase):
width = cfg.TRAIN_CROP_SIZE[0]
height = cfg.TRAIN_CROP_SIZE[1]
else:
width = cfg.EVAL_CROP_SIZE[0]
height = cfg.EVAL_CROP_SIZE[1]
image_shape = [cfg.DATASET.DATA_DIM, height, width]
grt_shape = [1, height, width]
class_num = cfg.DATASET.NUM_CLASSES
#with fluid.program_guard(main_prog, start_prog):
# with fluid.unique_name.guard():
# 在导出模型的时候,增加图像标准化预处理,减小预测部署时图像的处理流程
# 预测部署时只须对输入图像增加batch_size维度即可
if cfg.SLIM.KNOWLEDGE_DISTILL_IS_TEACHER:
image = main_prog.global_block()._clone_variable(kwargs['image'],
force_persistable=False)
label = main_prog.global_block()._clone_variable(kwargs['label'],
force_persistable=False)
mask = main_prog.global_block()._clone_variable(kwargs['mask'],
force_persistable=False)
else:
if ModelPhase.is_predict(phase):
origin_image = fluid.layers.data(
name='image',
shape=[-1, -1, -1, cfg.DATASET.DATA_DIM],
dtype='float32',
append_batch_size=False)
image, valid_shape, origin_shape = export_preprocess(
origin_image)
else:
image = fluid.layers.data(
name='image', shape=image_shape, dtype='float32')
label = fluid.layers.data(
name='label', shape=grt_shape, dtype='int32')
mask = fluid.layers.data(
name='mask', shape=grt_shape, dtype='int32')
# use PyReader when doing traning and evaluation
if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
py_reader = None
if not cfg.SLIM.KNOWLEDGE_DISTILL_IS_TEACHER:
py_reader = fluid.io.PyReader(
feed_list=[image, label, mask],
capacity=cfg.DATALOADER.BUF_SIZE,
iterable=False,
use_double_buffer=True)
loss_type = cfg.SOLVER.LOSS
if not isinstance(loss_type, list):
loss_type = list(loss_type)
# dice_loss或bce_loss只适用两类分割中
if class_num > 2 and (("dice_loss" in loss_type) or
("bce_loss" in loss_type)):
raise Exception(
"dice loss and bce loss is only applicable to binary classfication"
)
# 在两类分割情况下,当loss函数选择dice_loss或bce_loss的时候,最后logit输出通道数设置为1
if ("dice_loss" in loss_type) or ("bce_loss" in loss_type):
class_num = 1
if "softmax_loss" in loss_type:
raise Exception(
"softmax loss can not combine with dice loss or bce loss"
)
logits = seg_model(image, class_num)
# 根据选择的loss函数计算相应的损失函数
if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
loss_valid = False
avg_loss_list = []
valid_loss = []
if "softmax_loss" in loss_type:
weight = cfg.SOLVER.CROSS_ENTROPY_WEIGHT
avg_loss_list.append(
multi_softmax_with_loss(logits, label, mask, class_num, weight))
loss_valid = True
valid_loss.append("softmax_loss")
if "dice_loss" in loss_type:
avg_loss_list.append(multi_dice_loss(logits, label, mask))
loss_valid = True
valid_loss.append("dice_loss")
if "bce_loss" in loss_type:
avg_loss_list.append(multi_bce_loss(logits, label, mask))
loss_valid = True
valid_loss.append("bce_loss")
if not loss_valid:
raise Exception(
"SOLVER.LOSS: {} is set wrong. it should "
"include one of (softmax_loss, bce_loss, dice_loss) at least"
" example: ['softmax_loss'], ['dice_loss'], ['bce_loss', 'dice_loss']"
.format(cfg.SOLVER.LOSS))
invalid_loss = [x for x in loss_type if x not in valid_loss]
if len(invalid_loss) > 0:
print(
"Warning: the loss {} you set is invalid. it will not be included in loss computed."
.format(invalid_loss))
avg_loss = 0
for i in range(0, len(avg_loss_list)):
avg_loss += avg_loss_list[i]
#get pred result in original size
if isinstance(logits, tuple):
logit = logits[0]
else:
logit = logits
if logit.shape[2:] != label.shape[2:]:
logit = fluid.layers.resize_bilinear(logit, label.shape[2:])
# return image input and logit output for inference graph prune
if ModelPhase.is_predict(phase):
# 两类分割中,使用dice_loss或bce_loss返回的logit为单通道,进行到两通道的变换
if class_num == 1:
logit = sigmoid_to_softmax(logit)
else:
logit = softmax(logit)
# 获取有效部分
logit = fluid.layers.slice(
logit, axes=[2, 3], starts=[0, 0], ends=valid_shape)
logit = fluid.layers.resize_bilinear(
logit,
out_shape=origin_shape,
align_corners=False,
align_mode=0)
logit = fluid.layers.argmax(logit, axis=1)
return origin_image, logit
if class_num == 1:
out = sigmoid_to_softmax(logit)
out = fluid.layers.transpose(out, [0, 2, 3, 1])
else:
out = fluid.layers.transpose(logit, [0, 2, 3, 1])
pred = fluid.layers.argmax(out, axis=3)
pred = fluid.layers.unsqueeze(pred, axes=[3])
if ModelPhase.is_visual(phase):
if class_num == 1:
logit = sigmoid_to_softmax(logit)
else:
logit = softmax(logit)
return pred, logit
if ModelPhase.is_eval(phase):
return py_reader, avg_loss, pred, label, mask
if ModelPhase.is_train(phase):
decayed_lr = None
if not cfg.SLIM.KNOWLEDGE_DISTILL:
optimizer = solver.Solver(main_prog, start_prog)
decayed_lr = optimizer.optimise(avg_loss)
# optimizer = solver.Solver(main_prog, start_prog)
# decayed_lr = optimizer.optimise(avg_loss)
return py_reader, avg_loss, decayed_lr, pred, label, mask, image
def to_int(string, dest="I"):
return struct.unpack(dest, string)[0]
def parse_shape_from_file(filename):
with open(filename, "rb") as file:
version = file.read(4)
lod_level = to_int(file.read(8), dest="Q")
for i in range(lod_level):
_size = to_int(file.read(8), dest="Q")
_ = file.read(_size)
version = file.read(4)
tensor_desc_size = to_int(file.read(4))
tensor_desc = VarType.TensorDesc()
tensor_desc.ParseFromString(file.read(tensor_desc_size))
return tuple(tensor_desc.dims)
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
SEG_PATH = os.path.join(LOCAL_PATH, "../../", "pdseg")
sys.path.append(SEG_PATH)
import argparse
import pprint
import random
import shutil
import functools
import paddle
import numpy as np
import paddle.fluid as fluid
from utils.config import cfg
from utils.timer import Timer, calculate_eta
from metrics import ConfusionMatrix
from reader import SegDataset
from model_builder import build_model
from model_builder import ModelPhase
from model_builder import parse_shape_from_file
from eval import evaluate
from vis import visualize
from utils import dist_utils
import solver
from paddleslim.dist.single_distiller import merge, l2_loss
def parse_args():
parser = argparse.ArgumentParser(description='PaddleSeg training')
parser.add_argument(
'--cfg',
dest='cfg_file',
help='Config file for training (and optionally testing)',
default=None,
type=str)
parser.add_argument(
'--teacher_cfg',
dest='teacher_cfg_file',
help='Config file for training (and optionally testing)',
default=None,
type=str)
parser.add_argument(
'--use_gpu',
dest='use_gpu',
help='Use gpu or cpu',
action='store_true',
default=False)
parser.add_argument(
'--use_mpio',
dest='use_mpio',
help='Use multiprocess I/O or not',
action='store_true',
default=False)
parser.add_argument(
'--log_steps',
dest='log_steps',
help='Display logging information at every log_steps',
default=10,
type=int)
parser.add_argument(
'--debug',
dest='debug',
help='debug mode, display detail information of training',
action='store_true')
parser.add_argument(
'--use_tb',
dest='use_tb',
help='whether to record the data during training to Tensorboard',
action='store_true')
parser.add_argument(
'--tb_log_dir',
dest='tb_log_dir',
help='Tensorboard logging directory',
default=None,
type=str)
parser.add_argument(
'--do_eval',
dest='do_eval',
help='Evaluation models result on every new checkpoint',
action='store_true')
parser.add_argument(
'opts',
help='See utils/config.py for all options',
default=None,
nargs=argparse.REMAINDER)
parser.add_argument(
'--enable_ce',
dest='enable_ce',
help='If set True, enable continuous evaluation job.'
'This flag is only used for internal test.',
action='store_true')
return parser.parse_args()
def save_vars(executor, dirname, program=None, vars=None):
"""
Temporary resolution for Win save variables compatability.
Will fix in PaddlePaddle v1.5.2
"""
save_program = fluid.Program()
save_block = save_program.global_block()
for each_var in vars:
# NOTE: don't save the variable which type is RAW
if each_var.type == fluid.core.VarDesc.VarType.RAW:
continue
new_var = save_block.create_var(
name=each_var.name,
shape=each_var.shape,
dtype=each_var.dtype,
type=each_var.type,
lod_level=each_var.lod_level,
persistable=True)
file_path = os.path.join(dirname, new_var.name)
file_path = os.path.normpath(file_path)
save_block.append_op(
type='save',
inputs={'X': [new_var]},
outputs={},
attrs={'file_path': file_path})
executor.run(save_program)
def save_checkpoint(exe, program, ckpt_name):
"""
Save checkpoint for evaluation or resume training
"""
ckpt_dir = os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, str(ckpt_name))
print("Save model checkpoint to {}".format(ckpt_dir))
if not os.path.isdir(ckpt_dir):
os.makedirs(ckpt_dir)
save_vars(
exe,
ckpt_dir,
program,
vars=list(filter(fluid.io.is_persistable, program.list_vars())))
return ckpt_dir
def load_checkpoint(exe, program):
"""
Load checkpoiont from pretrained model directory for resume training
"""
print('Resume model training from:', cfg.TRAIN.RESUME_MODEL_DIR)
if not os.path.exists(cfg.TRAIN.RESUME_MODEL_DIR):
raise ValueError("TRAIN.PRETRAIN_MODEL {} not exist!".format(
cfg.TRAIN.RESUME_MODEL_DIR))
fluid.io.load_persistables(
exe, cfg.TRAIN.RESUME_MODEL_DIR, main_program=program)
model_path = cfg.TRAIN.RESUME_MODEL_DIR
# Check is path ended by path spearator
if model_path[-1] == os.sep:
model_path = model_path[0:-1]
epoch_name = os.path.basename(model_path)
# If resume model is final model
if epoch_name == 'final':
begin_epoch = cfg.SOLVER.NUM_EPOCHS
# If resume model path is end of digit, restore epoch status
elif epoch_name.isdigit():
epoch = int(epoch_name)
begin_epoch = epoch + 1
else:
raise ValueError("Resume model path is not valid!")
print("Model checkpoint loaded successfully!")
return begin_epoch
def update_best_model(ckpt_dir):
best_model_dir = os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, 'best_model')
if os.path.exists(best_model_dir):
shutil.rmtree(best_model_dir)
shutil.copytree(ckpt_dir, best_model_dir)
def print_info(*msg):
if cfg.TRAINER_ID == 0:
print(*msg)
def train(cfg):
# startup_prog = fluid.Program()
# train_prog = fluid.Program()
drop_last = True
dataset = SegDataset(
file_list=cfg.DATASET.TRAIN_FILE_LIST,
mode=ModelPhase.TRAIN,
shuffle=True,
data_dir=cfg.DATASET.DATA_DIR)
def data_generator():
if args.use_mpio:
data_gen = dataset.multiprocess_generator(
num_processes=cfg.DATALOADER.NUM_WORKERS,
max_queue_size=cfg.DATALOADER.BUF_SIZE)
else:
data_gen = dataset.generator()
batch_data = []
for b in data_gen:
batch_data.append(b)
if len(batch_data) == (cfg.BATCH_SIZE // cfg.NUM_TRAINERS):
for item in batch_data:
yield item[0], item[1], item[2]
batch_data = []
# If use sync batch norm strategy, drop last batch if number of samples
# in batch_data is less then cfg.BATCH_SIZE to avoid NCCL hang issues
if not cfg.TRAIN.SYNC_BATCH_NORM:
for item in batch_data:
yield item[0], item[1], item[2]
# Get device environment
# places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
# place = places[0]
gpu_id = int(os.environ.get('FLAGS_selected_gpus', 0))
place = fluid.CUDAPlace(gpu_id) if args.use_gpu else fluid.CPUPlace()
places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
# Get number of GPU
dev_count = cfg.NUM_TRAINERS if cfg.NUM_TRAINERS > 1 else len(places)
print_info("#Device count: {}".format(dev_count))
# Make sure BATCH_SIZE can divided by GPU cards
assert cfg.BATCH_SIZE % dev_count == 0, (
'BATCH_SIZE:{} not divisble by number of GPUs:{}'.format(
cfg.BATCH_SIZE, dev_count))
# If use multi-gpu training mode, batch data will allocated to each GPU evenly
batch_size_per_dev = cfg.BATCH_SIZE // dev_count
print_info("batch_size_per_dev: {}".format(batch_size_per_dev))
py_reader, loss, lr, pred, grts, masks, image = build_model(phase=ModelPhase.TRAIN)
py_reader.decorate_sample_generator(
data_generator, batch_size=batch_size_per_dev, drop_last=drop_last)
exe = fluid.Executor(place)
cfg.update_from_file(args.teacher_cfg_file)
# teacher_arch = teacher_cfg.architecture
teacher_program = fluid.Program()
teacher_startup_program = fluid.Program()
with fluid.program_guard(teacher_program, teacher_startup_program):
with fluid.unique_name.guard():
_, teacher_loss, _, _, _, _, _ = build_model(
teacher_program, teacher_startup_program, phase=ModelPhase.TRAIN, image=image,
label=grts, mask=masks)
exe.run(teacher_startup_program)
teacher_program = teacher_program.clone(for_test=True)
ckpt_dir = cfg.SLIM.KNOWLEDGE_DISTILL_TEACHER_MODEL_DIR
assert ckpt_dir is not None
print('load teacher model:', ckpt_dir)
fluid.io.load_params(exe, ckpt_dir, main_program=teacher_program)
# cfg = load_config(FLAGS.config)
cfg.update_from_file(args.cfg_file)
data_name_map = {
'image': 'image',
'label': 'label',
'mask': 'mask',
}
merge(teacher_program, fluid.default_main_program(), data_name_map, place)
distill_pairs = [['teacher_bilinear_interp_2.tmp_0', 'bilinear_interp_0.tmp_0']]
def distill(pairs, weight):
"""
Add 3 pairs of distillation losses, each pair of feature maps is the
input of teacher and student's yolov3_loss respectively
"""
loss = l2_loss(pairs[0][0], pairs[0][1])
weighted_loss = loss * weight
return weighted_loss
distill_loss = distill(distill_pairs, 0.1)
cfg.update_from_file(args.cfg_file)
optimizer = solver.Solver(None, None)
all_loss = loss + distill_loss
lr = optimizer.optimise(all_loss)
exe.run(fluid.default_startup_program())
exec_strategy = fluid.ExecutionStrategy()
# Clear temporary variables every 100 iteration
if args.use_gpu:
exec_strategy.num_threads = fluid.core.get_cuda_device_count()
exec_strategy.num_iteration_per_drop_scope = 100
build_strategy = fluid.BuildStrategy()
build_strategy.fuse_all_reduce_ops = False
build_strategy.fuse_all_optimizer_ops = False
build_strategy.fuse_elewise_add_act_ops = True
if cfg.NUM_TRAINERS > 1 and args.use_gpu:
dist_utils.prepare_for_multi_process(exe, build_strategy, fluid.default_main_program())
exec_strategy.num_threads = 1
if cfg.TRAIN.SYNC_BATCH_NORM and args.use_gpu:
if dev_count > 1:
# Apply sync batch norm strategy
print_info("Sync BatchNorm strategy is effective.")
build_strategy.sync_batch_norm = True
else:
print_info(
"Sync BatchNorm strategy will not be effective if GPU device"
" count <= 1")
compiled_train_prog = fluid.CompiledProgram(fluid.default_main_program()).with_data_parallel(
loss_name=all_loss.name,
exec_strategy=exec_strategy,
build_strategy=build_strategy)
# Resume training
begin_epoch = cfg.SOLVER.BEGIN_EPOCH
if cfg.TRAIN.RESUME_MODEL_DIR:
begin_epoch = load_checkpoint(exe, fluid.default_main_program())
# Load pretrained model
elif os.path.exists(cfg.TRAIN.PRETRAINED_MODEL_DIR):
print_info('Pretrained model dir: ', cfg.TRAIN.PRETRAINED_MODEL_DIR)
load_vars = []
load_fail_vars = []
def var_shape_matched(var, shape):
"""
Check whehter persitable variable shape is match with current network
"""
var_exist = os.path.exists(
os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name))
if var_exist:
var_shape = parse_shape_from_file(
os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name))
return var_shape == shape
return False
for x in fluid.default_main_program().list_vars():
if isinstance(x, fluid.framework.Parameter):
shape = tuple(fluid.global_scope().find_var(
x.name).get_tensor().shape())
if var_shape_matched(x, shape):
load_vars.append(x)
else:
load_fail_vars.append(x)
fluid.io.load_vars(
exe, dirname=cfg.TRAIN.PRETRAINED_MODEL_DIR, vars=load_vars)
for var in load_vars:
print_info("Parameter[{}] loaded sucessfully!".format(var.name))
for var in load_fail_vars:
print_info(
"Parameter[{}] don't exist or shape does not match current network, skip"
" to load it.".format(var.name))
print_info("{}/{} pretrained parameters loaded successfully!".format(
len(load_vars),
len(load_vars) + len(load_fail_vars)))
else:
print_info(
'Pretrained model dir {} not exists, training from scratch...'.
format(cfg.TRAIN.PRETRAINED_MODEL_DIR))
#fetch_list = [avg_loss.name, lr.name]
fetch_list = [loss.name, 'teacher_' + teacher_loss.name, distill_loss.name, lr.name]
if args.debug:
# Fetch more variable info and use streaming confusion matrix to
# calculate IoU results if in debug mode
np.set_printoptions(
precision=4, suppress=True, linewidth=160, floatmode="fixed")
fetch_list.extend([pred.name, grts.name, masks.name])
cm = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
if args.use_tb:
if not args.tb_log_dir:
print_info("Please specify the log directory by --tb_log_dir.")
exit(1)
from tb_paddle import SummaryWriter
log_writer = SummaryWriter(args.tb_log_dir)
# trainer_id = int(os.getenv("PADDLE_TRAINER_ID", 0))
# num_trainers = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
global_step = 0
all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE
if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True:
all_step += 1
all_step *= (cfg.SOLVER.NUM_EPOCHS - begin_epoch + 1)
avg_loss = 0.0
avg_t_loss = 0.0
avg_d_loss = 0.0
best_mIoU = 0.0
timer = Timer()
timer.start()
if begin_epoch > cfg.SOLVER.NUM_EPOCHS:
raise ValueError(
("begin epoch[{}] is larger than cfg.SOLVER.NUM_EPOCHS[{}]").format(
begin_epoch, cfg.SOLVER.NUM_EPOCHS))
if args.use_mpio:
print_info("Use multiprocess reader")
else:
print_info("Use multi-thread reader")
for epoch in range(begin_epoch, cfg.SOLVER.NUM_EPOCHS + 1):
py_reader.start()
while True:
try:
if args.debug:
# Print category IoU and accuracy to check whether the
# traning process is corresponed to expectation
loss, lr, pred, grts, masks = exe.run(
program=compiled_train_prog,
fetch_list=fetch_list,
return_numpy=True)
cm.calculate(pred, grts, masks)
avg_loss += np.mean(np.array(loss))
global_step += 1
if global_step % args.log_steps == 0:
speed = args.log_steps / timer.elapsed_time()
avg_loss /= args.log_steps
category_acc, mean_acc = cm.accuracy()
category_iou, mean_iou = cm.mean_iou()
print_info((
"epoch={} step={} lr={:.5f} loss={:.4f} acc={:.5f} mIoU={:.5f} step/sec={:.3f} | ETA {}"
).format(epoch, global_step, lr[0], avg_loss, mean_acc,
mean_iou, speed,
calculate_eta(all_step - global_step, speed)))
print_info("Category IoU: ", category_iou)
print_info("Category Acc: ", category_acc)
if args.use_tb:
log_writer.add_scalar('Train/mean_iou', mean_iou,
global_step)
log_writer.add_scalar('Train/mean_acc', mean_acc,
global_step)
log_writer.add_scalar('Train/loss', avg_loss,
global_step)
log_writer.add_scalar('Train/lr', lr[0],
global_step)
log_writer.add_scalar('Train/step/sec', speed,
global_step)
sys.stdout.flush()
avg_loss = 0.0
cm.zero_matrix()
timer.restart()
else:
# If not in debug mode, avoid unnessary log and calculate
loss, t_loss, d_loss, lr = exe.run(
program=compiled_train_prog,
fetch_list=fetch_list,
return_numpy=True)
avg_loss += np.mean(np.array(loss))
avg_t_loss += np.mean(np.array(t_loss))
avg_d_loss += np.mean(np.array(d_loss))
global_step += 1
if global_step % args.log_steps == 0 and cfg.TRAINER_ID == 0:
avg_loss /= args.log_steps
avg_t_loss /= args.log_steps
avg_d_loss /= args.log_steps
speed = args.log_steps / timer.elapsed_time()
print((
"epoch={} step={} lr={:.5f} loss={:.4f} teacher loss={:.4f} distill loss={:.4f} step/sec={:.3f} | ETA {}"
).format(epoch, global_step, lr[0], avg_loss, avg_t_loss, avg_d_loss, speed,
calculate_eta(all_step - global_step, speed)))
if args.use_tb:
log_writer.add_scalar('Train/loss', avg_loss,
global_step)
log_writer.add_scalar('Train/lr', lr[0],
global_step)
log_writer.add_scalar('Train/speed', speed,
global_step)
sys.stdout.flush()
avg_loss = 0.0
avg_t_loss = 0.0
avg_d_loss = 0.0
timer.restart()
except fluid.core.EOFException:
py_reader.reset()
break
except Exception as e:
print(e)
if (epoch % cfg.TRAIN.SNAPSHOT_EPOCH == 0
or epoch == cfg.SOLVER.NUM_EPOCHS) and cfg.TRAINER_ID == 0:
ckpt_dir = save_checkpoint(exe, fluid.default_main_program(), epoch)
if args.do_eval:
print("Evaluation start")
_, mean_iou, _, mean_acc = evaluate(
cfg=cfg,
ckpt_dir=ckpt_dir,
use_gpu=args.use_gpu,
use_mpio=args.use_mpio)
if args.use_tb:
log_writer.add_scalar('Evaluate/mean_iou', mean_iou,
global_step)
log_writer.add_scalar('Evaluate/mean_acc', mean_acc,
global_step)
if mean_iou > best_mIoU:
best_mIoU = mean_iou
update_best_model(ckpt_dir)
print_info("Save best model {} to {}, mIoU = {:.4f}".format(
ckpt_dir,
os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, 'best_model'),
mean_iou))
# Use Tensorboard to visualize results
if args.use_tb and cfg.DATASET.VIS_FILE_LIST is not None:
visualize(
cfg=cfg,
use_gpu=args.use_gpu,
vis_file_list=cfg.DATASET.VIS_FILE_LIST,
vis_dir="visual",
ckpt_dir=ckpt_dir,
log_writer=log_writer)
if cfg.TRAINER_ID == 0:
ckpt_dir = save_checkpoint(exe, fluid.default_main_program(), epoch)
# save final model
if cfg.TRAINER_ID == 0:
save_checkpoint(exe, fluid.default_main_program(), 'final')
def main(args):
if args.cfg_file is not None:
cfg.update_from_file(args.cfg_file)
if args.opts:
cfg.update_from_list(args.opts)
if args.enable_ce:
random.seed(0)
np.random.seed(0)
cfg.TRAINER_ID = int(os.getenv("PADDLE_TRAINER_ID", 0))
cfg.NUM_TRAINERS = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
cfg.check_and_infer()
print_info(pprint.pformat(cfg))
train(cfg)
if __name__ == '__main__':
args = parse_args()
if fluid.core.is_compiled_with_cuda() != True and args.use_gpu == True:
print(
"You can not set use_gpu = True in the model because you are using paddlepaddle-cpu."
)
print(
"Please: 1. Install paddlepaddle-gpu to run your models on GPU or 2. Set use_gpu=False to run models on CPU."
)
sys.exit(1)
main(args)
>运行该示例前请安装Paddle1.6或更高版本
# PaddleSeg神经网络搜索(NAS)示例
在阅读本教程前,请确保您已经了解过[PaddleSeg使用说明](../../docs/usage.md)等章节,以便对PaddleSeg有一定的了解
该文档介绍如何使用[PaddleSlim](https://paddlepaddle.github.io/PaddleSlim)对分割库中的模型进行搜索。
该教程中所示操作,如无特殊说明,均在`PaddleSeg/`路径下执行。
## 概述
我们选取Deeplab+mobilenetv2模型作为神经网络搜索示例,该示例使用[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
辅助完成神经网络搜索实验,具体技术细节,请您参考[神经网络搜索策略](https://github.com/PaddlePaddle/PaddleSlim/blob/4670a79343c191b61a78e416826d122eea52a7ab/docs/zh_cn/tutorials/image_classification_nas_quick_start.ipynb)
## 定义搜索空间
搜索实验中,我们采用了SANAS的方式进行搜索,本次实验会对网络模型中的通道数和卷积核尺寸进行搜索。
所以我们定义了如下搜索空间:
- head通道模块`head_num`:定义了MobilenetV2 head模块中通道数变化区间;
- inverse_res_block1-6`filter_num1-6`: 定义了inverse_res_block模块中通道数变化区间;
- inverse_res_block`repeat`:定义了MobilenetV2 inverse_res_block模块中unit的个数;
- inverse_res_block`multiply`:定义了MobilenetV2 inverse_res_block模块中expansion_factor变化区间;
- 卷积核尺寸`k_size`:定义了MobilenetV2中卷积和尺寸大小是3x3或者5x5。
根据定义的搜索空间各个区间,我们的搜索空间tokens共9位,变化区间在([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [7, 5, 8, 6, 2, 5, 8, 6, 2, 5, 8, 6, 2, 5, 10, 6, 2, 5, 10, 6, 2, 5, 12, 6, 2])范围内。
初始化tokens为:[4, 4, 5, 1, 0, 4, 4, 1, 0, 4, 4, 3, 0, 4, 5, 2, 0, 4, 7, 2, 0, 4, 9, 0, 0]。
## 开始搜索
首先需要安装PaddleSlim,请参考[安装教程](https://paddlepaddle.github.io/PaddleSlim/#_2)
配置paddleseg的config, 下面只展示nas相关的内容
```shell
SLIM:
NAS_PORT: 23333 # 端口
NAS_ADDRESS: "" # ip地址,作为server不用填写,作为client的时候需要填写server的ip
NAS_SEARCH_STEPS: 100 # 搜索多少个结构
NAS_START_EVAL_EPOCH: -1 # 第几个epoch开始对模型进行评估
NAS_IS_SERVER: True # 是否为server
NAS_SPACE_NAME: "MobileNetV2SpaceSeg" # 搜索空间
```
## 训练与评估
执行以下命令,边训练边评估
```shell
CUDA_VISIBLE_DEVICES=0 python -u ./slim/nas/train_nas.py --log_steps 10 --cfg configs/deeplabv3p_mobilenetv2_cityscapes.yaml --use_gpu --use_mpio \
SLIM.NAS_PORT 23333 \
SLIM.NAS_ADDRESS "" \
SLIM.NAS_SEARCH_STEPS 2 \
SLIM.NAS_START_EVAL_EPOCH -1 \
SLIM.NAS_IS_SERVER True \
SLIM.NAS_SPACE_NAME "MobileNetV2SpaceSeg" \
```
## FAQ
- 运行报错:`socket.error: [Errno 98] Address already in use`
解决方法:当前端口被占用,请修改`SLIM.NAS_PORT`端口。
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import contextlib
import paddle
import paddle.fluid as fluid
from utils.config import cfg
from models.libs.model_libs import scope, name_scope
from models.libs.model_libs import bn, bn_relu, relu
from models.libs.model_libs import conv
from models.libs.model_libs import separate_conv
from models.backbone.mobilenet_v2 import MobileNetV2 as mobilenet_backbone
from models.backbone.xception import Xception as xception_backbone
def encoder(input):
# 编码器配置,采用ASPP架构,pooling + 1x1_conv + 三个不同尺度的空洞卷积并行, concat后1x1conv
# ASPP_WITH_SEP_CONV:默认为真,使用depthwise可分离卷积,否则使用普通卷积
# OUTPUT_STRIDE: 下采样倍数,8或16,决定aspp_ratios大小
# aspp_ratios:ASPP模块空洞卷积的采样率
if cfg.MODEL.DEEPLAB.OUTPUT_STRIDE == 16:
aspp_ratios = [6, 12, 18]
elif cfg.MODEL.DEEPLAB.OUTPUT_STRIDE == 8:
aspp_ratios = [12, 24, 36]
else:
raise Exception("deeplab only support stride 8 or 16")
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
with scope('encoder'):
channel = 256
with scope("image_pool"):
image_avg = fluid.layers.reduce_mean(
input, [2, 3], keep_dim=True)
image_avg = bn_relu(
conv(
image_avg,
channel,
1,
1,
groups=1,
padding=0,
param_attr=param_attr))
image_avg = fluid.layers.resize_bilinear(image_avg, input.shape[2:])
with scope("aspp0"):
aspp0 = bn_relu(
conv(
input,
channel,
1,
1,
groups=1,
padding=0,
param_attr=param_attr))
with scope("aspp1"):
if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
aspp1 = separate_conv(
input, channel, 1, 3, dilation=aspp_ratios[0], act=relu)
else:
aspp1 = bn_relu(
conv(
input,
channel,
stride=1,
filter_size=3,
dilation=aspp_ratios[0],
padding=aspp_ratios[0],
param_attr=param_attr))
with scope("aspp2"):
if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
aspp2 = separate_conv(
input, channel, 1, 3, dilation=aspp_ratios[1], act=relu)
else:
aspp2 = bn_relu(
conv(
input,
channel,
stride=1,
filter_size=3,
dilation=aspp_ratios[1],
padding=aspp_ratios[1],
param_attr=param_attr))
with scope("aspp3"):
if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
aspp3 = separate_conv(
input, channel, 1, 3, dilation=aspp_ratios[2], act=relu)
else:
aspp3 = bn_relu(
conv(
input,
channel,
stride=1,
filter_size=3,
dilation=aspp_ratios[2],
padding=aspp_ratios[2],
param_attr=param_attr))
with scope("concat"):
data = fluid.layers.concat([image_avg, aspp0, aspp1, aspp2, aspp3],
axis=1)
data = bn_relu(
conv(
data,
channel,
1,
1,
groups=1,
padding=0,
param_attr=param_attr))
data = fluid.layers.dropout(data, 0.9)
return data
def decoder(encode_data, decode_shortcut):
# 解码器配置
# encode_data:编码器输出
# decode_shortcut: 从backbone引出的分支, resize后与encode_data concat
# DECODER_USE_SEP_CONV: 默认为真,则concat后连接两个可分离卷积,否则为普通卷积
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
with scope('decoder'):
with scope('concat'):
decode_shortcut = bn_relu(
conv(
decode_shortcut,
48,
1,
1,
groups=1,
padding=0,
param_attr=param_attr))
encode_data = fluid.layers.resize_bilinear(
encode_data, decode_shortcut.shape[2:])
encode_data = fluid.layers.concat([encode_data, decode_shortcut],
axis=1)
if cfg.MODEL.DEEPLAB.DECODER_USE_SEP_CONV:
with scope("separable_conv1"):
encode_data = separate_conv(
encode_data, 256, 1, 3, dilation=1, act=relu)
with scope("separable_conv2"):
encode_data = separate_conv(
encode_data, 256, 1, 3, dilation=1, act=relu)
else:
with scope("decoder_conv1"):
encode_data = bn_relu(
conv(
encode_data,
256,
stride=1,
filter_size=3,
dilation=1,
padding=1,
param_attr=param_attr))
with scope("decoder_conv2"):
encode_data = bn_relu(
conv(
encode_data,
256,
stride=1,
filter_size=3,
dilation=1,
padding=1,
param_attr=param_attr))
return encode_data
def nas_backbone(input, arch):
# scale = cfg.MODEL.DEEPLAB.DEPTH_MULTIPLIER
# output_stride = cfg.MODEL.DEEPLAB.OUTPUT_STRIDE
# model = mobilenet_backbone(scale=scale, output_stride=output_stride)
end_points = 8
decode_point = 3
data, decode_shortcuts = arch(
input, end_points=end_points, return_block=decode_point, output_stride=16)
decode_shortcut = decode_shortcuts[decode_point]
return data, decode_shortcut
def deeplabv3p_nas(img, num_classes, arch=None):
data, decode_shortcut = nas_backbone(img, arch)
# 编码器解码器设置
cfg.MODEL.DEFAULT_EPSILON = 1e-5
if cfg.MODEL.DEEPLAB.ENCODER_WITH_ASPP:
data = encoder(data)
if cfg.MODEL.DEEPLAB.ENABLE_DECODER:
data = decoder(data, decode_shortcut)
# 根据类别数设置最后一个卷积层输出,并resize到图片原始尺寸
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
with scope('logit'):
logit = conv(
data,
num_classes,
1,
stride=1,
padding=0,
bias_attr=True,
param_attr=param_attr)
logit = fluid.layers.resize_bilinear(logit, img.shape[2:])
return logit
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
# GPU memory garbage collection optimization flags
os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
import sys
LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
SEG_PATH = os.path.join(LOCAL_PATH, "../../", "pdseg")
sys.path.append(SEG_PATH)
import time
import argparse
import functools
import pprint
import cv2
import numpy as np
import paddle
import paddle.fluid as fluid
from utils.config import cfg
from utils.timer import Timer, calculate_eta
from model_builder import build_model
from model_builder import ModelPhase
from reader import SegDataset
from metrics import ConfusionMatrix
from mobilenetv2_search_space import MobileNetV2SpaceSeg
def parse_args():
parser = argparse.ArgumentParser(description='PaddleSeg model evalution')
parser.add_argument(
'--cfg',
dest='cfg_file',
help='Config file for training (and optionally testing)',
default=None,
type=str)
parser.add_argument(
'--use_gpu',
dest='use_gpu',
help='Use gpu or cpu',
action='store_true',
default=False)
parser.add_argument(
'--use_mpio',
dest='use_mpio',
help='Use multiprocess IO or not',
action='store_true',
default=False)
parser.add_argument(
'opts',
help='See utils/config.py for all options',
default=None,
nargs=argparse.REMAINDER)
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def evaluate(cfg, ckpt_dir=None, use_gpu=False, use_mpio=False, **kwargs):
np.set_printoptions(precision=5, suppress=True)
startup_prog = fluid.Program()
test_prog = fluid.Program()
dataset = SegDataset(
file_list=cfg.DATASET.VAL_FILE_LIST,
mode=ModelPhase.EVAL,
data_dir=cfg.DATASET.DATA_DIR)
def data_generator():
#TODO: check is batch reader compatitable with Windows
if use_mpio:
data_gen = dataset.multiprocess_generator(
num_processes=cfg.DATALOADER.NUM_WORKERS,
max_queue_size=cfg.DATALOADER.BUF_SIZE)
else:
data_gen = dataset.generator()
for b in data_gen:
yield b[0], b[1], b[2]
py_reader, avg_loss, pred, grts, masks = build_model(
test_prog, startup_prog, phase=ModelPhase.EVAL, arch=kwargs['arch'])
py_reader.decorate_sample_generator(
data_generator, drop_last=False, batch_size=cfg.BATCH_SIZE)
# Get device environment
places = fluid.cuda_places() if use_gpu else fluid.cpu_places()
place = places[0]
dev_count = len(places)
print("#Device count: {}".format(dev_count))
exe = fluid.Executor(place)
exe.run(startup_prog)
test_prog = test_prog.clone(for_test=True)
ckpt_dir = cfg.TEST.TEST_MODEL if not ckpt_dir else ckpt_dir
if not os.path.exists(ckpt_dir):
raise ValueError('The TEST.TEST_MODEL {} is not found'.format(ckpt_dir))
if ckpt_dir is not None:
print('load test model:', ckpt_dir)
fluid.io.load_params(exe, ckpt_dir, main_program=test_prog)
# Use streaming confusion matrix to calculate mean_iou
np.set_printoptions(
precision=4, suppress=True, linewidth=160, floatmode="fixed")
conf_mat = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
fetch_list = [avg_loss.name, pred.name, grts.name, masks.name]
num_images = 0
step = 0
all_step = cfg.DATASET.TEST_TOTAL_IMAGES // cfg.BATCH_SIZE + 1
timer = Timer()
timer.start()
py_reader.start()
while True:
try:
step += 1
loss, pred, grts, masks = exe.run(
test_prog, fetch_list=fetch_list, return_numpy=True)
loss = np.mean(np.array(loss))
num_images += pred.shape[0]
conf_mat.calculate(pred, grts, masks)
_, iou = conf_mat.mean_iou()
_, acc = conf_mat.accuracy()
speed = 1.0 / timer.elapsed_time()
print(
"[EVAL]step={} loss={:.5f} acc={:.4f} IoU={:.4f} step/sec={:.2f} | ETA {}"
.format(step, loss, acc, iou, speed,
calculate_eta(all_step - step, speed)))
timer.restart()
sys.stdout.flush()
except fluid.core.EOFException:
break
category_iou, avg_iou = conf_mat.mean_iou()
category_acc, avg_acc = conf_mat.accuracy()
print("[EVAL]#image={} acc={:.4f} IoU={:.4f}".format(
num_images, avg_acc, avg_iou))
print("[EVAL]Category IoU:", category_iou)
print("[EVAL]Category Acc:", category_acc)
print("[EVAL]Kappa:{:.4f}".format(conf_mat.kappa()))
return category_iou, avg_iou, category_acc, avg_acc
def main():
args = parse_args()
if args.cfg_file is not None:
cfg.update_from_file(args.cfg_file)
if args.opts:
cfg.update_from_list(args.opts)
cfg.check_and_infer()
print(pprint.pformat(cfg))
evaluate(cfg, **args.__dict__)
if __name__ == '__main__':
main()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
from paddleslim.nas.search_space.search_space_base import SearchSpaceBase
from paddleslim.nas.search_space.base_layer import conv_bn_layer
from paddleslim.nas.search_space.search_space_registry import SEARCHSPACE
from paddleslim.nas.search_space.utils import check_points
__all__ = ["MobileNetV2SpaceSeg"]
@SEARCHSPACE.register
class MobileNetV2SpaceSeg(SearchSpaceBase):
def __init__(self, input_size, output_size, block_num, block_mask=None):
super(MobileNetV2SpaceSeg, self).__init__(input_size, output_size,
block_num, block_mask)
# self.head_num means the first convolution channel
self.head_num = np.array([3, 4, 8, 12, 16, 24, 32]) #7
# self.filter_num1 ~ self.filter_num6 means following convlution channel
self.filter_num1 = np.array([3, 4, 8, 12, 16, 24, 32, 48]) #8
self.filter_num2 = np.array([8, 12, 16, 24, 32, 48, 64, 80]) #8
self.filter_num3 = np.array([16, 24, 32, 48, 64, 80, 96, 128]) #8
self.filter_num4 = np.array(
[24, 32, 48, 64, 80, 96, 128, 144, 160, 192]) #10
self.filter_num5 = np.array(
[32, 48, 64, 80, 96, 128, 144, 160, 192, 224]) #10
self.filter_num6 = np.array(
[64, 80, 96, 128, 144, 160, 192, 224, 256, 320, 384, 512]) #12
# self.k_size means kernel size
self.k_size = np.array([3, 5]) #2
# self.multiply means expansion_factor of each _inverted_residual_unit
self.multiply = np.array([1, 2, 3, 4, 6]) #5
# self.repeat means repeat_num _inverted_residual_unit in each _invresi_blocks
self.repeat = np.array([1, 2, 3, 4, 5, 6]) #6
def init_tokens(self):
"""
The initial token.
The first one is the index of the first layers' channel in self.head_num,
each line in the following represent the index of the [expansion_factor, filter_num, repeat_num, kernel_size]
"""
# original MobileNetV2
# yapf: disable
init_token_base = [4, # 1, 16, 1
4, 5, 1, 0, # 6, 24, 2
4, 4, 2, 0, # 6, 32, 3
4, 4, 3, 0, # 6, 64, 4
4, 5, 2, 0, # 6, 96, 3
4, 7, 2, 0, # 6, 160, 3
4, 9, 0, 0] # 6, 320, 1
# yapf: enable
return init_token_base
def range_table(self):
"""
Get range table of current search space, constrains the range of tokens.
"""
# head_num + 6 * [multiple(expansion_factor), filter_num, repeat, kernel_size]
# yapf: disable
range_table_base = [len(self.head_num),
len(self.multiply), len(self.filter_num1), len(self.repeat), len(self.k_size),
len(self.multiply), len(self.filter_num2), len(self.repeat), len(self.k_size),
len(self.multiply), len(self.filter_num3), len(self.repeat), len(self.k_size),
len(self.multiply), len(self.filter_num4), len(self.repeat), len(self.k_size),
len(self.multiply), len(self.filter_num5), len(self.repeat), len(self.k_size),
len(self.multiply), len(self.filter_num6), len(self.repeat), len(self.k_size)]
# yapf: enable
return range_table_base
def token2arch(self, tokens=None):
"""
return net_arch function
"""
if tokens is None:
tokens = self.init_tokens()
self.bottleneck_params_list = []
self.bottleneck_params_list.append(
(1, self.head_num[tokens[0]], 1, 1, 3))
self.bottleneck_params_list.append(
(self.multiply[tokens[1]], self.filter_num1[tokens[2]],
self.repeat[tokens[3]], 2, self.k_size[tokens[4]]))
self.bottleneck_params_list.append(
(self.multiply[tokens[5]], self.filter_num2[tokens[6]],
self.repeat[tokens[7]], 2, self.k_size[tokens[8]]))
self.bottleneck_params_list.append(
(self.multiply[tokens[9]], self.filter_num3[tokens[10]],
self.repeat[tokens[11]], 2, self.k_size[tokens[12]]))
self.bottleneck_params_list.append(
(self.multiply[tokens[13]], self.filter_num4[tokens[14]],
self.repeat[tokens[15]], 1, self.k_size[tokens[16]]))
self.bottleneck_params_list.append(
(self.multiply[tokens[17]], self.filter_num5[tokens[18]],
self.repeat[tokens[19]], 2, self.k_size[tokens[20]]))
self.bottleneck_params_list.append(
(self.multiply[tokens[21]], self.filter_num6[tokens[22]],
self.repeat[tokens[23]], 1, self.k_size[tokens[24]]))
def _modify_bottle_params(output_stride=None):
if output_stride is not None and output_stride % 2 != 0:
raise Exception("output stride must to be even number")
if output_stride is None:
return
else:
stride = 2
for i, layer_setting in enumerate(self.bottleneck_params_list):
t, c, n, s, ks = layer_setting
stride = stride * s
if stride > output_stride:
s = 1
self.bottleneck_params_list[i] = (t, c, n, s, ks)
def net_arch(input,
scale=1.0,
return_block=None,
end_points=None,
output_stride=None):
self.scale = scale
_modify_bottle_params(output_stride)
decode_ends = dict()
def check_points(count, points):
if points is None:
return False
else:
if isinstance(points, list):
return (True if count in points else False)
else:
return (True if count == points else False)
#conv1
# all padding is 'SAME' in the conv2d, can compute the actual padding automatic.
input = conv_bn_layer(
input,
num_filters=int(32 * self.scale),
filter_size=3,
stride=2,
padding='SAME',
act='relu6',
name='mobilenetv2_conv1')
layer_count = 1
depthwise_output = None
# bottleneck sequences
in_c = int(32 * self.scale)
for i, layer_setting in enumerate(self.bottleneck_params_list):
t, c, n, s, k = layer_setting
layer_count += 1
### return_block and end_points means block num
if check_points((layer_count - 1), return_block):
decode_ends[layer_count - 1] = depthwise_output
if check_points((layer_count - 1), end_points):
return input, decode_ends
input, depthwise_output = self._invresi_blocks(
input=input,
in_c=in_c,
t=t,
c=int(c * self.scale),
n=n,
s=s,
k=int(k),
name='mobilenetv2_conv' + str(i))
in_c = int(c * self.scale)
### return_block and end_points means block num
if check_points(layer_count, return_block):
decode_ends[layer_count] = depthwise_output
if check_points(layer_count, end_points):
return input, decode_ends
# last conv
input = conv_bn_layer(
input=input,
num_filters=int(1280 * self.scale)
if self.scale > 1.0 else 1280,
filter_size=1,
stride=1,
padding='SAME',
act='relu6',
name='mobilenetv2_conv' + str(i + 1))
input = fluid.layers.pool2d(
input=input,
pool_type='avg',
global_pooling=True,
name='mobilenetv2_last_pool')
return input
return net_arch
def _shortcut(self, input, data_residual):
"""Build shortcut layer.
Args:
input(Variable): input.
data_residual(Variable): residual layer.
Returns:
Variable, layer output.
"""
return fluid.layers.elementwise_add(input, data_residual)
def _inverted_residual_unit(self,
input,
num_in_filter,
num_filters,
ifshortcut,
stride,
filter_size,
expansion_factor,
reduction_ratio=4,
name=None):
"""Build inverted residual unit.
Args:
input(Variable), input.
num_in_filter(int), number of in filters.
num_filters(int), number of filters.
ifshortcut(bool), whether using shortcut.
stride(int), stride.
filter_size(int), filter size.
padding(str|int|list), padding.
expansion_factor(float), expansion factor.
name(str), name.
Returns:
Variable, layers output.
"""
num_expfilter = int(round(num_in_filter * expansion_factor))
channel_expand = conv_bn_layer(
input=input,
num_filters=num_expfilter,
filter_size=1,
stride=1,
padding='SAME',
num_groups=1,
act='relu6',
name=name + '_expand')
bottleneck_conv = conv_bn_layer(
input=channel_expand,
num_filters=num_expfilter,
filter_size=filter_size,
stride=stride,
padding='SAME',
num_groups=num_expfilter,
act='relu6',
name=name + '_dwise',
use_cudnn=False)
depthwise_output = bottleneck_conv
linear_out = conv_bn_layer(
input=bottleneck_conv,
num_filters=num_filters,
filter_size=1,
stride=1,
padding='SAME',
num_groups=1,
act=None,
name=name + '_linear')
out = linear_out
if ifshortcut:
out = self._shortcut(input=input, data_residual=out)
return out, depthwise_output
def _invresi_blocks(self, input, in_c, t, c, n, s, k, name=None):
"""Build inverted residual blocks.
Args:
input: Variable, input.
in_c: int, number of in filters.
t: float, expansion factor.
c: int, number of filters.
n: int, number of layers.
s: int, stride.
k: int, filter size.
name: str, name.
Returns:
Variable, layers output.
"""
first_block, depthwise_output = self._inverted_residual_unit(
input=input,
num_in_filter=in_c,
num_filters=c,
ifshortcut=False,
stride=s,
filter_size=k,
expansion_factor=t,
name=name + '_1')
last_residual_block = first_block
last_c = c
for i in range(1, n):
last_residual_block, depthwise_output = self._inverted_residual_unit(
input=last_residual_block,
num_in_filter=last_c,
num_filters=c,
ifshortcut=True,
stride=1,
filter_size=k,
expansion_factor=t,
name=name + '_' + str(i + 1))
return last_residual_block, depthwise_output
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import struct
import paddle.fluid as fluid
import numpy as np
from paddle.fluid.proto.framework_pb2 import VarType
import solver
from utils.config import cfg
from loss import multi_softmax_with_loss
from loss import multi_dice_loss
from loss import multi_bce_loss
import deeplab
class ModelPhase(object):
"""
Standard name for model phase in PaddleSeg
The following standard keys are defined:
* `TRAIN`: training mode.
* `EVAL`: testing/evaluation mode.
* `PREDICT`: prediction/inference mode.
* `VISUAL` : visualization mode
"""
TRAIN = 'train'
EVAL = 'eval'
PREDICT = 'predict'
VISUAL = 'visual'
@staticmethod
def is_train(phase):
return phase == ModelPhase.TRAIN
@staticmethod
def is_predict(phase):
return phase == ModelPhase.PREDICT
@staticmethod
def is_eval(phase):
return phase == ModelPhase.EVAL
@staticmethod
def is_visual(phase):
return phase == ModelPhase.VISUAL
@staticmethod
def is_valid_phase(phase):
""" Check valid phase """
if ModelPhase.is_train(phase) or ModelPhase.is_predict(phase) \
or ModelPhase.is_eval(phase) or ModelPhase.is_visual(phase):
return True
return False
def seg_model(image, class_num, arch):
model_name = cfg.MODEL.MODEL_NAME
if model_name == 'deeplabv3p':
logits = deeplab.deeplabv3p_nas(image, class_num, arch)
else:
raise Exception(
"unknow model name, only support deeplabv3p"
)
return logits
def softmax(logit):
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
logit = fluid.layers.softmax(logit)
logit = fluid.layers.transpose(logit, [0, 3, 1, 2])
return logit
def sigmoid_to_softmax(logit):
"""
one channel to two channel
"""
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
logit = fluid.layers.sigmoid(logit)
logit_back = 1 - logit
logit = fluid.layers.concat([logit_back, logit], axis=-1)
logit = fluid.layers.transpose(logit, [0, 3, 1, 2])
return logit
def export_preprocess(image):
"""导出模型的预处理流程"""
image = fluid.layers.transpose(image, [0, 3, 1, 2])
origin_shape = fluid.layers.shape(image)[-2:]
# 不同AUG_METHOD方法的resize
if cfg.AUG.AUG_METHOD == 'unpadding':
h_fix = cfg.AUG.FIX_RESIZE_SIZE[1]
w_fix = cfg.AUG.FIX_RESIZE_SIZE[0]
image = fluid.layers.resize_bilinear(
image, out_shape=[h_fix, w_fix], align_corners=False, align_mode=0)
elif cfg.AUG.AUG_METHOD == 'rangescaling':
size = cfg.AUG.INF_RESIZE_VALUE
value = fluid.layers.reduce_max(origin_shape)
scale = float(size) / value.astype('float32')
image = fluid.layers.resize_bilinear(
image, scale=scale, align_corners=False, align_mode=0)
# 存储resize后图像shape
valid_shape = fluid.layers.shape(image)[-2:]
# padding到eval_crop_size大小
width = cfg.EVAL_CROP_SIZE[0]
height = cfg.EVAL_CROP_SIZE[1]
pad_target = fluid.layers.assign(
np.array([height, width]).astype('float32'))
up = fluid.layers.assign(np.array([0]).astype('float32'))
down = pad_target[0] - valid_shape[0]
left = up
right = pad_target[1] - valid_shape[1]
paddings = fluid.layers.concat([up, down, left, right])
paddings = fluid.layers.cast(paddings, 'int32')
image = fluid.layers.pad2d(image, paddings=paddings, pad_value=127.5)
# normalize
mean = np.array(cfg.MEAN).reshape(1, len(cfg.MEAN), 1, 1)
mean = fluid.layers.assign(mean.astype('float32'))
std = np.array(cfg.STD).reshape(1, len(cfg.STD), 1, 1)
std = fluid.layers.assign(std.astype('float32'))
image = (image / 255 - mean) / std
# 使后面的网络能通过类似image.shape获取特征图的shape
image = fluid.layers.reshape(
image, shape=[-1, cfg.DATASET.DATA_DIM, height, width])
return image, valid_shape, origin_shape
def build_model(main_prog, start_prog, phase=ModelPhase.TRAIN, arch=None):
if not ModelPhase.is_valid_phase(phase):
raise ValueError("ModelPhase {} is not valid!".format(phase))
if ModelPhase.is_train(phase):
width = cfg.TRAIN_CROP_SIZE[0]
height = cfg.TRAIN_CROP_SIZE[1]
else:
width = cfg.EVAL_CROP_SIZE[0]
height = cfg.EVAL_CROP_SIZE[1]
image_shape = [cfg.DATASET.DATA_DIM, height, width]
grt_shape = [1, height, width]
class_num = cfg.DATASET.NUM_CLASSES
with fluid.program_guard(main_prog, start_prog):
with fluid.unique_name.guard():
# 在导出模型的时候,增加图像标准化预处理,减小预测部署时图像的处理流程
# 预测部署时只须对输入图像增加batch_size维度即可
if ModelPhase.is_predict(phase):
origin_image = fluid.layers.data(
name='image',
shape=[-1, -1, -1, cfg.DATASET.DATA_DIM],
dtype='float32',
append_batch_size=False)
image, valid_shape, origin_shape = export_preprocess(
origin_image)
else:
image = fluid.layers.data(
name='image', shape=image_shape, dtype='float32')
label = fluid.layers.data(
name='label', shape=grt_shape, dtype='int32')
mask = fluid.layers.data(
name='mask', shape=grt_shape, dtype='int32')
# use PyReader when doing traning and evaluation
if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
py_reader = fluid.io.PyReader(
feed_list=[image, label, mask],
capacity=cfg.DATALOADER.BUF_SIZE,
iterable=False,
use_double_buffer=True)
loss_type = cfg.SOLVER.LOSS
if not isinstance(loss_type, list):
loss_type = list(loss_type)
# dice_loss或bce_loss只适用两类分割中
if class_num > 2 and (("dice_loss" in loss_type) or
("bce_loss" in loss_type)):
raise Exception(
"dice loss and bce loss is only applicable to binary classfication"
)
# 在两类分割情况下,当loss函数选择dice_loss或bce_loss的时候,最后logit输出通道数设置为1
if ("dice_loss" in loss_type) or ("bce_loss" in loss_type):
class_num = 1
if "softmax_loss" in loss_type:
raise Exception(
"softmax loss can not combine with dice loss or bce loss"
)
logits = seg_model(image, class_num, arch)
# 根据选择的loss函数计算相应的损失函数
if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
loss_valid = False
avg_loss_list = []
valid_loss = []
if "softmax_loss" in loss_type:
weight = cfg.SOLVER.CROSS_ENTROPY_WEIGHT
avg_loss_list.append(
multi_softmax_with_loss(logits, label, mask, class_num, weight))
loss_valid = True
valid_loss.append("softmax_loss")
if "dice_loss" in loss_type:
avg_loss_list.append(multi_dice_loss(logits, label, mask))
loss_valid = True
valid_loss.append("dice_loss")
if "bce_loss" in loss_type:
avg_loss_list.append(multi_bce_loss(logits, label, mask))
loss_valid = True
valid_loss.append("bce_loss")
if not loss_valid:
raise Exception(
"SOLVER.LOSS: {} is set wrong. it should "
"include one of (softmax_loss, bce_loss, dice_loss) at least"
" example: ['softmax_loss'], ['dice_loss'], ['bce_loss', 'dice_loss']"
.format(cfg.SOLVER.LOSS))
invalid_loss = [x for x in loss_type if x not in valid_loss]
if len(invalid_loss) > 0:
print(
"Warning: the loss {} you set is invalid. it will not be included in loss computed."
.format(invalid_loss))
avg_loss = 0
for i in range(0, len(avg_loss_list)):
avg_loss += avg_loss_list[i]
#get pred result in original size
if isinstance(logits, tuple):
logit = logits[0]
else:
logit = logits
if logit.shape[2:] != label.shape[2:]:
logit = fluid.layers.resize_bilinear(logit, label.shape[2:])
# return image input and logit output for inference graph prune
if ModelPhase.is_predict(phase):
# 两类分割中,使用dice_loss或bce_loss返回的logit为单通道,进行到两通道的变换
if class_num == 1:
logit = sigmoid_to_softmax(logit)
else:
logit = softmax(logit)
# 获取有效部分
logit = fluid.layers.slice(
logit, axes=[2, 3], starts=[0, 0], ends=valid_shape)
logit = fluid.layers.resize_bilinear(
logit,
out_shape=origin_shape,
align_corners=False,
align_mode=0)
logit = fluid.layers.argmax(logit, axis=1)
return origin_image, logit
if class_num == 1:
out = sigmoid_to_softmax(logit)
out = fluid.layers.transpose(out, [0, 2, 3, 1])
else:
out = fluid.layers.transpose(logit, [0, 2, 3, 1])
pred = fluid.layers.argmax(out, axis=3)
pred = fluid.layers.unsqueeze(pred, axes=[3])
if ModelPhase.is_visual(phase):
if class_num == 1:
logit = sigmoid_to_softmax(logit)
else:
logit = softmax(logit)
return pred, logit
if ModelPhase.is_eval(phase):
return py_reader, avg_loss, pred, label, mask
if ModelPhase.is_train(phase):
optimizer = solver.Solver(main_prog, start_prog)
decayed_lr = optimizer.optimise(avg_loss)
return py_reader, avg_loss, decayed_lr, pred, label, mask
def to_int(string, dest="I"):
return struct.unpack(dest, string)[0]
def parse_shape_from_file(filename):
with open(filename, "rb") as file:
version = file.read(4)
lod_level = to_int(file.read(8), dest="Q")
for i in range(lod_level):
_size = to_int(file.read(8), dest="Q")
_ = file.read(_size)
version = file.read(4)
tensor_desc_size = to_int(file.read(4))
tensor_desc = VarType.TensorDesc()
tensor_desc.ParseFromString(file.read(tensor_desc_size))
return tuple(tensor_desc.dims)
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
# GPU memory garbage collection optimization flags
os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
import sys
LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
SEG_PATH = os.path.join(LOCAL_PATH, "../../", "pdseg")
sys.path.append(SEG_PATH)
import argparse
import pprint
import random
import shutil
import functools
import paddle
import numpy as np
import paddle.fluid as fluid
from utils.config import cfg
from utils.timer import Timer, calculate_eta
from metrics import ConfusionMatrix
from reader import SegDataset
from model_builder import build_model
from model_builder import ModelPhase
from model_builder import parse_shape_from_file
from eval_nas import evaluate
from vis import visualize
from utils import dist_utils
from mobilenetv2_search_space import MobileNetV2SpaceSeg
from paddleslim.nas.search_space.search_space_factory import SearchSpaceFactory
from paddleslim.analysis import flops
from paddleslim.nas.sa_nas import SANAS
from paddleslim.nas import search_space
def parse_args():
parser = argparse.ArgumentParser(description='PaddleSeg training')
parser.add_argument(
'--cfg',
dest='cfg_file',
help='Config file for training (and optionally testing)',
default=None,
type=str)
parser.add_argument(
'--use_gpu',
dest='use_gpu',
help='Use gpu or cpu',
action='store_true',
default=False)
parser.add_argument(
'--use_mpio',
dest='use_mpio',
help='Use multiprocess I/O or not',
action='store_true',
default=False)
parser.add_argument(
'--log_steps',
dest='log_steps',
help='Display logging information at every log_steps',
default=10,
type=int)
parser.add_argument(
'--debug',
dest='debug',
help='debug mode, display detail information of training',
action='store_true')
parser.add_argument(
'--use_tb',
dest='use_tb',
help='whether to record the data during training to Tensorboard',
action='store_true')
parser.add_argument(
'--tb_log_dir',
dest='tb_log_dir',
help='Tensorboard logging directory',
default=None,
type=str)
parser.add_argument(
'--do_eval',
dest='do_eval',
help='Evaluation models result on every new checkpoint',
action='store_true')
parser.add_argument(
'opts',
help='See utils/config.py for all options',
default=None,
nargs=argparse.REMAINDER)
parser.add_argument(
'--enable_ce',
dest='enable_ce',
help='If set True, enable continuous evaluation job.'
'This flag is only used for internal test.',
action='store_true')
return parser.parse_args()
def save_vars(executor, dirname, program=None, vars=None):
"""
Temporary resolution for Win save variables compatability.
Will fix in PaddlePaddle v1.5.2
"""
save_program = fluid.Program()
save_block = save_program.global_block()
for each_var in vars:
# NOTE: don't save the variable which type is RAW
if each_var.type == fluid.core.VarDesc.VarType.RAW:
continue
new_var = save_block.create_var(
name=each_var.name,
shape=each_var.shape,
dtype=each_var.dtype,
type=each_var.type,
lod_level=each_var.lod_level,
persistable=True)
file_path = os.path.join(dirname, new_var.name)
file_path = os.path.normpath(file_path)
save_block.append_op(
type='save',
inputs={'X': [new_var]},
outputs={},
attrs={'file_path': file_path})
executor.run(save_program)
def save_checkpoint(exe, program, ckpt_name):
"""
Save checkpoint for evaluation or resume training
"""
ckpt_dir = os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, str(ckpt_name))
print("Save model checkpoint to {}".format(ckpt_dir))
if not os.path.isdir(ckpt_dir):
os.makedirs(ckpt_dir)
save_vars(
exe,
ckpt_dir,
program,
vars=list(filter(fluid.io.is_persistable, program.list_vars())))
return ckpt_dir
def load_checkpoint(exe, program):
"""
Load checkpoiont from pretrained model directory for resume training
"""
print('Resume model training from:', cfg.TRAIN.RESUME_MODEL_DIR)
if not os.path.exists(cfg.TRAIN.RESUME_MODEL_DIR):
raise ValueError("TRAIN.PRETRAIN_MODEL {} not exist!".format(
cfg.TRAIN.RESUME_MODEL_DIR))
fluid.io.load_persistables(
exe, cfg.TRAIN.RESUME_MODEL_DIR, main_program=program)
model_path = cfg.TRAIN.RESUME_MODEL_DIR
# Check is path ended by path spearator
if model_path[-1] == os.sep:
model_path = model_path[0:-1]
epoch_name = os.path.basename(model_path)
# If resume model is final model
if epoch_name == 'final':
begin_epoch = cfg.SOLVER.NUM_EPOCHS
# If resume model path is end of digit, restore epoch status
elif epoch_name.isdigit():
epoch = int(epoch_name)
begin_epoch = epoch + 1
else:
raise ValueError("Resume model path is not valid!")
print("Model checkpoint loaded successfully!")
return begin_epoch
def update_best_model(ckpt_dir):
best_model_dir = os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, 'best_model')
if os.path.exists(best_model_dir):
shutil.rmtree(best_model_dir)
shutil.copytree(ckpt_dir, best_model_dir)
def print_info(*msg):
if cfg.TRAINER_ID == 0:
print(*msg)
def train(cfg):
startup_prog = fluid.Program()
train_prog = fluid.Program()
if args.enable_ce:
startup_prog.random_seed = 1000
train_prog.random_seed = 1000
drop_last = True
dataset = SegDataset(
file_list=cfg.DATASET.TRAIN_FILE_LIST,
mode=ModelPhase.TRAIN,
shuffle=True,
data_dir=cfg.DATASET.DATA_DIR)
def data_generator():
if args.use_mpio:
data_gen = dataset.multiprocess_generator(
num_processes=cfg.DATALOADER.NUM_WORKERS,
max_queue_size=cfg.DATALOADER.BUF_SIZE)
else:
data_gen = dataset.generator()
batch_data = []
for b in data_gen:
batch_data.append(b)
if len(batch_data) == (cfg.BATCH_SIZE // cfg.NUM_TRAINERS):
for item in batch_data:
yield item[0], item[1], item[2]
batch_data = []
# If use sync batch norm strategy, drop last batch if number of samples
# in batch_data is less then cfg.BATCH_SIZE to avoid NCCL hang issues
if not cfg.TRAIN.SYNC_BATCH_NORM:
for item in batch_data:
yield item[0], item[1], item[2]
# Get device environment
# places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
# place = places[0]
gpu_id = int(os.environ.get('FLAGS_selected_gpus', 0))
place = fluid.CUDAPlace(gpu_id) if args.use_gpu else fluid.CPUPlace()
places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
# Get number of GPU
dev_count = cfg.NUM_TRAINERS if cfg.NUM_TRAINERS > 1 else len(places)
print_info("#Device count: {}".format(dev_count))
# Make sure BATCH_SIZE can divided by GPU cards
assert cfg.BATCH_SIZE % dev_count == 0, (
'BATCH_SIZE:{} not divisble by number of GPUs:{}'.format(
cfg.BATCH_SIZE, dev_count))
# If use multi-gpu training mode, batch data will allocated to each GPU evenly
batch_size_per_dev = cfg.BATCH_SIZE // dev_count
print_info("batch_size_per_dev: {}".format(batch_size_per_dev))
config_info = {'input_size': 769, 'output_size': 1, 'block_num': 7}
config = ([(cfg.SLIM.NAS_SPACE_NAME, config_info)])
factory = SearchSpaceFactory()
space = factory.get_search_space(config)
port = cfg.SLIM.NAS_PORT
server_address = (cfg.SLIM.NAS_ADDRESS, port)
sa_nas = SANAS(config, server_addr=server_address, search_steps=cfg.SLIM.NAS_SEARCH_STEPS,
is_server=cfg.SLIM.NAS_IS_SERVER)
for step in range(cfg.SLIM.NAS_SEARCH_STEPS):
arch = sa_nas.next_archs()[0]
start_prog = fluid.Program()
train_prog = fluid.Program()
py_reader, avg_loss, lr, pred, grts, masks = build_model(
train_prog, start_prog, arch=arch, phase=ModelPhase.TRAIN)
cur_flops = flops(train_prog)
print('current step:', step, 'flops:', cur_flops)
py_reader.decorate_sample_generator(
data_generator, batch_size=batch_size_per_dev, drop_last=drop_last)
exe = fluid.Executor(place)
exe.run(start_prog)
exec_strategy = fluid.ExecutionStrategy()
# Clear temporary variables every 100 iteration
if args.use_gpu:
exec_strategy.num_threads = fluid.core.get_cuda_device_count()
exec_strategy.num_iteration_per_drop_scope = 100
build_strategy = fluid.BuildStrategy()
if cfg.NUM_TRAINERS > 1 and args.use_gpu:
dist_utils.prepare_for_multi_process(exe, build_strategy, train_prog)
exec_strategy.num_threads = 1
if cfg.TRAIN.SYNC_BATCH_NORM and args.use_gpu:
if dev_count > 1:
# Apply sync batch norm strategy
print_info("Sync BatchNorm strategy is effective.")
build_strategy.sync_batch_norm = True
else:
print_info(
"Sync BatchNorm strategy will not be effective if GPU device"
" count <= 1")
compiled_train_prog = fluid.CompiledProgram(train_prog).with_data_parallel(
loss_name=avg_loss.name,
exec_strategy=exec_strategy,
build_strategy=build_strategy)
# Resume training
begin_epoch = cfg.SOLVER.BEGIN_EPOCH
if cfg.TRAIN.RESUME_MODEL_DIR:
begin_epoch = load_checkpoint(exe, train_prog)
# Load pretrained model
elif os.path.exists(cfg.TRAIN.PRETRAINED_MODEL_DIR):
print_info('Pretrained model dir: ', cfg.TRAIN.PRETRAINED_MODEL_DIR)
load_vars = []
load_fail_vars = []
def var_shape_matched(var, shape):
"""
Check whehter persitable variable shape is match with current network
"""
var_exist = os.path.exists(
os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name))
if var_exist:
var_shape = parse_shape_from_file(
os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name))
return var_shape == shape
return False
for x in train_prog.list_vars():
if isinstance(x, fluid.framework.Parameter):
shape = tuple(fluid.global_scope().find_var(
x.name).get_tensor().shape())
if var_shape_matched(x, shape):
load_vars.append(x)
else:
load_fail_vars.append(x)
fluid.io.load_vars(
exe, dirname=cfg.TRAIN.PRETRAINED_MODEL_DIR, vars=load_vars)
for var in load_vars:
print_info("Parameter[{}] loaded sucessfully!".format(var.name))
for var in load_fail_vars:
print_info(
"Parameter[{}] don't exist or shape does not match current network, skip"
" to load it.".format(var.name))
print_info("{}/{} pretrained parameters loaded successfully!".format(
len(load_vars),
len(load_vars) + len(load_fail_vars)))
else:
print_info(
'Pretrained model dir {} not exists, training from scratch...'.
format(cfg.TRAIN.PRETRAINED_MODEL_DIR))
fetch_list = [avg_loss.name, lr.name]
global_step = 0
all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE
if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True:
all_step += 1
all_step *= (cfg.SOLVER.NUM_EPOCHS - begin_epoch + 1)
avg_loss = 0.0
timer = Timer()
timer.start()
if begin_epoch > cfg.SOLVER.NUM_EPOCHS:
raise ValueError(
("begin epoch[{}] is larger than cfg.SOLVER.NUM_EPOCHS[{}]").format(
begin_epoch, cfg.SOLVER.NUM_EPOCHS))
if args.use_mpio:
print_info("Use multiprocess reader")
else:
print_info("Use multi-thread reader")
best_miou = 0.0
for epoch in range(begin_epoch, cfg.SOLVER.NUM_EPOCHS + 1):
py_reader.start()
while True:
try:
loss, lr = exe.run(
program=compiled_train_prog,
fetch_list=fetch_list,
return_numpy=True)
avg_loss += np.mean(np.array(loss))
global_step += 1
if global_step % args.log_steps == 0 and cfg.TRAINER_ID == 0:
avg_loss /= args.log_steps
speed = args.log_steps / timer.elapsed_time()
print((
"epoch={} step={} lr={:.5f} loss={:.4f} step/sec={:.3f} | ETA {}"
).format(epoch, global_step, lr[0], avg_loss, speed,
calculate_eta(all_step - global_step, speed)))
sys.stdout.flush()
avg_loss = 0.0
timer.restart()
except fluid.core.EOFException:
py_reader.reset()
break
except Exception as e:
print(e)
if epoch > cfg.SLIM.NAS_START_EVAL_EPOCH:
ckpt_dir = save_checkpoint(exe, train_prog, '{}_tmp'.format(port))
_, mean_iou, _, mean_acc = evaluate(
cfg=cfg,
arch=arch,
ckpt_dir=ckpt_dir,
use_gpu=args.use_gpu,
use_mpio=args.use_mpio)
if best_miou < mean_iou:
print('search step {}, epoch {} best iou {}'.format(step, epoch, mean_iou))
best_miou = mean_iou
sa_nas.reward(float(best_miou))
def main(args):
if args.cfg_file is not None:
cfg.update_from_file(args.cfg_file)
if args.opts:
cfg.update_from_list(args.opts)
if args.enable_ce:
random.seed(0)
np.random.seed(0)
cfg.TRAINER_ID = int(os.getenv("PADDLE_TRAINER_ID", 0))
cfg.NUM_TRAINERS = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
cfg.check_and_infer()
print_info(pprint.pformat(cfg))
train(cfg)
if __name__ == '__main__':
args = parse_args()
if fluid.core.is_compiled_with_cuda() != True and args.use_gpu == True:
print(
"You can not set use_gpu = True in the model because you are using paddlepaddle-cpu."
)
print(
"Please: 1. Install paddlepaddle-gpu to run your models on GPU or 2. Set use_gpu=False to run models on CPU."
)
sys.exit(1)
main(args)
# PaddleSeg剪裁教程
在阅读本教程前,请确保您已经了解过[PaddleSeg使用说明](../../docs/usage.md)等章节,以便对PaddleSeg有一定的了解
该文档介绍如何使用[PaddleSlim](https://paddlepaddle.github.io/PaddleSlim)的卷积通道剪裁接口对检测库中的模型的卷积层的通道数进行剪裁。
在分割库中,可以直接调用`PaddleSeg/slim/prune/train_prune.py`脚本实现剪裁,在该脚本中调用了PaddleSlim的[paddleslim.prune.Pruner](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/#Pruner)接口。
该教程中所示操作,如无特殊说明,均在`PaddleSeg/`路径下执行。
## 1. 数据与预训练模型准备
执行如下命令,下载cityscapes数据集
```
python dataset/download_cityscapes.py
```
参照[预训练模型列表](../../docs/model_zoo.md)获取所需预训练模型
## 2. 确定待分析参数
我们通过剪裁卷积层参数达到缩减卷积层通道数的目的,在剪裁之前,我们需要确定待裁卷积层的参数的名称。
通过以下命令查看当前模型的所有参数:
```python
# 查看模型所有Paramters
for x in train_prog.list_vars():
if isinstance(x, fluid.framework.Parameter):
print(x.name, x.shape)
```
通过观察参数名称和参数的形状,筛选出所有卷积层参数,并确定要裁剪的卷积层参数。
## 3. 启动剪裁任务
使用`train_prune.py`启动裁剪任务时,通过`SLIM.PRUNE_PARAMS`选项指定待裁剪的参数名称列表,参数名之间用逗号分隔,通过`SLIM.PRUNE_RATIOS`选项指定各个参数被裁掉的比例。
```shell
CUDA_VISIBLE_DEVICES=0
python -u ./slim/prune/train_prune.py --log_steps 10 --cfg configs/cityscape_fast_scnn.yaml --use_gpu --use_mpio \
SLIM.PRUNE_PARAMS 'learning_to_downsample/weights,learning_to_downsample/dsconv1/pointwise/weights,learning_to_downsample/dsconv2/pointwise/weights' \
SLIM.PRUNE_RATIOS '[0.1,0.1,0.1]'
```
这里我们选取三个参数,按0.1的比例剪裁。
## 4. 评估
```shell
CUDA_VISIBLE_DEVICES=0
python -u ./slim/prune/eval_prune.py --cfg configs/cityscape_fast_scnn.yaml --use_gpu --use_mpio \
TEST.TEST_MODEL your_trained_model \
```
## 5. 模型
| 模型 | 数据集合 | 下载地址 |剪裁方法| flops | mIoU on val|
|---|---|---|---|---|---|
| Fast-SCNN/bn | Cityscapes |[fast_scnn_cityscapes.tar](https://paddleseg.bj.bcebos.com/models/fast_scnn_cityscape.tar) | 无 | 7.21g | 0.6964 |
| Fast-SCNN/bn | Cityscapes |[fast_scnn_cityscapes-uniform-51.tar](https://paddleseg.bj.bcebos.com/models/fast_scnn_cityscape-uniform-51.tar) | uniform | 3.54g | 0.6990 |
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
# GPU memory garbage collection optimization flags
os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
import sys
LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
SEG_PATH = os.path.join(LOCAL_PATH, "../../", "pdseg")
sys.path.append(SEG_PATH)
import time
import argparse
import functools
import pprint
import cv2
import numpy as np
import paddle
import paddle.fluid as fluid
from utils.config import cfg
from utils.timer import Timer, calculate_eta
from models.model_builder import build_model
from models.model_builder import ModelPhase
from reader import SegDataset
from metrics import ConfusionMatrix
from paddleslim.prune.io import *
def parse_args():
parser = argparse.ArgumentParser(description='PaddleSeg model evalution')
parser.add_argument(
'--cfg',
dest='cfg_file',
help='Config file for training (and optionally testing)',
default=None,
type=str)
parser.add_argument(
'--use_gpu',
dest='use_gpu',
help='Use gpu or cpu',
action='store_true',
default=False)
parser.add_argument(
'--use_mpio',
dest='use_mpio',
help='Use multiprocess IO or not',
action='store_true',
default=False)
parser.add_argument(
'opts',
help='See utils/config.py for all options',
default=None,
nargs=argparse.REMAINDER)
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def evaluate(cfg, ckpt_dir=None, use_gpu=False, use_mpio=False, **kwargs):
np.set_printoptions(precision=5, suppress=True)
startup_prog = fluid.Program()
test_prog = fluid.Program()
dataset = SegDataset(
file_list=cfg.DATASET.VAL_FILE_LIST,
mode=ModelPhase.EVAL,
data_dir=cfg.DATASET.DATA_DIR)
def data_generator():
#TODO: check is batch reader compatitable with Windows
if use_mpio:
data_gen = dataset.multiprocess_generator(
num_processes=cfg.DATALOADER.NUM_WORKERS,
max_queue_size=cfg.DATALOADER.BUF_SIZE)
else:
data_gen = dataset.generator()
for b in data_gen:
yield b[0], b[1], b[2]
py_reader, avg_loss, pred, grts, masks = build_model(
test_prog, startup_prog, phase=ModelPhase.EVAL)
py_reader.decorate_sample_generator(
data_generator, drop_last=False, batch_size=cfg.BATCH_SIZE)
# Get device environment
places = fluid.cuda_places() if use_gpu else fluid.cpu_places()
place = places[0]
dev_count = len(places)
print("#Device count: {}".format(dev_count))
exe = fluid.Executor(place)
exe.run(startup_prog)
test_prog = test_prog.clone(for_test=True)
ckpt_dir = cfg.TEST.TEST_MODEL if not ckpt_dir else ckpt_dir
if not os.path.exists(ckpt_dir):
raise ValueError('The TEST.TEST_MODEL {} is not found'.format(ckpt_dir))
if ckpt_dir is not None:
print('load test model:', ckpt_dir)
load_model(exe, test_prog, ckpt_dir)
# Use streaming confusion matrix to calculate mean_iou
np.set_printoptions(
precision=4, suppress=True, linewidth=160, floatmode="fixed")
conf_mat = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
fetch_list = [avg_loss.name, pred.name, grts.name, masks.name]
num_images = 0
step = 0
all_step = cfg.DATASET.TEST_TOTAL_IMAGES // cfg.BATCH_SIZE + 1
timer = Timer()
timer.start()
py_reader.start()
while True:
try:
step += 1
loss, pred, grts, masks = exe.run(
test_prog, fetch_list=fetch_list, return_numpy=True)
loss = np.mean(np.array(loss))
num_images += pred.shape[0]
conf_mat.calculate(pred, grts, masks)
_, iou = conf_mat.mean_iou()
_, acc = conf_mat.accuracy()
speed = 1.0 / timer.elapsed_time()
print(
"[EVAL]step={} loss={:.5f} acc={:.4f} IoU={:.4f} step/sec={:.2f} | ETA {}"
.format(step, loss, acc, iou, speed,
calculate_eta(all_step - step, speed)))
timer.restart()
sys.stdout.flush()
except fluid.core.EOFException:
break
category_iou, avg_iou = conf_mat.mean_iou()
category_acc, avg_acc = conf_mat.accuracy()
print("[EVAL]#image={} acc={:.4f} IoU={:.4f}".format(
num_images, avg_acc, avg_iou))
print("[EVAL]Category IoU:", category_iou)
print("[EVAL]Category Acc:", category_acc)
print("[EVAL]Kappa:{:.4f}".format(conf_mat.kappa()))
return category_iou, avg_iou, category_acc, avg_acc
def main():
args = parse_args()
if args.cfg_file is not None:
cfg.update_from_file(args.cfg_file)
if args.opts:
cfg.update_from_list(args.opts)
cfg.check_and_infer()
print(pprint.pformat(cfg))
evaluate(cfg, **args.__dict__)
if __name__ == '__main__':
main()
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
# GPU memory garbage collection optimization flags
os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
import sys
LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
SEG_PATH = os.path.join(LOCAL_PATH, "../../", "pdseg")
sys.path.append(SEG_PATH)
import argparse
import pprint
import shutil
import functools
import paddle
import numpy as np
import paddle.fluid as fluid
from utils.config import cfg
from utils.timer import Timer, calculate_eta
from metrics import ConfusionMatrix
from reader import SegDataset
from models.model_builder import build_model
from models.model_builder import ModelPhase
from models.model_builder import parse_shape_from_file
from eval_prune import evaluate
from vis import visualize
from utils import dist_utils
from paddleslim.prune import Pruner
from paddleslim.prune.io import *
from paddleslim.analysis import flops
def parse_args():
parser = argparse.ArgumentParser(description='PaddleSeg training')
parser.add_argument(
'--cfg',
dest='cfg_file',
help='Config file for training (and optionally testing)',
default=None,
type=str)
parser.add_argument(
'--use_gpu',
dest='use_gpu',
help='Use gpu or cpu',
action='store_true',
default=False)
parser.add_argument(
'--use_mpio',
dest='use_mpio',
help='Use multiprocess I/O or not',
action='store_true',
default=False)
parser.add_argument(
'--log_steps',
dest='log_steps',
help='Display logging information at every log_steps',
default=10,
type=int)
parser.add_argument(
'--debug',
dest='debug',
help='debug mode, display detail information of training',
action='store_true')
parser.add_argument(
'--use_tb',
dest='use_tb',
help='whether to record the data during training to Tensorboard',
action='store_true')
parser.add_argument(
'--tb_log_dir',
dest='tb_log_dir',
help='Tensorboard logging directory',
default=None,
type=str)
parser.add_argument(
'--do_eval',
dest='do_eval',
help='Evaluation models result on every new checkpoint',
action='store_true')
parser.add_argument(
'opts',
help='See utils/config.py for all options',
default=None,
nargs=argparse.REMAINDER)
return parser.parse_args()
def save_vars(executor, dirname, program=None, vars=None):
"""
Temporary resolution for Win save variables compatability.
Will fix in PaddlePaddle v1.5.2
"""
save_program = fluid.Program()
save_block = save_program.global_block()
for each_var in vars:
# NOTE: don't save the variable which type is RAW
if each_var.type == fluid.core.VarDesc.VarType.RAW:
continue
new_var = save_block.create_var(
name=each_var.name,
shape=each_var.shape,
dtype=each_var.dtype,
type=each_var.type,
lod_level=each_var.lod_level,
persistable=True)
file_path = os.path.join(dirname, new_var.name)
file_path = os.path.normpath(file_path)
save_block.append_op(
type='save',
inputs={'X': [new_var]},
outputs={},
attrs={'file_path': file_path})
executor.run(save_program)
def save_prune_checkpoint(exe, program, ckpt_name):
"""
Save checkpoint for evaluation or resume training
"""
ckpt_dir = os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, str(ckpt_name))
print("Save model checkpoint to {}".format(ckpt_dir))
if not os.path.isdir(ckpt_dir):
os.makedirs(ckpt_dir)
save_model(exe, program, ckpt_dir)
return ckpt_dir
def load_checkpoint(exe, program):
"""
Load checkpoiont from pretrained model directory for resume training
"""
print('Resume model training from:', cfg.TRAIN.RESUME_MODEL_DIR)
if not os.path.exists(cfg.TRAIN.RESUME_MODEL_DIR):
raise ValueError("TRAIN.PRETRAIN_MODEL {} not exist!".format(
cfg.TRAIN.RESUME_MODEL_DIR))
fluid.io.load_persistables(
exe, cfg.TRAIN.RESUME_MODEL_DIR, main_program=program)
model_path = cfg.TRAIN.RESUME_MODEL_DIR
# Check is path ended by path spearator
if model_path[-1] == os.sep:
model_path = model_path[0:-1]
epoch_name = os.path.basename(model_path)
# If resume model is final model
if epoch_name == 'final':
begin_epoch = cfg.SOLVER.NUM_EPOCHS
# If resume model path is end of digit, restore epoch status
elif epoch_name.isdigit():
epoch = int(epoch_name)
begin_epoch = epoch + 1
else:
raise ValueError("Resume model path is not valid!")
print("Model checkpoint loaded successfully!")
return begin_epoch
def print_info(*msg):
if cfg.TRAINER_ID == 0:
print(*msg)
def train(cfg):
startup_prog = fluid.Program()
train_prog = fluid.Program()
drop_last = True
dataset = SegDataset(
file_list=cfg.DATASET.TRAIN_FILE_LIST,
mode=ModelPhase.TRAIN,
shuffle=True,
data_dir=cfg.DATASET.DATA_DIR)
def data_generator():
if args.use_mpio:
data_gen = dataset.multiprocess_generator(
num_processes=cfg.DATALOADER.NUM_WORKERS,
max_queue_size=cfg.DATALOADER.BUF_SIZE)
else:
data_gen = dataset.generator()
batch_data = []
for b in data_gen:
batch_data.append(b)
if len(batch_data) == (cfg.BATCH_SIZE // cfg.NUM_TRAINERS):
for item in batch_data:
yield item[0], item[1], item[2]
batch_data = []
# If use sync batch norm strategy, drop last batch if number of samples
# in batch_data is less then cfg.BATCH_SIZE to avoid NCCL hang issues
if not cfg.TRAIN.SYNC_BATCH_NORM:
for item in batch_data:
yield item[0], item[1], item[2]
# Get device environment
# places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
# place = places[0]
gpu_id = int(os.environ.get('FLAGS_selected_gpus', 0))
place = fluid.CUDAPlace(gpu_id) if args.use_gpu else fluid.CPUPlace()
places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
# Get number of GPU
dev_count = cfg.NUM_TRAINERS if cfg.NUM_TRAINERS > 1 else len(places)
print_info("#Device count: {}".format(dev_count))
# Make sure BATCH_SIZE can divided by GPU cards
assert cfg.BATCH_SIZE % dev_count == 0, (
'BATCH_SIZE:{} not divisble by number of GPUs:{}'.format(
cfg.BATCH_SIZE, dev_count))
# If use multi-gpu training mode, batch data will allocated to each GPU evenly
batch_size_per_dev = cfg.BATCH_SIZE // dev_count
print_info("batch_size_per_dev: {}".format(batch_size_per_dev))
py_reader, avg_loss, lr, pred, grts, masks = build_model(
train_prog, startup_prog, phase=ModelPhase.TRAIN)
py_reader.decorate_sample_generator(
data_generator, batch_size=batch_size_per_dev, drop_last=drop_last)
exe = fluid.Executor(place)
exe.run(startup_prog)
exec_strategy = fluid.ExecutionStrategy()
# Clear temporary variables every 100 iteration
if args.use_gpu:
exec_strategy.num_threads = fluid.core.get_cuda_device_count()
exec_strategy.num_iteration_per_drop_scope = 100
build_strategy = fluid.BuildStrategy()
if cfg.NUM_TRAINERS > 1 and args.use_gpu:
dist_utils.prepare_for_multi_process(exe, build_strategy, train_prog)
exec_strategy.num_threads = 1
if cfg.TRAIN.SYNC_BATCH_NORM and args.use_gpu:
if dev_count > 1:
# Apply sync batch norm strategy
print_info("Sync BatchNorm strategy is effective.")
build_strategy.sync_batch_norm = True
else:
print_info("Sync BatchNorm strategy will not be effective if GPU device"
" count <= 1")
pruned_params = cfg.SLIM.PRUNE_PARAMS.strip().split(',')
pruned_ratios = cfg.SLIM.PRUNE_RATIOS
if isinstance(pruned_ratios, float):
pruned_ratios = [pruned_ratios] * len(pruned_params)
elif isinstance(pruned_ratios, (list, tuple)):
pruned_ratios = list(pruned_ratios)
else:
raise ValueError('expect SLIM.PRUNE_RATIOS type is float, list, tuple, '
'but received {}'.format(type(pruned_ratios)))
# Resume training
begin_epoch = cfg.SOLVER.BEGIN_EPOCH
if cfg.TRAIN.RESUME_MODEL_DIR:
begin_epoch = load_checkpoint(exe, train_prog)
# Load pretrained model
elif os.path.exists(cfg.TRAIN.PRETRAINED_MODEL_DIR):
print_info('Pretrained model dir: ', cfg.TRAIN.PRETRAINED_MODEL_DIR)
load_vars = []
load_fail_vars = []
def var_shape_matched(var, shape):
"""
Check whehter persitable variable shape is match with current network
"""
var_exist = os.path.exists(
os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name))
if var_exist:
var_shape = parse_shape_from_file(
os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name))
return var_shape == shape
return False
for x in train_prog.list_vars():
if isinstance(x, fluid.framework.Parameter):
shape = tuple(fluid.global_scope().find_var(
x.name).get_tensor().shape())
if var_shape_matched(x, shape):
load_vars.append(x)
else:
load_fail_vars.append(x)
fluid.io.load_vars(
exe, dirname=cfg.TRAIN.PRETRAINED_MODEL_DIR, vars=load_vars)
for var in load_vars:
print_info("Parameter[{}] loaded sucessfully!".format(var.name))
for var in load_fail_vars:
print_info("Parameter[{}] don't exist or shape does not match current network, skip"
" to load it.".format(var.name))
print_info("{}/{} pretrained parameters loaded successfully!".format(
len(load_vars),
len(load_vars) + len(load_fail_vars)))
else:
print_info('Pretrained model dir {} not exists, training from scratch...'.
format(cfg.TRAIN.PRETRAINED_MODEL_DIR))
fetch_list = [avg_loss.name, lr.name]
if args.debug:
# Fetch more variable info and use streaming confusion matrix to
# calculate IoU results if in debug mode
np.set_printoptions(
precision=4, suppress=True, linewidth=160, floatmode="fixed")
fetch_list.extend([pred.name, grts.name, masks.name])
cm = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
if args.use_tb:
if not args.tb_log_dir:
print_info("Please specify the log directory by --tb_log_dir.")
exit(1)
from tb_paddle import SummaryWriter
log_writer = SummaryWriter(args.tb_log_dir)
pruner = Pruner()
train_prog = pruner.prune(
train_prog,
fluid.global_scope(),
params=pruned_params,
ratios=pruned_ratios,
place=place,
only_graph=False)[0]
compiled_train_prog = fluid.CompiledProgram(train_prog).with_data_parallel(
loss_name=avg_loss.name,
exec_strategy=exec_strategy,
build_strategy=build_strategy)
global_step = 0
all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE
if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True:
all_step += 1
all_step *= (cfg.SOLVER.NUM_EPOCHS - begin_epoch + 1)
avg_loss = 0.0
timer = Timer()
timer.start()
if begin_epoch > cfg.SOLVER.NUM_EPOCHS:
raise ValueError(
("begin epoch[{}] is larger than cfg.SOLVER.NUM_EPOCHS[{}]").format(
begin_epoch, cfg.SOLVER.NUM_EPOCHS))
if args.use_mpio:
print_info("Use multiprocess reader")
else:
print_info("Use multi-thread reader")
for epoch in range(begin_epoch, cfg.SOLVER.NUM_EPOCHS + 1):
py_reader.start()
while True:
try:
if args.debug:
# Print category IoU and accuracy to check whether the
# traning process is corresponed to expectation
loss, lr, pred, grts, masks = exe.run(
program=compiled_train_prog,
fetch_list=fetch_list,
return_numpy=True)
cm.calculate(pred, grts, masks)
avg_loss += np.mean(np.array(loss))
global_step += 1
if global_step % args.log_steps == 0:
speed = args.log_steps / timer.elapsed_time()
avg_loss /= args.log_steps
category_acc, mean_acc = cm.accuracy()
category_iou, mean_iou = cm.mean_iou()
print_info((
"epoch={} step={} lr={:.5f} loss={:.4f} acc={:.5f} mIoU={:.5f} step/sec={:.3f} | ETA {}"
).format(epoch, global_step, lr[0], avg_loss, mean_acc,
mean_iou, speed,
calculate_eta(all_step - global_step, speed)))
print_info("Category IoU: ", category_iou)
print_info("Category Acc: ", category_acc)
if args.use_tb:
log_writer.add_scalar('Train/mean_iou', mean_iou,
global_step)
log_writer.add_scalar('Train/mean_acc', mean_acc,
global_step)
log_writer.add_scalar('Train/loss', avg_loss,
global_step)
log_writer.add_scalar('Train/lr', lr[0],
global_step)
log_writer.add_scalar('Train/step/sec', speed,
global_step)
sys.stdout.flush()
avg_loss = 0.0
cm.zero_matrix()
timer.restart()
else:
# If not in debug mode, avoid unnessary log and calculate
loss, lr = exe.run(
program=compiled_train_prog,
fetch_list=fetch_list,
return_numpy=True)
avg_loss += np.mean(np.array(loss))
global_step += 1
if global_step % args.log_steps == 0 and cfg.TRAINER_ID == 0:
avg_loss /= args.log_steps
speed = args.log_steps / timer.elapsed_time()
print((
"epoch={} step={} lr={:.5f} loss={:.4f} step/sec={:.3f} | ETA {}"
).format(epoch, global_step, lr[0], avg_loss, speed,
calculate_eta(all_step - global_step, speed)))
if args.use_tb:
log_writer.add_scalar('Train/loss', avg_loss,
global_step)
log_writer.add_scalar('Train/lr', lr[0],
global_step)
log_writer.add_scalar('Train/speed', speed,
global_step)
sys.stdout.flush()
avg_loss = 0.0
timer.restart()
except fluid.core.EOFException:
py_reader.reset()
break
except Exception as e:
print(e)
if epoch % cfg.TRAIN.SNAPSHOT_EPOCH == 0 and cfg.TRAINER_ID == 0:
ckpt_dir = save_prune_checkpoint(exe, train_prog, epoch)
if args.do_eval:
print("Evaluation start")
_, mean_iou, _, mean_acc = evaluate(
cfg=cfg,
ckpt_dir=ckpt_dir,
use_gpu=args.use_gpu,
use_mpio=args.use_mpio)
if args.use_tb:
log_writer.add_scalar('Evaluate/mean_iou', mean_iou,
global_step)
log_writer.add_scalar('Evaluate/mean_acc', mean_acc,
global_step)
# Use Tensorboard to visualize results
if args.use_tb and cfg.DATASET.VIS_FILE_LIST is not None:
visualize(
cfg=cfg,
use_gpu=args.use_gpu,
vis_file_list=cfg.DATASET.VIS_FILE_LIST,
vis_dir="visual",
ckpt_dir=ckpt_dir,
log_writer=log_writer)
# save final model
if cfg.TRAINER_ID == 0:
save_prune_checkpoint(exe, train_prog, 'final')
def main(args):
if args.cfg_file is not None:
cfg.update_from_file(args.cfg_file)
if args.opts is not None:
cfg.update_from_list(args.opts)
cfg.TRAINER_ID = int(os.getenv("PADDLE_TRAINER_ID", 0))
cfg.NUM_TRAINERS = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
cfg.check_and_infer()
print_info(pprint.pformat(cfg))
train(cfg)
if __name__ == '__main__':
args = parse_args()
if fluid.core.is_compiled_with_cuda() != True and args.use_gpu == True:
print(
"You can not set use_gpu = True in the model because you are using paddlepaddle-cpu."
)
print(
"Please: 1. Install paddlepaddle-gpu to run your models on GPU or 2. Set use_gpu=False to run models on CPU."
)
sys.exit(1)
main(args)
>运行该示例前请安装Paddle1.6或更高版本和PaddleSlim
# 分割模型量化压缩示例
## 概述
该示例使用PaddleSlim提供的[量化压缩API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/)对分割模型进行压缩。
在阅读该示例前,建议您先了解以下内容:
- [分割模型的常规训练方法](../../docs/usage.md)
- [PaddleSlim使用文档](https://paddlepaddle.github.io/PaddleSlim/)
## 安装PaddleSlim
可按照[PaddleSlim使用文档](https://paddlepaddle.github.io/PaddleSlim/)中的步骤安装PaddleSlim。
## 训练
### 数据集
请按照分割库的教程下载数据集并放到对应位置。
### 下载训练好的分割模型
在分割库根目录下运行以下命令:
```bash
mkdir pretrain
cd pretrain
wget https://paddleseg.bj.bcebos.com/models/mobilenet_cityscapes.tgz
tar xf mobilenet_cityscapes.tgz
```
### 定义量化配置
config = {
'weight_quantize_type': 'channel_wise_abs_max',
'activation_quantize_type': 'moving_average_abs_max',
'quantize_op_types': ['depthwise_conv2d', 'mul', 'conv2d'],
'not_quant_pattern': ['last_conv']
}
如何配置以及含义请参考[PaddleSlim 量化API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/)
### 插入量化反量化OP
使用[PaddleSlim quant_aware API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/#quant_aware)在Program中插入量化和反量化OP。
```
compiled_train_prog = quant_aware(train_prog, place, config, for_test=False)
```
### 关闭一些训练策略
因为量化要对Program做修改,所以一些会修改Program的训练策略需要关闭。``sync_batch_norm`` 和量化多卡训练同时使用时会出错, 需要将其关闭。
```
build_strategy.fuse_all_reduce_ops = False
build_strategy.sync_batch_norm = False
```
### 开始训练
step1: 设置gpu卡
```
export CUDA_VISIBLE_DEVICES=0
```
step2: 将``pdseg``文件夹加到系统路径
分割库根目录下运行以下命令
```
export PYTHONPATH=$PYTHONPATH:./pdseg
```
step2: 开始训练
在分割库根目录下运行以下命令进行训练。
```
python -u ./slim/quantization/train_quant.py --log_steps 10 --not_quant_pattern last_conv --cfg configs/deeplabv3p_mobilenetv2_cityscapes.yaml --use_gpu --use_mpio --do_eval \
TRAIN.PRETRAINED_MODEL_DIR "./pretrain/mobilenet_cityscapes/" \
TRAIN.MODEL_SAVE_DIR "./snapshots/mobilenetv2_quant" \
MODEL.DEEPLAB.ENCODER_WITH_ASPP False \
MODEL.DEEPLAB.ENABLE_DECODER False \
TRAIN.SYNC_BATCH_NORM False \
SOLVER.LR 0.0001 \
TRAIN.SNAPSHOT_EPOCH 1 \
SOLVER.NUM_EPOCHS 30 \
BATCH_SIZE 16 \
```
### 训练时的模型结构
[PaddleSlim 量化API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/)文档中介绍了``paddleslim.quant.quant_aware````paddleslim.quant.convert``两个接口。
``paddleslim.quant.quant_aware`` 作用是在网络中的conv2d、depthwise_conv2d、mul等算子的各个输入前插入连续的量化op和反量化op,并改变相应反向算子的某些输入。示例图如下:
<p align="center">
<img src="./images/TransformPass.png" height=400 width=520 hspace='10'/> <br />
<strong>图1:应用 paddleslim.quant.quant_aware 后的结果</strong>
</p>
### 边训练边测试
在脚本中边训练边测试得到的测试精度是基于图1中的网络结构进行的。
## 评估
### 最终评估模型
``paddleslim.quant.convert`` 主要用于改变Program中量化op和反量化op的顺序,即将类似图1中的量化op和反量化op顺序改变为图2中的布局。除此之外,``paddleslim.quant.convert`` 还会将`conv2d``depthwise_conv2d``mul`等算子参数变为量化后的int8_t范围内的值(但数据类型仍为float32),示例如图2:
<p align="center">
<img src="./images/FreezePass.png" height=400 width=420 hspace='10'/> <br />
<strong>图2:paddleslim.quant.convert 后的结果</strong>
</p>
所以在调用 ``paddleslim.quant.convert`` 之后,才得到最终的量化模型。此模型可使用PaddleLite进行加载预测,可参见教程[Paddle-Lite如何加载运行量化模型](https://github.com/PaddlePaddle/Paddle-Lite/wiki/model_quantization)
### 评估脚本
使用脚本[slim/quantization/eval_quant.py](./eval_quant.py)进行评估。
- 定义配置。使用和训练脚本中一样的量化配置,以得到和量化训练时同样的模型。
- 使用 ``paddleslim.quant.quant_aware`` 插入量化和反量化op。
- 使用 ``paddleslim.quant.convert`` 改变op顺序,得到最终量化模型进行评估。
评估命令:
分割库根目录下运行
```
python -u ./slim/quantization/eval_quant.py --cfg configs/deeplabv3p_mobilenetv2_cityscapes.yaml --use_gpu --not_quant_pattern last_conv --use_mpio --convert \
TEST.TEST_MODEL "./snapshots/mobilenetv2_quant/best_model" \
MODEL.DEEPLAB.ENCODER_WITH_ASPP False \
MODEL.DEEPLAB.ENABLE_DECODER False \
TRAIN.SYNC_BATCH_NORM False \
BATCH_SIZE 16 \
```
## 量化结果
## FAQ
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
import time
import argparse
import functools
import pprint
import cv2
import numpy as np
import paddle
import paddle.fluid as fluid
from utils.config import cfg
from utils.timer import Timer, calculate_eta
from models.model_builder import build_model
from models.model_builder import ModelPhase
from reader import SegDataset
from metrics import ConfusionMatrix
from paddleslim.quant import quant_aware, convert
def parse_args():
parser = argparse.ArgumentParser(description='PaddleSeg model evalution')
parser.add_argument(
'--cfg',
dest='cfg_file',
help='Config file for training (and optionally testing)',
default=None,
type=str)
parser.add_argument(
'--use_gpu',
dest='use_gpu',
help='Use gpu or cpu',
action='store_true',
default=False)
parser.add_argument(
'--use_mpio',
dest='use_mpio',
help='Use multiprocess IO or not',
action='store_true',
default=False)
parser.add_argument(
'opts',
help='See utils/config.py for all options',
default=None,
nargs=argparse.REMAINDER)
parser.add_argument(
'--convert',
dest='convert',
help='Convert or not',
action='store_true',
default=False)
parser.add_argument(
"--not_quant_pattern",
nargs='+',
type=str,
help=
"Layers which name_scope contains string in not_quant_pattern will not be quantized"
)
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def evaluate(cfg, ckpt_dir=None, use_gpu=False, use_mpio=False, **kwargs):
np.set_printoptions(precision=5, suppress=True)
startup_prog = fluid.Program()
test_prog = fluid.Program()
dataset = SegDataset(
file_list=cfg.DATASET.VAL_FILE_LIST,
mode=ModelPhase.EVAL,
data_dir=cfg.DATASET.DATA_DIR)
def data_generator():
#TODO: check is batch reader compatitable with Windows
if use_mpio:
data_gen = dataset.multiprocess_generator(
num_processes=cfg.DATALOADER.NUM_WORKERS,
max_queue_size=cfg.DATALOADER.BUF_SIZE)
else:
data_gen = dataset.generator()
for b in data_gen:
yield b[0], b[1], b[2]
py_reader, avg_loss, pred, grts, masks = build_model(
test_prog, startup_prog, phase=ModelPhase.EVAL)
py_reader.decorate_sample_generator(
data_generator, drop_last=False, batch_size=cfg.BATCH_SIZE)
# Get device environment
places = fluid.cuda_places() if use_gpu else fluid.cpu_places()
place = places[0]
dev_count = len(places)
print("#Device count: {}".format(dev_count))
exe = fluid.Executor(place)
exe.run(startup_prog)
test_prog = test_prog.clone(for_test=True)
not_quant_pattern_list = []
if kwargs['not_quant_pattern'] is not None:
not_quant_pattern_list = kwargs['not_quant_pattern']
config = {
'weight_quantize_type': 'channel_wise_abs_max',
'activation_quantize_type': 'moving_average_abs_max',
'quantize_op_types': ['depthwise_conv2d', 'mul', 'conv2d'],
'not_quant_pattern': not_quant_pattern_list
}
test_prog = quant_aware(test_prog, place, config, for_test=True)
ckpt_dir = cfg.TEST.TEST_MODEL if not ckpt_dir else ckpt_dir
if not os.path.exists(ckpt_dir):
raise ValueError('The TEST.TEST_MODEL {} is not found'.format(ckpt_dir))
if ckpt_dir is not None:
print('load test model:', ckpt_dir)
fluid.io.load_persistables(exe, ckpt_dir, main_program=test_prog)
if kwargs['convert']:
test_prog = convert(test_prog, place, config)
# Use streaming confusion matrix to calculate mean_iou
np.set_printoptions(
precision=4, suppress=True, linewidth=160, floatmode="fixed")
conf_mat = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
fetch_list = [avg_loss.name, pred.name, grts.name, masks.name]
num_images = 0
step = 0
all_step = cfg.DATASET.TEST_TOTAL_IMAGES // cfg.BATCH_SIZE + 1
timer = Timer()
timer.start()
py_reader.start()
while True:
try:
step += 1
loss, pred, grts, masks = exe.run(
test_prog, fetch_list=fetch_list, return_numpy=True)
loss = np.mean(np.array(loss))
num_images += pred.shape[0]
conf_mat.calculate(pred, grts, masks)
_, iou = conf_mat.mean_iou()
_, acc = conf_mat.accuracy()
speed = 1.0 / timer.elapsed_time()
print(
"[EVAL]step={} loss={:.5f} acc={:.4f} IoU={:.4f} step/sec={:.2f} | ETA {}"
.format(step, loss, acc, iou, speed,
calculate_eta(all_step - step, speed)))
timer.restart()
sys.stdout.flush()
except fluid.core.EOFException:
break
category_iou, avg_iou = conf_mat.mean_iou()
category_acc, avg_acc = conf_mat.accuracy()
print("[EVAL]#image={} acc={:.4f} IoU={:.4f}".format(
num_images, avg_acc, avg_iou))
print("[EVAL]Category IoU:", category_iou)
print("[EVAL]Category Acc:", category_acc)
print("[EVAL]Kappa:{:.4f}".format(conf_mat.kappa()))
return category_iou, avg_iou, category_acc, avg_acc
def main():
args = parse_args()
if args.cfg_file is not None:
cfg.update_from_file(args.cfg_file)
if args.opts:
cfg.update_from_list(args.opts)
cfg.check_and_infer()
print(pprint.pformat(cfg))
evaluate(cfg, **args.__dict__)
if __name__ == '__main__':
main()
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
import argparse
import pprint
import random
import shutil
import functools
import paddle
import numpy as np
import paddle.fluid as fluid
from utils.config import cfg
from utils.timer import Timer, calculate_eta
from metrics import ConfusionMatrix
from reader import SegDataset
from models.model_builder import build_model
from models.model_builder import ModelPhase
from models.model_builder import parse_shape_from_file
from eval_quant import evaluate
from vis import visualize
from utils import dist_utils
from train import save_vars, save_checkpoint, load_checkpoint, update_best_model, print_info
from paddleslim.quant import quant_aware
def parse_args():
parser = argparse.ArgumentParser(description='PaddleSeg training')
parser.add_argument(
'--cfg',
dest='cfg_file',
help='Config file for training (and optionally testing)',
default=None,
type=str)
parser.add_argument(
'--use_gpu',
dest='use_gpu',
help='Use gpu or cpu',
action='store_true',
default=False)
parser.add_argument(
'--use_mpio',
dest='use_mpio',
help='Use multiprocess I/O or not',
action='store_true',
default=False)
parser.add_argument(
'--log_steps',
dest='log_steps',
help='Display logging information at every log_steps',
default=10,
type=int)
parser.add_argument(
'--debug',
dest='debug',
help='debug mode, display detail information of training',
action='store_true')
parser.add_argument(
'--do_eval',
dest='do_eval',
help='Evaluation models result on every new checkpoint',
action='store_true')
parser.add_argument(
'opts',
help='See utils/config.py for all options',
default=None,
nargs=argparse.REMAINDER)
parser.add_argument(
'--enable_ce',
dest='enable_ce',
help='If set True, enable continuous evaluation job.'
'This flag is only used for internal test.',
action='store_true')
parser.add_argument(
"--not_quant_pattern",
nargs='+',
type=str,
help=
"Layers which name_scope contains string in not_quant_pattern will not be quantized"
)
return parser.parse_args()
def train_quant(cfg):
startup_prog = fluid.Program()
train_prog = fluid.Program()
if args.enable_ce:
startup_prog.random_seed = 1000
train_prog.random_seed = 1000
drop_last = True
dataset = SegDataset(
file_list=cfg.DATASET.TRAIN_FILE_LIST,
mode=ModelPhase.TRAIN,
shuffle=True,
data_dir=cfg.DATASET.DATA_DIR)
def data_generator():
if args.use_mpio:
data_gen = dataset.multiprocess_generator(
num_processes=cfg.DATALOADER.NUM_WORKERS,
max_queue_size=cfg.DATALOADER.BUF_SIZE)
else:
data_gen = dataset.generator()
batch_data = []
for b in data_gen:
batch_data.append(b)
if len(batch_data) == (cfg.BATCH_SIZE // cfg.NUM_TRAINERS):
for item in batch_data:
yield item[0], item[1], item[2]
batch_data = []
# If use sync batch norm strategy, drop last batch if number of samples
# in batch_data is less then cfg.BATCH_SIZE to avoid NCCL hang issues
if not cfg.TRAIN.SYNC_BATCH_NORM:
for item in batch_data:
yield item[0], item[1], item[2]
# Get device environment
# places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
# place = places[0]
gpu_id = int(os.environ.get('FLAGS_selected_gpus', 0))
place = fluid.CUDAPlace(gpu_id) if args.use_gpu else fluid.CPUPlace()
places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
# Get number of GPU
dev_count = cfg.NUM_TRAINERS if cfg.NUM_TRAINERS > 1 else len(places)
print_info("#Device count: {}".format(dev_count))
# Make sure BATCH_SIZE can divided by GPU cards
assert cfg.BATCH_SIZE % dev_count == 0, (
'BATCH_SIZE:{} not divisble by number of GPUs:{}'.format(
cfg.BATCH_SIZE, dev_count))
# If use multi-gpu training mode, batch data will allocated to each GPU evenly
batch_size_per_dev = cfg.BATCH_SIZE // dev_count
print_info("batch_size_per_dev: {}".format(batch_size_per_dev))
py_reader, avg_loss, lr, pred, grts, masks = build_model(
train_prog, startup_prog, phase=ModelPhase.TRAIN)
py_reader.decorate_sample_generator(
data_generator, batch_size=batch_size_per_dev, drop_last=drop_last)
exe = fluid.Executor(place)
exe.run(startup_prog)
exec_strategy = fluid.ExecutionStrategy()
# Clear temporary variables every 100 iteration
if args.use_gpu:
exec_strategy.num_threads = fluid.core.get_cuda_device_count()
exec_strategy.num_iteration_per_drop_scope = 100
build_strategy = fluid.BuildStrategy()
if cfg.NUM_TRAINERS > 1 and args.use_gpu:
dist_utils.prepare_for_multi_process(exe, build_strategy, train_prog)
exec_strategy.num_threads = 1
# Resume training
begin_epoch = cfg.SOLVER.BEGIN_EPOCH
if cfg.TRAIN.RESUME_MODEL_DIR:
begin_epoch = load_checkpoint(exe, train_prog)
# Load pretrained model
elif os.path.exists(cfg.TRAIN.PRETRAINED_MODEL_DIR):
print_info('Pretrained model dir: ', cfg.TRAIN.PRETRAINED_MODEL_DIR)
load_vars = []
load_fail_vars = []
def var_shape_matched(var, shape):
"""
Check whehter persitable variable shape is match with current network
"""
var_exist = os.path.exists(
os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name))
if var_exist:
var_shape = parse_shape_from_file(
os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name))
return var_shape == shape
return False
for x in train_prog.list_vars():
if isinstance(x, fluid.framework.Parameter):
shape = tuple(fluid.global_scope().find_var(
x.name).get_tensor().shape())
if var_shape_matched(x, shape):
load_vars.append(x)
else:
load_fail_vars.append(x)
fluid.io.load_vars(
exe, dirname=cfg.TRAIN.PRETRAINED_MODEL_DIR, vars=load_vars)
for var in load_vars:
print_info("Parameter[{}] loaded sucessfully!".format(var.name))
for var in load_fail_vars:
print_info(
"Parameter[{}] don't exist or shape does not match current network, skip"
" to load it.".format(var.name))
print_info("{}/{} pretrained parameters loaded successfully!".format(
len(load_vars),
len(load_vars) + len(load_fail_vars)))
else:
print_info(
'Pretrained model dir {} not exists, training from scratch...'.
format(cfg.TRAIN.PRETRAINED_MODEL_DIR))
fetch_list = [avg_loss.name, lr.name]
if args.debug:
# Fetch more variable info and use streaming confusion matrix to
# calculate IoU results if in debug mode
np.set_printoptions(
precision=4, suppress=True, linewidth=160, floatmode="fixed")
fetch_list.extend([pred.name, grts.name, masks.name])
cm = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
not_quant_pattern = []
if args.not_quant_pattern:
not_quant_pattern = args.not_quant_pattern
config = {
'weight_quantize_type': 'channel_wise_abs_max',
'activation_quantize_type': 'moving_average_abs_max',
'quantize_op_types': ['depthwise_conv2d', 'mul', 'conv2d'],
'not_quant_pattern': not_quant_pattern
}
compiled_train_prog = quant_aware(train_prog, place, config, for_test=False)
eval_prog = quant_aware(train_prog, place, config, for_test=True)
build_strategy.fuse_all_reduce_ops = False
build_strategy.sync_batch_norm = False
compiled_train_prog = compiled_train_prog.with_data_parallel(
loss_name=avg_loss.name,
exec_strategy=exec_strategy,
build_strategy=build_strategy)
# trainer_id = int(os.getenv("PADDLE_TRAINER_ID", 0))
# num_trainers = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
global_step = 0
all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE
if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True:
all_step += 1
all_step *= (cfg.SOLVER.NUM_EPOCHS - begin_epoch + 1)
avg_loss = 0.0
best_mIoU = 0.0
timer = Timer()
timer.start()
if begin_epoch > cfg.SOLVER.NUM_EPOCHS:
raise ValueError(
("begin epoch[{}] is larger than cfg.SOLVER.NUM_EPOCHS[{}]").format(
begin_epoch, cfg.SOLVER.NUM_EPOCHS))
if args.use_mpio:
print_info("Use multiprocess reader")
else:
print_info("Use multi-thread reader")
for epoch in range(begin_epoch, cfg.SOLVER.NUM_EPOCHS + 1):
py_reader.start()
while True:
try:
if args.debug:
# Print category IoU and accuracy to check whether the
# traning process is corresponed to expectation
loss, lr, pred, grts, masks = exe.run(
program=compiled_train_prog,
fetch_list=fetch_list,
return_numpy=True)
cm.calculate(pred, grts, masks)
avg_loss += np.mean(np.array(loss))
global_step += 1
if global_step % args.log_steps == 0:
speed = args.log_steps / timer.elapsed_time()
avg_loss /= args.log_steps
category_acc, mean_acc = cm.accuracy()
category_iou, mean_iou = cm.mean_iou()
print_info((
"epoch={} step={} lr={:.5f} loss={:.4f} acc={:.5f} mIoU={:.5f} step/sec={:.3f} | ETA {}"
).format(epoch, global_step, lr[0], avg_loss, mean_acc,
mean_iou, speed,
calculate_eta(all_step - global_step, speed)))
print_info("Category IoU: ", category_iou)
print_info("Category Acc: ", category_acc)
sys.stdout.flush()
avg_loss = 0.0
cm.zero_matrix()
timer.restart()
else:
# If not in debug mode, avoid unnessary log and calculate
loss, lr = exe.run(
program=compiled_train_prog,
fetch_list=fetch_list,
return_numpy=True)
avg_loss += np.mean(np.array(loss))
global_step += 1
if global_step % args.log_steps == 0 and cfg.TRAINER_ID == 0:
avg_loss /= args.log_steps
speed = args.log_steps / timer.elapsed_time()
print((
"epoch={} step={} lr={:.5f} loss={:.4f} step/sec={:.3f} | ETA {}"
).format(epoch, global_step, lr[0], avg_loss, speed,
calculate_eta(all_step - global_step, speed)))
sys.stdout.flush()
avg_loss = 0.0
timer.restart()
except fluid.core.EOFException:
py_reader.reset()
break
except Exception as e:
print(e)
if (epoch % cfg.TRAIN.SNAPSHOT_EPOCH == 0
or epoch == cfg.SOLVER.NUM_EPOCHS) and cfg.TRAINER_ID == 0:
ckpt_dir = save_checkpoint(exe, eval_prog, epoch)
if args.do_eval:
print("Evaluation start")
_, mean_iou, _, mean_acc = evaluate(
cfg=cfg,
ckpt_dir=ckpt_dir,
use_gpu=args.use_gpu,
use_mpio=args.use_mpio,
not_quant_pattern=args.not_quant_pattern,
convert=False)
if mean_iou > best_mIoU:
best_mIoU = mean_iou
update_best_model(ckpt_dir)
print_info("Save best model {} to {}, mIoU = {:.4f}".format(
ckpt_dir,
os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, 'best_model'),
mean_iou))
# save final model
if cfg.TRAINER_ID == 0:
save_checkpoint(exe, eval_prog, 'final')
def main(args):
if args.cfg_file is not None:
cfg.update_from_file(args.cfg_file)
if args.opts:
cfg.update_from_list(args.opts)
if args.enable_ce:
random.seed(0)
np.random.seed(0)
cfg.TRAINER_ID = int(os.getenv("PADDLE_TRAINER_ID", 0))
cfg.NUM_TRAINERS = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
cfg.check_and_infer()
print_info(pprint.pformat(cfg))
train_quant(cfg)
if __name__ == '__main__':
args = parse_args()
if fluid.core.is_compiled_with_cuda() != True and args.use_gpu == True:
print(
"You can not set use_gpu = True in the model because you are using paddlepaddle-cpu."
)
print(
"Please: 1. Install paddlepaddle-gpu to run your models on GPU or 2. Set use_gpu=False to run models on CPU."
)
sys.exit(1)
main(args)
# Fast-SCNN模型训练教程
* 本教程旨在介绍如何通过使用PaddleSeg提供的 ***`Fast_scnn_cityscapes`*** 预训练模型在自定义数据集上进行训练。
* 在阅读本教程前,请确保您已经了解过PaddleSeg的[快速入门](../README.md#快速入门)[基础功能](../README.md#基础功能)等章节,以便对PaddleSeg有一定的了解
* 本教程的所有命令都基于PaddleSeg主目录进行执行
## 一. 准备待训练数据
我们提前准备好了一份数据集,通过以下代码进行下载
```shell
python dataset/download_pet.py
```
## 二. 下载预训练模型
```shell
python pretrained_model/download_model.py fast_scnn_cityscapes
```
## 三. 准备配置
接着我们需要确定相关配置,从本教程的角度,配置分为三部分:
* 数据集
* 训练集主目录
* 训练集文件列表
* 测试集文件列表
* 评估集文件列表
* 预训练模型
* 预训练模型名称
* 预训练模型的backbone网络
* 预训练模型的Normalization类型
* 预训练模型路径
* 其他
* 学习率
* Batch大小
* ...
在三者中,预训练模型的配置尤为重要,如果模型或者BACKBONE配置错误,会导致预训练的参数没有加载,进而影响收敛速度。预训练模型相关的配置如第二步所展示。
数据集的配置和数据路径有关,在本教程中,数据存放在`dataset/mini_pet`
其他配置则根据数据集和机器环境的情况进行调节,最终我们保存一个如下内容的yaml配置文件,存放路径为**configs/fast_scnn_pet.yaml**
```yaml
# 数据集配置
DATASET:
DATA_DIR: "./dataset/mini_pet/"
NUM_CLASSES: 3
TEST_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
TRAIN_FILE_LIST: "./dataset/mini_pet/file_list/train_list.txt"
VAL_FILE_LIST: "./dataset/mini_pet/file_list/val_list.txt"
VIS_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
# 预训练模型配置
MODEL:
MODEL_NAME: "fast_scnn"
DEFAULT_NORM_TYPE: "bn"
# 其他配置
TRAIN_CROP_SIZE: (512, 512)
EVAL_CROP_SIZE: (512, 512)
AUG:
AUG_METHOD: "unpadding"
FIX_RESIZE_SIZE: (512, 512)
BATCH_SIZE: 4
TRAIN:
PRETRAINED_MODEL_DIR: "./pretrained_model/fast_scnn_cityscape/"
MODEL_SAVE_DIR: "./saved_model/fast_scnn_pet/"
SNAPSHOT_EPOCH: 10
TEST:
TEST_MODEL: "./saved_model/fast_scnn_pet/final"
SOLVER:
NUM_EPOCHS: 100
LR: 0.005
LR_POLICY: "poly"
OPTIMIZER: "sgd"
```
## 四. 配置/数据校验
在开始训练和评估之前,我们还需要对配置和数据进行一次校验,确保数据和配置是正确的。使用下述命令启动校验流程
```shell
python pdseg/check.py --cfg ./configs/fast_scnn_pet.yaml
```
## 五. 开始训练
校验通过后,使用下述命令启动训练
```shell
python pdseg/train.py --use_gpu --cfg ./configs/fast_scnn_pet.yaml
```
## 六. 进行评估
模型训练完成,使用下述命令启动评估
```shell
python pdseg/eval.py --use_gpu --cfg ./configs/fast_scnn_pet.yaml
```
## 七. 实时分割模型推理时间比较
| 模型 | eval size | inference time | mIoU on cityscape val|
|---|---|---|---|
| DeepLabv3+/MobileNetv2/bn | (1024, 2048) |16.14ms| 0.698|
| ICNet/bn |(1024, 2048) |8.76ms| 0.6831 |
| Fast-SCNN/bn | (1024, 2048) |6.28ms| 0.6964 |
上述测试环境为v100.
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册