diff --git a/README.md b/README.md index ab654768f09d13a8e130bc0284d21912a0e6248d..1421a5cbb30cc5c861ca0646af1eb58b475433f2 100644 --- a/README.md +++ b/README.md @@ -94,6 +94,7 @@ pip install -r requirements.txt * [ICNet模型使用教程](./turtorial/finetune_icnet.md) * [PSPNet模型使用教程](./turtorial/finetune_pspnet.md) * [HRNet模型使用教程](./turtorial/finetune_hrnet.md) +* [Fast-SCNN模型使用教程](./turtorial/finetune_fast_scnn.md) ### 预测部署 @@ -109,7 +110,7 @@ pip install -r requirements.txt * [如何解决二分类中类别不均衡问题](./docs/loss_select.md) * [特色垂类模型使用](./contrib) * [多进程训练和混合精度训练](./docs/multiple_gpus_train_and_mixed_precision_train.md) - +* 使用PaddleSlim进行分割模型压缩([量化](./slim/quantization/README.md), [蒸馏](./slim/distillation/README.md), [剪枝](./slim/prune/README.md), [搜索](./slim/nas/README.md)) ## 在线体验 我们在AI Studio平台上提供了在线体验的教程,欢迎体验: diff --git a/configs/cityscape_fast_scnn.yaml b/configs/cityscape_fast_scnn.yaml new file mode 100644 index 0000000000000000000000000000000000000000..d9e996d64fc208186777e86289ea7329f8240a3b --- /dev/null +++ b/configs/cityscape_fast_scnn.yaml @@ -0,0 +1,53 @@ +EVAL_CROP_SIZE: (2048, 1024) # (width, height), for unpadding rangescaling and stepscaling +TRAIN_CROP_SIZE: (1024, 1024) # (width, height), for unpadding rangescaling and stepscaling +AUG: + AUG_METHOD: "stepscaling" # choice unpadding rangescaling and stepscaling + FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding + INF_RESIZE_VALUE: 500 # for rangescaling + MAX_RESIZE_VALUE: 600 # for rangescaling + MIN_RESIZE_VALUE: 400 # for rangescaling + MAX_SCALE_FACTOR: 2.0 # for stepscaling + MIN_SCALE_FACTOR: 0.5 # for stepscaling + SCALE_STEP_SIZE: 0.25 # for stepscaling + MIRROR: True + FLIP: False + FLIP_RATIO: 0.2 + RICH_CROP: + ENABLE: True + ASPECT_RATIO: 0.0 + BLUR: False + BLUR_RATIO: 0.1 + MAX_ROTATION: 0 + MIN_AREA_RATIO: 0.0 + BRIGHTNESS_JITTER_RATIO: 0.4 + CONTRAST_JITTER_RATIO: 0.4 + SATURATION_JITTER_RATIO: 0.4 +BATCH_SIZE: 12 +MEAN: [0.5, 0.5, 0.5] +STD: [0.5, 0.5, 0.5] +DATASET: + DATA_DIR: "./dataset/cityscapes/" + IMAGE_TYPE: "rgb" # choice rgb or rgba + NUM_CLASSES: 19 + TEST_FILE_LIST: "dataset/cityscapes/val.list" + TRAIN_FILE_LIST: "dataset/cityscapes/train.list" + VAL_FILE_LIST: "dataset/cityscapes/val.list" + IGNORE_INDEX: 255 +FREEZE: + MODEL_FILENAME: "model" + PARAMS_FILENAME: "params" +MODEL: + DEFAULT_NORM_TYPE: "bn" + MODEL_NAME: "fast_scnn" + +TEST: + TEST_MODEL: "snapshots/cityscape_fast_scnn/final/" +TRAIN: + MODEL_SAVE_DIR: "snapshots/cityscape_fast_scnn/" + SNAPSHOT_EPOCH: 10 +SOLVER: + LR: 0.001 + LR_POLICY: "poly" + OPTIMIZER: "sgd" + NUM_EPOCHS: 100 + diff --git a/configs/fast_scnn_pet.yaml b/configs/fast_scnn_pet.yaml new file mode 100644 index 0000000000000000000000000000000000000000..02fdef8ea9a15c3c132d80b69cf3e4f6d0876c1f --- /dev/null +++ b/configs/fast_scnn_pet.yaml @@ -0,0 +1,43 @@ +TRAIN_CROP_SIZE: (512, 512) # (width, height), for unpadding rangescaling and stepscaling +EVAL_CROP_SIZE: (512, 512) # (width, height), for unpadding rangescaling and stepscaling +AUG: + AUG_METHOD: "unpadding" # choice unpadding rangescaling and stepscaling + FIX_RESIZE_SIZE: (512, 512) # (width, height), for unpadding + + INF_RESIZE_VALUE: 500 # for rangescaling + MAX_RESIZE_VALUE: 600 # for rangescaling + MIN_RESIZE_VALUE: 400 # for rangescaling + + MAX_SCALE_FACTOR: 1.25 # for stepscaling + MIN_SCALE_FACTOR: 0.75 # for stepscaling + SCALE_STEP_SIZE: 0.25 # for stepscaling + MIRROR: True +BATCH_SIZE: 4 +DATASET: + DATA_DIR: "./dataset/mini_pet/" + IMAGE_TYPE: "rgb" # choice rgb or rgba + NUM_CLASSES: 3 + TEST_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt" + TRAIN_FILE_LIST: "./dataset/mini_pet/file_list/train_list.txt" + VAL_FILE_LIST: "./dataset/mini_pet/file_list/val_list.txt" + VIS_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt" + IGNORE_INDEX: 255 + SEPARATOR: " " +FREEZE: + MODEL_FILENAME: "__model__" + PARAMS_FILENAME: "__params__" +MODEL: + MODEL_NAME: "fast_scnn" + DEFAULT_NORM_TYPE: "bn" + +TRAIN: + PRETRAINED_MODEL_DIR: "./pretrained_model/fast_scnn_cityscape/" + MODEL_SAVE_DIR: "./saved_model/fast_scnn_pet/" + SNAPSHOT_EPOCH: 10 +TEST: + TEST_MODEL: "./saved_model/fast_scnn_pet/final" +SOLVER: + NUM_EPOCHS: 100 + LR: 0.005 + LR_POLICY: "poly" + OPTIMIZER: "sgd" diff --git a/docs/model_zoo.md b/docs/model_zoo.md index 2b18260e3290561dcc7aa729ea307f23e45c26b0..8cd89fa41d6b7fc88759cf1250d88ec067755a6c 100644 --- a/docs/model_zoo.md +++ b/docs/model_zoo.md @@ -63,3 +63,6 @@ train数据集合为Cityscapes训练集合,测试为Cityscapes的验证集合 | PSPNet/bn | Cityscapes |[pspnet50_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/pspnet50_cityscapes.tgz) |16|false| 0.7013 | | PSPNet/bn | Cityscapes |[pspnet101_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/pspnet101_cityscapes.tgz) |16|false| 0.7734 | | HRNet_W18/bn | Cityscapes |[hrnet_w18_bn_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/hrnet_w18_bn_cityscapes.tgz) | 4 | false | 0.7936 | +| Fast-SCNN/bn | Cityscapes |[fast_scnn_cityscapes.tar](https://paddleseg.bj.bcebos.com/models/fast_scnn_cityscape.tar) | 32 | false | 0.6964 | + +测试环境为python 3.7.3,v100,cudnn 7.6.2。 diff --git a/docs/multiple_gpus_train_and_mixed_precision_train.md b/docs/multiple_gpus_train_and_mixed_precision_train.md index 7826d88171bec71cba7ae2db9327ce3dfd47efd9..206a9409d0326ee6d4cd7c07569e7698f7d9c469 100644 --- a/docs/multiple_gpus_train_and_mixed_precision_train.md +++ b/docs/multiple_gpus_train_and_mixed_precision_train.md @@ -4,7 +4,7 @@ * PaddlePaddle >= 1.6.1 * NVIDIA NCCL >= 2.4.7 -环境配置,数据,预训练模型准备等工作请参考[安装说明](./installation.md),[PaddleSeg使用说明](./usage.md) +环境配置,数据,预训练模型准备等工作请参考[PaddleSeg使用说明](./usage.md) ### 多进程训练示例 diff --git a/pdseg/__init__.py b/pdseg/__init__.py index 5a1851ecb5fc0575deb449110d69da3087282719..e1cb8ed082023155b95e6b6778b797a571b20ca8 100644 --- a/pdseg/__init__.py +++ b/pdseg/__init__.py @@ -14,4 +14,4 @@ # limitations under the License. import models import utils -import tools \ No newline at end of file +from . import tools \ No newline at end of file diff --git a/pdseg/loss.py b/pdseg/loss.py index 66f04f4ad412b115fef04b637ea5a544fa0c2da4..14f1b3794b6c8a15f4da5cf2a838ab7339eeffc4 100644 --- a/pdseg/loss.py +++ b/pdseg/loss.py @@ -71,6 +71,7 @@ def softmax_with_loss(logit, label, ignore_mask=None, num_classes=2, weight=None ignore_mask.stop_gradient = True return avg_loss + # to change, how to appicate ignore index and ignore mask def dice_loss(logit, label, ignore_mask=None, epsilon=0.00001): if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1: @@ -93,6 +94,7 @@ def dice_loss(logit, label, ignore_mask=None, epsilon=0.00001): ignore_mask.stop_gradient = True return fluid.layers.reduce_mean(dice_score) + def bce_loss(logit, label, ignore_mask=None): if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1: raise Exception("bce loss is only applicable to binary classfication") @@ -112,16 +114,18 @@ def multi_softmax_with_loss(logits, label, ignore_mask=None, num_classes=2, weig if isinstance(logits, tuple): avg_loss = 0 for i, logit in enumerate(logits): - logit_label = fluid.layers.resize_nearest(label, logit.shape[2:]) - logit_mask = (logit_label.astype('int32') != + if label.shape[2] != logit.shape[2] or label.shape[3] != logit.shape[3]: + label = fluid.layers.resize_nearest(label, logit.shape[2:]) + logit_mask = (label.astype('int32') != cfg.DATASET.IGNORE_INDEX).astype('int32') - loss = softmax_with_loss(logit, logit_label, logit_mask, + loss = softmax_with_loss(logit, label, logit_mask, num_classes) avg_loss += cfg.MODEL.MULTI_LOSS_WEIGHT[i] * loss else: avg_loss = softmax_with_loss(logits, label, ignore_mask, num_classes, weight=weight) return avg_loss + def multi_dice_loss(logits, label, ignore_mask=None): if isinstance(logits, tuple): avg_loss = 0 @@ -135,6 +139,7 @@ def multi_dice_loss(logits, label, ignore_mask=None): avg_loss = dice_loss(logits, label, ignore_mask) return avg_loss + def multi_bce_loss(logits, label, ignore_mask=None): if isinstance(logits, tuple): avg_loss = 0 diff --git a/pdseg/models/libs/model_libs.py b/pdseg/models/libs/model_libs.py index 19afe54224f259cbd98c189d6bc7196138ed8863..84494a9dd892105c799119c7a467b584c23f4241 100644 --- a/pdseg/models/libs/model_libs.py +++ b/pdseg/models/libs/model_libs.py @@ -164,3 +164,37 @@ def separate_conv(input, channel, stride, filter, dilation=1, act=None): input = bn(input) if act: input = act(input) return input + + +def conv_bn_layer(input, + filter_size, + num_filters, + stride, + padding, + channels=None, + num_groups=1, + if_act=True, + name=None, + use_cudnn=True): + conv = fluid.layers.conv2d( + input=input, + num_filters=num_filters, + filter_size=filter_size, + stride=stride, + padding=padding, + groups=num_groups, + act=None, + use_cudnn=use_cudnn, + param_attr=fluid.ParamAttr(name=name + '_weights'), + bias_attr=False) + bn_name = name + '_bn' + bn = fluid.layers.batch_norm( + input=conv, + param_attr=fluid.ParamAttr(name=bn_name + "_scale"), + bias_attr=fluid.ParamAttr(name=bn_name + "_offset"), + moving_mean_name=bn_name + '_mean', + moving_variance_name=bn_name + '_variance') + if if_act: + return fluid.layers.relu6(bn) + else: + return bn \ No newline at end of file diff --git a/pdseg/models/model_builder.py b/pdseg/models/model_builder.py index 65483b336b59440589f5c2fa27fd8ae456df176a..3ff7e1eacad3f649820d11b16f793d80e94b806b 100644 --- a/pdseg/models/model_builder.py +++ b/pdseg/models/model_builder.py @@ -24,7 +24,7 @@ from utils.config import cfg from loss import multi_softmax_with_loss from loss import multi_dice_loss from loss import multi_bce_loss -from models.modeling import deeplab, unet, icnet, pspnet, hrnet +from models.modeling import deeplab, unet, icnet, pspnet, hrnet, fast_scnn class ModelPhase(object): @@ -81,6 +81,8 @@ def seg_model(image, class_num): logits = pspnet.pspnet(image, class_num) elif model_name == 'hrnet': logits = hrnet.hrnet(image, class_num) + elif model_name == 'fast_scnn': + logits = fast_scnn.fast_scnn(image, class_num) else: raise Exception( "unknow model name, only support unet, deeplabv3p, icnet, pspnet, hrnet" diff --git a/pdseg/models/modeling/deeplab.py b/pdseg/models/modeling/deeplab.py index e7ed9604b2227bb498c2eb0b863804fbe0159333..186e2406d90d291de43133550875072d790a805f 100644 --- a/pdseg/models/modeling/deeplab.py +++ b/pdseg/models/modeling/deeplab.py @@ -27,6 +27,7 @@ from models.libs.model_libs import separate_conv from models.backbone.mobilenet_v2 import MobileNetV2 as mobilenet_backbone from models.backbone.xception import Xception as xception_backbone + def encoder(input): # 编码器配置,采用ASPP架构,pooling + 1x1_conv + 三个不同尺度的空洞卷积并行, concat后1x1conv # ASPP_WITH_SEP_CONV:默认为真,使用depthwise可分离卷积,否则使用普通卷积 @@ -47,8 +48,7 @@ def encoder(input): with scope('encoder'): channel = 256 with scope("image_pool"): - image_avg = fluid.layers.reduce_mean( - input, [2, 3], keep_dim=True) + image_avg = fluid.layers.reduce_mean(input, [2, 3], keep_dim=True) image_avg = bn_relu( conv( image_avg, @@ -250,14 +250,15 @@ def deeplabv3p(img, num_classes): regularization_coeff=0.0), initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01)) with scope('logit'): - logit = conv( - data, - num_classes, - 1, - stride=1, - padding=0, - bias_attr=True, - param_attr=param_attr) + with fluid.name_scope('last_conv'): + logit = conv( + data, + num_classes, + 1, + stride=1, + padding=0, + bias_attr=True, + param_attr=param_attr) logit = fluid.layers.resize_bilinear(logit, img.shape[2:]) return logit diff --git a/pdseg/models/modeling/fast_scnn.py b/pdseg/models/modeling/fast_scnn.py new file mode 100644 index 0000000000000000000000000000000000000000..b1ecdffea6625992e0c7e9e635e67ee79b7b4522 --- /dev/null +++ b/pdseg/models/modeling/fast_scnn.py @@ -0,0 +1,263 @@ +# coding: utf8 +# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle.fluid as fluid +from models.libs.model_libs import scope +from models.libs.model_libs import bn, bn_relu, relu, conv_bn_layer +from models.libs.model_libs import conv, avg_pool +from models.libs.model_libs import separate_conv +from utils.config import cfg + + +def learning_to_downsample(x, dw_channels1=32, dw_channels2=48, out_channels=64): + x = relu(bn(conv(x, dw_channels1, 3, 2))) + with scope('dsconv1'): + x = separate_conv(x, dw_channels2, stride=2, filter=3, act=fluid.layers.relu) + with scope('dsconv2'): + x = separate_conv(x, out_channels, stride=2, filter=3, act=fluid.layers.relu) + return x + + +def shortcut(input, data_residual): + return fluid.layers.elementwise_add(input, data_residual) + + +def dropout2d(input, prob, is_train=False): + if not is_train: + return input + channels = input.shape[1] + keep_prob = 1.0 - prob + random_tensor = keep_prob + fluid.layers.uniform_random_batch_size_like(input, [-1, channels, 1, 1], min=0., max=1.) + binary_tensor = fluid.layers.floor(random_tensor) + output = input / keep_prob * binary_tensor + return output + + +def inverted_residual_unit(input, + num_in_filter, + num_filters, + ifshortcut, + stride, + filter_size, + padding, + expansion_factor, + name=None): + num_expfilter = int(round(num_in_filter * expansion_factor)) + + channel_expand = conv_bn_layer( + input=input, + num_filters=num_expfilter, + filter_size=1, + stride=1, + padding=0, + num_groups=1, + if_act=True, + name=name + '_expand') + + bottleneck_conv = conv_bn_layer( + input=channel_expand, + num_filters=num_expfilter, + filter_size=filter_size, + stride=stride, + padding=padding, + num_groups=num_expfilter, + if_act=True, + name=name + '_dwise', + use_cudnn=False) + + depthwise_output = bottleneck_conv + + linear_out = conv_bn_layer( + input=bottleneck_conv, + num_filters=num_filters, + filter_size=1, + stride=1, + padding=0, + num_groups=1, + if_act=False, + name=name + '_linear') + + if ifshortcut: + out = shortcut(input=input, data_residual=linear_out) + return out, depthwise_output + else: + return linear_out, depthwise_output + + +def inverted_blocks(input, in_c, t, c, n, s, name=None): + first_block, depthwise_output = inverted_residual_unit( + input=input, + num_in_filter=in_c, + num_filters=c, + ifshortcut=False, + stride=s, + filter_size=3, + padding=1, + expansion_factor=t, + name=name + '_1') + + last_residual_block = first_block + last_c = c + + for i in range(1, n): + last_residual_block, depthwise_output = inverted_residual_unit( + input=last_residual_block, + num_in_filter=last_c, + num_filters=c, + ifshortcut=True, + stride=1, + filter_size=3, + padding=1, + expansion_factor=t, + name=name + '_' + str(i + 1)) + return last_residual_block, depthwise_output + + +def psp_module(input, out_features): + + cat_layers = [] + sizes = (1, 2, 3, 6) + for size in sizes: + psp_name = "psp" + str(size) + with scope(psp_name): + pool = fluid.layers.adaptive_pool2d(input, + pool_size=[size, size], + pool_type='avg', + name=psp_name + '_adapool') + data = conv(pool, out_features, + filter_size=1, + bias_attr=False, + name=psp_name + '_conv') + data_bn = bn(data, act='relu') + interp = fluid.layers.resize_bilinear(data_bn, + out_shape=input.shape[2:], + name=psp_name + '_interp', align_mode=0) + cat_layers.append(interp) + cat_layers = [input] + cat_layers + out = fluid.layers.concat(cat_layers, axis=1, name='psp_cat') + + return out + + +class FeatureFusionModule: + """Feature fusion module""" + + def __init__(self, higher_in_channels, lower_in_channels, out_channels, scale_factor=4): + self.higher_in_channels = higher_in_channels + self.lower_in_channels = lower_in_channels + self.out_channels = out_channels + self.scale_factor = scale_factor + + def net(self, higher_res_feature, lower_res_feature): + h, w = higher_res_feature.shape[2:] + lower_res_feature = fluid.layers.resize_bilinear(lower_res_feature, [h, w], align_mode=0) + + with scope('dwconv'): + lower_res_feature = relu(bn(conv(lower_res_feature, self.out_channels, 1)))#(lower_res_feature) + with scope('conv_lower_res'): + lower_res_feature = bn(conv(lower_res_feature, self.out_channels, 1, bias_attr=True)) + with scope('conv_higher_res'): + higher_res_feature = bn(conv(higher_res_feature, self.out_channels, 1, bias_attr=True)) + out = higher_res_feature + lower_res_feature + + return relu(out) + + +class GlobalFeatureExtractor(): + """Global feature extractor module""" + + def __init__(self, in_channels=64, block_channels=(64, 96, 128), out_channels=128, + t=6, num_blocks=(3, 3, 3)): + self.in_channels = in_channels + self.block_channels = block_channels + self.out_channels = out_channels + self.t = t + self.num_blocks = num_blocks + + def net(self, x): + x, _ = inverted_blocks(x, self.in_channels, self.t, self.block_channels[0], + self.num_blocks[0], 2, 'inverted_block_1') + x, _ = inverted_blocks(x, self.block_channels[0], self.t, self.block_channels[1], + self.num_blocks[1], 2, 'inverted_block_2') + x, _ = inverted_blocks(x, self.block_channels[1], self.t, self.block_channels[2], + self.num_blocks[2], 1, 'inverted_block_3') + x = psp_module(x, self.block_channels[2] // 4) + with scope('out'): + x = relu(bn(conv(x, self.out_channels, 1))) + return x + + +class Classifier: + """Classifier""" + + def __init__(self, dw_channels, num_classes, stride=1): + self.dw_channels = dw_channels + self.num_classes = num_classes + self.stride = stride + + def net(self, x): + with scope('dsconv1'): + x = separate_conv(x, self.dw_channels, stride=self.stride, filter=3, act=fluid.layers.relu) + with scope('dsconv2'): + x = separate_conv(x, self.dw_channels, stride=self.stride, filter=3, act=fluid.layers.relu) + x = dropout2d(x, 0.1, is_train=cfg.PHASE=='train') + x = conv(x, self.num_classes, 1, bias_attr=True) + return x + + +def aux_layer(x, num_classes): + x = relu(bn(conv(x, 32, 3, padding=1))) + x = dropout2d(x, 0.1, is_train=(cfg.PHASE == 'train')) + with scope('logit'): + x = conv(x, num_classes, 1, bias_attr=True) + return x + + +def fast_scnn(img, num_classes): + size = img.shape[2:] + classifier = Classifier(128, num_classes) + + global_feature_extractor = GlobalFeatureExtractor(64, [64, 96, 128], 128, 6, [3, 3, 3]) + feature_fusion = FeatureFusionModule(64, 128, 128) + + with scope('learning_to_downsample'): + higher_res_features = learning_to_downsample(img, 32, 48, 64) + with scope('global_feature_extractor'): + lower_res_feature = global_feature_extractor.net(higher_res_features) + with scope('feature_fusion'): + x = feature_fusion.net(higher_res_features, lower_res_feature) + with scope('classifier'): + logit = classifier.net(x) + logit = fluid.layers.resize_bilinear(logit, size, align_mode=0) + + if len(cfg.MODEL.MULTI_LOSS_WEIGHT) == 3: + with scope('aux_layer_higher'): + higher_logit = aux_layer(higher_res_features, num_classes) + higher_logit = fluid.layers.resize_bilinear(higher_logit, size, align_mode=0) + with scope('aux_layer_lower'): + lower_logit = aux_layer(lower_res_feature, num_classes) + lower_logit = fluid.layers.resize_bilinear(lower_logit, size, align_mode=0) + return logit, higher_logit, lower_logit + elif len(cfg.MODEL.MULTI_LOSS_WEIGHT) == 2: + with scope('aux_layer_higher'): + higher_logit = aux_layer(higher_res_features, num_classes) + higher_logit = fluid.layers.resize_bilinear(higher_logit, size, align_mode=0) + return logit, higher_logit + + return logit \ No newline at end of file diff --git a/pdseg/reader.py b/pdseg/reader.py index d3c3659e5064cd8a11e463267a4b046ffdf105ca..7f1fd6fbbe25f1199c9247aa9e42ae7cb682c03d 100644 --- a/pdseg/reader.py +++ b/pdseg/reader.py @@ -98,8 +98,8 @@ class SegDataset(object): # Re-shuffle file list if self.shuffle and cfg.NUM_TRAINERS > 1: np.random.RandomState(self.shuffle_seed).shuffle(self.all_lines) - num_lines = len(self.all_lines) // self.num_trainers - self.lines = self.all_lines[num_lines * self.trainer_id: num_lines * (self.trainer_id + 1)] + num_lines = len(self.all_lines) // cfg.NUM_TRAINERS + self.lines = self.all_lines[num_lines * cfg.TRAINER_ID: num_lines * (cfg.TRAINER_ID + 1)] self.shuffle_seed += 1 elif self.shuffle: np.random.shuffle(self.lines) diff --git a/pdseg/utils/config.py b/pdseg/utils/config.py index 1beff8f055479e5ae6a2cb982ea50d9ea2a900da..c3d84216752838a388fd2cda1946949d77960fb9 100644 --- a/pdseg/utils/config.py +++ b/pdseg/utils/config.py @@ -236,3 +236,19 @@ cfg.FREEZE.MODEL_FILENAME = '__model__' cfg.FREEZE.PARAMS_FILENAME = '__params__' # 预测模型参数保存的路径 cfg.FREEZE.SAVE_DIR = 'freeze_model' + +########################## paddle-slim ###################################### +cfg.SLIM.KNOWLEDGE_DISTILL_IS_TEACHER = False +cfg.SLIM.KNOWLEDGE_DISTILL = False +cfg.SLIM.KNOWLEDGE_DISTILL_TEACHER_MODEL_DIR = "" + +cfg.SLIM.NAS_PORT = 23333 +cfg.SLIM.NAS_ADDRESS = "" +cfg.SLIM.NAS_SEARCH_STEPS = 100 +cfg.SLIM.NAS_START_EVAL_EPOCH = 0 +cfg.SLIM.NAS_IS_SERVER = True +cfg.SLIM.NAS_SPACE_NAME = "" + +cfg.SLIM.PRUNE_PARAMS = '' +cfg.SLIM.PRUNE_RATIOS = [] + diff --git a/pretrained_model/download_model.py b/pretrained_model/download_model.py index 12b01472457bd25e22005141b21bb9d3014bf4fe..28b5ae421425a42e959fa6cf792c3e536e53c964 100644 --- a/pretrained_model/download_model.py +++ b/pretrained_model/download_model.py @@ -81,6 +81,8 @@ model_urls = { "https://paddleseg.bj.bcebos.com/models/pspnet101_cityscapes.tgz", "hrnet_w18_bn_cityscapes": "https://paddleseg.bj.bcebos.com/models/hrnet_w18_bn_cityscapes.tgz", + "fast_scnn_cityscapes": + "https://paddleseg.bj.bcebos.com/models/fast_scnn_cityscape.tar", } if __name__ == "__main__": diff --git a/slim/distillation/README.md b/slim/distillation/README.md new file mode 100644 index 0000000000000000000000000000000000000000..2bd772a1001e11efa89324315fa32d44032ade05 --- /dev/null +++ b/slim/distillation/README.md @@ -0,0 +1,100 @@ +>运行该示例前请安装PaddleSlim和Paddle1.6或更高版本 + +# PaddleSeg蒸馏教程 + +在阅读本教程前,请确保您已经了解过[PaddleSeg使用说明](../../docs/usage.md)等章节,以便对PaddleSeg有一定的了解 + +该文档介绍如何使用[PaddleSlim](https://paddlepaddle.github.io/PaddleSlim)对分割库中的模型进行蒸馏。 + +该教程中所示操作,如无特殊说明,均在`PaddleSeg/`路径下执行。 + +## 概述 + +该示例使用PaddleSlim提供的[蒸馏策略](https://paddlepaddle.github.io/PaddleSlim/algo/algo/#3)对分割库中的模型进行蒸馏训练。 +在阅读该示例前,建议您先了解以下内容: + +- [PaddleSlim蒸馏API文档](https://paddlepaddle.github.io/PaddleSlim/api/single_distiller_api/) + +## 安装PaddleSlim +可按照[PaddleSlim使用文档](https://paddlepaddle.github.io/PaddleSlim/)中的步骤安装PaddleSlim + +## 蒸馏策略说明 + +关于蒸馏API如何使用您可以参考PaddleSlim蒸馏API文档 + +这里以Deeplabv3-xception蒸馏训练Deeplabv3-mobilenet模型为例,首先,为了对`student model`和`teacher model`有个总体的认识,进一步确认蒸馏的对象,我们通过以下命令分别观察两个网络变量(Variables)的名称和形状: + +```python +# 观察student model的Variables +student_vars = [] +for v in fluid.default_main_program().list_vars(): + try: + student_vars.append((v.name, v.shape)) + except: + pass +print("="*50+"student_model_vars"+"="*50) +print(student_vars) +# 观察teacher model的Variables +teacher_vars = [] +for v in teacher_program.list_vars(): + try: + teacher_vars.append((v.name, v.shape)) + except: + pass +print("="*50+"teacher_model_vars"+"="*50) +print(teacher_vars) +``` + +经过对比可以发现,`student model`和`teacher model`输入到`loss`的特征图分别为: + +```bash +# student model +bilinear_interp_0.tmp_0 +# teacher model +bilinear_interp_2.tmp_0 +``` + + +它们形状两两相同,且分别处于两个网络的输出部分。所以,我们用`l2_loss`对这几个特征图两两对应添加蒸馏loss。需要注意的是,teacher的Variable在merge过程中被自动添加了一个`name_prefix`,所以这里也需要加上这个前缀`"teacher_"`,merge过程请参考[蒸馏API文档](https://paddlepaddle.github.io/PaddleSlim/api/single_distiller_api/#merge) + +```python +distill_loss = l2_loss('teacher_bilinear_interp_2.tmp_0', 'bilinear_interp_0.tmp_0') +``` + +我们也可以根据上述操作为蒸馏策略选择其他loss,PaddleSlim支持的有`FSP_loss`, `L2_loss`, `softmax_with_cross_entropy_loss` 以及自定义的任何loss。 + +## 训练 + +根据[PaddleSeg/pdseg/train.py](../../pdseg/train.py)编写压缩脚本`train_distill.py`。 +在该脚本中定义了teacher_model和student_model,用teacher_model的输出指导student_model的训练 + +### 执行示例 + +下载teacher的预训练模型([deeplabv3p_xception65_bn_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/xception65_bn_cityscapes.tgz))和student的预训练模型([mobilenet_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/mobilenet_cityscapes.tgz)), +修改student config file(./slim/distillation/cityscape.yaml)中预训练模型的路径: +``` +TRAIN: + PRETRAINED_MODEL_DIR: your_student_pretrained_model_dir +``` +修改teacher config file(./slim/distillation/cityscape_teacher.yaml)中预训练模型的路径: +``` +SLIM: + KNOWLEDGE_DISTILL_TEACHER_MODEL_DIR: your_teacher_pretrained_model_dir +``` + +执行如下命令启动训练,每间隔```cfg.TRAIN.SNAPSHOT_EPOCH```会进行一次评估。 +```shell +CUDA_VISIBLE_DEVICES=0,1 +python -m paddle.distributed.launch ./slim/distillation/train_distill.py \ +--log_steps 10 --cfg ./slim/distillation/cityscape.yaml \ +--teacher_cfg ./slim/distillation/cityscape_teacher.yaml \ +--use_gpu \ +--use_mpio \ +--do_eval +``` + +注意:如需修改配置文件中的参数,请在对应的配置文件中直接修改,暂不支持命令行输入覆盖。 + +## 评估预测 + +训练完成后的评估和预测请参考PaddleSeg的[快速入门](../../README.md#快速入门)和[基础功能](../../README.md#基础功能)等章节 diff --git a/slim/distillation/cityscape.yaml b/slim/distillation/cityscape.yaml new file mode 100644 index 0000000000000000000000000000000000000000..703a6a2483fcf68f9ea801369ff0675c41ad286c --- /dev/null +++ b/slim/distillation/cityscape.yaml @@ -0,0 +1,59 @@ +EVAL_CROP_SIZE: (2049, 1025) # (width, height), for unpadding rangescaling and stepscaling +TRAIN_CROP_SIZE: (769, 769) # (width, height), for unpadding rangescaling and stepscaling +AUG: + AUG_METHOD: "stepscaling" # choice unpadding rangescaling and stepscaling + FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding + INF_RESIZE_VALUE: 500 # for rangescaling + MAX_RESIZE_VALUE: 600 # for rangescaling + MIN_RESIZE_VALUE: 400 # for rangescaling + MAX_SCALE_FACTOR: 2.0 # for stepscaling + MIN_SCALE_FACTOR: 0.5 # for stepscaling + SCALE_STEP_SIZE: 0.25 # for stepscaling + MIRROR: True + FLIP: True + FLIP_RATIO: 0.2 + RICH_CROP: + ENABLE: False + ASPECT_RATIO: 0.33 + BLUR: True + BLUR_RATIO: 0.1 + MAX_ROTATION: 15 + MIN_AREA_RATIO: 0.5 + BRIGHTNESS_JITTER_RATIO: 0.5 + CONTRAST_JITTER_RATIO: 0.5 + SATURATION_JITTER_RATIO: 0.5 +BATCH_SIZE: 16 +MEAN: [0.5, 0.5, 0.5] +STD: [0.5, 0.5, 0.5] +DATASET: + DATA_DIR: "./dataset/cityscapes/" + IMAGE_TYPE: "rgb" # choice rgb or rgba + NUM_CLASSES: 19 + TEST_FILE_LIST: "dataset/cityscapes/val.list" + TRAIN_FILE_LIST: "dataset/cityscapes/train.list" + VAL_FILE_LIST: "dataset/cityscapes/val.list" + IGNORE_INDEX: 255 +FREEZE: + MODEL_FILENAME: "model" + PARAMS_FILENAME: "params" +MODEL: + DEFAULT_NORM_TYPE: "bn" + MODEL_NAME: "deeplabv3p" + DEEPLAB: + BACKBONE: "mobilenet" + ASPP_WITH_SEP_CONV: True + DECODER_USE_SEP_CONV: True + ENCODER_WITH_ASPP: False + ENABLE_DECODER: False +TEST: + TEST_MODEL: "snapshots/cityscape_v5/final/" +TRAIN: + MODEL_SAVE_DIR: "snapshots/cityscape_mbv2_kd_e100_1/" + PRETRAINED_MODEL_DIR: u"pretrained_model/mobilenet_cityscapes" + SNAPSHOT_EPOCH: 5 + SYNC_BATCH_NORM: True +SOLVER: + LR: 0.001 + LR_POLICY: "poly" + OPTIMIZER: "sgd" + NUM_EPOCHS: 100 diff --git a/slim/distillation/cityscape_teacher.yaml b/slim/distillation/cityscape_teacher.yaml new file mode 100644 index 0000000000000000000000000000000000000000..ff7df807bbb782e4d5862f8963104f07fa147bb1 --- /dev/null +++ b/slim/distillation/cityscape_teacher.yaml @@ -0,0 +1,65 @@ +EVAL_CROP_SIZE: (2049, 1025) # (width, height), for unpadding rangescaling and stepscaling +TRAIN_CROP_SIZE: (769, 769) # (width, height), for unpadding rangescaling and stepscaling +AUG: + AUG_METHOD: "stepscaling" # choice unpadding rangescaling and stepscaling + FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding + INF_RESIZE_VALUE: 500 # for rangescaling + MAX_RESIZE_VALUE: 600 # for rangescaling + MIN_RESIZE_VALUE: 400 # for rangescaling + MAX_SCALE_FACTOR: 2.0 # for stepscaling + MIN_SCALE_FACTOR: 0.5 # for stepscaling + SCALE_STEP_SIZE: 0.25 # for stepscaling + MIRROR: True + FLIP: True + FLIP_RATIO: 0.2 + RICH_CROP: + ENABLE: False + ASPECT_RATIO: 0.33 + BLUR: True + BLUR_RATIO: 0.1 + MAX_ROTATION: 15 + MIN_AREA_RATIO: 0.5 + BRIGHTNESS_JITTER_RATIO: 0.5 + CONTRAST_JITTER_RATIO: 0.5 + SATURATION_JITTER_RATIO: 0.5 +BATCH_SIZE: 16 +MEAN: [0.5, 0.5, 0.5] +STD: [0.5, 0.5, 0.5] +DATASET: + DATA_DIR: "./dataset/cityscapes/" + IMAGE_TYPE: "rgb" # choice rgb or rgba + NUM_CLASSES: 19 + TEST_FILE_LIST: "dataset/cityscapes/val.list" + TRAIN_FILE_LIST: "dataset/cityscapes/train.list" + VAL_FILE_LIST: "dataset/cityscapes/val.list" + IGNORE_INDEX: 255 +FREEZE: + MODEL_FILENAME: "model" + PARAMS_FILENAME: "params" +MODEL: + DEFAULT_NORM_TYPE: "bn" + MODEL_NAME: "deeplabv3p" + DEEPLAB: + BACKBONE: "xception_65" + ASPP_WITH_SEP_CONV: True + DECODER_USE_SEP_CONV: True + ENCODER_WITH_ASPP: True + ENABLE_DECODER: True +TEST: + TEST_MODEL: "snapshots/cityscape_v5/final/" +TRAIN: + MODEL_SAVE_DIR: "snapshots/cityscape_v7/" + PRETRAINED_MODEL_DIR: u"pretrain/deeplabv3plus_gn_init" + SNAPSHOT_EPOCH: 5 + SYNC_BATCH_NORM: True +SOLVER: + LR: 0.001 + LR_POLICY: "poly" + OPTIMIZER: "sgd" + NUM_EPOCHS: 100 + +SLIM: + KNOWLEDGE_DISTILL_IS_TEACHER: True + KNOWLEDGE_DISTILL: True + KNOWLEDGE_DISTILL_TEACHER_MODEL_DIR: "pretrained_model/xception65_bn_cityscapes" + diff --git a/slim/distillation/model_builder.py b/slim/distillation/model_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..f903b8dd2b635fa10070dcc3da488be66746d539 --- /dev/null +++ b/slim/distillation/model_builder.py @@ -0,0 +1,342 @@ +# coding: utf8 +# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import struct + +import paddle.fluid as fluid +import numpy as np +from paddle.fluid.proto.framework_pb2 import VarType + +import solver +from utils.config import cfg +from loss import multi_softmax_with_loss +from loss import multi_dice_loss +from loss import multi_bce_loss +from models.modeling import deeplab, unet, icnet, pspnet, hrnet, fast_scnn + + +class ModelPhase(object): + """ + Standard name for model phase in PaddleSeg + + The following standard keys are defined: + * `TRAIN`: training mode. + * `EVAL`: testing/evaluation mode. + * `PREDICT`: prediction/inference mode. + * `VISUAL` : visualization mode + """ + + TRAIN = 'train' + EVAL = 'eval' + PREDICT = 'predict' + VISUAL = 'visual' + + @staticmethod + def is_train(phase): + return phase == ModelPhase.TRAIN + + @staticmethod + def is_predict(phase): + return phase == ModelPhase.PREDICT + + @staticmethod + def is_eval(phase): + return phase == ModelPhase.EVAL + + @staticmethod + def is_visual(phase): + return phase == ModelPhase.VISUAL + + @staticmethod + def is_valid_phase(phase): + """ Check valid phase """ + if ModelPhase.is_train(phase) or ModelPhase.is_predict(phase) \ + or ModelPhase.is_eval(phase) or ModelPhase.is_visual(phase): + return True + + return False + + +def seg_model(image, class_num): + model_name = cfg.MODEL.MODEL_NAME + if model_name == 'unet': + logits = unet.unet(image, class_num) + elif model_name == 'deeplabv3p': + logits = deeplab.deeplabv3p(image, class_num) + elif model_name == 'icnet': + logits = icnet.icnet(image, class_num) + elif model_name == 'pspnet': + logits = pspnet.pspnet(image, class_num) + elif model_name == 'hrnet': + logits = hrnet.hrnet(image, class_num) + elif model_name == 'fast_scnn': + logits = fast_scnn.fast_scnn(image, class_num) + else: + raise Exception( + "unknow model name, only support unet, deeplabv3p, icnet, pspnet, hrnet" + ) + return logits + + +def softmax(logit): + logit = fluid.layers.transpose(logit, [0, 2, 3, 1]) + logit = fluid.layers.softmax(logit) + logit = fluid.layers.transpose(logit, [0, 3, 1, 2]) + return logit + + +def sigmoid_to_softmax(logit): + """ + one channel to two channel + """ + logit = fluid.layers.transpose(logit, [0, 2, 3, 1]) + logit = fluid.layers.sigmoid(logit) + logit_back = 1 - logit + logit = fluid.layers.concat([logit_back, logit], axis=-1) + logit = fluid.layers.transpose(logit, [0, 3, 1, 2]) + return logit + + +def export_preprocess(image): + """导出模型的预处理流程""" + + image = fluid.layers.transpose(image, [0, 3, 1, 2]) + origin_shape = fluid.layers.shape(image)[-2:] + + # 不同AUG_METHOD方法的resize + if cfg.AUG.AUG_METHOD == 'unpadding': + h_fix = cfg.AUG.FIX_RESIZE_SIZE[1] + w_fix = cfg.AUG.FIX_RESIZE_SIZE[0] + image = fluid.layers.resize_bilinear( + image, out_shape=[h_fix, w_fix], align_corners=False, align_mode=0) + elif cfg.AUG.AUG_METHOD == 'rangescaling': + size = cfg.AUG.INF_RESIZE_VALUE + value = fluid.layers.reduce_max(origin_shape) + scale = float(size) / value.astype('float32') + image = fluid.layers.resize_bilinear( + image, scale=scale, align_corners=False, align_mode=0) + + # 存储resize后图像shape + valid_shape = fluid.layers.shape(image)[-2:] + + # padding到eval_crop_size大小 + width = cfg.EVAL_CROP_SIZE[0] + height = cfg.EVAL_CROP_SIZE[1] + pad_target = fluid.layers.assign( + np.array([height, width]).astype('float32')) + up = fluid.layers.assign(np.array([0]).astype('float32')) + down = pad_target[0] - valid_shape[0] + left = up + right = pad_target[1] - valid_shape[1] + paddings = fluid.layers.concat([up, down, left, right]) + paddings = fluid.layers.cast(paddings, 'int32') + image = fluid.layers.pad2d(image, paddings=paddings, pad_value=127.5) + + # normalize + mean = np.array(cfg.MEAN).reshape(1, len(cfg.MEAN), 1, 1) + mean = fluid.layers.assign(mean.astype('float32')) + std = np.array(cfg.STD).reshape(1, len(cfg.STD), 1, 1) + std = fluid.layers.assign(std.astype('float32')) + image = (image / 255 - mean) / std + # 使后面的网络能通过类似image.shape获取特征图的shape + image = fluid.layers.reshape( + image, shape=[-1, cfg.DATASET.DATA_DIM, height, width]) + return image, valid_shape, origin_shape + + +def build_model(main_prog=None, start_prog=None, phase=ModelPhase.TRAIN, **kwargs): + + if not ModelPhase.is_valid_phase(phase): + raise ValueError("ModelPhase {} is not valid!".format(phase)) + if ModelPhase.is_train(phase): + width = cfg.TRAIN_CROP_SIZE[0] + height = cfg.TRAIN_CROP_SIZE[1] + else: + width = cfg.EVAL_CROP_SIZE[0] + height = cfg.EVAL_CROP_SIZE[1] + + image_shape = [cfg.DATASET.DATA_DIM, height, width] + grt_shape = [1, height, width] + class_num = cfg.DATASET.NUM_CLASSES + + #with fluid.program_guard(main_prog, start_prog): + # with fluid.unique_name.guard(): + # 在导出模型的时候,增加图像标准化预处理,减小预测部署时图像的处理流程 + # 预测部署时只须对输入图像增加batch_size维度即可 + if cfg.SLIM.KNOWLEDGE_DISTILL_IS_TEACHER: + image = main_prog.global_block()._clone_variable(kwargs['image'], + force_persistable=False) + label = main_prog.global_block()._clone_variable(kwargs['label'], + force_persistable=False) + mask = main_prog.global_block()._clone_variable(kwargs['mask'], + force_persistable=False) + else: + if ModelPhase.is_predict(phase): + origin_image = fluid.layers.data( + name='image', + shape=[-1, -1, -1, cfg.DATASET.DATA_DIM], + dtype='float32', + append_batch_size=False) + image, valid_shape, origin_shape = export_preprocess( + origin_image) + + else: + image = fluid.layers.data( + name='image', shape=image_shape, dtype='float32') + label = fluid.layers.data( + name='label', shape=grt_shape, dtype='int32') + mask = fluid.layers.data( + name='mask', shape=grt_shape, dtype='int32') + + + # use PyReader when doing traning and evaluation + if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase): + py_reader = None + if not cfg.SLIM.KNOWLEDGE_DISTILL_IS_TEACHER: + py_reader = fluid.io.PyReader( + feed_list=[image, label, mask], + capacity=cfg.DATALOADER.BUF_SIZE, + iterable=False, + use_double_buffer=True) + + loss_type = cfg.SOLVER.LOSS + if not isinstance(loss_type, list): + loss_type = list(loss_type) + + # dice_loss或bce_loss只适用两类分割中 + if class_num > 2 and (("dice_loss" in loss_type) or + ("bce_loss" in loss_type)): + raise Exception( + "dice loss and bce loss is only applicable to binary classfication" + ) + + # 在两类分割情况下,当loss函数选择dice_loss或bce_loss的时候,最后logit输出通道数设置为1 + if ("dice_loss" in loss_type) or ("bce_loss" in loss_type): + class_num = 1 + if "softmax_loss" in loss_type: + raise Exception( + "softmax loss can not combine with dice loss or bce loss" + ) + logits = seg_model(image, class_num) + + # 根据选择的loss函数计算相应的损失函数 + if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase): + loss_valid = False + avg_loss_list = [] + valid_loss = [] + if "softmax_loss" in loss_type: + weight = cfg.SOLVER.CROSS_ENTROPY_WEIGHT + avg_loss_list.append( + multi_softmax_with_loss(logits, label, mask, class_num, weight)) + loss_valid = True + valid_loss.append("softmax_loss") + if "dice_loss" in loss_type: + avg_loss_list.append(multi_dice_loss(logits, label, mask)) + loss_valid = True + valid_loss.append("dice_loss") + if "bce_loss" in loss_type: + avg_loss_list.append(multi_bce_loss(logits, label, mask)) + loss_valid = True + valid_loss.append("bce_loss") + if not loss_valid: + raise Exception( + "SOLVER.LOSS: {} is set wrong. it should " + "include one of (softmax_loss, bce_loss, dice_loss) at least" + " example: ['softmax_loss'], ['dice_loss'], ['bce_loss', 'dice_loss']" + .format(cfg.SOLVER.LOSS)) + + invalid_loss = [x for x in loss_type if x not in valid_loss] + if len(invalid_loss) > 0: + print( + "Warning: the loss {} you set is invalid. it will not be included in loss computed." + .format(invalid_loss)) + + avg_loss = 0 + for i in range(0, len(avg_loss_list)): + avg_loss += avg_loss_list[i] + + #get pred result in original size + if isinstance(logits, tuple): + logit = logits[0] + else: + logit = logits + + if logit.shape[2:] != label.shape[2:]: + logit = fluid.layers.resize_bilinear(logit, label.shape[2:]) + + # return image input and logit output for inference graph prune + if ModelPhase.is_predict(phase): + # 两类分割中,使用dice_loss或bce_loss返回的logit为单通道,进行到两通道的变换 + if class_num == 1: + logit = sigmoid_to_softmax(logit) + else: + logit = softmax(logit) + + # 获取有效部分 + logit = fluid.layers.slice( + logit, axes=[2, 3], starts=[0, 0], ends=valid_shape) + + logit = fluid.layers.resize_bilinear( + logit, + out_shape=origin_shape, + align_corners=False, + align_mode=0) + logit = fluid.layers.argmax(logit, axis=1) + return origin_image, logit + + if class_num == 1: + out = sigmoid_to_softmax(logit) + out = fluid.layers.transpose(out, [0, 2, 3, 1]) + else: + out = fluid.layers.transpose(logit, [0, 2, 3, 1]) + + pred = fluid.layers.argmax(out, axis=3) + pred = fluid.layers.unsqueeze(pred, axes=[3]) + if ModelPhase.is_visual(phase): + if class_num == 1: + logit = sigmoid_to_softmax(logit) + else: + logit = softmax(logit) + return pred, logit + + if ModelPhase.is_eval(phase): + return py_reader, avg_loss, pred, label, mask + + if ModelPhase.is_train(phase): + decayed_lr = None + if not cfg.SLIM.KNOWLEDGE_DISTILL: + optimizer = solver.Solver(main_prog, start_prog) + decayed_lr = optimizer.optimise(avg_loss) + # optimizer = solver.Solver(main_prog, start_prog) + # decayed_lr = optimizer.optimise(avg_loss) + return py_reader, avg_loss, decayed_lr, pred, label, mask, image + + +def to_int(string, dest="I"): + return struct.unpack(dest, string)[0] + + +def parse_shape_from_file(filename): + with open(filename, "rb") as file: + version = file.read(4) + lod_level = to_int(file.read(8), dest="Q") + for i in range(lod_level): + _size = to_int(file.read(8), dest="Q") + _ = file.read(_size) + version = file.read(4) + tensor_desc_size = to_int(file.read(4)) + tensor_desc = VarType.TensorDesc() + tensor_desc.ParseFromString(file.read(tensor_desc_size)) + return tuple(tensor_desc.dims) diff --git a/slim/distillation/train_distill.py b/slim/distillation/train_distill.py new file mode 100644 index 0000000000000000000000000000000000000000..c1e23253ffcde9eea034bd7f67906ca9e534d2e2 --- /dev/null +++ b/slim/distillation/train_distill.py @@ -0,0 +1,584 @@ +# coding: utf8 +# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys + +LOCAL_PATH = os.path.dirname(os.path.abspath(__file__)) +SEG_PATH = os.path.join(LOCAL_PATH, "../../", "pdseg") +sys.path.append(SEG_PATH) + +import argparse +import pprint +import random +import shutil +import functools + +import paddle +import numpy as np +import paddle.fluid as fluid + +from utils.config import cfg +from utils.timer import Timer, calculate_eta +from metrics import ConfusionMatrix +from reader import SegDataset +from model_builder import build_model +from model_builder import ModelPhase +from model_builder import parse_shape_from_file +from eval import evaluate +from vis import visualize +from utils import dist_utils + +import solver +from paddleslim.dist.single_distiller import merge, l2_loss + +def parse_args(): + parser = argparse.ArgumentParser(description='PaddleSeg training') + parser.add_argument( + '--cfg', + dest='cfg_file', + help='Config file for training (and optionally testing)', + default=None, + type=str) + parser.add_argument( + '--teacher_cfg', + dest='teacher_cfg_file', + help='Config file for training (and optionally testing)', + default=None, + type=str) + parser.add_argument( + '--use_gpu', + dest='use_gpu', + help='Use gpu or cpu', + action='store_true', + default=False) + parser.add_argument( + '--use_mpio', + dest='use_mpio', + help='Use multiprocess I/O or not', + action='store_true', + default=False) + parser.add_argument( + '--log_steps', + dest='log_steps', + help='Display logging information at every log_steps', + default=10, + type=int) + parser.add_argument( + '--debug', + dest='debug', + help='debug mode, display detail information of training', + action='store_true') + parser.add_argument( + '--use_tb', + dest='use_tb', + help='whether to record the data during training to Tensorboard', + action='store_true') + parser.add_argument( + '--tb_log_dir', + dest='tb_log_dir', + help='Tensorboard logging directory', + default=None, + type=str) + parser.add_argument( + '--do_eval', + dest='do_eval', + help='Evaluation models result on every new checkpoint', + action='store_true') + parser.add_argument( + 'opts', + help='See utils/config.py for all options', + default=None, + nargs=argparse.REMAINDER) + parser.add_argument( + '--enable_ce', + dest='enable_ce', + help='If set True, enable continuous evaluation job.' + 'This flag is only used for internal test.', + action='store_true') + return parser.parse_args() + + +def save_vars(executor, dirname, program=None, vars=None): + """ + Temporary resolution for Win save variables compatability. + Will fix in PaddlePaddle v1.5.2 + """ + + save_program = fluid.Program() + save_block = save_program.global_block() + + for each_var in vars: + # NOTE: don't save the variable which type is RAW + if each_var.type == fluid.core.VarDesc.VarType.RAW: + continue + new_var = save_block.create_var( + name=each_var.name, + shape=each_var.shape, + dtype=each_var.dtype, + type=each_var.type, + lod_level=each_var.lod_level, + persistable=True) + file_path = os.path.join(dirname, new_var.name) + file_path = os.path.normpath(file_path) + save_block.append_op( + type='save', + inputs={'X': [new_var]}, + outputs={}, + attrs={'file_path': file_path}) + + executor.run(save_program) + + +def save_checkpoint(exe, program, ckpt_name): + """ + Save checkpoint for evaluation or resume training + """ + ckpt_dir = os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, str(ckpt_name)) + print("Save model checkpoint to {}".format(ckpt_dir)) + if not os.path.isdir(ckpt_dir): + os.makedirs(ckpt_dir) + + save_vars( + exe, + ckpt_dir, + program, + vars=list(filter(fluid.io.is_persistable, program.list_vars()))) + + return ckpt_dir + + +def load_checkpoint(exe, program): + """ + Load checkpoiont from pretrained model directory for resume training + """ + + print('Resume model training from:', cfg.TRAIN.RESUME_MODEL_DIR) + if not os.path.exists(cfg.TRAIN.RESUME_MODEL_DIR): + raise ValueError("TRAIN.PRETRAIN_MODEL {} not exist!".format( + cfg.TRAIN.RESUME_MODEL_DIR)) + + fluid.io.load_persistables( + exe, cfg.TRAIN.RESUME_MODEL_DIR, main_program=program) + + model_path = cfg.TRAIN.RESUME_MODEL_DIR + # Check is path ended by path spearator + if model_path[-1] == os.sep: + model_path = model_path[0:-1] + epoch_name = os.path.basename(model_path) + # If resume model is final model + if epoch_name == 'final': + begin_epoch = cfg.SOLVER.NUM_EPOCHS + # If resume model path is end of digit, restore epoch status + elif epoch_name.isdigit(): + epoch = int(epoch_name) + begin_epoch = epoch + 1 + else: + raise ValueError("Resume model path is not valid!") + print("Model checkpoint loaded successfully!") + + return begin_epoch + + +def update_best_model(ckpt_dir): + best_model_dir = os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, 'best_model') + if os.path.exists(best_model_dir): + shutil.rmtree(best_model_dir) + shutil.copytree(ckpt_dir, best_model_dir) + + +def print_info(*msg): + if cfg.TRAINER_ID == 0: + print(*msg) + + +def train(cfg): + # startup_prog = fluid.Program() + # train_prog = fluid.Program() + + drop_last = True + + dataset = SegDataset( + file_list=cfg.DATASET.TRAIN_FILE_LIST, + mode=ModelPhase.TRAIN, + shuffle=True, + data_dir=cfg.DATASET.DATA_DIR) + + def data_generator(): + if args.use_mpio: + data_gen = dataset.multiprocess_generator( + num_processes=cfg.DATALOADER.NUM_WORKERS, + max_queue_size=cfg.DATALOADER.BUF_SIZE) + else: + data_gen = dataset.generator() + + batch_data = [] + for b in data_gen: + batch_data.append(b) + if len(batch_data) == (cfg.BATCH_SIZE // cfg.NUM_TRAINERS): + for item in batch_data: + yield item[0], item[1], item[2] + batch_data = [] + # If use sync batch norm strategy, drop last batch if number of samples + # in batch_data is less then cfg.BATCH_SIZE to avoid NCCL hang issues + if not cfg.TRAIN.SYNC_BATCH_NORM: + for item in batch_data: + yield item[0], item[1], item[2] + + # Get device environment + # places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places() + # place = places[0] + gpu_id = int(os.environ.get('FLAGS_selected_gpus', 0)) + place = fluid.CUDAPlace(gpu_id) if args.use_gpu else fluid.CPUPlace() + places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places() + + # Get number of GPU + dev_count = cfg.NUM_TRAINERS if cfg.NUM_TRAINERS > 1 else len(places) + print_info("#Device count: {}".format(dev_count)) + + # Make sure BATCH_SIZE can divided by GPU cards + assert cfg.BATCH_SIZE % dev_count == 0, ( + 'BATCH_SIZE:{} not divisble by number of GPUs:{}'.format( + cfg.BATCH_SIZE, dev_count)) + # If use multi-gpu training mode, batch data will allocated to each GPU evenly + batch_size_per_dev = cfg.BATCH_SIZE // dev_count + print_info("batch_size_per_dev: {}".format(batch_size_per_dev)) + + py_reader, loss, lr, pred, grts, masks, image = build_model(phase=ModelPhase.TRAIN) + py_reader.decorate_sample_generator( + data_generator, batch_size=batch_size_per_dev, drop_last=drop_last) + + exe = fluid.Executor(place) + + cfg.update_from_file(args.teacher_cfg_file) + # teacher_arch = teacher_cfg.architecture + teacher_program = fluid.Program() + teacher_startup_program = fluid.Program() + + with fluid.program_guard(teacher_program, teacher_startup_program): + with fluid.unique_name.guard(): + _, teacher_loss, _, _, _, _, _ = build_model( + teacher_program, teacher_startup_program, phase=ModelPhase.TRAIN, image=image, + label=grts, mask=masks) + + exe.run(teacher_startup_program) + + teacher_program = teacher_program.clone(for_test=True) + ckpt_dir = cfg.SLIM.KNOWLEDGE_DISTILL_TEACHER_MODEL_DIR + assert ckpt_dir is not None + print('load teacher model:', ckpt_dir) + fluid.io.load_params(exe, ckpt_dir, main_program=teacher_program) + + # cfg = load_config(FLAGS.config) + cfg.update_from_file(args.cfg_file) + data_name_map = { + 'image': 'image', + 'label': 'label', + 'mask': 'mask', + } + merge(teacher_program, fluid.default_main_program(), data_name_map, place) + distill_pairs = [['teacher_bilinear_interp_2.tmp_0', 'bilinear_interp_0.tmp_0']] + + def distill(pairs, weight): + """ + Add 3 pairs of distillation losses, each pair of feature maps is the + input of teacher and student's yolov3_loss respectively + """ + loss = l2_loss(pairs[0][0], pairs[0][1]) + weighted_loss = loss * weight + return weighted_loss + + distill_loss = distill(distill_pairs, 0.1) + cfg.update_from_file(args.cfg_file) + optimizer = solver.Solver(None, None) + all_loss = loss + distill_loss + lr = optimizer.optimise(all_loss) + + exe.run(fluid.default_startup_program()) + + exec_strategy = fluid.ExecutionStrategy() + # Clear temporary variables every 100 iteration + if args.use_gpu: + exec_strategy.num_threads = fluid.core.get_cuda_device_count() + exec_strategy.num_iteration_per_drop_scope = 100 + build_strategy = fluid.BuildStrategy() + build_strategy.fuse_all_reduce_ops = False + build_strategy.fuse_all_optimizer_ops = False + build_strategy.fuse_elewise_add_act_ops = True + if cfg.NUM_TRAINERS > 1 and args.use_gpu: + dist_utils.prepare_for_multi_process(exe, build_strategy, fluid.default_main_program()) + exec_strategy.num_threads = 1 + + if cfg.TRAIN.SYNC_BATCH_NORM and args.use_gpu: + if dev_count > 1: + # Apply sync batch norm strategy + print_info("Sync BatchNorm strategy is effective.") + build_strategy.sync_batch_norm = True + else: + print_info( + "Sync BatchNorm strategy will not be effective if GPU device" + " count <= 1") + compiled_train_prog = fluid.CompiledProgram(fluid.default_main_program()).with_data_parallel( + loss_name=all_loss.name, + exec_strategy=exec_strategy, + build_strategy=build_strategy) + + # Resume training + begin_epoch = cfg.SOLVER.BEGIN_EPOCH + if cfg.TRAIN.RESUME_MODEL_DIR: + begin_epoch = load_checkpoint(exe, fluid.default_main_program()) + # Load pretrained model + elif os.path.exists(cfg.TRAIN.PRETRAINED_MODEL_DIR): + print_info('Pretrained model dir: ', cfg.TRAIN.PRETRAINED_MODEL_DIR) + load_vars = [] + load_fail_vars = [] + + def var_shape_matched(var, shape): + """ + Check whehter persitable variable shape is match with current network + """ + var_exist = os.path.exists( + os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name)) + if var_exist: + var_shape = parse_shape_from_file( + os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name)) + return var_shape == shape + return False + + for x in fluid.default_main_program().list_vars(): + if isinstance(x, fluid.framework.Parameter): + shape = tuple(fluid.global_scope().find_var( + x.name).get_tensor().shape()) + if var_shape_matched(x, shape): + load_vars.append(x) + else: + load_fail_vars.append(x) + + fluid.io.load_vars( + exe, dirname=cfg.TRAIN.PRETRAINED_MODEL_DIR, vars=load_vars) + for var in load_vars: + print_info("Parameter[{}] loaded sucessfully!".format(var.name)) + for var in load_fail_vars: + print_info( + "Parameter[{}] don't exist or shape does not match current network, skip" + " to load it.".format(var.name)) + print_info("{}/{} pretrained parameters loaded successfully!".format( + len(load_vars), + len(load_vars) + len(load_fail_vars))) + else: + print_info( + 'Pretrained model dir {} not exists, training from scratch...'. + format(cfg.TRAIN.PRETRAINED_MODEL_DIR)) + + #fetch_list = [avg_loss.name, lr.name] + fetch_list = [loss.name, 'teacher_' + teacher_loss.name, distill_loss.name, lr.name] + + if args.debug: + # Fetch more variable info and use streaming confusion matrix to + # calculate IoU results if in debug mode + np.set_printoptions( + precision=4, suppress=True, linewidth=160, floatmode="fixed") + fetch_list.extend([pred.name, grts.name, masks.name]) + cm = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True) + + if args.use_tb: + if not args.tb_log_dir: + print_info("Please specify the log directory by --tb_log_dir.") + exit(1) + + from tb_paddle import SummaryWriter + log_writer = SummaryWriter(args.tb_log_dir) + + # trainer_id = int(os.getenv("PADDLE_TRAINER_ID", 0)) + # num_trainers = int(os.environ.get('PADDLE_TRAINERS_NUM', 1)) + global_step = 0 + all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE + if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True: + all_step += 1 + all_step *= (cfg.SOLVER.NUM_EPOCHS - begin_epoch + 1) + + avg_loss = 0.0 + avg_t_loss = 0.0 + avg_d_loss = 0.0 + best_mIoU = 0.0 + + timer = Timer() + timer.start() + if begin_epoch > cfg.SOLVER.NUM_EPOCHS: + raise ValueError( + ("begin epoch[{}] is larger than cfg.SOLVER.NUM_EPOCHS[{}]").format( + begin_epoch, cfg.SOLVER.NUM_EPOCHS)) + + if args.use_mpio: + print_info("Use multiprocess reader") + else: + print_info("Use multi-thread reader") + + for epoch in range(begin_epoch, cfg.SOLVER.NUM_EPOCHS + 1): + py_reader.start() + while True: + try: + if args.debug: + # Print category IoU and accuracy to check whether the + # traning process is corresponed to expectation + loss, lr, pred, grts, masks = exe.run( + program=compiled_train_prog, + fetch_list=fetch_list, + return_numpy=True) + cm.calculate(pred, grts, masks) + avg_loss += np.mean(np.array(loss)) + global_step += 1 + + if global_step % args.log_steps == 0: + speed = args.log_steps / timer.elapsed_time() + avg_loss /= args.log_steps + category_acc, mean_acc = cm.accuracy() + category_iou, mean_iou = cm.mean_iou() + + print_info(( + "epoch={} step={} lr={:.5f} loss={:.4f} acc={:.5f} mIoU={:.5f} step/sec={:.3f} | ETA {}" + ).format(epoch, global_step, lr[0], avg_loss, mean_acc, + mean_iou, speed, + calculate_eta(all_step - global_step, speed))) + print_info("Category IoU: ", category_iou) + print_info("Category Acc: ", category_acc) + if args.use_tb: + log_writer.add_scalar('Train/mean_iou', mean_iou, + global_step) + log_writer.add_scalar('Train/mean_acc', mean_acc, + global_step) + log_writer.add_scalar('Train/loss', avg_loss, + global_step) + log_writer.add_scalar('Train/lr', lr[0], + global_step) + log_writer.add_scalar('Train/step/sec', speed, + global_step) + sys.stdout.flush() + avg_loss = 0.0 + cm.zero_matrix() + timer.restart() + else: + # If not in debug mode, avoid unnessary log and calculate + loss, t_loss, d_loss, lr = exe.run( + program=compiled_train_prog, + fetch_list=fetch_list, + return_numpy=True) + avg_loss += np.mean(np.array(loss)) + avg_t_loss += np.mean(np.array(t_loss)) + avg_d_loss += np.mean(np.array(d_loss)) + global_step += 1 + + if global_step % args.log_steps == 0 and cfg.TRAINER_ID == 0: + avg_loss /= args.log_steps + avg_t_loss /= args.log_steps + avg_d_loss /= args.log_steps + speed = args.log_steps / timer.elapsed_time() + print(( + "epoch={} step={} lr={:.5f} loss={:.4f} teacher loss={:.4f} distill loss={:.4f} step/sec={:.3f} | ETA {}" + ).format(epoch, global_step, lr[0], avg_loss, avg_t_loss, avg_d_loss, speed, + calculate_eta(all_step - global_step, speed))) + if args.use_tb: + log_writer.add_scalar('Train/loss', avg_loss, + global_step) + log_writer.add_scalar('Train/lr', lr[0], + global_step) + log_writer.add_scalar('Train/speed', speed, + global_step) + sys.stdout.flush() + avg_loss = 0.0 + avg_t_loss = 0.0 + avg_d_loss = 0.0 + timer.restart() + + except fluid.core.EOFException: + py_reader.reset() + break + except Exception as e: + print(e) + + if (epoch % cfg.TRAIN.SNAPSHOT_EPOCH == 0 + or epoch == cfg.SOLVER.NUM_EPOCHS) and cfg.TRAINER_ID == 0: + ckpt_dir = save_checkpoint(exe, fluid.default_main_program(), epoch) + + if args.do_eval: + print("Evaluation start") + _, mean_iou, _, mean_acc = evaluate( + cfg=cfg, + ckpt_dir=ckpt_dir, + use_gpu=args.use_gpu, + use_mpio=args.use_mpio) + if args.use_tb: + log_writer.add_scalar('Evaluate/mean_iou', mean_iou, + global_step) + log_writer.add_scalar('Evaluate/mean_acc', mean_acc, + global_step) + + if mean_iou > best_mIoU: + best_mIoU = mean_iou + update_best_model(ckpt_dir) + print_info("Save best model {} to {}, mIoU = {:.4f}".format( + ckpt_dir, + os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, 'best_model'), + mean_iou)) + + # Use Tensorboard to visualize results + if args.use_tb and cfg.DATASET.VIS_FILE_LIST is not None: + visualize( + cfg=cfg, + use_gpu=args.use_gpu, + vis_file_list=cfg.DATASET.VIS_FILE_LIST, + vis_dir="visual", + ckpt_dir=ckpt_dir, + log_writer=log_writer) + if cfg.TRAINER_ID == 0: + ckpt_dir = save_checkpoint(exe, fluid.default_main_program(), epoch) + + # save final model + if cfg.TRAINER_ID == 0: + save_checkpoint(exe, fluid.default_main_program(), 'final') + + +def main(args): + if args.cfg_file is not None: + cfg.update_from_file(args.cfg_file) + if args.opts: + cfg.update_from_list(args.opts) + if args.enable_ce: + random.seed(0) + np.random.seed(0) + + cfg.TRAINER_ID = int(os.getenv("PADDLE_TRAINER_ID", 0)) + cfg.NUM_TRAINERS = int(os.environ.get('PADDLE_TRAINERS_NUM', 1)) + + cfg.check_and_infer() + print_info(pprint.pformat(cfg)) + train(cfg) + + +if __name__ == '__main__': + args = parse_args() + if fluid.core.is_compiled_with_cuda() != True and args.use_gpu == True: + print( + "You can not set use_gpu = True in the model because you are using paddlepaddle-cpu." + ) + print( + "Please: 1. Install paddlepaddle-gpu to run your models on GPU or 2. Set use_gpu=False to run models on CPU." + ) + sys.exit(1) + main(args) diff --git a/slim/nas/README.md b/slim/nas/README.md new file mode 100644 index 0000000000000000000000000000000000000000..cddfc5a82f07ab0b3f2e2acad6a4c0f7b2ed650c --- /dev/null +++ b/slim/nas/README.md @@ -0,0 +1,63 @@ +>运行该示例前请安装Paddle1.6或更高版本 + +# PaddleSeg神经网络搜索(NAS)示例 + +在阅读本教程前,请确保您已经了解过[PaddleSeg使用说明](../../docs/usage.md)等章节,以便对PaddleSeg有一定的了解 + +该文档介绍如何使用[PaddleSlim](https://paddlepaddle.github.io/PaddleSlim)对分割库中的模型进行搜索。 + +该教程中所示操作,如无特殊说明,均在`PaddleSeg/`路径下执行。 + +## 概述 + +我们选取Deeplab+mobilenetv2模型作为神经网络搜索示例,该示例使用[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) +辅助完成神经网络搜索实验,具体技术细节,请您参考[神经网络搜索策略](https://github.com/PaddlePaddle/PaddleSlim/blob/4670a79343c191b61a78e416826d122eea52a7ab/docs/zh_cn/tutorials/image_classification_nas_quick_start.ipynb)。 + + +## 定义搜索空间 +搜索实验中,我们采用了SANAS的方式进行搜索,本次实验会对网络模型中的通道数和卷积核尺寸进行搜索。 +所以我们定义了如下搜索空间: +- head通道模块`head_num`:定义了MobilenetV2 head模块中通道数变化区间; +- inverse_res_block1-6`filter_num1-6`: 定义了inverse_res_block模块中通道数变化区间; +- inverse_res_block`repeat`:定义了MobilenetV2 inverse_res_block模块中unit的个数; +- inverse_res_block`multiply`:定义了MobilenetV2 inverse_res_block模块中expansion_factor变化区间; +- 卷积核尺寸`k_size`:定义了MobilenetV2中卷积和尺寸大小是3x3或者5x5。 + +根据定义的搜索空间各个区间,我们的搜索空间tokens共9位,变化区间在([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [7, 5, 8, 6, 2, 5, 8, 6, 2, 5, 8, 6, 2, 5, 10, 6, 2, 5, 10, 6, 2, 5, 12, 6, 2])范围内。 + + +初始化tokens为:[4, 4, 5, 1, 0, 4, 4, 1, 0, 4, 4, 3, 0, 4, 5, 2, 0, 4, 7, 2, 0, 4, 9, 0, 0]。 + +## 开始搜索 +首先需要安装PaddleSlim,请参考[安装教程](https://paddlepaddle.github.io/PaddleSlim/#_2)。 + +配置paddleseg的config, 下面只展示nas相关的内容 + +```shell +SLIM: + NAS_PORT: 23333 # 端口 + NAS_ADDRESS: "" # ip地址,作为server不用填写,作为client的时候需要填写server的ip + NAS_SEARCH_STEPS: 100 # 搜索多少个结构 + NAS_START_EVAL_EPOCH: -1 # 第几个epoch开始对模型进行评估 + NAS_IS_SERVER: True # 是否为server + NAS_SPACE_NAME: "MobileNetV2SpaceSeg" # 搜索空间 +``` + +## 训练与评估 +执行以下命令,边训练边评估 +```shell +CUDA_VISIBLE_DEVICES=0 python -u ./slim/nas/train_nas.py --log_steps 10 --cfg configs/deeplabv3p_mobilenetv2_cityscapes.yaml --use_gpu --use_mpio \ +SLIM.NAS_PORT 23333 \ +SLIM.NAS_ADDRESS "" \ +SLIM.NAS_SEARCH_STEPS 2 \ +SLIM.NAS_START_EVAL_EPOCH -1 \ +SLIM.NAS_IS_SERVER True \ +SLIM.NAS_SPACE_NAME "MobileNetV2SpaceSeg" \ +``` + + +## FAQ +- 运行报错:`socket.error: [Errno 98] Address already in use`。 + +解决方法:当前端口被占用,请修改`SLIM.NAS_PORT`端口。 + diff --git a/slim/nas/deeplab.py b/slim/nas/deeplab.py new file mode 100644 index 0000000000000000000000000000000000000000..6cbf840927b107a36273e9890f1ba4d076ddb417 --- /dev/null +++ b/slim/nas/deeplab.py @@ -0,0 +1,225 @@ +# coding: utf8 +# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +import contextlib +import paddle +import paddle.fluid as fluid +from utils.config import cfg +from models.libs.model_libs import scope, name_scope +from models.libs.model_libs import bn, bn_relu, relu +from models.libs.model_libs import conv +from models.libs.model_libs import separate_conv +from models.backbone.mobilenet_v2 import MobileNetV2 as mobilenet_backbone +from models.backbone.xception import Xception as xception_backbone + +def encoder(input): + # 编码器配置,采用ASPP架构,pooling + 1x1_conv + 三个不同尺度的空洞卷积并行, concat后1x1conv + # ASPP_WITH_SEP_CONV:默认为真,使用depthwise可分离卷积,否则使用普通卷积 + # OUTPUT_STRIDE: 下采样倍数,8或16,决定aspp_ratios大小 + # aspp_ratios:ASPP模块空洞卷积的采样率 + + if cfg.MODEL.DEEPLAB.OUTPUT_STRIDE == 16: + aspp_ratios = [6, 12, 18] + elif cfg.MODEL.DEEPLAB.OUTPUT_STRIDE == 8: + aspp_ratios = [12, 24, 36] + else: + raise Exception("deeplab only support stride 8 or 16") + + param_attr = fluid.ParamAttr( + name=name_scope + 'weights', + regularizer=None, + initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06)) + with scope('encoder'): + channel = 256 + with scope("image_pool"): + image_avg = fluid.layers.reduce_mean( + input, [2, 3], keep_dim=True) + image_avg = bn_relu( + conv( + image_avg, + channel, + 1, + 1, + groups=1, + padding=0, + param_attr=param_attr)) + image_avg = fluid.layers.resize_bilinear(image_avg, input.shape[2:]) + + with scope("aspp0"): + aspp0 = bn_relu( + conv( + input, + channel, + 1, + 1, + groups=1, + padding=0, + param_attr=param_attr)) + with scope("aspp1"): + if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV: + aspp1 = separate_conv( + input, channel, 1, 3, dilation=aspp_ratios[0], act=relu) + else: + aspp1 = bn_relu( + conv( + input, + channel, + stride=1, + filter_size=3, + dilation=aspp_ratios[0], + padding=aspp_ratios[0], + param_attr=param_attr)) + with scope("aspp2"): + if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV: + aspp2 = separate_conv( + input, channel, 1, 3, dilation=aspp_ratios[1], act=relu) + else: + aspp2 = bn_relu( + conv( + input, + channel, + stride=1, + filter_size=3, + dilation=aspp_ratios[1], + padding=aspp_ratios[1], + param_attr=param_attr)) + with scope("aspp3"): + if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV: + aspp3 = separate_conv( + input, channel, 1, 3, dilation=aspp_ratios[2], act=relu) + else: + aspp3 = bn_relu( + conv( + input, + channel, + stride=1, + filter_size=3, + dilation=aspp_ratios[2], + padding=aspp_ratios[2], + param_attr=param_attr)) + with scope("concat"): + data = fluid.layers.concat([image_avg, aspp0, aspp1, aspp2, aspp3], + axis=1) + data = bn_relu( + conv( + data, + channel, + 1, + 1, + groups=1, + padding=0, + param_attr=param_attr)) + data = fluid.layers.dropout(data, 0.9) + return data + + +def decoder(encode_data, decode_shortcut): + # 解码器配置 + # encode_data:编码器输出 + # decode_shortcut: 从backbone引出的分支, resize后与encode_data concat + # DECODER_USE_SEP_CONV: 默认为真,则concat后连接两个可分离卷积,否则为普通卷积 + param_attr = fluid.ParamAttr( + name=name_scope + 'weights', + regularizer=None, + initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06)) + with scope('decoder'): + with scope('concat'): + decode_shortcut = bn_relu( + conv( + decode_shortcut, + 48, + 1, + 1, + groups=1, + padding=0, + param_attr=param_attr)) + + encode_data = fluid.layers.resize_bilinear( + encode_data, decode_shortcut.shape[2:]) + encode_data = fluid.layers.concat([encode_data, decode_shortcut], + axis=1) + if cfg.MODEL.DEEPLAB.DECODER_USE_SEP_CONV: + with scope("separable_conv1"): + encode_data = separate_conv( + encode_data, 256, 1, 3, dilation=1, act=relu) + with scope("separable_conv2"): + encode_data = separate_conv( + encode_data, 256, 1, 3, dilation=1, act=relu) + else: + with scope("decoder_conv1"): + encode_data = bn_relu( + conv( + encode_data, + 256, + stride=1, + filter_size=3, + dilation=1, + padding=1, + param_attr=param_attr)) + with scope("decoder_conv2"): + encode_data = bn_relu( + conv( + encode_data, + 256, + stride=1, + filter_size=3, + dilation=1, + padding=1, + param_attr=param_attr)) + return encode_data + + +def nas_backbone(input, arch): + # scale = cfg.MODEL.DEEPLAB.DEPTH_MULTIPLIER + # output_stride = cfg.MODEL.DEEPLAB.OUTPUT_STRIDE + # model = mobilenet_backbone(scale=scale, output_stride=output_stride) + end_points = 8 + decode_point = 3 + data, decode_shortcuts = arch( + input, end_points=end_points, return_block=decode_point, output_stride=16) + decode_shortcut = decode_shortcuts[decode_point] + return data, decode_shortcut + + +def deeplabv3p_nas(img, num_classes, arch=None): + data, decode_shortcut = nas_backbone(img, arch) + # 编码器解码器设置 + cfg.MODEL.DEFAULT_EPSILON = 1e-5 + if cfg.MODEL.DEEPLAB.ENCODER_WITH_ASPP: + data = encoder(data) + if cfg.MODEL.DEEPLAB.ENABLE_DECODER: + data = decoder(data, decode_shortcut) + + # 根据类别数设置最后一个卷积层输出,并resize到图片原始尺寸 + param_attr = fluid.ParamAttr( + name=name_scope + 'weights', + regularizer=fluid.regularizer.L2DecayRegularizer( + regularization_coeff=0.0), + initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01)) + with scope('logit'): + logit = conv( + data, + num_classes, + 1, + stride=1, + padding=0, + bias_attr=True, + param_attr=param_attr) + logit = fluid.layers.resize_bilinear(logit, img.shape[2:]) + + return logit diff --git a/slim/nas/eval_nas.py b/slim/nas/eval_nas.py new file mode 100644 index 0000000000000000000000000000000000000000..08f75f5d8ee8d6afbcf9b038e4f8dcf0237a5b56 --- /dev/null +++ b/slim/nas/eval_nas.py @@ -0,0 +1,185 @@ +# coding: utf8 +# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +# GPU memory garbage collection optimization flags +os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0" + +import sys + +LOCAL_PATH = os.path.dirname(os.path.abspath(__file__)) +SEG_PATH = os.path.join(LOCAL_PATH, "../../", "pdseg") +sys.path.append(SEG_PATH) + +import time +import argparse +import functools +import pprint +import cv2 +import numpy as np +import paddle +import paddle.fluid as fluid + +from utils.config import cfg +from utils.timer import Timer, calculate_eta +from model_builder import build_model +from model_builder import ModelPhase +from reader import SegDataset +from metrics import ConfusionMatrix + +from mobilenetv2_search_space import MobileNetV2SpaceSeg + +def parse_args(): + parser = argparse.ArgumentParser(description='PaddleSeg model evalution') + parser.add_argument( + '--cfg', + dest='cfg_file', + help='Config file for training (and optionally testing)', + default=None, + type=str) + parser.add_argument( + '--use_gpu', + dest='use_gpu', + help='Use gpu or cpu', + action='store_true', + default=False) + parser.add_argument( + '--use_mpio', + dest='use_mpio', + help='Use multiprocess IO or not', + action='store_true', + default=False) + parser.add_argument( + 'opts', + help='See utils/config.py for all options', + default=None, + nargs=argparse.REMAINDER) + if len(sys.argv) == 1: + parser.print_help() + sys.exit(1) + return parser.parse_args() + + +def evaluate(cfg, ckpt_dir=None, use_gpu=False, use_mpio=False, **kwargs): + np.set_printoptions(precision=5, suppress=True) + + startup_prog = fluid.Program() + test_prog = fluid.Program() + dataset = SegDataset( + file_list=cfg.DATASET.VAL_FILE_LIST, + mode=ModelPhase.EVAL, + data_dir=cfg.DATASET.DATA_DIR) + + def data_generator(): + #TODO: check is batch reader compatitable with Windows + if use_mpio: + data_gen = dataset.multiprocess_generator( + num_processes=cfg.DATALOADER.NUM_WORKERS, + max_queue_size=cfg.DATALOADER.BUF_SIZE) + else: + data_gen = dataset.generator() + + for b in data_gen: + yield b[0], b[1], b[2] + + py_reader, avg_loss, pred, grts, masks = build_model( + test_prog, startup_prog, phase=ModelPhase.EVAL, arch=kwargs['arch']) + + py_reader.decorate_sample_generator( + data_generator, drop_last=False, batch_size=cfg.BATCH_SIZE) + + # Get device environment + places = fluid.cuda_places() if use_gpu else fluid.cpu_places() + place = places[0] + dev_count = len(places) + print("#Device count: {}".format(dev_count)) + + exe = fluid.Executor(place) + exe.run(startup_prog) + + test_prog = test_prog.clone(for_test=True) + + ckpt_dir = cfg.TEST.TEST_MODEL if not ckpt_dir else ckpt_dir + + if not os.path.exists(ckpt_dir): + raise ValueError('The TEST.TEST_MODEL {} is not found'.format(ckpt_dir)) + + if ckpt_dir is not None: + print('load test model:', ckpt_dir) + fluid.io.load_params(exe, ckpt_dir, main_program=test_prog) + + # Use streaming confusion matrix to calculate mean_iou + np.set_printoptions( + precision=4, suppress=True, linewidth=160, floatmode="fixed") + conf_mat = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True) + fetch_list = [avg_loss.name, pred.name, grts.name, masks.name] + num_images = 0 + step = 0 + all_step = cfg.DATASET.TEST_TOTAL_IMAGES // cfg.BATCH_SIZE + 1 + timer = Timer() + timer.start() + py_reader.start() + while True: + try: + step += 1 + loss, pred, grts, masks = exe.run( + test_prog, fetch_list=fetch_list, return_numpy=True) + + loss = np.mean(np.array(loss)) + + num_images += pred.shape[0] + conf_mat.calculate(pred, grts, masks) + _, iou = conf_mat.mean_iou() + _, acc = conf_mat.accuracy() + + speed = 1.0 / timer.elapsed_time() + + print( + "[EVAL]step={} loss={:.5f} acc={:.4f} IoU={:.4f} step/sec={:.2f} | ETA {}" + .format(step, loss, acc, iou, speed, + calculate_eta(all_step - step, speed))) + timer.restart() + sys.stdout.flush() + except fluid.core.EOFException: + break + + category_iou, avg_iou = conf_mat.mean_iou() + category_acc, avg_acc = conf_mat.accuracy() + print("[EVAL]#image={} acc={:.4f} IoU={:.4f}".format( + num_images, avg_acc, avg_iou)) + print("[EVAL]Category IoU:", category_iou) + print("[EVAL]Category Acc:", category_acc) + print("[EVAL]Kappa:{:.4f}".format(conf_mat.kappa())) + + return category_iou, avg_iou, category_acc, avg_acc + + +def main(): + args = parse_args() + if args.cfg_file is not None: + cfg.update_from_file(args.cfg_file) + if args.opts: + cfg.update_from_list(args.opts) + cfg.check_and_infer() + print(pprint.pformat(cfg)) + evaluate(cfg, **args.__dict__) + + +if __name__ == '__main__': + main() diff --git a/slim/nas/mobilenetv2_search_space.py b/slim/nas/mobilenetv2_search_space.py new file mode 100644 index 0000000000000000000000000000000000000000..2703e161f02e9659040b827fff8d345db5bf5946 --- /dev/null +++ b/slim/nas/mobilenetv2_search_space.py @@ -0,0 +1,323 @@ +# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License" +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import numpy as np +import paddle.fluid as fluid +from paddle.fluid.param_attr import ParamAttr +from paddleslim.nas.search_space.search_space_base import SearchSpaceBase +from paddleslim.nas.search_space.base_layer import conv_bn_layer +from paddleslim.nas.search_space.search_space_registry import SEARCHSPACE +from paddleslim.nas.search_space.utils import check_points + +__all__ = ["MobileNetV2SpaceSeg"] + + +@SEARCHSPACE.register +class MobileNetV2SpaceSeg(SearchSpaceBase): + def __init__(self, input_size, output_size, block_num, block_mask=None): + super(MobileNetV2SpaceSeg, self).__init__(input_size, output_size, + block_num, block_mask) + # self.head_num means the first convolution channel + self.head_num = np.array([3, 4, 8, 12, 16, 24, 32]) #7 + # self.filter_num1 ~ self.filter_num6 means following convlution channel + self.filter_num1 = np.array([3, 4, 8, 12, 16, 24, 32, 48]) #8 + self.filter_num2 = np.array([8, 12, 16, 24, 32, 48, 64, 80]) #8 + self.filter_num3 = np.array([16, 24, 32, 48, 64, 80, 96, 128]) #8 + self.filter_num4 = np.array( + [24, 32, 48, 64, 80, 96, 128, 144, 160, 192]) #10 + self.filter_num5 = np.array( + [32, 48, 64, 80, 96, 128, 144, 160, 192, 224]) #10 + self.filter_num6 = np.array( + [64, 80, 96, 128, 144, 160, 192, 224, 256, 320, 384, 512]) #12 + # self.k_size means kernel size + self.k_size = np.array([3, 5]) #2 + # self.multiply means expansion_factor of each _inverted_residual_unit + self.multiply = np.array([1, 2, 3, 4, 6]) #5 + # self.repeat means repeat_num _inverted_residual_unit in each _invresi_blocks + self.repeat = np.array([1, 2, 3, 4, 5, 6]) #6 + + def init_tokens(self): + """ + The initial token. + The first one is the index of the first layers' channel in self.head_num, + each line in the following represent the index of the [expansion_factor, filter_num, repeat_num, kernel_size] + """ + # original MobileNetV2 + # yapf: disable + init_token_base = [4, # 1, 16, 1 + 4, 5, 1, 0, # 6, 24, 2 + 4, 4, 2, 0, # 6, 32, 3 + 4, 4, 3, 0, # 6, 64, 4 + 4, 5, 2, 0, # 6, 96, 3 + 4, 7, 2, 0, # 6, 160, 3 + 4, 9, 0, 0] # 6, 320, 1 + # yapf: enable + + return init_token_base + + def range_table(self): + """ + Get range table of current search space, constrains the range of tokens. + """ + # head_num + 6 * [multiple(expansion_factor), filter_num, repeat, kernel_size] + # yapf: disable + range_table_base = [len(self.head_num), + len(self.multiply), len(self.filter_num1), len(self.repeat), len(self.k_size), + len(self.multiply), len(self.filter_num2), len(self.repeat), len(self.k_size), + len(self.multiply), len(self.filter_num3), len(self.repeat), len(self.k_size), + len(self.multiply), len(self.filter_num4), len(self.repeat), len(self.k_size), + len(self.multiply), len(self.filter_num5), len(self.repeat), len(self.k_size), + len(self.multiply), len(self.filter_num6), len(self.repeat), len(self.k_size)] + # yapf: enable + return range_table_base + + def token2arch(self, tokens=None): + """ + return net_arch function + """ + + if tokens is None: + tokens = self.init_tokens() + + self.bottleneck_params_list = [] + self.bottleneck_params_list.append( + (1, self.head_num[tokens[0]], 1, 1, 3)) + self.bottleneck_params_list.append( + (self.multiply[tokens[1]], self.filter_num1[tokens[2]], + self.repeat[tokens[3]], 2, self.k_size[tokens[4]])) + self.bottleneck_params_list.append( + (self.multiply[tokens[5]], self.filter_num2[tokens[6]], + self.repeat[tokens[7]], 2, self.k_size[tokens[8]])) + self.bottleneck_params_list.append( + (self.multiply[tokens[9]], self.filter_num3[tokens[10]], + self.repeat[tokens[11]], 2, self.k_size[tokens[12]])) + self.bottleneck_params_list.append( + (self.multiply[tokens[13]], self.filter_num4[tokens[14]], + self.repeat[tokens[15]], 1, self.k_size[tokens[16]])) + self.bottleneck_params_list.append( + (self.multiply[tokens[17]], self.filter_num5[tokens[18]], + self.repeat[tokens[19]], 2, self.k_size[tokens[20]])) + self.bottleneck_params_list.append( + (self.multiply[tokens[21]], self.filter_num6[tokens[22]], + self.repeat[tokens[23]], 1, self.k_size[tokens[24]])) + + def _modify_bottle_params(output_stride=None): + if output_stride is not None and output_stride % 2 != 0: + raise Exception("output stride must to be even number") + if output_stride is None: + return + else: + stride = 2 + for i, layer_setting in enumerate(self.bottleneck_params_list): + t, c, n, s, ks = layer_setting + stride = stride * s + if stride > output_stride: + s = 1 + self.bottleneck_params_list[i] = (t, c, n, s, ks) + + def net_arch(input, + scale=1.0, + return_block=None, + end_points=None, + output_stride=None): + self.scale = scale + _modify_bottle_params(output_stride) + + decode_ends = dict() + + def check_points(count, points): + if points is None: + return False + else: + if isinstance(points, list): + return (True if count in points else False) + else: + return (True if count == points else False) + + #conv1 + # all padding is 'SAME' in the conv2d, can compute the actual padding automatic. + input = conv_bn_layer( + input, + num_filters=int(32 * self.scale), + filter_size=3, + stride=2, + padding='SAME', + act='relu6', + name='mobilenetv2_conv1') + layer_count = 1 + + depthwise_output = None + # bottleneck sequences + in_c = int(32 * self.scale) + for i, layer_setting in enumerate(self.bottleneck_params_list): + t, c, n, s, k = layer_setting + layer_count += 1 + ### return_block and end_points means block num + if check_points((layer_count - 1), return_block): + decode_ends[layer_count - 1] = depthwise_output + + if check_points((layer_count - 1), end_points): + return input, decode_ends + input, depthwise_output = self._invresi_blocks( + input=input, + in_c=in_c, + t=t, + c=int(c * self.scale), + n=n, + s=s, + k=int(k), + name='mobilenetv2_conv' + str(i)) + in_c = int(c * self.scale) + + ### return_block and end_points means block num + if check_points(layer_count, return_block): + decode_ends[layer_count] = depthwise_output + + if check_points(layer_count, end_points): + return input, decode_ends + # last conv + input = conv_bn_layer( + input=input, + num_filters=int(1280 * self.scale) + if self.scale > 1.0 else 1280, + filter_size=1, + stride=1, + padding='SAME', + act='relu6', + name='mobilenetv2_conv' + str(i + 1)) + + input = fluid.layers.pool2d( + input=input, + pool_type='avg', + global_pooling=True, + name='mobilenetv2_last_pool') + + return input + + return net_arch + + def _shortcut(self, input, data_residual): + """Build shortcut layer. + Args: + input(Variable): input. + data_residual(Variable): residual layer. + Returns: + Variable, layer output. + """ + return fluid.layers.elementwise_add(input, data_residual) + + def _inverted_residual_unit(self, + input, + num_in_filter, + num_filters, + ifshortcut, + stride, + filter_size, + expansion_factor, + reduction_ratio=4, + name=None): + """Build inverted residual unit. + Args: + input(Variable), input. + num_in_filter(int), number of in filters. + num_filters(int), number of filters. + ifshortcut(bool), whether using shortcut. + stride(int), stride. + filter_size(int), filter size. + padding(str|int|list), padding. + expansion_factor(float), expansion factor. + name(str), name. + Returns: + Variable, layers output. + """ + num_expfilter = int(round(num_in_filter * expansion_factor)) + channel_expand = conv_bn_layer( + input=input, + num_filters=num_expfilter, + filter_size=1, + stride=1, + padding='SAME', + num_groups=1, + act='relu6', + name=name + '_expand') + + bottleneck_conv = conv_bn_layer( + input=channel_expand, + num_filters=num_expfilter, + filter_size=filter_size, + stride=stride, + padding='SAME', + num_groups=num_expfilter, + act='relu6', + name=name + '_dwise', + use_cudnn=False) + + depthwise_output = bottleneck_conv + + linear_out = conv_bn_layer( + input=bottleneck_conv, + num_filters=num_filters, + filter_size=1, + stride=1, + padding='SAME', + num_groups=1, + act=None, + name=name + '_linear') + out = linear_out + if ifshortcut: + out = self._shortcut(input=input, data_residual=out) + return out, depthwise_output + + def _invresi_blocks(self, input, in_c, t, c, n, s, k, name=None): + """Build inverted residual blocks. + Args: + input: Variable, input. + in_c: int, number of in filters. + t: float, expansion factor. + c: int, number of filters. + n: int, number of layers. + s: int, stride. + k: int, filter size. + name: str, name. + Returns: + Variable, layers output. + """ + first_block, depthwise_output = self._inverted_residual_unit( + input=input, + num_in_filter=in_c, + num_filters=c, + ifshortcut=False, + stride=s, + filter_size=k, + expansion_factor=t, + name=name + '_1') + + last_residual_block = first_block + last_c = c + + for i in range(1, n): + last_residual_block, depthwise_output = self._inverted_residual_unit( + input=last_residual_block, + num_in_filter=last_c, + num_filters=c, + ifshortcut=True, + stride=1, + filter_size=k, + expansion_factor=t, + name=name + '_' + str(i + 1)) + return last_residual_block, depthwise_output diff --git a/slim/nas/model_builder.py b/slim/nas/model_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..3dfbacb0cd41a14bb81c6f6c82b81479fb1c30c8 --- /dev/null +++ b/slim/nas/model_builder.py @@ -0,0 +1,316 @@ +# coding: utf8 +# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import struct + +import paddle.fluid as fluid +import numpy as np +from paddle.fluid.proto.framework_pb2 import VarType + +import solver +from utils.config import cfg +from loss import multi_softmax_with_loss +from loss import multi_dice_loss +from loss import multi_bce_loss +import deeplab + + +class ModelPhase(object): + """ + Standard name for model phase in PaddleSeg + + The following standard keys are defined: + * `TRAIN`: training mode. + * `EVAL`: testing/evaluation mode. + * `PREDICT`: prediction/inference mode. + * `VISUAL` : visualization mode + """ + + TRAIN = 'train' + EVAL = 'eval' + PREDICT = 'predict' + VISUAL = 'visual' + + @staticmethod + def is_train(phase): + return phase == ModelPhase.TRAIN + + @staticmethod + def is_predict(phase): + return phase == ModelPhase.PREDICT + + @staticmethod + def is_eval(phase): + return phase == ModelPhase.EVAL + + @staticmethod + def is_visual(phase): + return phase == ModelPhase.VISUAL + + @staticmethod + def is_valid_phase(phase): + """ Check valid phase """ + if ModelPhase.is_train(phase) or ModelPhase.is_predict(phase) \ + or ModelPhase.is_eval(phase) or ModelPhase.is_visual(phase): + return True + + return False + + +def seg_model(image, class_num, arch): + model_name = cfg.MODEL.MODEL_NAME + if model_name == 'deeplabv3p': + logits = deeplab.deeplabv3p_nas(image, class_num, arch) + else: + raise Exception( + "unknow model name, only support deeplabv3p" + ) + return logits + + +def softmax(logit): + logit = fluid.layers.transpose(logit, [0, 2, 3, 1]) + logit = fluid.layers.softmax(logit) + logit = fluid.layers.transpose(logit, [0, 3, 1, 2]) + return logit + + +def sigmoid_to_softmax(logit): + """ + one channel to two channel + """ + logit = fluid.layers.transpose(logit, [0, 2, 3, 1]) + logit = fluid.layers.sigmoid(logit) + logit_back = 1 - logit + logit = fluid.layers.concat([logit_back, logit], axis=-1) + logit = fluid.layers.transpose(logit, [0, 3, 1, 2]) + return logit + + +def export_preprocess(image): + """导出模型的预处理流程""" + + image = fluid.layers.transpose(image, [0, 3, 1, 2]) + origin_shape = fluid.layers.shape(image)[-2:] + + # 不同AUG_METHOD方法的resize + if cfg.AUG.AUG_METHOD == 'unpadding': + h_fix = cfg.AUG.FIX_RESIZE_SIZE[1] + w_fix = cfg.AUG.FIX_RESIZE_SIZE[0] + image = fluid.layers.resize_bilinear( + image, out_shape=[h_fix, w_fix], align_corners=False, align_mode=0) + elif cfg.AUG.AUG_METHOD == 'rangescaling': + size = cfg.AUG.INF_RESIZE_VALUE + value = fluid.layers.reduce_max(origin_shape) + scale = float(size) / value.astype('float32') + image = fluid.layers.resize_bilinear( + image, scale=scale, align_corners=False, align_mode=0) + + # 存储resize后图像shape + valid_shape = fluid.layers.shape(image)[-2:] + + # padding到eval_crop_size大小 + width = cfg.EVAL_CROP_SIZE[0] + height = cfg.EVAL_CROP_SIZE[1] + pad_target = fluid.layers.assign( + np.array([height, width]).astype('float32')) + up = fluid.layers.assign(np.array([0]).astype('float32')) + down = pad_target[0] - valid_shape[0] + left = up + right = pad_target[1] - valid_shape[1] + paddings = fluid.layers.concat([up, down, left, right]) + paddings = fluid.layers.cast(paddings, 'int32') + image = fluid.layers.pad2d(image, paddings=paddings, pad_value=127.5) + + # normalize + mean = np.array(cfg.MEAN).reshape(1, len(cfg.MEAN), 1, 1) + mean = fluid.layers.assign(mean.astype('float32')) + std = np.array(cfg.STD).reshape(1, len(cfg.STD), 1, 1) + std = fluid.layers.assign(std.astype('float32')) + image = (image / 255 - mean) / std + # 使后面的网络能通过类似image.shape获取特征图的shape + image = fluid.layers.reshape( + image, shape=[-1, cfg.DATASET.DATA_DIM, height, width]) + return image, valid_shape, origin_shape + + +def build_model(main_prog, start_prog, phase=ModelPhase.TRAIN, arch=None): + if not ModelPhase.is_valid_phase(phase): + raise ValueError("ModelPhase {} is not valid!".format(phase)) + if ModelPhase.is_train(phase): + width = cfg.TRAIN_CROP_SIZE[0] + height = cfg.TRAIN_CROP_SIZE[1] + else: + width = cfg.EVAL_CROP_SIZE[0] + height = cfg.EVAL_CROP_SIZE[1] + + image_shape = [cfg.DATASET.DATA_DIM, height, width] + grt_shape = [1, height, width] + class_num = cfg.DATASET.NUM_CLASSES + + with fluid.program_guard(main_prog, start_prog): + with fluid.unique_name.guard(): + # 在导出模型的时候,增加图像标准化预处理,减小预测部署时图像的处理流程 + # 预测部署时只须对输入图像增加batch_size维度即可 + if ModelPhase.is_predict(phase): + origin_image = fluid.layers.data( + name='image', + shape=[-1, -1, -1, cfg.DATASET.DATA_DIM], + dtype='float32', + append_batch_size=False) + image, valid_shape, origin_shape = export_preprocess( + origin_image) + + else: + image = fluid.layers.data( + name='image', shape=image_shape, dtype='float32') + label = fluid.layers.data( + name='label', shape=grt_shape, dtype='int32') + mask = fluid.layers.data( + name='mask', shape=grt_shape, dtype='int32') + + # use PyReader when doing traning and evaluation + if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase): + py_reader = fluid.io.PyReader( + feed_list=[image, label, mask], + capacity=cfg.DATALOADER.BUF_SIZE, + iterable=False, + use_double_buffer=True) + + loss_type = cfg.SOLVER.LOSS + if not isinstance(loss_type, list): + loss_type = list(loss_type) + + # dice_loss或bce_loss只适用两类分割中 + if class_num > 2 and (("dice_loss" in loss_type) or + ("bce_loss" in loss_type)): + raise Exception( + "dice loss and bce loss is only applicable to binary classfication" + ) + + # 在两类分割情况下,当loss函数选择dice_loss或bce_loss的时候,最后logit输出通道数设置为1 + if ("dice_loss" in loss_type) or ("bce_loss" in loss_type): + class_num = 1 + if "softmax_loss" in loss_type: + raise Exception( + "softmax loss can not combine with dice loss or bce loss" + ) + logits = seg_model(image, class_num, arch) + + # 根据选择的loss函数计算相应的损失函数 + if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase): + loss_valid = False + avg_loss_list = [] + valid_loss = [] + if "softmax_loss" in loss_type: + weight = cfg.SOLVER.CROSS_ENTROPY_WEIGHT + avg_loss_list.append( + multi_softmax_with_loss(logits, label, mask, class_num, weight)) + loss_valid = True + valid_loss.append("softmax_loss") + if "dice_loss" in loss_type: + avg_loss_list.append(multi_dice_loss(logits, label, mask)) + loss_valid = True + valid_loss.append("dice_loss") + if "bce_loss" in loss_type: + avg_loss_list.append(multi_bce_loss(logits, label, mask)) + loss_valid = True + valid_loss.append("bce_loss") + if not loss_valid: + raise Exception( + "SOLVER.LOSS: {} is set wrong. it should " + "include one of (softmax_loss, bce_loss, dice_loss) at least" + " example: ['softmax_loss'], ['dice_loss'], ['bce_loss', 'dice_loss']" + .format(cfg.SOLVER.LOSS)) + + invalid_loss = [x for x in loss_type if x not in valid_loss] + if len(invalid_loss) > 0: + print( + "Warning: the loss {} you set is invalid. it will not be included in loss computed." + .format(invalid_loss)) + + avg_loss = 0 + for i in range(0, len(avg_loss_list)): + avg_loss += avg_loss_list[i] + + #get pred result in original size + if isinstance(logits, tuple): + logit = logits[0] + else: + logit = logits + + if logit.shape[2:] != label.shape[2:]: + logit = fluid.layers.resize_bilinear(logit, label.shape[2:]) + + # return image input and logit output for inference graph prune + if ModelPhase.is_predict(phase): + # 两类分割中,使用dice_loss或bce_loss返回的logit为单通道,进行到两通道的变换 + if class_num == 1: + logit = sigmoid_to_softmax(logit) + else: + logit = softmax(logit) + + # 获取有效部分 + logit = fluid.layers.slice( + logit, axes=[2, 3], starts=[0, 0], ends=valid_shape) + + logit = fluid.layers.resize_bilinear( + logit, + out_shape=origin_shape, + align_corners=False, + align_mode=0) + logit = fluid.layers.argmax(logit, axis=1) + return origin_image, logit + + if class_num == 1: + out = sigmoid_to_softmax(logit) + out = fluid.layers.transpose(out, [0, 2, 3, 1]) + else: + out = fluid.layers.transpose(logit, [0, 2, 3, 1]) + + pred = fluid.layers.argmax(out, axis=3) + pred = fluid.layers.unsqueeze(pred, axes=[3]) + if ModelPhase.is_visual(phase): + if class_num == 1: + logit = sigmoid_to_softmax(logit) + else: + logit = softmax(logit) + return pred, logit + + if ModelPhase.is_eval(phase): + return py_reader, avg_loss, pred, label, mask + + if ModelPhase.is_train(phase): + optimizer = solver.Solver(main_prog, start_prog) + decayed_lr = optimizer.optimise(avg_loss) + return py_reader, avg_loss, decayed_lr, pred, label, mask + + +def to_int(string, dest="I"): + return struct.unpack(dest, string)[0] + + +def parse_shape_from_file(filename): + with open(filename, "rb") as file: + version = file.read(4) + lod_level = to_int(file.read(8), dest="Q") + for i in range(lod_level): + _size = to_int(file.read(8), dest="Q") + _ = file.read(_size) + version = file.read(4) + tensor_desc_size = to_int(file.read(4)) + tensor_desc = VarType.TensorDesc() + tensor_desc.ParseFromString(file.read(tensor_desc_size)) + return tuple(tensor_desc.dims) diff --git a/slim/nas/train_nas.py b/slim/nas/train_nas.py new file mode 100644 index 0000000000000000000000000000000000000000..7822657fa264d053360199d5691098ae85fcd12c --- /dev/null +++ b/slim/nas/train_nas.py @@ -0,0 +1,456 @@ +# coding: utf8 +# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +# GPU memory garbage collection optimization flags +os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0" + +import sys + +LOCAL_PATH = os.path.dirname(os.path.abspath(__file__)) +SEG_PATH = os.path.join(LOCAL_PATH, "../../", "pdseg") +sys.path.append(SEG_PATH) + +import argparse +import pprint +import random +import shutil +import functools + +import paddle +import numpy as np +import paddle.fluid as fluid + +from utils.config import cfg +from utils.timer import Timer, calculate_eta +from metrics import ConfusionMatrix +from reader import SegDataset +from model_builder import build_model +from model_builder import ModelPhase +from model_builder import parse_shape_from_file +from eval_nas import evaluate +from vis import visualize +from utils import dist_utils + +from mobilenetv2_search_space import MobileNetV2SpaceSeg +from paddleslim.nas.search_space.search_space_factory import SearchSpaceFactory +from paddleslim.analysis import flops +from paddleslim.nas.sa_nas import SANAS +from paddleslim.nas import search_space + +def parse_args(): + parser = argparse.ArgumentParser(description='PaddleSeg training') + parser.add_argument( + '--cfg', + dest='cfg_file', + help='Config file for training (and optionally testing)', + default=None, + type=str) + parser.add_argument( + '--use_gpu', + dest='use_gpu', + help='Use gpu or cpu', + action='store_true', + default=False) + parser.add_argument( + '--use_mpio', + dest='use_mpio', + help='Use multiprocess I/O or not', + action='store_true', + default=False) + parser.add_argument( + '--log_steps', + dest='log_steps', + help='Display logging information at every log_steps', + default=10, + type=int) + parser.add_argument( + '--debug', + dest='debug', + help='debug mode, display detail information of training', + action='store_true') + parser.add_argument( + '--use_tb', + dest='use_tb', + help='whether to record the data during training to Tensorboard', + action='store_true') + parser.add_argument( + '--tb_log_dir', + dest='tb_log_dir', + help='Tensorboard logging directory', + default=None, + type=str) + parser.add_argument( + '--do_eval', + dest='do_eval', + help='Evaluation models result on every new checkpoint', + action='store_true') + parser.add_argument( + 'opts', + help='See utils/config.py for all options', + default=None, + nargs=argparse.REMAINDER) + parser.add_argument( + '--enable_ce', + dest='enable_ce', + help='If set True, enable continuous evaluation job.' + 'This flag is only used for internal test.', + action='store_true') + return parser.parse_args() + + +def save_vars(executor, dirname, program=None, vars=None): + """ + Temporary resolution for Win save variables compatability. + Will fix in PaddlePaddle v1.5.2 + """ + + save_program = fluid.Program() + save_block = save_program.global_block() + + for each_var in vars: + # NOTE: don't save the variable which type is RAW + if each_var.type == fluid.core.VarDesc.VarType.RAW: + continue + new_var = save_block.create_var( + name=each_var.name, + shape=each_var.shape, + dtype=each_var.dtype, + type=each_var.type, + lod_level=each_var.lod_level, + persistable=True) + file_path = os.path.join(dirname, new_var.name) + file_path = os.path.normpath(file_path) + save_block.append_op( + type='save', + inputs={'X': [new_var]}, + outputs={}, + attrs={'file_path': file_path}) + + executor.run(save_program) + + +def save_checkpoint(exe, program, ckpt_name): + """ + Save checkpoint for evaluation or resume training + """ + ckpt_dir = os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, str(ckpt_name)) + print("Save model checkpoint to {}".format(ckpt_dir)) + if not os.path.isdir(ckpt_dir): + os.makedirs(ckpt_dir) + + save_vars( + exe, + ckpt_dir, + program, + vars=list(filter(fluid.io.is_persistable, program.list_vars()))) + + return ckpt_dir + + +def load_checkpoint(exe, program): + """ + Load checkpoiont from pretrained model directory for resume training + """ + + print('Resume model training from:', cfg.TRAIN.RESUME_MODEL_DIR) + if not os.path.exists(cfg.TRAIN.RESUME_MODEL_DIR): + raise ValueError("TRAIN.PRETRAIN_MODEL {} not exist!".format( + cfg.TRAIN.RESUME_MODEL_DIR)) + + fluid.io.load_persistables( + exe, cfg.TRAIN.RESUME_MODEL_DIR, main_program=program) + + model_path = cfg.TRAIN.RESUME_MODEL_DIR + # Check is path ended by path spearator + if model_path[-1] == os.sep: + model_path = model_path[0:-1] + epoch_name = os.path.basename(model_path) + # If resume model is final model + if epoch_name == 'final': + begin_epoch = cfg.SOLVER.NUM_EPOCHS + # If resume model path is end of digit, restore epoch status + elif epoch_name.isdigit(): + epoch = int(epoch_name) + begin_epoch = epoch + 1 + else: + raise ValueError("Resume model path is not valid!") + print("Model checkpoint loaded successfully!") + + return begin_epoch + + +def update_best_model(ckpt_dir): + best_model_dir = os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, 'best_model') + if os.path.exists(best_model_dir): + shutil.rmtree(best_model_dir) + shutil.copytree(ckpt_dir, best_model_dir) + + +def print_info(*msg): + if cfg.TRAINER_ID == 0: + print(*msg) + + +def train(cfg): + startup_prog = fluid.Program() + train_prog = fluid.Program() + if args.enable_ce: + startup_prog.random_seed = 1000 + train_prog.random_seed = 1000 + drop_last = True + + dataset = SegDataset( + file_list=cfg.DATASET.TRAIN_FILE_LIST, + mode=ModelPhase.TRAIN, + shuffle=True, + data_dir=cfg.DATASET.DATA_DIR) + + def data_generator(): + if args.use_mpio: + data_gen = dataset.multiprocess_generator( + num_processes=cfg.DATALOADER.NUM_WORKERS, + max_queue_size=cfg.DATALOADER.BUF_SIZE) + else: + data_gen = dataset.generator() + + batch_data = [] + for b in data_gen: + batch_data.append(b) + if len(batch_data) == (cfg.BATCH_SIZE // cfg.NUM_TRAINERS): + for item in batch_data: + yield item[0], item[1], item[2] + batch_data = [] + # If use sync batch norm strategy, drop last batch if number of samples + # in batch_data is less then cfg.BATCH_SIZE to avoid NCCL hang issues + if not cfg.TRAIN.SYNC_BATCH_NORM: + for item in batch_data: + yield item[0], item[1], item[2] + + # Get device environment + # places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places() + # place = places[0] + gpu_id = int(os.environ.get('FLAGS_selected_gpus', 0)) + place = fluid.CUDAPlace(gpu_id) if args.use_gpu else fluid.CPUPlace() + places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places() + + # Get number of GPU + dev_count = cfg.NUM_TRAINERS if cfg.NUM_TRAINERS > 1 else len(places) + print_info("#Device count: {}".format(dev_count)) + + # Make sure BATCH_SIZE can divided by GPU cards + assert cfg.BATCH_SIZE % dev_count == 0, ( + 'BATCH_SIZE:{} not divisble by number of GPUs:{}'.format( + cfg.BATCH_SIZE, dev_count)) + # If use multi-gpu training mode, batch data will allocated to each GPU evenly + batch_size_per_dev = cfg.BATCH_SIZE // dev_count + print_info("batch_size_per_dev: {}".format(batch_size_per_dev)) + + config_info = {'input_size': 769, 'output_size': 1, 'block_num': 7} + config = ([(cfg.SLIM.NAS_SPACE_NAME, config_info)]) + factory = SearchSpaceFactory() + space = factory.get_search_space(config) + + port = cfg.SLIM.NAS_PORT + server_address = (cfg.SLIM.NAS_ADDRESS, port) + sa_nas = SANAS(config, server_addr=server_address, search_steps=cfg.SLIM.NAS_SEARCH_STEPS, + is_server=cfg.SLIM.NAS_IS_SERVER) + for step in range(cfg.SLIM.NAS_SEARCH_STEPS): + arch = sa_nas.next_archs()[0] + + start_prog = fluid.Program() + train_prog = fluid.Program() + + py_reader, avg_loss, lr, pred, grts, masks = build_model( + train_prog, start_prog, arch=arch, phase=ModelPhase.TRAIN) + + cur_flops = flops(train_prog) + print('current step:', step, 'flops:', cur_flops) + + py_reader.decorate_sample_generator( + data_generator, batch_size=batch_size_per_dev, drop_last=drop_last) + + exe = fluid.Executor(place) + exe.run(start_prog) + + exec_strategy = fluid.ExecutionStrategy() + # Clear temporary variables every 100 iteration + if args.use_gpu: + exec_strategy.num_threads = fluid.core.get_cuda_device_count() + exec_strategy.num_iteration_per_drop_scope = 100 + build_strategy = fluid.BuildStrategy() + + if cfg.NUM_TRAINERS > 1 and args.use_gpu: + dist_utils.prepare_for_multi_process(exe, build_strategy, train_prog) + exec_strategy.num_threads = 1 + + if cfg.TRAIN.SYNC_BATCH_NORM and args.use_gpu: + if dev_count > 1: + # Apply sync batch norm strategy + print_info("Sync BatchNorm strategy is effective.") + build_strategy.sync_batch_norm = True + else: + print_info( + "Sync BatchNorm strategy will not be effective if GPU device" + " count <= 1") + compiled_train_prog = fluid.CompiledProgram(train_prog).with_data_parallel( + loss_name=avg_loss.name, + exec_strategy=exec_strategy, + build_strategy=build_strategy) + + # Resume training + begin_epoch = cfg.SOLVER.BEGIN_EPOCH + if cfg.TRAIN.RESUME_MODEL_DIR: + begin_epoch = load_checkpoint(exe, train_prog) + # Load pretrained model + elif os.path.exists(cfg.TRAIN.PRETRAINED_MODEL_DIR): + print_info('Pretrained model dir: ', cfg.TRAIN.PRETRAINED_MODEL_DIR) + load_vars = [] + load_fail_vars = [] + + def var_shape_matched(var, shape): + """ + Check whehter persitable variable shape is match with current network + """ + var_exist = os.path.exists( + os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name)) + if var_exist: + var_shape = parse_shape_from_file( + os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name)) + return var_shape == shape + return False + + for x in train_prog.list_vars(): + if isinstance(x, fluid.framework.Parameter): + shape = tuple(fluid.global_scope().find_var( + x.name).get_tensor().shape()) + if var_shape_matched(x, shape): + load_vars.append(x) + else: + load_fail_vars.append(x) + + fluid.io.load_vars( + exe, dirname=cfg.TRAIN.PRETRAINED_MODEL_DIR, vars=load_vars) + for var in load_vars: + print_info("Parameter[{}] loaded sucessfully!".format(var.name)) + for var in load_fail_vars: + print_info( + "Parameter[{}] don't exist or shape does not match current network, skip" + " to load it.".format(var.name)) + print_info("{}/{} pretrained parameters loaded successfully!".format( + len(load_vars), + len(load_vars) + len(load_fail_vars))) + else: + print_info( + 'Pretrained model dir {} not exists, training from scratch...'. + format(cfg.TRAIN.PRETRAINED_MODEL_DIR)) + + fetch_list = [avg_loss.name, lr.name] + + global_step = 0 + all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE + if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True: + all_step += 1 + all_step *= (cfg.SOLVER.NUM_EPOCHS - begin_epoch + 1) + + avg_loss = 0.0 + timer = Timer() + timer.start() + if begin_epoch > cfg.SOLVER.NUM_EPOCHS: + raise ValueError( + ("begin epoch[{}] is larger than cfg.SOLVER.NUM_EPOCHS[{}]").format( + begin_epoch, cfg.SOLVER.NUM_EPOCHS)) + + if args.use_mpio: + print_info("Use multiprocess reader") + else: + print_info("Use multi-thread reader") + + best_miou = 0.0 + for epoch in range(begin_epoch, cfg.SOLVER.NUM_EPOCHS + 1): + py_reader.start() + while True: + try: + loss, lr = exe.run( + program=compiled_train_prog, + fetch_list=fetch_list, + return_numpy=True) + avg_loss += np.mean(np.array(loss)) + global_step += 1 + + if global_step % args.log_steps == 0 and cfg.TRAINER_ID == 0: + avg_loss /= args.log_steps + speed = args.log_steps / timer.elapsed_time() + print(( + "epoch={} step={} lr={:.5f} loss={:.4f} step/sec={:.3f} | ETA {}" + ).format(epoch, global_step, lr[0], avg_loss, speed, + calculate_eta(all_step - global_step, speed))) + + sys.stdout.flush() + avg_loss = 0.0 + timer.restart() + + except fluid.core.EOFException: + py_reader.reset() + break + except Exception as e: + print(e) + if epoch > cfg.SLIM.NAS_START_EVAL_EPOCH: + ckpt_dir = save_checkpoint(exe, train_prog, '{}_tmp'.format(port)) + _, mean_iou, _, mean_acc = evaluate( + cfg=cfg, + arch=arch, + ckpt_dir=ckpt_dir, + use_gpu=args.use_gpu, + use_mpio=args.use_mpio) + if best_miou < mean_iou: + print('search step {}, epoch {} best iou {}'.format(step, epoch, mean_iou)) + best_miou = mean_iou + + sa_nas.reward(float(best_miou)) + + +def main(args): + if args.cfg_file is not None: + cfg.update_from_file(args.cfg_file) + if args.opts: + cfg.update_from_list(args.opts) + if args.enable_ce: + random.seed(0) + np.random.seed(0) + + cfg.TRAINER_ID = int(os.getenv("PADDLE_TRAINER_ID", 0)) + cfg.NUM_TRAINERS = int(os.environ.get('PADDLE_TRAINERS_NUM', 1)) + + cfg.check_and_infer() + print_info(pprint.pformat(cfg)) + train(cfg) + + +if __name__ == '__main__': + args = parse_args() + if fluid.core.is_compiled_with_cuda() != True and args.use_gpu == True: + print( + "You can not set use_gpu = True in the model because you are using paddlepaddle-cpu." + ) + print( + "Please: 1. Install paddlepaddle-gpu to run your models on GPU or 2. Set use_gpu=False to run models on CPU." + ) + sys.exit(1) + main(args) diff --git a/slim/prune/README.md b/slim/prune/README.md new file mode 100644 index 0000000000000000000000000000000000000000..b6a45238938567a845b44ff768db6982bfeab55c --- /dev/null +++ b/slim/prune/README.md @@ -0,0 +1,58 @@ +# PaddleSeg剪裁教程 + +在阅读本教程前,请确保您已经了解过[PaddleSeg使用说明](../../docs/usage.md)等章节,以便对PaddleSeg有一定的了解 + +该文档介绍如何使用[PaddleSlim](https://paddlepaddle.github.io/PaddleSlim)的卷积通道剪裁接口对检测库中的模型的卷积层的通道数进行剪裁。 + +在分割库中,可以直接调用`PaddleSeg/slim/prune/train_prune.py`脚本实现剪裁,在该脚本中调用了PaddleSlim的[paddleslim.prune.Pruner](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/#Pruner)接口。 + +该教程中所示操作,如无特殊说明,均在`PaddleSeg/`路径下执行。 + +## 1. 数据与预训练模型准备 +执行如下命令,下载cityscapes数据集 +``` +python dataset/download_cityscapes.py +``` +参照[预训练模型列表](../../docs/model_zoo.md)获取所需预训练模型 + +## 2. 确定待分析参数 + +我们通过剪裁卷积层参数达到缩减卷积层通道数的目的,在剪裁之前,我们需要确定待裁卷积层的参数的名称。 +通过以下命令查看当前模型的所有参数: + +```python +# 查看模型所有Paramters +for x in train_prog.list_vars(): + if isinstance(x, fluid.framework.Parameter): + print(x.name, x.shape) + +``` + +通过观察参数名称和参数的形状,筛选出所有卷积层参数,并确定要裁剪的卷积层参数。 + +## 3. 启动剪裁任务 + +使用`train_prune.py`启动裁剪任务时,通过`SLIM.PRUNE_PARAMS`选项指定待裁剪的参数名称列表,参数名之间用逗号分隔,通过`SLIM.PRUNE_RATIOS`选项指定各个参数被裁掉的比例。 + +```shell +CUDA_VISIBLE_DEVICES=0 +python -u ./slim/prune/train_prune.py --log_steps 10 --cfg configs/cityscape_fast_scnn.yaml --use_gpu --use_mpio \ +SLIM.PRUNE_PARAMS 'learning_to_downsample/weights,learning_to_downsample/dsconv1/pointwise/weights,learning_to_downsample/dsconv2/pointwise/weights' \ +SLIM.PRUNE_RATIOS '[0.1,0.1,0.1]' +``` +这里我们选取三个参数,按0.1的比例剪裁。 + +## 4. 评估 + +```shell +CUDA_VISIBLE_DEVICES=0 +python -u ./slim/prune/eval_prune.py --cfg configs/cityscape_fast_scnn.yaml --use_gpu --use_mpio \ +TEST.TEST_MODEL your_trained_model \ +``` + +## 5. 模型 + +| 模型 | 数据集合 | 下载地址 |剪裁方法| flops | mIoU on val| +|---|---|---|---|---|---| +| Fast-SCNN/bn | Cityscapes |[fast_scnn_cityscapes.tar](https://paddleseg.bj.bcebos.com/models/fast_scnn_cityscape.tar) | 无 | 7.21g | 0.6964 | +| Fast-SCNN/bn | Cityscapes |[fast_scnn_cityscapes-uniform-51.tar](https://paddleseg.bj.bcebos.com/models/fast_scnn_cityscape-uniform-51.tar) | uniform | 3.54g | 0.6990 | diff --git a/slim/prune/eval_prune.py b/slim/prune/eval_prune.py new file mode 100644 index 0000000000000000000000000000000000000000..940adce015b9e703535755cec06f57d75acbb051 --- /dev/null +++ b/slim/prune/eval_prune.py @@ -0,0 +1,185 @@ +# coding: utf8 +# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +# GPU memory garbage collection optimization flags +os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0" + +import sys + +LOCAL_PATH = os.path.dirname(os.path.abspath(__file__)) +SEG_PATH = os.path.join(LOCAL_PATH, "../../", "pdseg") +sys.path.append(SEG_PATH) + +import time +import argparse +import functools +import pprint +import cv2 +import numpy as np +import paddle +import paddle.fluid as fluid + +from utils.config import cfg +from utils.timer import Timer, calculate_eta +from models.model_builder import build_model +from models.model_builder import ModelPhase +from reader import SegDataset +from metrics import ConfusionMatrix + +from paddleslim.prune.io import * + +def parse_args(): + parser = argparse.ArgumentParser(description='PaddleSeg model evalution') + parser.add_argument( + '--cfg', + dest='cfg_file', + help='Config file for training (and optionally testing)', + default=None, + type=str) + parser.add_argument( + '--use_gpu', + dest='use_gpu', + help='Use gpu or cpu', + action='store_true', + default=False) + parser.add_argument( + '--use_mpio', + dest='use_mpio', + help='Use multiprocess IO or not', + action='store_true', + default=False) + parser.add_argument( + 'opts', + help='See utils/config.py for all options', + default=None, + nargs=argparse.REMAINDER) + if len(sys.argv) == 1: + parser.print_help() + sys.exit(1) + return parser.parse_args() + + +def evaluate(cfg, ckpt_dir=None, use_gpu=False, use_mpio=False, **kwargs): + np.set_printoptions(precision=5, suppress=True) + + startup_prog = fluid.Program() + test_prog = fluid.Program() + dataset = SegDataset( + file_list=cfg.DATASET.VAL_FILE_LIST, + mode=ModelPhase.EVAL, + data_dir=cfg.DATASET.DATA_DIR) + + def data_generator(): + #TODO: check is batch reader compatitable with Windows + if use_mpio: + data_gen = dataset.multiprocess_generator( + num_processes=cfg.DATALOADER.NUM_WORKERS, + max_queue_size=cfg.DATALOADER.BUF_SIZE) + else: + data_gen = dataset.generator() + + for b in data_gen: + yield b[0], b[1], b[2] + + py_reader, avg_loss, pred, grts, masks = build_model( + test_prog, startup_prog, phase=ModelPhase.EVAL) + + py_reader.decorate_sample_generator( + data_generator, drop_last=False, batch_size=cfg.BATCH_SIZE) + + # Get device environment + places = fluid.cuda_places() if use_gpu else fluid.cpu_places() + place = places[0] + dev_count = len(places) + print("#Device count: {}".format(dev_count)) + + exe = fluid.Executor(place) + exe.run(startup_prog) + + test_prog = test_prog.clone(for_test=True) + + ckpt_dir = cfg.TEST.TEST_MODEL if not ckpt_dir else ckpt_dir + + if not os.path.exists(ckpt_dir): + raise ValueError('The TEST.TEST_MODEL {} is not found'.format(ckpt_dir)) + + if ckpt_dir is not None: + print('load test model:', ckpt_dir) + load_model(exe, test_prog, ckpt_dir) + + # Use streaming confusion matrix to calculate mean_iou + np.set_printoptions( + precision=4, suppress=True, linewidth=160, floatmode="fixed") + conf_mat = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True) + fetch_list = [avg_loss.name, pred.name, grts.name, masks.name] + num_images = 0 + step = 0 + all_step = cfg.DATASET.TEST_TOTAL_IMAGES // cfg.BATCH_SIZE + 1 + timer = Timer() + timer.start() + py_reader.start() + while True: + try: + step += 1 + loss, pred, grts, masks = exe.run( + test_prog, fetch_list=fetch_list, return_numpy=True) + + loss = np.mean(np.array(loss)) + + num_images += pred.shape[0] + conf_mat.calculate(pred, grts, masks) + _, iou = conf_mat.mean_iou() + _, acc = conf_mat.accuracy() + + speed = 1.0 / timer.elapsed_time() + + print( + "[EVAL]step={} loss={:.5f} acc={:.4f} IoU={:.4f} step/sec={:.2f} | ETA {}" + .format(step, loss, acc, iou, speed, + calculate_eta(all_step - step, speed))) + timer.restart() + sys.stdout.flush() + except fluid.core.EOFException: + break + + category_iou, avg_iou = conf_mat.mean_iou() + category_acc, avg_acc = conf_mat.accuracy() + print("[EVAL]#image={} acc={:.4f} IoU={:.4f}".format( + num_images, avg_acc, avg_iou)) + print("[EVAL]Category IoU:", category_iou) + print("[EVAL]Category Acc:", category_acc) + print("[EVAL]Kappa:{:.4f}".format(conf_mat.kappa())) + + return category_iou, avg_iou, category_acc, avg_acc + + +def main(): + args = parse_args() + if args.cfg_file is not None: + cfg.update_from_file(args.cfg_file) + if args.opts: + cfg.update_from_list(args.opts) + cfg.check_and_infer() + print(pprint.pformat(cfg)) + evaluate(cfg, **args.__dict__) + + +if __name__ == '__main__': + main() diff --git a/slim/prune/train_prune.py b/slim/prune/train_prune.py new file mode 100644 index 0000000000000000000000000000000000000000..364130f7ad2ea2e2ef733cd6391deb8e77fdf893 --- /dev/null +++ b/slim/prune/train_prune.py @@ -0,0 +1,505 @@ +# coding: utf8 +# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +# GPU memory garbage collection optimization flags +os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0" + +import sys + +LOCAL_PATH = os.path.dirname(os.path.abspath(__file__)) +SEG_PATH = os.path.join(LOCAL_PATH, "../../", "pdseg") +sys.path.append(SEG_PATH) + +import argparse +import pprint +import shutil +import functools + +import paddle +import numpy as np +import paddle.fluid as fluid + +from utils.config import cfg +from utils.timer import Timer, calculate_eta +from metrics import ConfusionMatrix +from reader import SegDataset +from models.model_builder import build_model +from models.model_builder import ModelPhase +from models.model_builder import parse_shape_from_file +from eval_prune import evaluate +from vis import visualize +from utils import dist_utils + +from paddleslim.prune import Pruner +from paddleslim.prune.io import * +from paddleslim.analysis import flops + +def parse_args(): + parser = argparse.ArgumentParser(description='PaddleSeg training') + parser.add_argument( + '--cfg', + dest='cfg_file', + help='Config file for training (and optionally testing)', + default=None, + type=str) + parser.add_argument( + '--use_gpu', + dest='use_gpu', + help='Use gpu or cpu', + action='store_true', + default=False) + parser.add_argument( + '--use_mpio', + dest='use_mpio', + help='Use multiprocess I/O or not', + action='store_true', + default=False) + parser.add_argument( + '--log_steps', + dest='log_steps', + help='Display logging information at every log_steps', + default=10, + type=int) + parser.add_argument( + '--debug', + dest='debug', + help='debug mode, display detail information of training', + action='store_true') + parser.add_argument( + '--use_tb', + dest='use_tb', + help='whether to record the data during training to Tensorboard', + action='store_true') + parser.add_argument( + '--tb_log_dir', + dest='tb_log_dir', + help='Tensorboard logging directory', + default=None, + type=str) + parser.add_argument( + '--do_eval', + dest='do_eval', + help='Evaluation models result on every new checkpoint', + action='store_true') + parser.add_argument( + 'opts', + help='See utils/config.py for all options', + default=None, + nargs=argparse.REMAINDER) + return parser.parse_args() + + +def save_vars(executor, dirname, program=None, vars=None): + """ + Temporary resolution for Win save variables compatability. + Will fix in PaddlePaddle v1.5.2 + """ + + save_program = fluid.Program() + save_block = save_program.global_block() + + for each_var in vars: + # NOTE: don't save the variable which type is RAW + if each_var.type == fluid.core.VarDesc.VarType.RAW: + continue + new_var = save_block.create_var( + name=each_var.name, + shape=each_var.shape, + dtype=each_var.dtype, + type=each_var.type, + lod_level=each_var.lod_level, + persistable=True) + file_path = os.path.join(dirname, new_var.name) + file_path = os.path.normpath(file_path) + save_block.append_op( + type='save', + inputs={'X': [new_var]}, + outputs={}, + attrs={'file_path': file_path}) + + executor.run(save_program) + + +def save_prune_checkpoint(exe, program, ckpt_name): + """ + Save checkpoint for evaluation or resume training + """ + ckpt_dir = os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, str(ckpt_name)) + print("Save model checkpoint to {}".format(ckpt_dir)) + if not os.path.isdir(ckpt_dir): + os.makedirs(ckpt_dir) + + save_model(exe, program, ckpt_dir) + + return ckpt_dir + + +def load_checkpoint(exe, program): + """ + Load checkpoiont from pretrained model directory for resume training + """ + + print('Resume model training from:', cfg.TRAIN.RESUME_MODEL_DIR) + if not os.path.exists(cfg.TRAIN.RESUME_MODEL_DIR): + raise ValueError("TRAIN.PRETRAIN_MODEL {} not exist!".format( + cfg.TRAIN.RESUME_MODEL_DIR)) + + fluid.io.load_persistables( + exe, cfg.TRAIN.RESUME_MODEL_DIR, main_program=program) + + model_path = cfg.TRAIN.RESUME_MODEL_DIR + # Check is path ended by path spearator + if model_path[-1] == os.sep: + model_path = model_path[0:-1] + epoch_name = os.path.basename(model_path) + # If resume model is final model + if epoch_name == 'final': + begin_epoch = cfg.SOLVER.NUM_EPOCHS + # If resume model path is end of digit, restore epoch status + elif epoch_name.isdigit(): + epoch = int(epoch_name) + begin_epoch = epoch + 1 + else: + raise ValueError("Resume model path is not valid!") + print("Model checkpoint loaded successfully!") + + return begin_epoch + +def print_info(*msg): + if cfg.TRAINER_ID == 0: + print(*msg) + +def train(cfg): + startup_prog = fluid.Program() + train_prog = fluid.Program() + drop_last = True + + dataset = SegDataset( + file_list=cfg.DATASET.TRAIN_FILE_LIST, + mode=ModelPhase.TRAIN, + shuffle=True, + data_dir=cfg.DATASET.DATA_DIR) + + def data_generator(): + if args.use_mpio: + data_gen = dataset.multiprocess_generator( + num_processes=cfg.DATALOADER.NUM_WORKERS, + max_queue_size=cfg.DATALOADER.BUF_SIZE) + else: + data_gen = dataset.generator() + + batch_data = [] + for b in data_gen: + batch_data.append(b) + if len(batch_data) == (cfg.BATCH_SIZE // cfg.NUM_TRAINERS): + for item in batch_data: + yield item[0], item[1], item[2] + batch_data = [] + # If use sync batch norm strategy, drop last batch if number of samples + # in batch_data is less then cfg.BATCH_SIZE to avoid NCCL hang issues + if not cfg.TRAIN.SYNC_BATCH_NORM: + for item in batch_data: + yield item[0], item[1], item[2] + + # Get device environment + # places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places() + # place = places[0] + gpu_id = int(os.environ.get('FLAGS_selected_gpus', 0)) + place = fluid.CUDAPlace(gpu_id) if args.use_gpu else fluid.CPUPlace() + places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places() + + # Get number of GPU + dev_count = cfg.NUM_TRAINERS if cfg.NUM_TRAINERS > 1 else len(places) + print_info("#Device count: {}".format(dev_count)) + + # Make sure BATCH_SIZE can divided by GPU cards + assert cfg.BATCH_SIZE % dev_count == 0, ( + 'BATCH_SIZE:{} not divisble by number of GPUs:{}'.format( + cfg.BATCH_SIZE, dev_count)) + # If use multi-gpu training mode, batch data will allocated to each GPU evenly + batch_size_per_dev = cfg.BATCH_SIZE // dev_count + print_info("batch_size_per_dev: {}".format(batch_size_per_dev)) + + py_reader, avg_loss, lr, pred, grts, masks = build_model( + train_prog, startup_prog, phase=ModelPhase.TRAIN) + py_reader.decorate_sample_generator( + data_generator, batch_size=batch_size_per_dev, drop_last=drop_last) + + exe = fluid.Executor(place) + exe.run(startup_prog) + + exec_strategy = fluid.ExecutionStrategy() + # Clear temporary variables every 100 iteration + if args.use_gpu: + exec_strategy.num_threads = fluid.core.get_cuda_device_count() + exec_strategy.num_iteration_per_drop_scope = 100 + build_strategy = fluid.BuildStrategy() + + if cfg.NUM_TRAINERS > 1 and args.use_gpu: + dist_utils.prepare_for_multi_process(exe, build_strategy, train_prog) + exec_strategy.num_threads = 1 + + if cfg.TRAIN.SYNC_BATCH_NORM and args.use_gpu: + if dev_count > 1: + # Apply sync batch norm strategy + print_info("Sync BatchNorm strategy is effective.") + build_strategy.sync_batch_norm = True + else: + print_info("Sync BatchNorm strategy will not be effective if GPU device" + " count <= 1") + + pruned_params = cfg.SLIM.PRUNE_PARAMS.strip().split(',') + pruned_ratios = cfg.SLIM.PRUNE_RATIOS + + if isinstance(pruned_ratios, float): + pruned_ratios = [pruned_ratios] * len(pruned_params) + elif isinstance(pruned_ratios, (list, tuple)): + pruned_ratios = list(pruned_ratios) + else: + raise ValueError('expect SLIM.PRUNE_RATIOS type is float, list, tuple, ' + 'but received {}'.format(type(pruned_ratios))) + + # Resume training + begin_epoch = cfg.SOLVER.BEGIN_EPOCH + if cfg.TRAIN.RESUME_MODEL_DIR: + begin_epoch = load_checkpoint(exe, train_prog) + # Load pretrained model + elif os.path.exists(cfg.TRAIN.PRETRAINED_MODEL_DIR): + print_info('Pretrained model dir: ', cfg.TRAIN.PRETRAINED_MODEL_DIR) + load_vars = [] + load_fail_vars = [] + + def var_shape_matched(var, shape): + """ + Check whehter persitable variable shape is match with current network + """ + var_exist = os.path.exists( + os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name)) + if var_exist: + var_shape = parse_shape_from_file( + os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name)) + return var_shape == shape + return False + + for x in train_prog.list_vars(): + if isinstance(x, fluid.framework.Parameter): + shape = tuple(fluid.global_scope().find_var( + x.name).get_tensor().shape()) + if var_shape_matched(x, shape): + load_vars.append(x) + else: + load_fail_vars.append(x) + + fluid.io.load_vars( + exe, dirname=cfg.TRAIN.PRETRAINED_MODEL_DIR, vars=load_vars) + for var in load_vars: + print_info("Parameter[{}] loaded sucessfully!".format(var.name)) + for var in load_fail_vars: + print_info("Parameter[{}] don't exist or shape does not match current network, skip" + " to load it.".format(var.name)) + print_info("{}/{} pretrained parameters loaded successfully!".format( + len(load_vars), + len(load_vars) + len(load_fail_vars))) + else: + print_info('Pretrained model dir {} not exists, training from scratch...'. + format(cfg.TRAIN.PRETRAINED_MODEL_DIR)) + + fetch_list = [avg_loss.name, lr.name] + if args.debug: + # Fetch more variable info and use streaming confusion matrix to + # calculate IoU results if in debug mode + np.set_printoptions( + precision=4, suppress=True, linewidth=160, floatmode="fixed") + fetch_list.extend([pred.name, grts.name, masks.name]) + cm = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True) + + if args.use_tb: + if not args.tb_log_dir: + print_info("Please specify the log directory by --tb_log_dir.") + exit(1) + + from tb_paddle import SummaryWriter + log_writer = SummaryWriter(args.tb_log_dir) + + pruner = Pruner() + train_prog = pruner.prune( + train_prog, + fluid.global_scope(), + params=pruned_params, + ratios=pruned_ratios, + place=place, + only_graph=False)[0] + + compiled_train_prog = fluid.CompiledProgram(train_prog).with_data_parallel( + loss_name=avg_loss.name, + exec_strategy=exec_strategy, + build_strategy=build_strategy) + + global_step = 0 + all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE + if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True: + all_step += 1 + all_step *= (cfg.SOLVER.NUM_EPOCHS - begin_epoch + 1) + + avg_loss = 0.0 + timer = Timer() + timer.start() + if begin_epoch > cfg.SOLVER.NUM_EPOCHS: + raise ValueError( + ("begin epoch[{}] is larger than cfg.SOLVER.NUM_EPOCHS[{}]").format( + begin_epoch, cfg.SOLVER.NUM_EPOCHS)) + + if args.use_mpio: + print_info("Use multiprocess reader") + else: + print_info("Use multi-thread reader") + + for epoch in range(begin_epoch, cfg.SOLVER.NUM_EPOCHS + 1): + py_reader.start() + while True: + try: + if args.debug: + # Print category IoU and accuracy to check whether the + # traning process is corresponed to expectation + loss, lr, pred, grts, masks = exe.run( + program=compiled_train_prog, + fetch_list=fetch_list, + return_numpy=True) + cm.calculate(pred, grts, masks) + avg_loss += np.mean(np.array(loss)) + global_step += 1 + + if global_step % args.log_steps == 0: + speed = args.log_steps / timer.elapsed_time() + avg_loss /= args.log_steps + category_acc, mean_acc = cm.accuracy() + category_iou, mean_iou = cm.mean_iou() + + print_info(( + "epoch={} step={} lr={:.5f} loss={:.4f} acc={:.5f} mIoU={:.5f} step/sec={:.3f} | ETA {}" + ).format(epoch, global_step, lr[0], avg_loss, mean_acc, + mean_iou, speed, + calculate_eta(all_step - global_step, speed))) + print_info("Category IoU: ", category_iou) + print_info("Category Acc: ", category_acc) + if args.use_tb: + log_writer.add_scalar('Train/mean_iou', mean_iou, + global_step) + log_writer.add_scalar('Train/mean_acc', mean_acc, + global_step) + log_writer.add_scalar('Train/loss', avg_loss, + global_step) + log_writer.add_scalar('Train/lr', lr[0], + global_step) + log_writer.add_scalar('Train/step/sec', speed, + global_step) + sys.stdout.flush() + avg_loss = 0.0 + cm.zero_matrix() + timer.restart() + else: + # If not in debug mode, avoid unnessary log and calculate + loss, lr = exe.run( + program=compiled_train_prog, + fetch_list=fetch_list, + return_numpy=True) + avg_loss += np.mean(np.array(loss)) + global_step += 1 + + if global_step % args.log_steps == 0 and cfg.TRAINER_ID == 0: + avg_loss /= args.log_steps + speed = args.log_steps / timer.elapsed_time() + print(( + "epoch={} step={} lr={:.5f} loss={:.4f} step/sec={:.3f} | ETA {}" + ).format(epoch, global_step, lr[0], avg_loss, speed, + calculate_eta(all_step - global_step, speed))) + if args.use_tb: + log_writer.add_scalar('Train/loss', avg_loss, + global_step) + log_writer.add_scalar('Train/lr', lr[0], + global_step) + log_writer.add_scalar('Train/speed', speed, + global_step) + sys.stdout.flush() + avg_loss = 0.0 + timer.restart() + + except fluid.core.EOFException: + py_reader.reset() + break + except Exception as e: + print(e) + + if epoch % cfg.TRAIN.SNAPSHOT_EPOCH == 0 and cfg.TRAINER_ID == 0: + + ckpt_dir = save_prune_checkpoint(exe, train_prog, epoch) + + if args.do_eval: + print("Evaluation start") + _, mean_iou, _, mean_acc = evaluate( + cfg=cfg, + ckpt_dir=ckpt_dir, + use_gpu=args.use_gpu, + use_mpio=args.use_mpio) + if args.use_tb: + log_writer.add_scalar('Evaluate/mean_iou', mean_iou, + global_step) + log_writer.add_scalar('Evaluate/mean_acc', mean_acc, + global_step) + + # Use Tensorboard to visualize results + if args.use_tb and cfg.DATASET.VIS_FILE_LIST is not None: + visualize( + cfg=cfg, + use_gpu=args.use_gpu, + vis_file_list=cfg.DATASET.VIS_FILE_LIST, + vis_dir="visual", + ckpt_dir=ckpt_dir, + log_writer=log_writer) + + # save final model + if cfg.TRAINER_ID == 0: + save_prune_checkpoint(exe, train_prog, 'final') + +def main(args): + if args.cfg_file is not None: + cfg.update_from_file(args.cfg_file) + if args.opts is not None: + cfg.update_from_list(args.opts) + + cfg.TRAINER_ID = int(os.getenv("PADDLE_TRAINER_ID", 0)) + cfg.NUM_TRAINERS = int(os.environ.get('PADDLE_TRAINERS_NUM', 1)) + + cfg.check_and_infer() + print_info(pprint.pformat(cfg)) + train(cfg) + + +if __name__ == '__main__': + args = parse_args() + if fluid.core.is_compiled_with_cuda() != True and args.use_gpu == True: + print( + "You can not set use_gpu = True in the model because you are using paddlepaddle-cpu." + ) + print( + "Please: 1. Install paddlepaddle-gpu to run your models on GPU or 2. Set use_gpu=False to run models on CPU." + ) + sys.exit(1) + main(args) diff --git a/slim/quantization/README.md b/slim/quantization/README.md new file mode 100644 index 0000000000000000000000000000000000000000..9af04033b3a9af84d4b1fdf081f156be6f8dc0c2 --- /dev/null +++ b/slim/quantization/README.md @@ -0,0 +1,142 @@ +>运行该示例前请安装Paddle1.6或更高版本和PaddleSlim + +# 分割模型量化压缩示例 + +## 概述 + +该示例使用PaddleSlim提供的[量化压缩API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/)对分割模型进行压缩。 +在阅读该示例前,建议您先了解以下内容: + +- [分割模型的常规训练方法](../../docs/usage.md) +- [PaddleSlim使用文档](https://paddlepaddle.github.io/PaddleSlim/) + + +## 安装PaddleSlim +可按照[PaddleSlim使用文档](https://paddlepaddle.github.io/PaddleSlim/)中的步骤安装PaddleSlim。 + + +## 训练 + + +### 数据集 +请按照分割库的教程下载数据集并放到对应位置。 + +### 下载训练好的分割模型 + +在分割库根目录下运行以下命令: +```bash +mkdir pretrain +cd pretrain +wget https://paddleseg.bj.bcebos.com/models/mobilenet_cityscapes.tgz +tar xf mobilenet_cityscapes.tgz +``` + +### 定义量化配置 +config = { + 'weight_quantize_type': 'channel_wise_abs_max', + 'activation_quantize_type': 'moving_average_abs_max', + 'quantize_op_types': ['depthwise_conv2d', 'mul', 'conv2d'], + 'not_quant_pattern': ['last_conv'] + } + +如何配置以及含义请参考[PaddleSlim 量化API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/)。 + +### 插入量化反量化OP +使用[PaddleSlim quant_aware API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/#quant_aware)在Program中插入量化和反量化OP。 +``` +compiled_train_prog = quant_aware(train_prog, place, config, for_test=False) +``` + +### 关闭一些训练策略 + +因为量化要对Program做修改,所以一些会修改Program的训练策略需要关闭。``sync_batch_norm`` 和量化多卡训练同时使用时会出错, 需要将其关闭。 +``` +build_strategy.fuse_all_reduce_ops = False +build_strategy.sync_batch_norm = False +``` + +### 开始训练 + + +step1: 设置gpu卡 +``` +export CUDA_VISIBLE_DEVICES=0 +``` +step2: 将``pdseg``文件夹加到系统路径 + +分割库根目录下运行以下命令 +``` +export PYTHONPATH=$PYTHONPATH:./pdseg +``` + +step2: 开始训练 + + +在分割库根目录下运行以下命令进行训练。 +``` +python -u ./slim/quantization/train_quant.py --log_steps 10 --not_quant_pattern last_conv --cfg configs/deeplabv3p_mobilenetv2_cityscapes.yaml --use_gpu --use_mpio --do_eval \ +TRAIN.PRETRAINED_MODEL_DIR "./pretrain/mobilenet_cityscapes/" \ +TRAIN.MODEL_SAVE_DIR "./snapshots/mobilenetv2_quant" \ +MODEL.DEEPLAB.ENCODER_WITH_ASPP False \ +MODEL.DEEPLAB.ENABLE_DECODER False \ +TRAIN.SYNC_BATCH_NORM False \ +SOLVER.LR 0.0001 \ +TRAIN.SNAPSHOT_EPOCH 1 \ +SOLVER.NUM_EPOCHS 30 \ +BATCH_SIZE 16 \ +``` + + +### 训练时的模型结构 +[PaddleSlim 量化API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/)文档中介绍了``paddleslim.quant.quant_aware``和``paddleslim.quant.convert``两个接口。 +``paddleslim.quant.quant_aware`` 作用是在网络中的conv2d、depthwise_conv2d、mul等算子的各个输入前插入连续的量化op和反量化op,并改变相应反向算子的某些输入。示例图如下: + +

+
+图1:应用 paddleslim.quant.quant_aware 后的结果 +

+ + +### 边训练边测试 + +在脚本中边训练边测试得到的测试精度是基于图1中的网络结构进行的。 + +## 评估 + +### 最终评估模型 + +``paddleslim.quant.convert`` 主要用于改变Program中量化op和反量化op的顺序,即将类似图1中的量化op和反量化op顺序改变为图2中的布局。除此之外,``paddleslim.quant.convert`` 还会将`conv2d`、`depthwise_conv2d`、`mul`等算子参数变为量化后的int8_t范围内的值(但数据类型仍为float32),示例如图2: + +

+
+图2:paddleslim.quant.convert 后的结果 +

+ +所以在调用 ``paddleslim.quant.convert`` 之后,才得到最终的量化模型。此模型可使用PaddleLite进行加载预测,可参见教程[Paddle-Lite如何加载运行量化模型](https://github.com/PaddlePaddle/Paddle-Lite/wiki/model_quantization)。 + +### 评估脚本 +使用脚本[slim/quantization/eval_quant.py](./eval_quant.py)进行评估。 + +- 定义配置。使用和训练脚本中一样的量化配置,以得到和量化训练时同样的模型。 +- 使用 ``paddleslim.quant.quant_aware`` 插入量化和反量化op。 +- 使用 ``paddleslim.quant.convert`` 改变op顺序,得到最终量化模型进行评估。 + +评估命令: + +分割库根目录下运行 +``` +python -u ./slim/quantization/eval_quant.py --cfg configs/deeplabv3p_mobilenetv2_cityscapes.yaml --use_gpu --not_quant_pattern last_conv --use_mpio --convert \ +TEST.TEST_MODEL "./snapshots/mobilenetv2_quant/best_model" \ +MODEL.DEEPLAB.ENCODER_WITH_ASPP False \ +MODEL.DEEPLAB.ENABLE_DECODER False \ +TRAIN.SYNC_BATCH_NORM False \ +BATCH_SIZE 16 \ +``` + + + +## 量化结果 + + + +## FAQ diff --git a/slim/quantization/eval_quant.py b/slim/quantization/eval_quant.py new file mode 100644 index 0000000000000000000000000000000000000000..f40021df10ac5cabee789ca4de04b7489b37f182 --- /dev/null +++ b/slim/quantization/eval_quant.py @@ -0,0 +1,203 @@ +# coding: utf8 +# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os + +import sys +import time +import argparse +import functools +import pprint +import cv2 +import numpy as np +import paddle +import paddle.fluid as fluid + +from utils.config import cfg +from utils.timer import Timer, calculate_eta +from models.model_builder import build_model +from models.model_builder import ModelPhase +from reader import SegDataset +from metrics import ConfusionMatrix +from paddleslim.quant import quant_aware, convert + + +def parse_args(): + parser = argparse.ArgumentParser(description='PaddleSeg model evalution') + parser.add_argument( + '--cfg', + dest='cfg_file', + help='Config file for training (and optionally testing)', + default=None, + type=str) + parser.add_argument( + '--use_gpu', + dest='use_gpu', + help='Use gpu or cpu', + action='store_true', + default=False) + parser.add_argument( + '--use_mpio', + dest='use_mpio', + help='Use multiprocess IO or not', + action='store_true', + default=False) + parser.add_argument( + 'opts', + help='See utils/config.py for all options', + default=None, + nargs=argparse.REMAINDER) + parser.add_argument( + '--convert', + dest='convert', + help='Convert or not', + action='store_true', + default=False) + parser.add_argument( + "--not_quant_pattern", + nargs='+', + type=str, + help= + "Layers which name_scope contains string in not_quant_pattern will not be quantized" + ) + + if len(sys.argv) == 1: + parser.print_help() + sys.exit(1) + return parser.parse_args() + + +def evaluate(cfg, ckpt_dir=None, use_gpu=False, use_mpio=False, **kwargs): + np.set_printoptions(precision=5, suppress=True) + + startup_prog = fluid.Program() + test_prog = fluid.Program() + dataset = SegDataset( + file_list=cfg.DATASET.VAL_FILE_LIST, + mode=ModelPhase.EVAL, + data_dir=cfg.DATASET.DATA_DIR) + + def data_generator(): + #TODO: check is batch reader compatitable with Windows + if use_mpio: + data_gen = dataset.multiprocess_generator( + num_processes=cfg.DATALOADER.NUM_WORKERS, + max_queue_size=cfg.DATALOADER.BUF_SIZE) + else: + data_gen = dataset.generator() + + for b in data_gen: + yield b[0], b[1], b[2] + + py_reader, avg_loss, pred, grts, masks = build_model( + test_prog, startup_prog, phase=ModelPhase.EVAL) + + py_reader.decorate_sample_generator( + data_generator, drop_last=False, batch_size=cfg.BATCH_SIZE) + + # Get device environment + places = fluid.cuda_places() if use_gpu else fluid.cpu_places() + place = places[0] + dev_count = len(places) + print("#Device count: {}".format(dev_count)) + + exe = fluid.Executor(place) + exe.run(startup_prog) + + test_prog = test_prog.clone(for_test=True) + not_quant_pattern_list = [] + if kwargs['not_quant_pattern'] is not None: + not_quant_pattern_list = kwargs['not_quant_pattern'] + config = { + 'weight_quantize_type': 'channel_wise_abs_max', + 'activation_quantize_type': 'moving_average_abs_max', + 'quantize_op_types': ['depthwise_conv2d', 'mul', 'conv2d'], + 'not_quant_pattern': not_quant_pattern_list + } + test_prog = quant_aware(test_prog, place, config, for_test=True) + + ckpt_dir = cfg.TEST.TEST_MODEL if not ckpt_dir else ckpt_dir + + if not os.path.exists(ckpt_dir): + raise ValueError('The TEST.TEST_MODEL {} is not found'.format(ckpt_dir)) + + if ckpt_dir is not None: + print('load test model:', ckpt_dir) + fluid.io.load_persistables(exe, ckpt_dir, main_program=test_prog) + if kwargs['convert']: + test_prog = convert(test_prog, place, config) + # Use streaming confusion matrix to calculate mean_iou + np.set_printoptions( + precision=4, suppress=True, linewidth=160, floatmode="fixed") + conf_mat = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True) + fetch_list = [avg_loss.name, pred.name, grts.name, masks.name] + num_images = 0 + step = 0 + all_step = cfg.DATASET.TEST_TOTAL_IMAGES // cfg.BATCH_SIZE + 1 + timer = Timer() + timer.start() + py_reader.start() + while True: + try: + step += 1 + loss, pred, grts, masks = exe.run( + test_prog, fetch_list=fetch_list, return_numpy=True) + + loss = np.mean(np.array(loss)) + + num_images += pred.shape[0] + conf_mat.calculate(pred, grts, masks) + _, iou = conf_mat.mean_iou() + _, acc = conf_mat.accuracy() + + speed = 1.0 / timer.elapsed_time() + + print( + "[EVAL]step={} loss={:.5f} acc={:.4f} IoU={:.4f} step/sec={:.2f} | ETA {}" + .format(step, loss, acc, iou, speed, + calculate_eta(all_step - step, speed))) + timer.restart() + sys.stdout.flush() + except fluid.core.EOFException: + break + + category_iou, avg_iou = conf_mat.mean_iou() + category_acc, avg_acc = conf_mat.accuracy() + print("[EVAL]#image={} acc={:.4f} IoU={:.4f}".format( + num_images, avg_acc, avg_iou)) + print("[EVAL]Category IoU:", category_iou) + print("[EVAL]Category Acc:", category_acc) + print("[EVAL]Kappa:{:.4f}".format(conf_mat.kappa())) + + return category_iou, avg_iou, category_acc, avg_acc + + +def main(): + args = parse_args() + if args.cfg_file is not None: + cfg.update_from_file(args.cfg_file) + if args.opts: + cfg.update_from_list(args.opts) + cfg.check_and_infer() + print(pprint.pformat(cfg)) + evaluate(cfg, **args.__dict__) + + +if __name__ == '__main__': + main() diff --git a/slim/quantization/images/ConvertToInt8Pass.png b/slim/quantization/images/ConvertToInt8Pass.png new file mode 100644 index 0000000000000000000000000000000000000000..8b5849819c0bc8e592dc8f864d8945330df85ab1 Binary files /dev/null and b/slim/quantization/images/ConvertToInt8Pass.png differ diff --git a/slim/quantization/images/FreezePass.png b/slim/quantization/images/FreezePass.png new file mode 100644 index 0000000000000000000000000000000000000000..acd2b0a890a8af85bec6eecdb22e47ad386a178c Binary files /dev/null and b/slim/quantization/images/FreezePass.png differ diff --git a/slim/quantization/images/TransformForMobilePass.png b/slim/quantization/images/TransformForMobilePass.png new file mode 100644 index 0000000000000000000000000000000000000000..4104cacc67af0be1c7bc152696e2ae544127aace Binary files /dev/null and b/slim/quantization/images/TransformForMobilePass.png differ diff --git a/slim/quantization/images/TransformPass.png b/slim/quantization/images/TransformPass.png new file mode 100644 index 0000000000000000000000000000000000000000..f29ab62753e0e6ddf28d0c1dda7139705fc24b18 Binary files /dev/null and b/slim/quantization/images/TransformPass.png differ diff --git a/slim/quantization/train_quant.py b/slim/quantization/train_quant.py new file mode 100644 index 0000000000000000000000000000000000000000..6a29dccdbaeda54b06c11299fb37e979cec6e401 --- /dev/null +++ b/slim/quantization/train_quant.py @@ -0,0 +1,388 @@ +# coding: utf8 +# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os + +import sys +import argparse +import pprint +import random +import shutil +import functools + +import paddle +import numpy as np +import paddle.fluid as fluid + +from utils.config import cfg +from utils.timer import Timer, calculate_eta +from metrics import ConfusionMatrix +from reader import SegDataset +from models.model_builder import build_model +from models.model_builder import ModelPhase +from models.model_builder import parse_shape_from_file +from eval_quant import evaluate +from vis import visualize +from utils import dist_utils +from train import save_vars, save_checkpoint, load_checkpoint, update_best_model, print_info + +from paddleslim.quant import quant_aware + + +def parse_args(): + parser = argparse.ArgumentParser(description='PaddleSeg training') + parser.add_argument( + '--cfg', + dest='cfg_file', + help='Config file for training (and optionally testing)', + default=None, + type=str) + parser.add_argument( + '--use_gpu', + dest='use_gpu', + help='Use gpu or cpu', + action='store_true', + default=False) + parser.add_argument( + '--use_mpio', + dest='use_mpio', + help='Use multiprocess I/O or not', + action='store_true', + default=False) + parser.add_argument( + '--log_steps', + dest='log_steps', + help='Display logging information at every log_steps', + default=10, + type=int) + parser.add_argument( + '--debug', + dest='debug', + help='debug mode, display detail information of training', + action='store_true') + parser.add_argument( + '--do_eval', + dest='do_eval', + help='Evaluation models result on every new checkpoint', + action='store_true') + parser.add_argument( + 'opts', + help='See utils/config.py for all options', + default=None, + nargs=argparse.REMAINDER) + parser.add_argument( + '--enable_ce', + dest='enable_ce', + help='If set True, enable continuous evaluation job.' + 'This flag is only used for internal test.', + action='store_true') + parser.add_argument( + "--not_quant_pattern", + nargs='+', + type=str, + help= + "Layers which name_scope contains string in not_quant_pattern will not be quantized" + ) + + return parser.parse_args() + + +def train_quant(cfg): + startup_prog = fluid.Program() + train_prog = fluid.Program() + if args.enable_ce: + startup_prog.random_seed = 1000 + train_prog.random_seed = 1000 + drop_last = True + + dataset = SegDataset( + file_list=cfg.DATASET.TRAIN_FILE_LIST, + mode=ModelPhase.TRAIN, + shuffle=True, + data_dir=cfg.DATASET.DATA_DIR) + + def data_generator(): + if args.use_mpio: + data_gen = dataset.multiprocess_generator( + num_processes=cfg.DATALOADER.NUM_WORKERS, + max_queue_size=cfg.DATALOADER.BUF_SIZE) + else: + data_gen = dataset.generator() + + batch_data = [] + for b in data_gen: + batch_data.append(b) + if len(batch_data) == (cfg.BATCH_SIZE // cfg.NUM_TRAINERS): + for item in batch_data: + yield item[0], item[1], item[2] + batch_data = [] + # If use sync batch norm strategy, drop last batch if number of samples + # in batch_data is less then cfg.BATCH_SIZE to avoid NCCL hang issues + if not cfg.TRAIN.SYNC_BATCH_NORM: + for item in batch_data: + yield item[0], item[1], item[2] + + # Get device environment + # places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places() + # place = places[0] + gpu_id = int(os.environ.get('FLAGS_selected_gpus', 0)) + place = fluid.CUDAPlace(gpu_id) if args.use_gpu else fluid.CPUPlace() + places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places() + + # Get number of GPU + dev_count = cfg.NUM_TRAINERS if cfg.NUM_TRAINERS > 1 else len(places) + print_info("#Device count: {}".format(dev_count)) + + # Make sure BATCH_SIZE can divided by GPU cards + assert cfg.BATCH_SIZE % dev_count == 0, ( + 'BATCH_SIZE:{} not divisble by number of GPUs:{}'.format( + cfg.BATCH_SIZE, dev_count)) + # If use multi-gpu training mode, batch data will allocated to each GPU evenly + batch_size_per_dev = cfg.BATCH_SIZE // dev_count + print_info("batch_size_per_dev: {}".format(batch_size_per_dev)) + + py_reader, avg_loss, lr, pred, grts, masks = build_model( + train_prog, startup_prog, phase=ModelPhase.TRAIN) + py_reader.decorate_sample_generator( + data_generator, batch_size=batch_size_per_dev, drop_last=drop_last) + + exe = fluid.Executor(place) + exe.run(startup_prog) + + exec_strategy = fluid.ExecutionStrategy() + # Clear temporary variables every 100 iteration + if args.use_gpu: + exec_strategy.num_threads = fluid.core.get_cuda_device_count() + exec_strategy.num_iteration_per_drop_scope = 100 + build_strategy = fluid.BuildStrategy() + + if cfg.NUM_TRAINERS > 1 and args.use_gpu: + dist_utils.prepare_for_multi_process(exe, build_strategy, train_prog) + exec_strategy.num_threads = 1 + + # Resume training + begin_epoch = cfg.SOLVER.BEGIN_EPOCH + if cfg.TRAIN.RESUME_MODEL_DIR: + begin_epoch = load_checkpoint(exe, train_prog) + # Load pretrained model + elif os.path.exists(cfg.TRAIN.PRETRAINED_MODEL_DIR): + print_info('Pretrained model dir: ', cfg.TRAIN.PRETRAINED_MODEL_DIR) + load_vars = [] + load_fail_vars = [] + + def var_shape_matched(var, shape): + """ + Check whehter persitable variable shape is match with current network + """ + var_exist = os.path.exists( + os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name)) + if var_exist: + var_shape = parse_shape_from_file( + os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name)) + return var_shape == shape + return False + + for x in train_prog.list_vars(): + if isinstance(x, fluid.framework.Parameter): + shape = tuple(fluid.global_scope().find_var( + x.name).get_tensor().shape()) + if var_shape_matched(x, shape): + load_vars.append(x) + else: + load_fail_vars.append(x) + + fluid.io.load_vars( + exe, dirname=cfg.TRAIN.PRETRAINED_MODEL_DIR, vars=load_vars) + for var in load_vars: + print_info("Parameter[{}] loaded sucessfully!".format(var.name)) + for var in load_fail_vars: + print_info( + "Parameter[{}] don't exist or shape does not match current network, skip" + " to load it.".format(var.name)) + print_info("{}/{} pretrained parameters loaded successfully!".format( + len(load_vars), + len(load_vars) + len(load_fail_vars))) + else: + print_info( + 'Pretrained model dir {} not exists, training from scratch...'. + format(cfg.TRAIN.PRETRAINED_MODEL_DIR)) + + fetch_list = [avg_loss.name, lr.name] + if args.debug: + # Fetch more variable info and use streaming confusion matrix to + # calculate IoU results if in debug mode + np.set_printoptions( + precision=4, suppress=True, linewidth=160, floatmode="fixed") + fetch_list.extend([pred.name, grts.name, masks.name]) + cm = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True) + + not_quant_pattern = [] + if args.not_quant_pattern: + not_quant_pattern = args.not_quant_pattern + config = { + 'weight_quantize_type': 'channel_wise_abs_max', + 'activation_quantize_type': 'moving_average_abs_max', + 'quantize_op_types': ['depthwise_conv2d', 'mul', 'conv2d'], + 'not_quant_pattern': not_quant_pattern + } + compiled_train_prog = quant_aware(train_prog, place, config, for_test=False) + eval_prog = quant_aware(train_prog, place, config, for_test=True) + build_strategy.fuse_all_reduce_ops = False + build_strategy.sync_batch_norm = False + compiled_train_prog = compiled_train_prog.with_data_parallel( + loss_name=avg_loss.name, + exec_strategy=exec_strategy, + build_strategy=build_strategy) + + # trainer_id = int(os.getenv("PADDLE_TRAINER_ID", 0)) + # num_trainers = int(os.environ.get('PADDLE_TRAINERS_NUM', 1)) + global_step = 0 + all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE + if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True: + all_step += 1 + all_step *= (cfg.SOLVER.NUM_EPOCHS - begin_epoch + 1) + + avg_loss = 0.0 + best_mIoU = 0.0 + + timer = Timer() + timer.start() + if begin_epoch > cfg.SOLVER.NUM_EPOCHS: + raise ValueError( + ("begin epoch[{}] is larger than cfg.SOLVER.NUM_EPOCHS[{}]").format( + begin_epoch, cfg.SOLVER.NUM_EPOCHS)) + + if args.use_mpio: + print_info("Use multiprocess reader") + else: + print_info("Use multi-thread reader") + + for epoch in range(begin_epoch, cfg.SOLVER.NUM_EPOCHS + 1): + py_reader.start() + while True: + try: + if args.debug: + # Print category IoU and accuracy to check whether the + # traning process is corresponed to expectation + loss, lr, pred, grts, masks = exe.run( + program=compiled_train_prog, + fetch_list=fetch_list, + return_numpy=True) + cm.calculate(pred, grts, masks) + avg_loss += np.mean(np.array(loss)) + global_step += 1 + + if global_step % args.log_steps == 0: + speed = args.log_steps / timer.elapsed_time() + avg_loss /= args.log_steps + category_acc, mean_acc = cm.accuracy() + category_iou, mean_iou = cm.mean_iou() + + print_info(( + "epoch={} step={} lr={:.5f} loss={:.4f} acc={:.5f} mIoU={:.5f} step/sec={:.3f} | ETA {}" + ).format(epoch, global_step, lr[0], avg_loss, mean_acc, + mean_iou, speed, + calculate_eta(all_step - global_step, speed))) + print_info("Category IoU: ", category_iou) + print_info("Category Acc: ", category_acc) + sys.stdout.flush() + avg_loss = 0.0 + cm.zero_matrix() + timer.restart() + else: + # If not in debug mode, avoid unnessary log and calculate + loss, lr = exe.run( + program=compiled_train_prog, + fetch_list=fetch_list, + return_numpy=True) + avg_loss += np.mean(np.array(loss)) + global_step += 1 + + if global_step % args.log_steps == 0 and cfg.TRAINER_ID == 0: + avg_loss /= args.log_steps + speed = args.log_steps / timer.elapsed_time() + print(( + "epoch={} step={} lr={:.5f} loss={:.4f} step/sec={:.3f} | ETA {}" + ).format(epoch, global_step, lr[0], avg_loss, speed, + calculate_eta(all_step - global_step, speed))) + sys.stdout.flush() + avg_loss = 0.0 + timer.restart() + + except fluid.core.EOFException: + py_reader.reset() + break + except Exception as e: + print(e) + + if (epoch % cfg.TRAIN.SNAPSHOT_EPOCH == 0 + or epoch == cfg.SOLVER.NUM_EPOCHS) and cfg.TRAINER_ID == 0: + ckpt_dir = save_checkpoint(exe, eval_prog, epoch) + + if args.do_eval: + print("Evaluation start") + _, mean_iou, _, mean_acc = evaluate( + cfg=cfg, + ckpt_dir=ckpt_dir, + use_gpu=args.use_gpu, + use_mpio=args.use_mpio, + not_quant_pattern=args.not_quant_pattern, + convert=False) + + if mean_iou > best_mIoU: + best_mIoU = mean_iou + update_best_model(ckpt_dir) + print_info("Save best model {} to {}, mIoU = {:.4f}".format( + ckpt_dir, + os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, 'best_model'), + mean_iou)) + + # save final model + if cfg.TRAINER_ID == 0: + save_checkpoint(exe, eval_prog, 'final') + + +def main(args): + if args.cfg_file is not None: + cfg.update_from_file(args.cfg_file) + if args.opts: + cfg.update_from_list(args.opts) + if args.enable_ce: + random.seed(0) + np.random.seed(0) + + cfg.TRAINER_ID = int(os.getenv("PADDLE_TRAINER_ID", 0)) + cfg.NUM_TRAINERS = int(os.environ.get('PADDLE_TRAINERS_NUM', 1)) + + cfg.check_and_infer() + print_info(pprint.pformat(cfg)) + train_quant(cfg) + + +if __name__ == '__main__': + args = parse_args() + if fluid.core.is_compiled_with_cuda() != True and args.use_gpu == True: + print( + "You can not set use_gpu = True in the model because you are using paddlepaddle-cpu." + ) + print( + "Please: 1. Install paddlepaddle-gpu to run your models on GPU or 2. Set use_gpu=False to run models on CPU." + ) + sys.exit(1) + main(args) diff --git a/turtorial/finetune_fast_scnn.md b/turtorial/finetune_fast_scnn.md new file mode 100644 index 0000000000000000000000000000000000000000..0ad8bde68bf8783e72f92d4e8e34482c55c32065 --- /dev/null +++ b/turtorial/finetune_fast_scnn.md @@ -0,0 +1,119 @@ +# Fast-SCNN模型训练教程 + +* 本教程旨在介绍如何通过使用PaddleSeg提供的 ***`Fast_scnn_cityscapes`*** 预训练模型在自定义数据集上进行训练。 + +* 在阅读本教程前,请确保您已经了解过PaddleSeg的[快速入门](../README.md#快速入门)和[基础功能](../README.md#基础功能)等章节,以便对PaddleSeg有一定的了解 + +* 本教程的所有命令都基于PaddleSeg主目录进行执行 + +## 一. 准备待训练数据 + +我们提前准备好了一份数据集,通过以下代码进行下载 + +```shell +python dataset/download_pet.py +``` + +## 二. 下载预训练模型 + +```shell +python pretrained_model/download_model.py fast_scnn_cityscapes +``` + +## 三. 准备配置 + +接着我们需要确定相关配置,从本教程的角度,配置分为三部分: + +* 数据集 + * 训练集主目录 + * 训练集文件列表 + * 测试集文件列表 + * 评估集文件列表 +* 预训练模型 + * 预训练模型名称 + * 预训练模型的backbone网络 + * 预训练模型的Normalization类型 + * 预训练模型路径 +* 其他 + * 学习率 + * Batch大小 + * ... + +在三者中,预训练模型的配置尤为重要,如果模型或者BACKBONE配置错误,会导致预训练的参数没有加载,进而影响收敛速度。预训练模型相关的配置如第二步所展示。 + +数据集的配置和数据路径有关,在本教程中,数据存放在`dataset/mini_pet`中 + +其他配置则根据数据集和机器环境的情况进行调节,最终我们保存一个如下内容的yaml配置文件,存放路径为**configs/fast_scnn_pet.yaml** + +```yaml +# 数据集配置 +DATASET: + DATA_DIR: "./dataset/mini_pet/" + NUM_CLASSES: 3 + TEST_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt" + TRAIN_FILE_LIST: "./dataset/mini_pet/file_list/train_list.txt" + VAL_FILE_LIST: "./dataset/mini_pet/file_list/val_list.txt" + VIS_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt" + +# 预训练模型配置 +MODEL: + MODEL_NAME: "fast_scnn" + DEFAULT_NORM_TYPE: "bn" + +# 其他配置 +TRAIN_CROP_SIZE: (512, 512) +EVAL_CROP_SIZE: (512, 512) +AUG: + AUG_METHOD: "unpadding" + FIX_RESIZE_SIZE: (512, 512) +BATCH_SIZE: 4 +TRAIN: + PRETRAINED_MODEL_DIR: "./pretrained_model/fast_scnn_cityscape/" + MODEL_SAVE_DIR: "./saved_model/fast_scnn_pet/" + SNAPSHOT_EPOCH: 10 +TEST: + TEST_MODEL: "./saved_model/fast_scnn_pet/final" +SOLVER: + NUM_EPOCHS: 100 + LR: 0.005 + LR_POLICY: "poly" + OPTIMIZER: "sgd" +``` + +## 四. 配置/数据校验 + +在开始训练和评估之前,我们还需要对配置和数据进行一次校验,确保数据和配置是正确的。使用下述命令启动校验流程 + +```shell +python pdseg/check.py --cfg ./configs/fast_scnn_pet.yaml +``` + + +## 五. 开始训练 + +校验通过后,使用下述命令启动训练 + +```shell +python pdseg/train.py --use_gpu --cfg ./configs/fast_scnn_pet.yaml +``` + +## 六. 进行评估 + +模型训练完成,使用下述命令启动评估 + +```shell +python pdseg/eval.py --use_gpu --cfg ./configs/fast_scnn_pet.yaml +``` + + +## 七. 实时分割模型推理时间比较 + +| 模型 | eval size | inference time | mIoU on cityscape val| +|---|---|---|---| +| DeepLabv3+/MobileNetv2/bn | (1024, 2048) |16.14ms| 0.698| +| ICNet/bn |(1024, 2048) |8.76ms| 0.6831 | +| Fast-SCNN/bn | (1024, 2048) |6.28ms| 0.6964 | + +上述测试环境为v100. + +