Add paddleslim demo for seg (#154)

* add prune distill nas * update readme * update readme * remove debug code

Add paddleslim demo for seg (#154)
* add prune distill nas * update readme * update readme * remove debug code
14473532 · LielinJiang · Zeyu Chen · 0193ce28 · 14473532 · 14473532
16 changed file
--- a/README.md
+++ b/README.md
@@ -110,7 +110,7 @@ pip install -r requirements.txt
 * [如何解决二分类中类别不均衡问题](./docs/loss_select.md)
 * [特色垂类模型使用](./contrib)
 * [多进程训练和混合精度训练](./docs/multiple_gpus_train_and_mixed_precision_train.md)
-
+* 使用PaddleSlim进行分割模型压缩([量化](./slim/quantization/README.md), [蒸馏](./slim/distillation/README.md), [剪枝](./slim/prune/README.md), [搜索](./slim/nas/README.md))
 ## 在线体验

 我们在AI Studio平台上提供了在线体验的教程，欢迎体验：

--- a/pdseg/utils/config.py
+++ b/pdseg/utils/config.py
@@ -236,3 +236,19 @@ cfg.FREEZE.MODEL_FILENAME = '__model__'
 cfg.FREEZE.PARAMS_FILENAME = '__params__'
 # 预测模型参数保存的路径
 cfg.FREEZE.SAVE_DIR = 'freeze_model'
+
+########################## paddle-slim ######################################
+cfg.SLIM.KNOWLEDGE_DISTILL_IS_TEACHER = False
+cfg.SLIM.KNOWLEDGE_DISTILL = False
+cfg.SLIM.KNOWLEDGE_DISTILL_TEACHER_MODEL_DIR = ""
+
+cfg.SLIM.NAS_PORT = 23333
+cfg.SLIM.NAS_ADDRESS = ""
+cfg.SLIM.NAS_SEARCH_STEPS = 100
+cfg.SLIM.NAS_START_EVAL_EPOCH = 0
+cfg.SLIM.NAS_IS_SERVER = True
+cfg.SLIM.NAS_SPACE_NAME = ""
+
+cfg.SLIM.PRUNE_PARAMS = ''
+cfg.SLIM.PRUNE_RATIOS = []
+
--- a/slim/distillation/README.md
+++ b/slim/distillation/README.md
+>运行该示例前请安装PaddleSlim和Paddle1.6或更高版本
+
+# PaddleSeg蒸馏教程
+
+在阅读本教程前，请确保您已经了解过PaddleSeg的[快速入门](../README.md#快速入门)和[基础功能](../README.md#基础功能)等章节，以便对PaddleSeg有一定的了解
+
+该文档介绍如何使用[PaddleSlim](https://paddlepaddle.github.io/PaddleSlim)对分割库中的模型进行蒸馏。
+
+该教程中所示操作，如无特殊说明，均在`PaddleSeg/`路径下执行。
+
+## 概述
+
+该示例使用PaddleSlim提供的[蒸馏策略](https://paddlepaddle.github.io/PaddleSlim/algo/algo/#3)对分割库中的模型进行蒸馏训练。
+在阅读该示例前，建议您先了解以下内容：
+
+- [PaddleSlim蒸馏API文档](https://paddlepaddle.github.io/PaddleSlim/api/single_distiller_api/)
+
+## 安装PaddleSlim
+可按照[PaddleSlim使用文档](https://paddlepaddle.github.io/PaddleSlim/)中的步骤安装PaddleSlim
+
+## 蒸馏策略说明
+
+关于蒸馏API如何使用您可以参考PaddleSlim蒸馏API文档
+
+这里以Deeplabv3-xception蒸馏训练Deeplabv3-mobilenet模型为例，首先，为了对`student model`和`teacher model`有个总体的认识，进一步确认蒸馏的对象，我们通过以下命令分别观察两个网络变量（Variables）的名称和形状：
+
+```python
+# 观察student model的Variables
+student_vars = []
+for v in fluid.default_main_program().list_vars():
+    try:
+        student_vars.append((v.name, v.shape))
+    except:
+        pass
+print("="*50+"student_model_vars"+"="*50)
+print(student_vars)
+# 观察teacher model的Variables
+teacher_vars = []
+for v in teacher_program.list_vars():
+    try:
+        teacher_vars.append((v.name, v.shape))
+    except:
+        pass
+print("="*50+"teacher_model_vars"+"="*50)
+print(teacher_vars)
+```
+
+经过对比可以发现，`student model`和`teacher model`输入到`loss`的特征图分别为：
+
+```bash
+# student model
+bilinear_interp_1.tmp_0
+# teacher model
+bilinear_interp_2.tmp_0
+```
+
+
+它们形状两两相同，且分别处于两个网络的输出部分。所以，我们用`l2_loss`对这几个特征图两两对应添加蒸馏loss。需要注意的是，teacher的Variable在merge过程中被自动添加了一个`name_prefix`，所以这里也需要加上这个前缀`"teacher_"`，merge过程请参考[蒸馏API文档](https://paddlepaddle.github.io/PaddleSlim/api/single_distiller_api/#merge)
+
+```python
+distill_loss = l2_loss('teacher_bilinear_interp_2.tmp_0', 'bilinear_interp_1.tmp_0')
+```
+
+我们也可以根据上述操作为蒸馏策略选择其他loss，PaddleSlim支持的有`FSP_loss`, `L2_loss`, `softmax_with_cross_entropy_loss` 以及自定义的任何loss。
+
+## 训练
+
+根据[PaddleSeg/pdseg/train.py](../../pdseg/train.py)编写压缩脚本`train_distill.py`。
+在该脚本中定义了teacher_model和student_model，用teacher_model的输出指导student_model的训练
+
+### 执行示例
+如下命令启动训练，每间隔```cfg.TRAIN.SNAPSHOT_EPOCH```会进行一次评估。
+```shell
+CUDA_VISIBLE_DEVICES=0,1
+python -m paddle.distributed.launch ./slim/distill/train.py \
+--log_steps 10 --cfg ./slim/distill/cityscape_fast_scnn.yaml \
+--teacher_cfg ./slim/distill/cityscape_teacher.yaml \
+--use_gpu \
+--use_mpio \
+--do_eval
+```
+
+## 评估预测
+
+训练完成后的评估和预测请参考PaddleSeg的[快速入门](../../README.md#快速入门)和[基础功能](../../README.md#基础功能)等章节
--- a/slim/distillation/cityscape.yaml
+++ b/slim/distillation/cityscape.yaml
+EVAL_CROP_SIZE: (2049, 1025) # (width, height), for unpadding rangescaling and stepscaling
+TRAIN_CROP_SIZE: (769, 769) # (width, height), for unpadding rangescaling and stepscaling
+AUG:
+    AUG_METHOD: "stepscaling" # choice unpadding rangescaling and stepscaling
+    FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding
+    INF_RESIZE_VALUE: 500  # for rangescaling
+    MAX_RESIZE_VALUE: 600  # for rangescaling
+    MIN_RESIZE_VALUE: 400  # for rangescaling
+    MAX_SCALE_FACTOR: 2.0  # for stepscaling
+    MIN_SCALE_FACTOR: 0.5  # for stepscaling
+    SCALE_STEP_SIZE: 0.25  # for stepscaling
+    MIRROR: True
+    FLIP: True
+    FLIP_RATIO: 0.2
+    RICH_CROP:
+        ENABLE: False
+        ASPECT_RATIO: 0.33
+        BLUR: True
+        BLUR_RATIO: 0.1
+        MAX_ROTATION: 15
+        MIN_AREA_RATIO: 0.5
+        BRIGHTNESS_JITTER_RATIO: 0.5
+        CONTRAST_JITTER_RATIO: 0.5
+        SATURATION_JITTER_RATIO: 0.5
+BATCH_SIZE: 16
+MEAN: [0.5, 0.5, 0.5]
+STD: [0.5, 0.5, 0.5]
+DATASET:
+    DATA_DIR: "./dataset/cityscapes/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 19
+    TEST_FILE_LIST: "dataset/cityscapes/val.list"
+    TRAIN_FILE_LIST: "dataset/cityscapes/train.list"
+    VAL_FILE_LIST: "dataset/cityscapes/val.list"
+    IGNORE_INDEX: 255
+FREEZE:
+    MODEL_FILENAME: "model"
+    PARAMS_FILENAME: "params"
+MODEL:
+    DEFAULT_NORM_TYPE: "bn"
+    MODEL_NAME: "deeplabv3p"
+    DEEPLAB:
+        BACKBONE: "mobilenet"
+        ASPP_WITH_SEP_CONV: True
+        DECODER_USE_SEP_CONV: True
+        ENCODER_WITH_ASPP: False
+        ENABLE_DECODER: False
+TEST:
+    TEST_MODEL: "snapshots/cityscape_v5/final/"
+TRAIN:
+    MODEL_SAVE_DIR: "snapshots/cityscape_mbv2_kd_e100_1/"
+    PRETRAINED_MODEL_DIR: u"/workspace/pretrained_models/mobilenet_cityscapes"
+    SNAPSHOT_EPOCH: 5
+    SYNC_BATCH_NORM: True
+SOLVER:
+    LR: 0.001
+    LR_POLICY: "poly"
+    OPTIMIZER: "sgd"
+    NUM_EPOCHS: 100
--- a/slim/distillation/cityscape_teacher.yaml
+++ b/slim/distillation/cityscape_teacher.yaml
+EVAL_CROP_SIZE: (2049, 1025) # (width, height), for unpadding rangescaling and stepscaling
+TRAIN_CROP_SIZE: (769, 769) # (width, height), for unpadding rangescaling and stepscaling
+AUG:
+    AUG_METHOD: "stepscaling" # choice unpadding rangescaling and stepscaling
+    FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding
+    INF_RESIZE_VALUE: 500  # for rangescaling
+    MAX_RESIZE_VALUE: 600  # for rangescaling
+    MIN_RESIZE_VALUE: 400  # for rangescaling
+    MAX_SCALE_FACTOR: 2.0  # for stepscaling
+    MIN_SCALE_FACTOR: 0.5  # for stepscaling
+    SCALE_STEP_SIZE: 0.25  # for stepscaling
+    MIRROR: True
+    FLIP: True
+    FLIP_RATIO: 0.2
+    RICH_CROP:
+        ENABLE: False
+        ASPECT_RATIO: 0.33
+        BLUR: True
+        BLUR_RATIO: 0.1
+        MAX_ROTATION: 15
+        MIN_AREA_RATIO: 0.5
+        BRIGHTNESS_JITTER_RATIO: 0.5
+        CONTRAST_JITTER_RATIO: 0.5
+        SATURATION_JITTER_RATIO: 0.5
+BATCH_SIZE: 16
+MEAN: [0.5, 0.5, 0.5]
+STD: [0.5, 0.5, 0.5]
+DATASET:
+    DATA_DIR: "./dataset/cityscapes/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 19
+    TEST_FILE_LIST: "dataset/cityscapes/val.list"
+    TRAIN_FILE_LIST: "dataset/cityscapes/train.list"
+    VAL_FILE_LIST: "dataset/cityscapes/val.list"
+    IGNORE_INDEX: 255
+FREEZE:
+    MODEL_FILENAME: "model"
+    PARAMS_FILENAME: "params"
+MODEL:
+    DEFAULT_NORM_TYPE: "bn"
+    MODEL_NAME: "deeplabv3p"
+    DEEPLAB:
+        BACKBONE: "xception_65"
+        ASPP_WITH_SEP_CONV: True
+        DECODER_USE_SEP_CONV: True
+        ENCODER_WITH_ASPP: True
+        ENABLE_DECODER: True
+TEST:
+    TEST_MODEL: "snapshots/cityscape_v5/final/"
+TRAIN:
+    MODEL_SAVE_DIR: "snapshots/cityscape_v7/"
+    PRETRAINED_MODEL_DIR: u"pretrain/deeplabv3plus_gn_init"
+    SNAPSHOT_EPOCH: 5
+    SYNC_BATCH_NORM: True
+SOLVER:
+    LR: 0.001
+    LR_POLICY: "poly"
+    OPTIMIZER: "sgd"
+    NUM_EPOCHS: 100
+
+SLIM:
+    KNOWLEDGE_DISTILL_IS_TEACHER: True
+    KNOWLEDGE_DISTILL: True
+    KNOWLEDGE_DISTILL_TEACHER_MODEL_DIR: "/workspace/pretrained_models/xception65_bn_cityscapes"
+
--- a/slim/distillation/model_builder.py
+++ b/slim/distillation/model_builder.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import struct
+
+import paddle.fluid as fluid
+import numpy as np
+from paddle.fluid.proto.framework_pb2 import VarType
+
+import solver
+from utils.config import cfg
+from loss import multi_softmax_with_loss
+from loss import multi_dice_loss
+from loss import multi_bce_loss
+from models.modeling import deeplab, unet, icnet, pspnet, hrnet, fast_scnn
+
+
+class ModelPhase(object):
+    """
+    Standard name for model phase in PaddleSeg
+
+    The following standard keys are defined:
+    * `TRAIN`: training mode.
+    * `EVAL`: testing/evaluation mode.
+    * `PREDICT`: prediction/inference mode.
+    * `VISUAL` : visualization mode
+    """
+
+    TRAIN = 'train'
+    EVAL = 'eval'
+    PREDICT = 'predict'
+    VISUAL = 'visual'
+
+    @staticmethod
+    def is_train(phase):
+        return phase == ModelPhase.TRAIN
+
+    @staticmethod
+    def is_predict(phase):
+        return phase == ModelPhase.PREDICT
+
+    @staticmethod
+    def is_eval(phase):
+        return phase == ModelPhase.EVAL
+
+    @staticmethod
+    def is_visual(phase):
+        return phase == ModelPhase.VISUAL
+
+    @staticmethod
+    def is_valid_phase(phase):
+        """ Check valid phase """
+        if ModelPhase.is_train(phase) or ModelPhase.is_predict(phase) \
+                or ModelPhase.is_eval(phase) or ModelPhase.is_visual(phase):
+            return True
+
+        return False
+
+
+def seg_model(image, class_num):
+    model_name = cfg.MODEL.MODEL_NAME
+    if model_name == 'unet':
+        logits = unet.unet(image, class_num)
+    elif model_name == 'deeplabv3p':
+        logits = deeplab.deeplabv3p(image, class_num)
+    elif model_name == 'icnet':
+        logits = icnet.icnet(image, class_num)
+    elif model_name == 'pspnet':
+        logits = pspnet.pspnet(image, class_num)
+    elif model_name == 'hrnet':
+        logits = hrnet.hrnet(image, class_num)
+    elif model_name == 'fast_scnn':
+        logits = fast_scnn.fast_scnn(image, class_num)
+    else:
+        raise Exception(
+            "unknow model name, only support unet, deeplabv3p, icnet, pspnet, hrnet"
+        )
+    return logits
+
+
+def softmax(logit):
+    logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+    logit = fluid.layers.softmax(logit)
+    logit = fluid.layers.transpose(logit, [0, 3, 1, 2])
+    return logit
+
+
+def sigmoid_to_softmax(logit):
+    """
+    one channel to two channel
+    """
+    logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+    logit = fluid.layers.sigmoid(logit)
+    logit_back = 1 - logit
+    logit = fluid.layers.concat([logit_back, logit], axis=-1)
+    logit = fluid.layers.transpose(logit, [0, 3, 1, 2])
+    return logit
+
+
+def export_preprocess(image):
+    """导出模型的预处理流程"""
+
+    image = fluid.layers.transpose(image, [0, 3, 1, 2])
+    origin_shape = fluid.layers.shape(image)[-2:]
+
+    # 不同AUG_METHOD方法的resize
+    if cfg.AUG.AUG_METHOD == 'unpadding':
+        h_fix = cfg.AUG.FIX_RESIZE_SIZE[1]
+        w_fix = cfg.AUG.FIX_RESIZE_SIZE[0]
+        image = fluid.layers.resize_bilinear(
+            image, out_shape=[h_fix, w_fix], align_corners=False, align_mode=0)
+    elif cfg.AUG.AUG_METHOD == 'rangescaling':
+        size = cfg.AUG.INF_RESIZE_VALUE
+        value = fluid.layers.reduce_max(origin_shape)
+        scale = float(size) / value.astype('float32')
+        image = fluid.layers.resize_bilinear(
+            image, scale=scale, align_corners=False, align_mode=0)
+
+    # 存储resize后图像shape
+    valid_shape = fluid.layers.shape(image)[-2:]
+
+    # padding到eval_crop_size大小
+    width = cfg.EVAL_CROP_SIZE[0]
+    height = cfg.EVAL_CROP_SIZE[1]
+    pad_target = fluid.layers.assign(
+        np.array([height, width]).astype('float32'))
+    up = fluid.layers.assign(np.array([0]).astype('float32'))
+    down = pad_target[0] - valid_shape[0]
+    left = up
+    right = pad_target[1] - valid_shape[1]
+    paddings = fluid.layers.concat([up, down, left, right])
+    paddings = fluid.layers.cast(paddings, 'int32')
+    image = fluid.layers.pad2d(image, paddings=paddings, pad_value=127.5)
+
+    # normalize
+    mean = np.array(cfg.MEAN).reshape(1, len(cfg.MEAN), 1, 1)
+    mean = fluid.layers.assign(mean.astype('float32'))
+    std = np.array(cfg.STD).reshape(1, len(cfg.STD), 1, 1)
+    std = fluid.layers.assign(std.astype('float32'))
+    image = (image / 255 - mean) / std
+    # 使后面的网络能通过类似image.shape获取特征图的shape
+    image = fluid.layers.reshape(
+        image, shape=[-1, cfg.DATASET.DATA_DIM, height, width])
+    return image, valid_shape, origin_shape
+
+
+def build_model(main_prog=None, start_prog=None, phase=ModelPhase.TRAIN, **kwargs):
+    print('debugggggggggg')
+    if not ModelPhase.is_valid_phase(phase):
+        raise ValueError("ModelPhase {} is not valid!".format(phase))
+    if ModelPhase.is_train(phase):
+        width = cfg.TRAIN_CROP_SIZE[0]
+        height = cfg.TRAIN_CROP_SIZE[1]
+    else:
+        width = cfg.EVAL_CROP_SIZE[0]
+        height = cfg.EVAL_CROP_SIZE[1]
+
+    image_shape = [cfg.DATASET.DATA_DIM, height, width]
+    grt_shape = [1, height, width]
+    class_num = cfg.DATASET.NUM_CLASSES
+
+    #with fluid.program_guard(main_prog, start_prog):
+    #    with fluid.unique_name.guard():
+    # 在导出模型的时候，增加图像标准化预处理,减小预测部署时图像的处理流程
+    # 预测部署时只须对输入图像增加batch_size维度即可
+    if cfg.SLIM.KNOWLEDGE_DISTILL_IS_TEACHER:
+        print('teacher input:')
+        image = main_prog.global_block()._clone_variable(kwargs['image'],
+                                                         force_persistable=False)
+        label = main_prog.global_block()._clone_variable(kwargs['label'],
+                                                         force_persistable=False)
+        mask = main_prog.global_block()._clone_variable(kwargs['mask'],
+                                                        force_persistable=False)
+    else:
+        if ModelPhase.is_predict(phase):
+            origin_image = fluid.layers.data(
+                name='image',
+                shape=[-1, -1, -1, cfg.DATASET.DATA_DIM],
+                dtype='float32',
+                append_batch_size=False)
+            image, valid_shape, origin_shape = export_preprocess(
+                origin_image)
+
+        else:
+            image = fluid.layers.data(
+                name='image', shape=image_shape, dtype='float32')
+        label = fluid.layers.data(
+            name='label', shape=grt_shape, dtype='int32')
+        mask = fluid.layers.data(
+            name='mask', shape=grt_shape, dtype='int32')
+
+
+    # use PyReader when doing traning and evaluation
+    if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
+        py_reader = None
+        if not cfg.SLIM.KNOWLEDGE_DISTILL_IS_TEACHER:
+            py_reader = fluid.io.PyReader(
+                feed_list=[image, label, mask],
+                capacity=cfg.DATALOADER.BUF_SIZE,
+                iterable=False,
+                use_double_buffer=True)
+
+    loss_type = cfg.SOLVER.LOSS
+    if not isinstance(loss_type, list):
+        loss_type = list(loss_type)
+
+    # dice_loss或bce_loss只适用两类分割中
+    if class_num > 2 and (("dice_loss" in loss_type) or
+                          ("bce_loss" in loss_type)):
+        raise Exception(
+            "dice loss and bce loss is only applicable to binary classfication"
+        )
+
+    # 在两类分割情况下，当loss函数选择dice_loss或bce_loss的时候，最后logit输出通道数设置为1
+    if ("dice_loss" in loss_type) or ("bce_loss" in loss_type):
+        class_num = 1
+        if "softmax_loss" in loss_type:
+            raise Exception(
+                "softmax loss can not combine with dice loss or bce loss"
+            )
+    logits = seg_model(image, class_num)
+
+    # 根据选择的loss函数计算相应的损失函数
+    if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
+        loss_valid = False
+        avg_loss_list = []
+        valid_loss = []
+        if "softmax_loss" in loss_type:
+            weight = cfg.SOLVER.CROSS_ENTROPY_WEIGHT
+            avg_loss_list.append(
+                multi_softmax_with_loss(logits, label, mask, class_num, weight))
+            loss_valid = True
+            valid_loss.append("softmax_loss")
+        if "dice_loss" in loss_type:
+            avg_loss_list.append(multi_dice_loss(logits, label, mask))
+            loss_valid = True
+            valid_loss.append("dice_loss")
+        if "bce_loss" in loss_type:
+            avg_loss_list.append(multi_bce_loss(logits, label, mask))
+            loss_valid = True
+            valid_loss.append("bce_loss")
+        if not loss_valid:
+            raise Exception(
+                "SOLVER.LOSS: {} is set wrong. it should "
+                "include one of (softmax_loss, bce_loss, dice_loss) at least"
+                " example: ['softmax_loss'], ['dice_loss'], ['bce_loss', 'dice_loss']"
+                .format(cfg.SOLVER.LOSS))
+
+        invalid_loss = [x for x in loss_type if x not in valid_loss]
+        if len(invalid_loss) > 0:
+            print(
+                "Warning: the loss {} you set is invalid. it will not be included in loss computed."
+                .format(invalid_loss))
+
+        avg_loss = 0
+        for i in range(0, len(avg_loss_list)):
+            avg_loss += avg_loss_list[i]
+
+    #get pred result in original size
+    if isinstance(logits, tuple):
+        logit = logits[0]
+    else:
+        logit = logits
+
+    if logit.shape[2:] != label.shape[2:]:
+        logit = fluid.layers.resize_bilinear(logit, label.shape[2:])
+
+    # return image input and logit output for inference graph prune
+    if ModelPhase.is_predict(phase):
+        # 两类分割中，使用dice_loss或bce_loss返回的logit为单通道，进行到两通道的变换
+        if class_num == 1:
+            logit = sigmoid_to_softmax(logit)
+        else:
+            logit = softmax(logit)
+
+        # 获取有效部分
+        logit = fluid.layers.slice(
+            logit, axes=[2, 3], starts=[0, 0], ends=valid_shape)
+
+        logit = fluid.layers.resize_bilinear(
+            logit,
+            out_shape=origin_shape,
+            align_corners=False,
+            align_mode=0)
+        logit = fluid.layers.argmax(logit, axis=1)
+        return origin_image, logit
+
+    if class_num == 1:
+        out = sigmoid_to_softmax(logit)
+        out = fluid.layers.transpose(out, [0, 2, 3, 1])
+    else:
+        out = fluid.layers.transpose(logit, [0, 2, 3, 1])
+
+    pred = fluid.layers.argmax(out, axis=3)
+    pred = fluid.layers.unsqueeze(pred, axes=[3])
+    if ModelPhase.is_visual(phase):
+        if class_num == 1:
+            logit = sigmoid_to_softmax(logit)
+        else:
+            logit = softmax(logit)
+        return pred, logit
+
+    if ModelPhase.is_eval(phase):
+        return py_reader, avg_loss, pred, label, mask
+
+    if ModelPhase.is_train(phase):
+        decayed_lr = None
+        if not cfg.SLIM.KNOWLEDGE_DISTILL:
+            optimizer = solver.Solver(main_prog, start_prog)
+            decayed_lr = optimizer.optimise(avg_loss)
+        # optimizer = solver.Solver(main_prog, start_prog)
+        # decayed_lr = optimizer.optimise(avg_loss)
+        return py_reader, avg_loss, decayed_lr, pred, label, mask, image
+
+
+def to_int(string, dest="I"):
+    return struct.unpack(dest, string)[0]
+
+
+def parse_shape_from_file(filename):
+    with open(filename, "rb") as file:
+        version = file.read(4)
+        lod_level = to_int(file.read(8), dest="Q")
+        for i in range(lod_level):
+            _size = to_int(file.read(8), dest="Q")
+            _ = file.read(_size)
+        version = file.read(4)
+        tensor_desc_size = to_int(file.read(4))
+        tensor_desc = VarType.TensorDesc()
+        tensor_desc.ParseFromString(file.read(tensor_desc_size))
+    return tuple(tensor_desc.dims)
--- a/slim/distillation/train_distill.py
+++ b/slim/distillation/train_distill.py
--- a/slim/nas/README.md
+++ b/slim/nas/README.md
+>运行该示例前请安装Paddle1.6或更高版本
+
+# PaddleSeg神经网络搜索(NAS)示例
+
+在阅读本教程前，请确保您已经了解过PaddleSeg的[快速入门](../README.md#快速入门)和[基础功能](../README.md#基础功能)等章节，以便对PaddleSeg有一定的了解
+
+该文档介绍如何使用[PaddleSlim](https://paddlepaddle.github.io/PaddleSlim)对分割库中的模型进行搜索。
+
+该教程中所示操作，如无特殊说明，均在`PaddleSeg/`路径下执行。
+
+## 概述
+
+我们选取Deeplab+mobilenetv2模型作为神经网络搜索示例，该示例使用[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
+辅助完成神经网络搜索实验，具体技术细节，请您参考[神经网络搜索策略](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/docs/tutorials/nas_demo.md)。
+
+
+## 定义搜索空间
+搜索实验中，我们采用了SANAS的方式进行搜索，本次实验会对网络模型中的通道数和卷积核尺寸进行搜索。
+所以我们定义了如下搜索空间：
+- head通道模块`head_num`：定义了MobilenetV2 head模块中通道数变化区间；
+- inverse_res_block1-6`filter_num1-6`: 定义了inverse_res_block模块中通道数变化区间；
+- inverse_res_block`repeat`：定义了MobilenetV2 inverse_res_block模块中unit的个数；
+- inverse_res_block`multiply`：定义了MobilenetV2 inverse_res_block模块中expansion_factor变化区间；
+- 卷积核尺寸`k_size`：定义了MobilenetV2中卷积和尺寸大小是3x3或者5x5。
+
+根据定义的搜索空间各个区间，我们的搜索空间tokens共9位，变化区间在([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [7, 5, 8, 6, 2, 5, 8, 6, 2, 5, 8, 6, 2, 5, 10, 6, 2, 5, 10, 6, 2, 5, 12, 6, 2])范围内。  
+
+
+初始化tokens为：[4, 4, 5, 1, 0, 4, 4, 1, 0, 4, 4, 3, 0, 4, 5, 2, 0, 4, 7, 2, 0, 4, 9, 0, 0]。
+
+## 开始搜索
+首先需要安装PaddleSlim，请参考[安装教程](https://paddlepaddle.github.io/PaddleSlim/#_2)。
+
+配置paddleseg的config, 下面只展示nas相关的内容
+
+```shell
+SLIM:
+    NAS_PORT: 23333 # 端口
+    NAS_ADDRESS: "" # ip地址，作为server不用填写，作为client的时候需要填写server的ip
+    NAS_SEARCH_STEPS: 100 # 搜索多少个结构
+    NAS_START_EVAL_EPOCH: -1 # 第几个epoch开始对模型进行评估
+    NAS_IS_SERVER: True # 是否为server
+    NAS_SPACE_NAME: "MobileNetV2SpaceSeg" # 搜索空间
+```
+
+## 训练与评估
+执行以下命令，边训练边评估
+```shell
+python -u ./slim/nas/train.py --log_steps 10 --cfg configs/cityscape.yaml --use_gpu --use_mpio \
+SLIM.NAS_PORT 23333 \
+SLIM.NAS_ADDRESS "" \
+SLIM.NAS_SEARCH_STEPS 2 \
+SLIM.NAS_START_EVAL_EPOCH -1 \
+SLIM.NAS_IS_SERVER True \
+SLIM.NAS_SPACE_NAME "MobileNetV2SpaceSeg" \
+```
+
+
+## FAQ
+- 运行报错：`socket.error: [Errno 98] Address already in use`。
+
+解决方法：当前端口被占用，请修改`SLIM.NAS_PORT`端口。
+
--- a/slim/nas/deeplab.py
+++ b/slim/nas/deeplab.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import contextlib
+import paddle
+import paddle.fluid as fluid
+from utils.config import cfg
+from models.libs.model_libs import scope, name_scope
+from models.libs.model_libs import bn, bn_relu, relu
+from models.libs.model_libs import conv
+from models.libs.model_libs import separate_conv
+from models.backbone.mobilenet_v2 import MobileNetV2 as mobilenet_backbone
+from models.backbone.xception import Xception as xception_backbone
+
+def encoder(input):
+    # 编码器配置，采用ASPP架构，pooling + 1x1_conv + 三个不同尺度的空洞卷积并行, concat后1x1conv
+    # ASPP_WITH_SEP_CONV：默认为真，使用depthwise可分离卷积，否则使用普通卷积
+    # OUTPUT_STRIDE: 下采样倍数，8或16，决定aspp_ratios大小
+    # aspp_ratios：ASPP模块空洞卷积的采样率
+
+    if cfg.MODEL.DEEPLAB.OUTPUT_STRIDE == 16:
+        aspp_ratios = [6, 12, 18]
+    elif cfg.MODEL.DEEPLAB.OUTPUT_STRIDE == 8:
+        aspp_ratios = [12, 24, 36]
+    else:
+        raise Exception("deeplab only support stride 8 or 16")
+
+    param_attr = fluid.ParamAttr(
+        name=name_scope + 'weights',
+        regularizer=None,
+        initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
+    with scope('encoder'):
+        channel = 256
+        with scope("image_pool"):
+            image_avg = fluid.layers.reduce_mean(
+                input, [2, 3], keep_dim=True)
+            image_avg = bn_relu(
+                conv(
+                    image_avg,
+                    channel,
+                    1,
+                    1,
+                    groups=1,
+                    padding=0,
+                    param_attr=param_attr))
+            image_avg = fluid.layers.resize_bilinear(image_avg, input.shape[2:])
+
+        with scope("aspp0"):
+            aspp0 = bn_relu(
+                conv(
+                    input,
+                    channel,
+                    1,
+                    1,
+                    groups=1,
+                    padding=0,
+                    param_attr=param_attr))
+        with scope("aspp1"):
+            if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
+                aspp1 = separate_conv(
+                    input, channel, 1, 3, dilation=aspp_ratios[0], act=relu)
+            else:
+                aspp1 = bn_relu(
+                    conv(
+                        input,
+                        channel,
+                        stride=1,
+                        filter_size=3,
+                        dilation=aspp_ratios[0],
+                        padding=aspp_ratios[0],
+                        param_attr=param_attr))
+        with scope("aspp2"):
+            if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
+                aspp2 = separate_conv(
+                    input, channel, 1, 3, dilation=aspp_ratios[1], act=relu)
+            else:
+                aspp2 = bn_relu(
+                    conv(
+                        input,
+                        channel,
+                        stride=1,
+                        filter_size=3,
+                        dilation=aspp_ratios[1],
+                        padding=aspp_ratios[1],
+                        param_attr=param_attr))
+        with scope("aspp3"):
+            if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
+                aspp3 = separate_conv(
+                    input, channel, 1, 3, dilation=aspp_ratios[2], act=relu)
+            else:
+                aspp3 = bn_relu(
+                    conv(
+                        input,
+                        channel,
+                        stride=1,
+                        filter_size=3,
+                        dilation=aspp_ratios[2],
+                        padding=aspp_ratios[2],
+                        param_attr=param_attr))
+        with scope("concat"):
+            data = fluid.layers.concat([image_avg, aspp0, aspp1, aspp2, aspp3],
+                                       axis=1)
+            data = bn_relu(
+                conv(
+                    data,
+                    channel,
+                    1,
+                    1,
+                    groups=1,
+                    padding=0,
+                    param_attr=param_attr))
+            data = fluid.layers.dropout(data, 0.9)
+        return data
+
+
+def decoder(encode_data, decode_shortcut):
+    # 解码器配置
+    # encode_data：编码器输出
+    # decode_shortcut: 从backbone引出的分支, resize后与encode_data concat
+    # DECODER_USE_SEP_CONV: 默认为真，则concat后连接两个可分离卷积，否则为普通卷积
+    param_attr = fluid.ParamAttr(
+        name=name_scope + 'weights',
+        regularizer=None,
+        initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
+    with scope('decoder'):
+        with scope('concat'):
+            decode_shortcut = bn_relu(
+                conv(
+                    decode_shortcut,
+                    48,
+                    1,
+                    1,
+                    groups=1,
+                    padding=0,
+                    param_attr=param_attr))
+
+            encode_data = fluid.layers.resize_bilinear(
+                encode_data, decode_shortcut.shape[2:])
+            encode_data = fluid.layers.concat([encode_data, decode_shortcut],
+                                              axis=1)
+        if cfg.MODEL.DEEPLAB.DECODER_USE_SEP_CONV:
+            with scope("separable_conv1"):
+                encode_data = separate_conv(
+                    encode_data, 256, 1, 3, dilation=1, act=relu)
+            with scope("separable_conv2"):
+                encode_data = separate_conv(
+                    encode_data, 256, 1, 3, dilation=1, act=relu)
+        else:
+            with scope("decoder_conv1"):
+                encode_data = bn_relu(
+                    conv(
+                        encode_data,
+                        256,
+                        stride=1,
+                        filter_size=3,
+                        dilation=1,
+                        padding=1,
+                        param_attr=param_attr))
+            with scope("decoder_conv2"):
+                encode_data = bn_relu(
+                    conv(
+                        encode_data,
+                        256,
+                        stride=1,
+                        filter_size=3,
+                        dilation=1,
+                        padding=1,
+                        param_attr=param_attr))
+        return encode_data
+
+
+def nas_backbone(input, arch):
+    # scale = cfg.MODEL.DEEPLAB.DEPTH_MULTIPLIER
+    # output_stride = cfg.MODEL.DEEPLAB.OUTPUT_STRIDE
+    # model = mobilenet_backbone(scale=scale, output_stride=output_stride)
+    end_points = 8
+    decode_point = 3
+    data, decode_shortcuts = arch(
+        input, end_points=end_points, return_block=decode_point, output_stride=16)
+    decode_shortcut = decode_shortcuts[decode_point]
+    return data, decode_shortcut
+
+
+def deeplabv3p_nas(img, num_classes, arch=None):
+    data, decode_shortcut = nas_backbone(img, arch)
+    # 编码器解码器设置
+    cfg.MODEL.DEFAULT_EPSILON = 1e-5
+    if cfg.MODEL.DEEPLAB.ENCODER_WITH_ASPP:
+        data = encoder(data)
+    if cfg.MODEL.DEEPLAB.ENABLE_DECODER:
+        data = decoder(data, decode_shortcut)
+
+    # 根据类别数设置最后一个卷积层输出，并resize到图片原始尺寸
+    param_attr = fluid.ParamAttr(
+        name=name_scope + 'weights',
+        regularizer=fluid.regularizer.L2DecayRegularizer(
+            regularization_coeff=0.0),
+        initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
+    with scope('logit'):
+        logit = conv(
+            data,
+            num_classes,
+            1,
+            stride=1,
+            padding=0,
+            bias_attr=True,
+            param_attr=param_attr)
+        logit = fluid.layers.resize_bilinear(logit, img.shape[2:])
+
+    return logit
--- a/slim/nas/eval_nas.py
+++ b/slim/nas/eval_nas.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+# GPU memory garbage collection optimization flags
+os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
+
+import sys
+
+LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
+SEG_PATH = os.path.join(LOCAL_PATH, "../../", "pdseg")
+sys.path.append(SEG_PATH)
+sys.path.append('/workspace/codes/PaddleSlim1')
+
+import time
+import argparse
+import functools
+import pprint
+import cv2
+import numpy as np
+import paddle
+import paddle.fluid as fluid
+
+from utils.config import cfg
+from utils.timer import Timer, calculate_eta
+from model_builder import build_model
+from model_builder import ModelPhase
+from reader import SegDataset
+from metrics import ConfusionMatrix
+
+from mobilenetv2_search_space import MobileNetV2SpaceSeg
+
+def parse_args():
+    parser = argparse.ArgumentParser(description='PaddleSeg model evalution')
+    parser.add_argument(
+        '--cfg',
+        dest='cfg_file',
+        help='Config file for training (and optionally testing)',
+        default=None,
+        type=str)
+    parser.add_argument(
+        '--use_gpu',
+        dest='use_gpu',
+        help='Use gpu or cpu',
+        action='store_true',
+        default=False)
+    parser.add_argument(
+        '--use_mpio',
+        dest='use_mpio',
+        help='Use multiprocess IO or not',
+        action='store_true',
+        default=False)
+    parser.add_argument(
+        'opts',
+        help='See utils/config.py for all options',
+        default=None,
+        nargs=argparse.REMAINDER)
+    if len(sys.argv) == 1:
+        parser.print_help()
+        sys.exit(1)
+    return parser.parse_args()
+
+
+def evaluate(cfg, ckpt_dir=None, use_gpu=False, use_mpio=False, **kwargs):
+    np.set_printoptions(precision=5, suppress=True)
+
+    startup_prog = fluid.Program()
+    test_prog = fluid.Program()
+    dataset = SegDataset(
+        file_list=cfg.DATASET.VAL_FILE_LIST,
+        mode=ModelPhase.EVAL,
+        data_dir=cfg.DATASET.DATA_DIR)
+
+    def data_generator():
+        #TODO: check is batch reader compatitable with Windows
+        if use_mpio:
+            data_gen = dataset.multiprocess_generator(
+                num_processes=cfg.DATALOADER.NUM_WORKERS,
+                max_queue_size=cfg.DATALOADER.BUF_SIZE)
+        else:
+            data_gen = dataset.generator()
+
+        for b in data_gen:
+            yield b[0], b[1], b[2]
+
+    py_reader, avg_loss, pred, grts, masks = build_model(
+        test_prog, startup_prog, phase=ModelPhase.EVAL, arch=kwargs['arch'])
+
+    py_reader.decorate_sample_generator(
+        data_generator, drop_last=False, batch_size=cfg.BATCH_SIZE)
+
+    # Get device environment
+    places = fluid.cuda_places() if use_gpu else fluid.cpu_places()
+    place = places[0]
+    dev_count = len(places)
+    print("#Device count: {}".format(dev_count))
+
+    exe = fluid.Executor(place)
+    exe.run(startup_prog)
+
+    test_prog = test_prog.clone(for_test=True)
+
+    ckpt_dir = cfg.TEST.TEST_MODEL if not ckpt_dir else ckpt_dir
+
+    if not os.path.exists(ckpt_dir):
+        raise ValueError('The TEST.TEST_MODEL {} is not found'.format(ckpt_dir))
+
+    if ckpt_dir is not None:
+        print('load test model:', ckpt_dir)
+        fluid.io.load_params(exe, ckpt_dir, main_program=test_prog)
+
+    # Use streaming confusion matrix to calculate mean_iou
+    np.set_printoptions(
+        precision=4, suppress=True, linewidth=160, floatmode="fixed")
+    conf_mat = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
+    fetch_list = [avg_loss.name, pred.name, grts.name, masks.name]
+    num_images = 0
+    step = 0
+    all_step = cfg.DATASET.TEST_TOTAL_IMAGES // cfg.BATCH_SIZE + 1
+    timer = Timer()
+    timer.start()
+    py_reader.start()
+    while True:
+        try:
+            step += 1
+            loss, pred, grts, masks = exe.run(
+                test_prog, fetch_list=fetch_list, return_numpy=True)
+
+            loss = np.mean(np.array(loss))
+
+            num_images += pred.shape[0]
+            conf_mat.calculate(pred, grts, masks)
+            _, iou = conf_mat.mean_iou()
+            _, acc = conf_mat.accuracy()
+
+            speed = 1.0 / timer.elapsed_time()
+
+            print(
+                "[EVAL]step={} loss={:.5f} acc={:.4f} IoU={:.4f} step/sec={:.2f} | ETA {}"
+                .format(step, loss, acc, iou, speed,
+                        calculate_eta(all_step - step, speed)))
+            timer.restart()
+            sys.stdout.flush()
+        except fluid.core.EOFException:
+            break
+
+    category_iou, avg_iou = conf_mat.mean_iou()
+    category_acc, avg_acc = conf_mat.accuracy()
+    print("[EVAL]#image={} acc={:.4f} IoU={:.4f}".format(
+        num_images, avg_acc, avg_iou))
+    print("[EVAL]Category IoU:", category_iou)
+    print("[EVAL]Category Acc:", category_acc)
+    print("[EVAL]Kappa:{:.4f}".format(conf_mat.kappa()))
+
+    return category_iou, avg_iou, category_acc, avg_acc
+
+
+def main():
+    args = parse_args()
+    if args.cfg_file is not None:
+        cfg.update_from_file(args.cfg_file)
+    if args.opts:
+        cfg.update_from_list(args.opts)
+    cfg.check_and_infer()
+    print(pprint.pformat(cfg))
+    evaluate(cfg, **args.__dict__)
+
+
+if __name__ == '__main__':
+    main()
--- a/slim/nas/mobilenetv2_search_space.py
+++ b/slim/nas/mobilenetv2_search_space.py
+# Copyright (c) 2019  PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+from paddleslim.nas.search_space.search_space_base import SearchSpaceBase
+from paddleslim.nas.search_space.base_layer import conv_bn_layer
+from paddleslim.nas.search_space.search_space_registry import SEARCHSPACE
+from paddleslim.nas.search_space.utils import check_points
+
+__all__ = ["MobileNetV2SpaceSeg"]
+
+
+@SEARCHSPACE.register
+class MobileNetV2SpaceSeg(SearchSpaceBase):
+    def __init__(self, input_size, output_size, block_num, block_mask=None):
+        super(MobileNetV2SpaceSeg, self).__init__(input_size, output_size,
+                                               block_num, block_mask)
+        # self.head_num means the first convolution channel
+        self.head_num = np.array([3, 4, 8, 12, 16, 24, 32])  #7
+        # self.filter_num1 ~ self.filter_num6 means following convlution channel
+        self.filter_num1 = np.array([3, 4, 8, 12, 16, 24, 32, 48])  #8
+        self.filter_num2 = np.array([8, 12, 16, 24, 32, 48, 64, 80])  #8
+        self.filter_num3 = np.array([16, 24, 32, 48, 64, 80, 96, 128])  #8
+        self.filter_num4 = np.array(
+            [24, 32, 48, 64, 80, 96, 128, 144, 160, 192])  #10
+        self.filter_num5 = np.array(
+            [32, 48, 64, 80, 96, 128, 144, 160, 192, 224])  #10
+        self.filter_num6 = np.array(
+            [64, 80, 96, 128, 144, 160, 192, 224, 256, 320, 384, 512])  #12
+        # self.k_size means kernel size
+        self.k_size = np.array([3, 5])  #2
+        # self.multiply means expansion_factor of each _inverted_residual_unit
+        self.multiply = np.array([1, 2, 3, 4, 6])  #5
+        # self.repeat means repeat_num _inverted_residual_unit in each _invresi_blocks 
+        self.repeat = np.array([1, 2, 3, 4, 5, 6])  #6
+
+    def init_tokens(self):
+        """
+        The initial token.
+        The first one is the index of the first layers' channel in self.head_num,
+        each line in the following represent the index of the [expansion_factor, filter_num, repeat_num, kernel_size]
+        """
+        # original MobileNetV2
+        # yapf: disable
+        init_token_base =  [4,          # 1, 16, 1
+                4, 5, 1, 0, # 6, 24, 2
+                4, 4, 2, 0, # 6, 32, 3
+                4, 4, 3, 0, # 6, 64, 4
+                4, 5, 2, 0, # 6, 96, 3
+                4, 7, 2, 0, # 6, 160, 3
+                4, 9, 0, 0] # 6, 320, 1
+        # yapf: enable
+
+        return init_token_base
+
+    def range_table(self):
+        """
+        Get range table of current search space, constrains the range of tokens. 
+        """
+        # head_num + 6 * [multiple(expansion_factor), filter_num, repeat, kernel_size]
+        # yapf: disable
+        range_table_base =  [len(self.head_num),
+                len(self.multiply), len(self.filter_num1), len(self.repeat), len(self.k_size),
+                len(self.multiply), len(self.filter_num2), len(self.repeat), len(self.k_size),
+                len(self.multiply), len(self.filter_num3), len(self.repeat), len(self.k_size),
+                len(self.multiply), len(self.filter_num4), len(self.repeat), len(self.k_size),
+                len(self.multiply), len(self.filter_num5), len(self.repeat), len(self.k_size),
+                len(self.multiply), len(self.filter_num6), len(self.repeat), len(self.k_size)]
+        # yapf: enable
+        return range_table_base
+
+    def token2arch(self, tokens=None):
+        """
+        return net_arch function
+        """
+
+        if tokens is None:
+            tokens = self.init_tokens()
+
+        self.bottleneck_params_list = []
+        self.bottleneck_params_list.append(
+            (1, self.head_num[tokens[0]], 1, 1, 3))
+        self.bottleneck_params_list.append(
+            (self.multiply[tokens[1]], self.filter_num1[tokens[2]],
+             self.repeat[tokens[3]], 2, self.k_size[tokens[4]]))
+        self.bottleneck_params_list.append(
+            (self.multiply[tokens[5]], self.filter_num2[tokens[6]],
+             self.repeat[tokens[7]], 2, self.k_size[tokens[8]]))
+        self.bottleneck_params_list.append(
+            (self.multiply[tokens[9]], self.filter_num3[tokens[10]],
+             self.repeat[tokens[11]], 2, self.k_size[tokens[12]]))
+        self.bottleneck_params_list.append(
+            (self.multiply[tokens[13]], self.filter_num4[tokens[14]],
+             self.repeat[tokens[15]], 1, self.k_size[tokens[16]]))
+        self.bottleneck_params_list.append(
+            (self.multiply[tokens[17]], self.filter_num5[tokens[18]],
+             self.repeat[tokens[19]], 2, self.k_size[tokens[20]]))
+        self.bottleneck_params_list.append(
+            (self.multiply[tokens[21]], self.filter_num6[tokens[22]],
+             self.repeat[tokens[23]], 1, self.k_size[tokens[24]]))
+
+        def _modify_bottle_params(output_stride=None):
+            if output_stride is not None and output_stride % 2 != 0:
+                raise Exception("output stride must to be even number")
+            if output_stride is None:
+                return
+            else:
+                stride = 2
+                for i, layer_setting in enumerate(self.bottleneck_params_list):
+                    t, c, n, s, ks = layer_setting
+                    stride = stride * s
+                    if stride > output_stride:
+                        s = 1
+                    self.bottleneck_params_list[i] = (t, c, n, s, ks)
+
+        def net_arch(input,
+                     scale=1.0,
+                     return_block=None,
+                     end_points=None,
+                     output_stride=None):
+            self.scale = scale
+            _modify_bottle_params(output_stride)
+
+            decode_ends = dict()
+
+            def check_points(count, points):
+                if points is None:
+                    return False
+                else:
+                    if isinstance(points, list):
+                        return (True if count in points else False)
+                    else:
+                        return (True if count == points else False)
+
+            #conv1
+            # all padding is 'SAME' in the conv2d, can compute the actual padding automatic. 
+            input = conv_bn_layer(
+                input,
+                num_filters=int(32 * self.scale),
+                filter_size=3,
+                stride=2,
+                padding='SAME',
+                act='relu6',
+                name='mobilenetv2_conv1')
+            layer_count = 1
+
+            depthwise_output = None
+            # bottleneck sequences
+            in_c = int(32 * self.scale)
+            for i, layer_setting in enumerate(self.bottleneck_params_list):
+                t, c, n, s, k = layer_setting
+                layer_count += 1
+                ### return_block and end_points means block num
+                if check_points((layer_count - 1), return_block):
+                    decode_ends[layer_count - 1] = depthwise_output
+
+                if check_points((layer_count - 1), end_points):
+                    return input, decode_ends
+                input, depthwise_output = self._invresi_blocks(
+                    input=input,
+                    in_c=in_c,
+                    t=t,
+                    c=int(c * self.scale),
+                    n=n,
+                    s=s,
+                    k=k,
+                    name='mobilenetv2_conv' + str(i))
+                in_c = int(c * self.scale)
+
+            ### return_block and end_points means block num
+            if check_points(layer_count, return_block):
+                decode_ends[layer_count] = depthwise_output
+
+            if check_points(layer_count, end_points):
+                return input, decode_ends
+            # last conv
+            input = conv_bn_layer(
+                input=input,
+                num_filters=int(1280 * self.scale)
+                if self.scale > 1.0 else 1280,
+                filter_size=1,
+                stride=1,
+                padding='SAME',
+                act='relu6',
+                name='mobilenetv2_conv' + str(i + 1))
+
+            input = fluid.layers.pool2d(
+                input=input,
+                pool_type='avg',
+                global_pooling=True,
+                name='mobilenetv2_last_pool')
+
+            return input
+
+        return net_arch
+
+    def _shortcut(self, input, data_residual):
+        """Build shortcut layer.
+        Args:
+            input(Variable): input.
+            data_residual(Variable): residual layer.
+        Returns:
+            Variable, layer output.
+        """
+        return fluid.layers.elementwise_add(input, data_residual)
+
+    def _inverted_residual_unit(self,
+                                input,
+                                num_in_filter,
+                                num_filters,
+                                ifshortcut,
+                                stride,
+                                filter_size,
+                                expansion_factor,
+                                reduction_ratio=4,
+                                name=None):
+        """Build inverted residual unit.
+        Args:
+            input(Variable), input.
+            num_in_filter(int), number of in filters.
+            num_filters(int), number of filters.
+            ifshortcut(bool), whether using shortcut.
+            stride(int), stride.
+            filter_size(int), filter size.
+            padding(str|int|list), padding.
+            expansion_factor(float), expansion factor.
+            name(str), name.
+        Returns:
+            Variable, layers output.
+        """
+        num_expfilter = int(round(num_in_filter * expansion_factor))
+        channel_expand = conv_bn_layer(
+            input=input,
+            num_filters=num_expfilter,
+            filter_size=1,
+            stride=1,
+            padding='SAME',
+            num_groups=1,
+            act='relu6',
+            name=name + '_expand')
+
+        bottleneck_conv = conv_bn_layer(
+            input=channel_expand,
+            num_filters=num_expfilter,
+            filter_size=filter_size,
+            stride=stride,
+            padding='SAME',
+            num_groups=num_expfilter,
+            act='relu6',
+            name=name + '_dwise',
+            use_cudnn=False)
+
+        depthwise_output = bottleneck_conv
+
+        linear_out = conv_bn_layer(
+            input=bottleneck_conv,
+            num_filters=num_filters,
+            filter_size=1,
+            stride=1,
+            padding='SAME',
+            num_groups=1,
+            act=None,
+            name=name + '_linear')
+        out = linear_out
+        if ifshortcut:
+            out = self._shortcut(input=input, data_residual=out)
+        return out, depthwise_output
+
+    def _invresi_blocks(self, input, in_c, t, c, n, s, k, name=None):
+        """Build inverted residual blocks.
+        Args:
+            input: Variable, input.
+            in_c: int, number of in filters.
+            t: float, expansion factor.
+            c: int, number of filters.
+            n: int, number of layers.
+            s: int, stride.
+            k: int, filter size.
+            name: str, name.
+        Returns:
+            Variable, layers output.
+        """
+        first_block, depthwise_output = self._inverted_residual_unit(
+            input=input,
+            num_in_filter=in_c,
+            num_filters=c,
+            ifshortcut=False,
+            stride=s,
+            filter_size=k,
+            expansion_factor=t,
+            name=name + '_1')
+
+        last_residual_block = first_block
+        last_c = c
+
+        for i in range(1, n):
+            last_residual_block, depthwise_output = self._inverted_residual_unit(
+                input=last_residual_block,
+                num_in_filter=last_c,
+                num_filters=c,
+                ifshortcut=True,
+                stride=1,
+                filter_size=k,
+                expansion_factor=t,
+                name=name + '_' + str(i + 1))
+        return last_residual_block, depthwise_output
--- a/slim/nas/model_builder.py
+++ b/slim/nas/model_builder.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import struct
+
+import paddle.fluid as fluid
+import numpy as np
+from paddle.fluid.proto.framework_pb2 import VarType
+
+import solver
+from utils.config import cfg
+from loss import multi_softmax_with_loss
+from loss import multi_dice_loss
+from loss import multi_bce_loss
+import deeplab
+
+
+class ModelPhase(object):
+    """
+    Standard name for model phase in PaddleSeg
+
+    The following standard keys are defined:
+    * `TRAIN`: training mode.
+    * `EVAL`: testing/evaluation mode.
+    * `PREDICT`: prediction/inference mode.
+    * `VISUAL` : visualization mode
+    """
+
+    TRAIN = 'train'
+    EVAL = 'eval'
+    PREDICT = 'predict'
+    VISUAL = 'visual'
+
+    @staticmethod
+    def is_train(phase):
+        return phase == ModelPhase.TRAIN
+
+    @staticmethod
+    def is_predict(phase):
+        return phase == ModelPhase.PREDICT
+
+    @staticmethod
+    def is_eval(phase):
+        return phase == ModelPhase.EVAL
+
+    @staticmethod
+    def is_visual(phase):
+        return phase == ModelPhase.VISUAL
+
+    @staticmethod
+    def is_valid_phase(phase):
+        """ Check valid phase """
+        if ModelPhase.is_train(phase) or ModelPhase.is_predict(phase) \
+                or ModelPhase.is_eval(phase) or ModelPhase.is_visual(phase):
+            return True
+
+        return False
+
+
+def seg_model(image, class_num, arch):
+    model_name = cfg.MODEL.MODEL_NAME
+    if model_name == 'deeplabv3p':
+        logits = deeplab.deeplabv3p_nas(image, class_num, arch)
+    else:
+        raise Exception(
+            "unknow model name, only support deeplabv3p"
+        )
+    return logits
+
+
+def softmax(logit):
+    logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+    logit = fluid.layers.softmax(logit)
+    logit = fluid.layers.transpose(logit, [0, 3, 1, 2])
+    return logit
+
+
+def sigmoid_to_softmax(logit):
+    """
+    one channel to two channel
+    """
+    logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+    logit = fluid.layers.sigmoid(logit)
+    logit_back = 1 - logit
+    logit = fluid.layers.concat([logit_back, logit], axis=-1)
+    logit = fluid.layers.transpose(logit, [0, 3, 1, 2])
+    return logit
+
+
+def export_preprocess(image):
+    """导出模型的预处理流程"""
+
+    image = fluid.layers.transpose(image, [0, 3, 1, 2])
+    origin_shape = fluid.layers.shape(image)[-2:]
+
+    # 不同AUG_METHOD方法的resize
+    if cfg.AUG.AUG_METHOD == 'unpadding':
+        h_fix = cfg.AUG.FIX_RESIZE_SIZE[1]
+        w_fix = cfg.AUG.FIX_RESIZE_SIZE[0]
+        image = fluid.layers.resize_bilinear(
+            image, out_shape=[h_fix, w_fix], align_corners=False, align_mode=0)
+    elif cfg.AUG.AUG_METHOD == 'rangescaling':
+        size = cfg.AUG.INF_RESIZE_VALUE
+        value = fluid.layers.reduce_max(origin_shape)
+        scale = float(size) / value.astype('float32')
+        image = fluid.layers.resize_bilinear(
+            image, scale=scale, align_corners=False, align_mode=0)
+
+    # 存储resize后图像shape
+    valid_shape = fluid.layers.shape(image)[-2:]
+
+    # padding到eval_crop_size大小
+    width = cfg.EVAL_CROP_SIZE[0]
+    height = cfg.EVAL_CROP_SIZE[1]
+    pad_target = fluid.layers.assign(
+        np.array([height, width]).astype('float32'))
+    up = fluid.layers.assign(np.array([0]).astype('float32'))
+    down = pad_target[0] - valid_shape[0]
+    left = up
+    right = pad_target[1] - valid_shape[1]
+    paddings = fluid.layers.concat([up, down, left, right])
+    paddings = fluid.layers.cast(paddings, 'int32')
+    image = fluid.layers.pad2d(image, paddings=paddings, pad_value=127.5)
+
+    # normalize
+    mean = np.array(cfg.MEAN).reshape(1, len(cfg.MEAN), 1, 1)
+    mean = fluid.layers.assign(mean.astype('float32'))
+    std = np.array(cfg.STD).reshape(1, len(cfg.STD), 1, 1)
+    std = fluid.layers.assign(std.astype('float32'))
+    image = (image / 255 - mean) / std
+    # 使后面的网络能通过类似image.shape获取特征图的shape
+    image = fluid.layers.reshape(
+        image, shape=[-1, cfg.DATASET.DATA_DIM, height, width])
+    return image, valid_shape, origin_shape
+
+
+def build_model(main_prog, start_prog, phase=ModelPhase.TRAIN, arch=None):
+    if not ModelPhase.is_valid_phase(phase):
+        raise ValueError("ModelPhase {} is not valid!".format(phase))
+    if ModelPhase.is_train(phase):
+        width = cfg.TRAIN_CROP_SIZE[0]
+        height = cfg.TRAIN_CROP_SIZE[1]
+    else:
+        width = cfg.EVAL_CROP_SIZE[0]
+        height = cfg.EVAL_CROP_SIZE[1]
+
+    image_shape = [cfg.DATASET.DATA_DIM, height, width]
+    grt_shape = [1, height, width]
+    class_num = cfg.DATASET.NUM_CLASSES
+
+    with fluid.program_guard(main_prog, start_prog):
+        with fluid.unique_name.guard():
+            # 在导出模型的时候，增加图像标准化预处理,减小预测部署时图像的处理流程
+            # 预测部署时只须对输入图像增加batch_size维度即可
+            if ModelPhase.is_predict(phase):
+                origin_image = fluid.layers.data(
+                    name='image',
+                    shape=[-1, -1, -1, cfg.DATASET.DATA_DIM],
+                    dtype='float32',
+                    append_batch_size=False)
+                image, valid_shape, origin_shape = export_preprocess(
+                    origin_image)
+
+            else:
+                image = fluid.layers.data(
+                    name='image', shape=image_shape, dtype='float32')
+            label = fluid.layers.data(
+                name='label', shape=grt_shape, dtype='int32')
+            mask = fluid.layers.data(
+                name='mask', shape=grt_shape, dtype='int32')
+
+            # use PyReader when doing traning and evaluation
+            if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
+                py_reader = fluid.io.PyReader(
+                    feed_list=[image, label, mask],
+                    capacity=cfg.DATALOADER.BUF_SIZE,
+                    iterable=False,
+                    use_double_buffer=True)
+
+            loss_type = cfg.SOLVER.LOSS
+            if not isinstance(loss_type, list):
+                loss_type = list(loss_type)
+
+            # dice_loss或bce_loss只适用两类分割中
+            if class_num > 2 and (("dice_loss" in loss_type) or
+                                  ("bce_loss" in loss_type)):
+                raise Exception(
+                    "dice loss and bce loss is only applicable to binary classfication"
+                )
+
+            # 在两类分割情况下，当loss函数选择dice_loss或bce_loss的时候，最后logit输出通道数设置为1
+            if ("dice_loss" in loss_type) or ("bce_loss" in loss_type):
+                class_num = 1
+                if "softmax_loss" in loss_type:
+                    raise Exception(
+                        "softmax loss can not combine with dice loss or bce loss"
+                    )
+            logits = seg_model(image, class_num, arch)
+
+            # 根据选择的loss函数计算相应的损失函数
+            if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
+                loss_valid = False
+                avg_loss_list = []
+                valid_loss = []
+                if "softmax_loss" in loss_type:
+                    weight = cfg.SOLVER.CROSS_ENTROPY_WEIGHT
+                    avg_loss_list.append(
+                        multi_softmax_with_loss(logits, label, mask, class_num, weight))
+                    loss_valid = True
+                    valid_loss.append("softmax_loss")
+                if "dice_loss" in loss_type:
+                    avg_loss_list.append(multi_dice_loss(logits, label, mask))
+                    loss_valid = True
+                    valid_loss.append("dice_loss")
+                if "bce_loss" in loss_type:
+                    avg_loss_list.append(multi_bce_loss(logits, label, mask))
+                    loss_valid = True
+                    valid_loss.append("bce_loss")
+                if not loss_valid:
+                    raise Exception(
+                        "SOLVER.LOSS: {} is set wrong. it should "
+                        "include one of (softmax_loss, bce_loss, dice_loss) at least"
+                        " example: ['softmax_loss'], ['dice_loss'], ['bce_loss', 'dice_loss']"
+                        .format(cfg.SOLVER.LOSS))
+
+                invalid_loss = [x for x in loss_type if x not in valid_loss]
+                if len(invalid_loss) > 0:
+                    print(
+                        "Warning: the loss {} you set is invalid. it will not be included in loss computed."
+                        .format(invalid_loss))
+
+                avg_loss = 0
+                for i in range(0, len(avg_loss_list)):
+                    avg_loss += avg_loss_list[i]
+
+            #get pred result in original size
+            if isinstance(logits, tuple):
+                logit = logits[0]
+            else:
+                logit = logits
+
+            if logit.shape[2:] != label.shape[2:]:
+                logit = fluid.layers.resize_bilinear(logit, label.shape[2:])
+
+            # return image input and logit output for inference graph prune
+            if ModelPhase.is_predict(phase):
+                # 两类分割中，使用dice_loss或bce_loss返回的logit为单通道，进行到两通道的变换
+                if class_num == 1:
+                    logit = sigmoid_to_softmax(logit)
+                else:
+                    logit = softmax(logit)
+
+                # 获取有效部分
+                logit = fluid.layers.slice(
+                    logit, axes=[2, 3], starts=[0, 0], ends=valid_shape)
+
+                logit = fluid.layers.resize_bilinear(
+                    logit,
+                    out_shape=origin_shape,
+                    align_corners=False,
+                    align_mode=0)
+                logit = fluid.layers.argmax(logit, axis=1)
+                return origin_image, logit
+
+            if class_num == 1:
+                out = sigmoid_to_softmax(logit)
+                out = fluid.layers.transpose(out, [0, 2, 3, 1])
+            else:
+                out = fluid.layers.transpose(logit, [0, 2, 3, 1])
+
+            pred = fluid.layers.argmax(out, axis=3)
+            pred = fluid.layers.unsqueeze(pred, axes=[3])
+            if ModelPhase.is_visual(phase):
+                if class_num == 1:
+                    logit = sigmoid_to_softmax(logit)
+                else:
+                    logit = softmax(logit)
+                return pred, logit
+
+            if ModelPhase.is_eval(phase):
+                return py_reader, avg_loss, pred, label, mask
+
+            if ModelPhase.is_train(phase):
+                optimizer = solver.Solver(main_prog, start_prog)
+                decayed_lr = optimizer.optimise(avg_loss)
+                return py_reader, avg_loss, decayed_lr, pred, label, mask
+
+
+def to_int(string, dest="I"):
+    return struct.unpack(dest, string)[0]
+
+
+def parse_shape_from_file(filename):
+    with open(filename, "rb") as file:
+        version = file.read(4)
+        lod_level = to_int(file.read(8), dest="Q")
+        for i in range(lod_level):
+            _size = to_int(file.read(8), dest="Q")
+            _ = file.read(_size)
+        version = file.read(4)
+        tensor_desc_size = to_int(file.read(4))
+        tensor_desc = VarType.TensorDesc()
+        tensor_desc.ParseFromString(file.read(tensor_desc_size))
+    return tuple(tensor_desc.dims)
--- a/slim/nas/train_nas.py
+++ b/slim/nas/train_nas.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+# GPU memory garbage collection optimization flags
+os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
+
+import sys
+
+LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
+SEG_PATH = os.path.join(LOCAL_PATH, "../../", "pdseg")
+sys.path.append(SEG_PATH)
+
+import argparse
+import pprint
+import random
+import shutil
+import functools
+
+import paddle
+import numpy as np
+import paddle.fluid as fluid
+
+from utils.config import cfg
+from utils.timer import Timer, calculate_eta
+from metrics import ConfusionMatrix
+from reader import SegDataset
+from model_builder import build_model
+from model_builder import ModelPhase
+from model_builder import parse_shape_from_file
+from eval_nas import evaluate
+from vis import visualize
+from utils import dist_utils
+
+from mobilenetv2_search_space import MobileNetV2SpaceSeg
+from paddleslim.nas.search_space.search_space_factory import SearchSpaceFactory
+from paddleslim.analysis import flops
+from paddleslim.nas.sa_nas import SANAS
+from paddleslim.nas import search_space
+
+def parse_args():
+    parser = argparse.ArgumentParser(description='PaddleSeg training')
+    parser.add_argument(
+        '--cfg',
+        dest='cfg_file',
+        help='Config file for training (and optionally testing)',
+        default=None,
+        type=str)
+    parser.add_argument(
+        '--use_gpu',
+        dest='use_gpu',
+        help='Use gpu or cpu',
+        action='store_true',
+        default=False)
+    parser.add_argument(
+        '--use_mpio',
+        dest='use_mpio',
+        help='Use multiprocess I/O or not',
+        action='store_true',
+        default=False)
+    parser.add_argument(
+        '--log_steps',
+        dest='log_steps',
+        help='Display logging information at every log_steps',
+        default=10,
+        type=int)
+    parser.add_argument(
+        '--debug',
+        dest='debug',
+        help='debug mode, display detail information of training',
+        action='store_true')
+    parser.add_argument(
+        '--use_tb',
+        dest='use_tb',
+        help='whether to record the data during training to Tensorboard',
+        action='store_true')
+    parser.add_argument(
+        '--tb_log_dir',
+        dest='tb_log_dir',
+        help='Tensorboard logging directory',
+        default=None,
+        type=str)
+    parser.add_argument(
+        '--do_eval',
+        dest='do_eval',
+        help='Evaluation models result on every new checkpoint',
+        action='store_true')
+    parser.add_argument(
+        'opts',
+        help='See utils/config.py for all options',
+        default=None,
+        nargs=argparse.REMAINDER)
+    parser.add_argument(
+        '--enable_ce',
+        dest='enable_ce',
+        help='If set True, enable continuous evaluation job.'
+        'This flag is only used for internal test.',
+        action='store_true')
+    return parser.parse_args()
+
+
+def save_vars(executor, dirname, program=None, vars=None):
+    """
+    Temporary resolution for Win save variables compatability.
+    Will fix in PaddlePaddle v1.5.2
+    """
+
+    save_program = fluid.Program()
+    save_block = save_program.global_block()
+
+    for each_var in vars:
+        # NOTE: don't save the variable which type is RAW
+        if each_var.type == fluid.core.VarDesc.VarType.RAW:
+            continue
+        new_var = save_block.create_var(
+            name=each_var.name,
+            shape=each_var.shape,
+            dtype=each_var.dtype,
+            type=each_var.type,
+            lod_level=each_var.lod_level,
+            persistable=True)
+        file_path = os.path.join(dirname, new_var.name)
+        file_path = os.path.normpath(file_path)
+        save_block.append_op(
+            type='save',
+            inputs={'X': [new_var]},
+            outputs={},
+            attrs={'file_path': file_path})
+
+    executor.run(save_program)
+
+
+def save_checkpoint(exe, program, ckpt_name):
+    """
+    Save checkpoint for evaluation or resume training
+    """
+    ckpt_dir = os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, str(ckpt_name))
+    print("Save model checkpoint to {}".format(ckpt_dir))
+    if not os.path.isdir(ckpt_dir):
+        os.makedirs(ckpt_dir)
+
+    save_vars(
+        exe,
+        ckpt_dir,
+        program,
+        vars=list(filter(fluid.io.is_persistable, program.list_vars())))
+
+    return ckpt_dir
+
+
+def load_checkpoint(exe, program):
+    """
+    Load checkpoiont from pretrained model directory for resume training
+    """
+
+    print('Resume model training from:', cfg.TRAIN.RESUME_MODEL_DIR)
+    if not os.path.exists(cfg.TRAIN.RESUME_MODEL_DIR):
+        raise ValueError("TRAIN.PRETRAIN_MODEL {} not exist!".format(
+            cfg.TRAIN.RESUME_MODEL_DIR))
+
+    fluid.io.load_persistables(
+        exe, cfg.TRAIN.RESUME_MODEL_DIR, main_program=program)
+
+    model_path = cfg.TRAIN.RESUME_MODEL_DIR
+    # Check is path ended by path spearator
+    if model_path[-1] == os.sep:
+        model_path = model_path[0:-1]
+    epoch_name = os.path.basename(model_path)
+    # If resume model is final model
+    if epoch_name == 'final':
+        begin_epoch = cfg.SOLVER.NUM_EPOCHS
+    # If resume model path is end of digit, restore epoch status
+    elif epoch_name.isdigit():
+        epoch = int(epoch_name)
+        begin_epoch = epoch + 1
+    else:
+        raise ValueError("Resume model path is not valid!")
+    print("Model checkpoint loaded successfully!")
+
+    return begin_epoch
+
+
+def update_best_model(ckpt_dir):
+    best_model_dir = os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, 'best_model')
+    if os.path.exists(best_model_dir):
+        shutil.rmtree(best_model_dir)
+    shutil.copytree(ckpt_dir, best_model_dir)
+
+
+def print_info(*msg):
+    if cfg.TRAINER_ID == 0:
+        print(*msg)
+
+
+def train(cfg):
+    startup_prog = fluid.Program()
+    train_prog = fluid.Program()
+    if args.enable_ce:
+        startup_prog.random_seed = 1000
+        train_prog.random_seed = 1000
+    drop_last = True
+
+    dataset = SegDataset(
+        file_list=cfg.DATASET.TRAIN_FILE_LIST,
+        mode=ModelPhase.TRAIN,
+        shuffle=True,
+        data_dir=cfg.DATASET.DATA_DIR)
+
+    def data_generator():
+        if args.use_mpio:
+            data_gen = dataset.multiprocess_generator(
+                num_processes=cfg.DATALOADER.NUM_WORKERS,
+                max_queue_size=cfg.DATALOADER.BUF_SIZE)
+        else:
+            data_gen = dataset.generator()
+
+        batch_data = []
+        for b in data_gen:
+            batch_data.append(b)
+            if len(batch_data) == (cfg.BATCH_SIZE // cfg.NUM_TRAINERS):
+                for item in batch_data:
+                    yield item[0], item[1], item[2]
+                batch_data = []
+        # If use sync batch norm strategy, drop last batch if number of samples
+        # in batch_data is less then cfg.BATCH_SIZE to avoid NCCL hang issues
+        if not cfg.TRAIN.SYNC_BATCH_NORM:
+            for item in batch_data:
+                yield item[0], item[1], item[2]
+
+    # Get device environment
+    # places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
+    # place = places[0]
+    gpu_id = int(os.environ.get('FLAGS_selected_gpus', 0))
+    place = fluid.CUDAPlace(gpu_id) if args.use_gpu else fluid.CPUPlace()
+    places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
+
+    # Get number of GPU
+    dev_count = cfg.NUM_TRAINERS if cfg.NUM_TRAINERS > 1 else len(places)
+    print_info("#Device count: {}".format(dev_count))
+
+    # Make sure BATCH_SIZE can divided by GPU cards
+    assert cfg.BATCH_SIZE % dev_count == 0, (
+        'BATCH_SIZE:{} not divisble by number of GPUs:{}'.format(
+            cfg.BATCH_SIZE, dev_count))
+    # If use multi-gpu training mode, batch data will allocated to each GPU evenly
+    batch_size_per_dev = cfg.BATCH_SIZE // dev_count
+    print_info("batch_size_per_dev: {}".format(batch_size_per_dev))
+
+    config_info = {'input_size': 769, 'output_size': 1, 'block_num': 7}
+    config = ([(cfg.SLIM.NAS_SPACE_NAME, config_info)])
+    factory = SearchSpaceFactory()
+    space = factory.get_search_space(config)
+
+    port = cfg.SLIM.NAS_PORT
+    server_address = (cfg.SLIM.NAS_ADDRESS, port)
+    sa_nas = SANAS(config, server_addr=server_address, search_steps=cfg.SLIM.NAS_SEARCH_STEPS,
+                   is_server=cfg.SLIM.NAS_IS_SERVER)
+    for step in range(cfg.SLIM.NAS_SEARCH_STEPS):
+        arch = sa_nas.next_archs()[0]
+
+        start_prog = fluid.Program()
+        train_prog = fluid.Program()
+
+        py_reader, avg_loss, lr, pred, grts, masks = build_model(
+            train_prog, start_prog, arch=arch, phase=ModelPhase.TRAIN)
+
+        cur_flops = flops(train_prog)
+        print('current step:', step, 'flops:', cur_flops)
+
+        py_reader.decorate_sample_generator(
+            data_generator, batch_size=batch_size_per_dev, drop_last=drop_last)
+
+        exe = fluid.Executor(place)
+        exe.run(start_prog)
+
+        exec_strategy = fluid.ExecutionStrategy()
+        # Clear temporary variables every 100 iteration
+        if args.use_gpu:
+            exec_strategy.num_threads = fluid.core.get_cuda_device_count()
+        exec_strategy.num_iteration_per_drop_scope = 100
+        build_strategy = fluid.BuildStrategy()
+
+        if cfg.NUM_TRAINERS > 1 and args.use_gpu:
+            dist_utils.prepare_for_multi_process(exe, build_strategy, train_prog)
+            exec_strategy.num_threads = 1
+
+        if cfg.TRAIN.SYNC_BATCH_NORM and args.use_gpu:
+            if dev_count > 1:
+                # Apply sync batch norm strategy
+                print_info("Sync BatchNorm strategy is effective.")
+                build_strategy.sync_batch_norm = True
+            else:
+                print_info(
+                    "Sync BatchNorm strategy will not be effective if GPU device"
+                    " count <= 1")
+        compiled_train_prog = fluid.CompiledProgram(train_prog).with_data_parallel(
+            loss_name=avg_loss.name,
+            exec_strategy=exec_strategy,
+            build_strategy=build_strategy)
+
+        # Resume training
+        begin_epoch = cfg.SOLVER.BEGIN_EPOCH
+        if cfg.TRAIN.RESUME_MODEL_DIR:
+            begin_epoch = load_checkpoint(exe, train_prog)
+        # Load pretrained model
+        elif os.path.exists(cfg.TRAIN.PRETRAINED_MODEL_DIR):
+            print_info('Pretrained model dir: ', cfg.TRAIN.PRETRAINED_MODEL_DIR)
+            load_vars = []
+            load_fail_vars = []
+
+            def var_shape_matched(var, shape):
+                """
+                Check whehter persitable variable shape is match with current network
+                """
+                var_exist = os.path.exists(
+                    os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name))
+                if var_exist:
+                    var_shape = parse_shape_from_file(
+                        os.path.join(cfg.TRAIN.PRETRAINED_MODEL_DIR, var.name))
+                    return var_shape == shape
+                return False
+
+            for x in train_prog.list_vars():
+                if isinstance(x, fluid.framework.Parameter):
+                    shape = tuple(fluid.global_scope().find_var(
+                        x.name).get_tensor().shape())
+                    if var_shape_matched(x, shape):
+                        load_vars.append(x)
+                    else:
+                        load_fail_vars.append(x)
+
+            fluid.io.load_vars(
+                exe, dirname=cfg.TRAIN.PRETRAINED_MODEL_DIR, vars=load_vars)
+            for var in load_vars:
+                print_info("Parameter[{}] loaded sucessfully!".format(var.name))
+            for var in load_fail_vars:
+                print_info(
+                    "Parameter[{}] don't exist or shape does not match current network, skip"
+                    " to load it.".format(var.name))
+            print_info("{}/{} pretrained parameters loaded successfully!".format(
+                len(load_vars),
+                len(load_vars) + len(load_fail_vars)))
+        else:
+            print_info(
+                'Pretrained model dir {} not exists, training from scratch...'.
+                    format(cfg.TRAIN.PRETRAINED_MODEL_DIR))
+
+        fetch_list = [avg_loss.name, lr.name]
+
+        global_step = 0
+        all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE
+        if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True:
+            all_step += 1
+        all_step *= (cfg.SOLVER.NUM_EPOCHS - begin_epoch + 1)
+
+        avg_loss = 0.0
+        timer = Timer()
+        timer.start()
+        if begin_epoch > cfg.SOLVER.NUM_EPOCHS:
+            raise ValueError(
+                ("begin epoch[{}] is larger than cfg.SOLVER.NUM_EPOCHS[{}]").format(
+                    begin_epoch, cfg.SOLVER.NUM_EPOCHS))
+
+        if args.use_mpio:
+            print_info("Use multiprocess reader")
+        else:
+            print_info("Use multi-thread reader")
+
+        best_miou = 0.0
+        for epoch in range(begin_epoch, cfg.SOLVER.NUM_EPOCHS + 1):
+            py_reader.start()
+            while True:
+                try:
+                    loss, lr = exe.run(
+                        program=compiled_train_prog,
+                        fetch_list=fetch_list,
+                        return_numpy=True)
+                    avg_loss += np.mean(np.array(loss))
+                    global_step += 1
+
+                    if global_step % args.log_steps == 0 and cfg.TRAINER_ID == 0:
+                        avg_loss /= args.log_steps
+                        speed = args.log_steps / timer.elapsed_time()
+                        print((
+                                  "epoch={} step={} lr={:.5f} loss={:.4f} step/sec={:.3f} | ETA {}"
+                              ).format(epoch, global_step, lr[0], avg_loss, speed,
+                                       calculate_eta(all_step - global_step, speed)))
+
+                        sys.stdout.flush()
+                        avg_loss = 0.0
+                        timer.restart()
+
+                except fluid.core.EOFException:
+                    py_reader.reset()
+                    break
+                except Exception as e:
+                    print(e)
+            if epoch > cfg.SLIM.NAS_START_EVAL_EPOCH:
+                ckpt_dir = save_checkpoint(exe, train_prog, '{}_tmp'.format(port))
+                _, mean_iou, _, mean_acc = evaluate(
+                    cfg=cfg,
+                    arch=arch,
+                    ckpt_dir=ckpt_dir,
+                    use_gpu=args.use_gpu,
+                    use_mpio=args.use_mpio)
+                if best_miou < mean_iou:
+                    print('search step {}, epoch {} best iou {}'.format(step, epoch, mean_iou))
+                    best_miou = mean_iou
+
+        sa_nas.reward(float(best_miou))
+
+
+def main(args):
+    if args.cfg_file is not None:
+        cfg.update_from_file(args.cfg_file)
+    if args.opts:
+        cfg.update_from_list(args.opts)
+    if args.enable_ce:
+        random.seed(0)
+        np.random.seed(0)
+
+    cfg.TRAINER_ID = int(os.getenv("PADDLE_TRAINER_ID", 0))
+    cfg.NUM_TRAINERS = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
+
+    cfg.check_and_infer()
+    print_info(pprint.pformat(cfg))
+    train(cfg)
+
+
+if __name__ == '__main__':
+    args = parse_args()
+    if fluid.core.is_compiled_with_cuda() != True and args.use_gpu == True:
+        print(
+            "You can not set use_gpu = True in the model because you are using paddlepaddle-cpu."
+        )
+        print(
+            "Please: 1. Install paddlepaddle-gpu to run your models on GPU or 2. Set use_gpu=False to run models on CPU."
+        )
+        sys.exit(1)
+    main(args)
--- a/slim/prune/README.md
+++ b/slim/prune/README.md
+# PaddleSeg剪裁教程
+
+在阅读本教程前，请确保您已经了解过PaddleSeg的[快速入门](../README.md#快速入门)和[基础功能](../README.md#基础功能)等章节，以便对PaddleSeg有一定的了解
+
+该文档介绍如何使用[PaddleSlim](https://paddlepaddle.github.io/PaddleSlim)的卷积通道剪裁接口对检测库中的模型的卷积层的通道数进行剪裁。
+
+在分割库中，可以直接调用`PaddleSeg/slim/prune/train_prune.py`脚本实现剪裁，在该脚本中调用了PaddleSlim的[paddleslim.prune.Pruner](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/#Pruner)接口。
+
+该教程中所示操作，如无特殊说明，均在`PaddleSeg/`路径下执行。
+
+## 1. 数据与预训练模型准备
+执行如下命令，下载cityscapes数据集
+```
+python dataset/download_cityscapes.py
+```
+参照[预训练模型列表](../../docs/model_zoo.md)获取所需预训练模型
+
+## 2. 确定待分析参数
+
+我们通过剪裁卷积层参数达到缩减卷积层通道数的目的，在剪裁之前，我们需要确定待裁卷积层的参数的名称。
+通过以下命令查看当前模型的所有参数：
+
+```python
+# 查看模型所有Paramters
+for x in train_prog.list_vars():
+    if isinstance(x, fluid.framework.Parameter):
+        print(x.name, x.shape)
+            
+```
+
+通过观察参数名称和参数的形状，筛选出所有卷积层参数，并确定要裁剪的卷积层参数。
+
+## 3. 启动剪裁任务
+
+使用`train_prune.py`启动裁剪任务时，通过`SLIM.PRUNE_PARAMS`选项指定待裁剪的参数名称列表，参数名之间用逗号分隔，通过`SLIM.PRUNE_RATIOS`选项指定各个参数被裁掉的比例。
+
+```shell
+CUDA_VISIBLE_DEVICES=0 
+python -u ./slim/prune/train_prune.py --log_steps 10 --cfg configs/cityscape_fast_scnn.yaml --use_gpu --use_mpio \
+SLIM.PRUNE_PARAMS 'learning_to_downsample/weights,learning_to_downsample/dsconv1/pointwise/weights,learning_to_downsample/dsconv2/pointwise/weights' \
+SLIM.PRUNE_RATIOS '[0.1,0.1,0.1]'
+```
+这里我们选取三个参数，按0.1的比例剪裁。
+
+## 4. 评估
+
+```shell
+CUDA_VISIBLE_DEVICES=0 
+python -u ./slim/prune/eval_prune.py --cfg configs/cityscape_fast_scnn.yaml --use_gpu --use_mpio \
+TEST.TEST_MODEL your_trained_model \
+```
--- a/slim/prune/eval_prune.py
+++ b/slim/prune/eval_prune.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+# GPU memory garbage collection optimization flags
+os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
+
+import sys
+
+LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
+SEG_PATH = os.path.join(LOCAL_PATH, "../../", "pdseg")
+sys.path.append(SEG_PATH)
+sys.path.append('/workspace/codes/PaddleSlim1')
+
+import time
+import argparse
+import functools
+import pprint
+import cv2
+import numpy as np
+import paddle
+import paddle.fluid as fluid
+
+from utils.config import cfg
+from utils.timer import Timer, calculate_eta
+from models.model_builder import build_model
+from models.model_builder import ModelPhase
+from reader import SegDataset
+from metrics import ConfusionMatrix
+
+from paddleslim.prune.io import *
+
+def parse_args():
+    parser = argparse.ArgumentParser(description='PaddleSeg model evalution')
+    parser.add_argument(
+        '--cfg',
+        dest='cfg_file',
+        help='Config file for training (and optionally testing)',
+        default=None,
+        type=str)
+    parser.add_argument(
+        '--use_gpu',
+        dest='use_gpu',
+        help='Use gpu or cpu',
+        action='store_true',
+        default=False)
+    parser.add_argument(
+        '--use_mpio',
+        dest='use_mpio',
+        help='Use multiprocess IO or not',
+        action='store_true',
+        default=False)
+    parser.add_argument(
+        'opts',
+        help='See utils/config.py for all options',
+        default=None,
+        nargs=argparse.REMAINDER)
+    if len(sys.argv) == 1:
+        parser.print_help()
+        sys.exit(1)
+    return parser.parse_args()
+
+
+def evaluate(cfg, ckpt_dir=None, use_gpu=False, use_mpio=False, **kwargs):
+    np.set_printoptions(precision=5, suppress=True)
+
+    startup_prog = fluid.Program()
+    test_prog = fluid.Program()
+    dataset = SegDataset(
+        file_list=cfg.DATASET.VAL_FILE_LIST,
+        mode=ModelPhase.EVAL,
+        data_dir=cfg.DATASET.DATA_DIR)
+
+    def data_generator():
+        #TODO: check is batch reader compatitable with Windows
+        if use_mpio:
+            data_gen = dataset.multiprocess_generator(
+                num_processes=cfg.DATALOADER.NUM_WORKERS,
+                max_queue_size=cfg.DATALOADER.BUF_SIZE)
+        else:
+            data_gen = dataset.generator()
+
+        for b in data_gen:
+            yield b[0], b[1], b[2]
+
+    py_reader, avg_loss, pred, grts, masks = build_model(
+        test_prog, startup_prog, phase=ModelPhase.EVAL)
+
+    py_reader.decorate_sample_generator(
+        data_generator, drop_last=False, batch_size=cfg.BATCH_SIZE)
+
+    # Get device environment
+    places = fluid.cuda_places() if use_gpu else fluid.cpu_places()
+    place = places[0]
+    dev_count = len(places)
+    print("#Device count: {}".format(dev_count))
+
+    exe = fluid.Executor(place)
+    exe.run(startup_prog)
+
+    test_prog = test_prog.clone(for_test=True)
+
+    ckpt_dir = cfg.TEST.TEST_MODEL if not ckpt_dir else ckpt_dir
+
+    if not os.path.exists(ckpt_dir):
+        raise ValueError('The TEST.TEST_MODEL {} is not found'.format(ckpt_dir))
+
+    if ckpt_dir is not None:
+        print('load test model:', ckpt_dir)
+        load_model(exe, test_prog, ckpt_dir)
+        #fluid.io.load_params(exe, ckpt_dir, main_program=test_prog)
+
+    # Use streaming confusion matrix to calculate mean_iou
+    np.set_printoptions(
+        precision=4, suppress=True, linewidth=160, floatmode="fixed")
+    conf_mat = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
+    fetch_list = [avg_loss.name, pred.name, grts.name, masks.name]
+    num_images = 0
+    step = 0
+    all_step = cfg.DATASET.TEST_TOTAL_IMAGES // cfg.BATCH_SIZE + 1
+    timer = Timer()
+    timer.start()
+    py_reader.start()
+    while True:
+        try:
+            step += 1
+            loss, pred, grts, masks = exe.run(
+                test_prog, fetch_list=fetch_list, return_numpy=True)
+
+            loss = np.mean(np.array(loss))
+
+            num_images += pred.shape[0]
+            conf_mat.calculate(pred, grts, masks)
+            _, iou = conf_mat.mean_iou()
+            _, acc = conf_mat.accuracy()
+
+            speed = 1.0 / timer.elapsed_time()
+
+            print(
+                "[EVAL]step={} loss={:.5f} acc={:.4f} IoU={:.4f} step/sec={:.2f} | ETA {}"
+                .format(step, loss, acc, iou, speed,
+                        calculate_eta(all_step - step, speed)))
+            timer.restart()
+            sys.stdout.flush()
+        except fluid.core.EOFException:
+            break
+
+    category_iou, avg_iou = conf_mat.mean_iou()
+    category_acc, avg_acc = conf_mat.accuracy()
+    print("[EVAL]#image={} acc={:.4f} IoU={:.4f}".format(
+        num_images, avg_acc, avg_iou))
+    print("[EVAL]Category IoU:", category_iou)
+    print("[EVAL]Category Acc:", category_acc)
+    print("[EVAL]Kappa:{:.4f}".format(conf_mat.kappa()))
+
+    return category_iou, avg_iou, category_acc, avg_acc
+
+
+def main():
+    args = parse_args()
+    if args.cfg_file is not None:
+        cfg.update_from_file(args.cfg_file)
+    if args.opts:
+        cfg.update_from_list(args.opts)
+    cfg.check_and_infer()
+    print(pprint.pformat(cfg))
+    evaluate(cfg, **args.__dict__)
+
+
+if __name__ == '__main__':
+    main()
--- a/slim/prune/train_prune.py
+++ b/slim/prune/train_prune.py