First commit

951d342d · wuzewu · f8e28e8b · 951d342d · 951d342d · 951d342d
183 changed file
--- a/README.md
+++ b/README.md
-# PaddleSeg
+# PaddleSeg 语义分割库
\ No newline at end of file
+## 简介
+PaddleSeg是基于[PaddlePaddle](https://www.paddlepaddle.org.cn)开发的语义分割库，覆盖了DeepLabv3+, U-Net, ICNet三类主流的分割模型。通过统一的配置，帮助用户更便捷地完成从训练到部署的全流程图像分割应用。
+具备高性能、丰富的数据增强、工业级部署、全流程应用的特点。
+- **丰富的数据增强**
+  - 基于百度视觉技术部的实际业务经验，内置10+种数据增强策略，可结合实际业务场景进行定制组合，提升模型泛化能力和鲁棒性。
+- **主流模型覆盖**
+  - 支持U-Net, DeepLabv3+, ICNet三类主流分割网络，结合预训练模型和可调节的骨干网络，满足不同性能和精度的要求。
+- **高性能**
+  - PaddleSeg支持多进程IO、多卡并行、多卡Batch Norm, FP16混合精度等训练加速策略，通过飞桨核心框架的显存优化算法，可以大幅度节约分割模型的显存开销，更快完成分割模型训练。
+- **工业级部署**
+  - 基于[Paddle Serving](https://github.com/PaddlePaddle/Serving)和PaddlePaddle高性能预测引擎, 结合百度开放的AI能力，轻松搭建人像分割和车道线分割服务。
+更多模型信息与技术细节请查看[模型介绍](./docs/models.md)和[预训练模型](./docs/mode_zoo.md)
+## AI Studio教程
+### 快速开始
+通过 [PaddleSeg人像分割](https://aistudio.baidu.com/aistudio/projectDetail/100798) 教程可快速体验PaddleSeg人像分割模型的效果。
+### 入门教程
+入门教程以经典的U-Net模型为例, 结合Oxford-IIIT宠物数据集，快速熟悉PaddleSeg使用流程, 详情请点击[U-Net宠物分割](https://aistudio.baidu.com/aistudio/projectDetail/102889)。
+### 高级教程
+高级教程以DeepLabv3+模型为例，结合Cityscapes数据集，快速了解ASPP, Backbone网络切换，多卡Batch Norm同步等策略，详情请点击[DeepLabv3+图像分割](https://aistudio.baidu.com/aistudio/projectDetail/101696)。
+### 垂类模型
+更多特色垂类分割模型如LIP人体部件分割、人像分割、车道线分割模型可以参考[contrib](./contrib/README.md)
+## 使用文档
+* [安装说明](./docs/installation.md)
+* [数据准备](./docs/data_prepare.md)
+* [数据增强](./docs/data_aug.md)
+* [预训练模型](./docs/model_zoo.md)
+* [训练/评估/预测(可视化)](./docs/usage.md)
+* [预测库集成](./inference/README.md)
+* [服务端部署](./serving/README.md)
+* [垂类分割模型](./contrib/README.md)
+## FAQ
+#### Q:图像分割的数据增强如何配置，unpadding, step scaling, range scaling的原理是什么？
+A:数据增强的配置可以参考文档[数据增强](./docs/data_aug.md)
+#### Q: 预测时图片过大，导致显存不足如何处理？
+A: 降低Batch size，使用Group Norm策略等。
+## 更新日志
+### 2019.08.25
+#### v0.1.0
+* PaddleSeg分割库初始版本发布，包含DeepLabv3+, U-Net, ICNet三类分割模型, 其中DeepLabv3+支持Xception, MobileNet两种可调节的骨干网络。
+* CVPR 19' LIP人体部件分割比赛冠军预测模型发布[ACE2P](./contrib/ACE2P)
+* 预置基于DeepLabv3+网络的[人像分割](./contrib/HumanSeg/)和[车道线分割](./contrib/RoadLine)预测模型发布
+## 如何贡献代码
+我们非常欢迎您为PaddleSeg贡献代码或者提供使用建议。
--- a/configs/cityscape.yaml
+++ b/configs/cityscape.yaml
+EVAL_CROP_SIZE: (2049, 1025) # (width, height), for unpadding rangescaling and stepscaling
+TRAIN_CROP_SIZE: (769, 769) # (width, height), for unpadding rangescaling and stepscaling
+AUG:
+    AUG_METHOD: "stepscaling" # choice unpadding rangescaling and stepscaling
+    FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding
+    INF_RESIZE_VALUE: 500  # for rangescaling
+    MAX_RESIZE_VALUE: 600  # for rangescaling
+    MIN_RESIZE_VALUE: 400  # for rangescaling
+    MAX_SCALE_FACTOR: 2.0  # for stepscaling
+    MIN_SCALE_FACTOR: 0.5  # for stepscaling
+    SCALE_STEP_SIZE: 0.25  # for stepscaling
+    MIRROR: True
+    RICH_CROP:
+        ENABLE: False
+        ASPECT_RATIO: 0.33
+        BLUR: True
+        BLUR_RATIO: 0.1
+        FLIP: True
+        FLIP_RATIO: 0.2
+        MAX_ROTATION: 15
+        MIN_AREA_RATIO: 0.5
+        BRIGHTNESS_JITTER_RATIO: 0.5
+        CONTRAST_JITTER_RATIO: 0.5
+        SATURATION_JITTER_RATIO: 0.5
+BATCH_SIZE: 4
+MEAN: [0.5, 0.5, 0.5]
+STD: [0.5, 0.5, 0.5]
+DATASET:
+    DATA_DIR: "./dataset/cityscapes/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 19
+    TEST_FILE_LIST: "dataset/cityscapes/val.list"
+    TRAIN_FILE_LIST: "dataset/cityscapes/train.list"
+    VAL_FILE_LIST: "dataset/cityscapes/val.list"
+    IGNORE_INDEX: 255
+FREEZE:
+    MODEL_FILENAME: "model"
+    PARAMS_FILENAME: "params"
+MODEL:
+    DEFAULT_NORM_TYPE: "gn"
+    MODEL_NAME: "deeplabv3p"
+    DEEPLAB:
+        ASPP_WITH_SEP_CONV: True
+        DECODER_USE_SEP_CONV: True
+TEST:
+    TEST_MODEL: "snapshots/cityscape_v5/final/"
+TRAIN:
+    MODEL_SAVE_DIR: "snapshots/cityscape_v7/"
+    PRETRAINED_MODEL: u"pretrain/deeplabv3plus_gn_init"
+    RESUME: False
+    SNAPSHOT_EPOCH: 10
+SOLVER:
+    LR: 0.001
+    LR_POLICY: "poly"
+    OPTIMIZER: "sgd"
+    NUM_EPOCHS: 700
--- a/configs/coco.yaml
+++ b/configs/coco.yaml
+EVAL_CROP_SIZE: (513, 513) # (width, height), for unpadding rangescaling and stepscaling
+TRAIN_CROP_SIZE: (513, 513) # (width, height), for unpadding rangescaling and stepscaling
+AUG:
+    AUG_METHOD: u"stepscaling" # choice unpadding rangescaling and stepscaling
+    FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding
+    INF_RESIZE_VALUE: 500  # for rangescaling
+    MAX_RESIZE_VALUE: 600  # for rangescaling
+    MIN_RESIZE_VALUE: 400  # for rangescaling
+    MAX_SCALE_FACTOR: 2.0  # for stepscaling
+    MIN_SCALE_FACTOR: 0.5  # for stepscaling
+    SCALE_STEP_SIZE: 0.25  # for stepscaling
+    MIRROR: True
+    RICH_CROP:
+        ENABLE: False
+        ASPECT_RATIO: 0.33
+        BLUR: True
+        BLUR_RATIO: 0.1
+        FLIP: True
+        FLIP_RATIO: 0.2
+        MAX_ROTATION: 15
+        MIN_AREA_RATIO: 0.5
+        BRIGHTNESS_JITTER_RATIO: 0.5
+        CONTRAST_JITTER_RATIO: 0.5
+        SATURATION_JITTER_RATIO: 0.5
+BATCH_SIZE: 8
+MEAN: [104.008, 116.669, 122.675]
+STD: [1.0, 1.0, 1.0]
+DATASET:
+    DATA_DIR: "./data/COCO2014/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 21
+    TEST_FILE_LIST: "data/COCO2014/VOC_ImageSets/val.txt"
+    TRAIN_FILE_LIST: "data/COCO2014/ImageSets/train.txt"
+    VAL_FILE_LIST: "data/COCO2014/VOC_ImageSets/val.txt"
+    SEPARATOR: " "
+    IGNORE_INDEX: 255
+FREEZE:
+    MODEL_FILENAME: "model"
+    PARAMS_FILENAME: "params"
+MODEL:
+    DEFAULT_NORM_TYPE: "bn"
+    MODEL_NAME: "deeplabv3p"
+TEST:
+    TEST_MODEL: "snapshots/coco_v1/final"
+TRAIN:
+    MODEL_SAVE_DIR: "snapshots/coco_v1/"
+    PRETRAINED_MODEL: "pretrain/xception65_pretrained/"
+    RESUME: False
+    SNAPSHOT_EPOCH: 5
+SOLVER:
+    LR: 0.007
+    WEIGHT_DECAY: 0.00004
+    NUM_EPOCHS: 40
+    LR_POLICY: "poly"
+    OPTIMIZER: "SGD"
--- a/configs/humanseg.yaml
+++ b/configs/humanseg.yaml
+TRAIN_CROP_SIZE: (513, 513) # (width, height), for unpadding rangescaling and stepscaling
+EVAL_CROP_SIZE: (513, 513) # (width, height), for unpadding rangescaling and stepscaling
+AUG:
+    AUG_METHOD: u"unpadding" # choice unpadding rangescaling and stepscaling
+    FIX_RESIZE_SIZE: (513, 513) # (width, height), for unpadding
+    INF_RESIZE_VALUE: 513  # for rangescaling
+    MAX_RESIZE_VALUE: 400  # for rangescaling
+    MIN_RESIZE_VALUE: 513  # for rangescaling
+    MAX_SCALE_FACTOR: 2.0  # for stepscaling
+    MIN_SCALE_FACTOR: 0.5  # for stepscaling
+    SCALE_STEP_SIZE: 0.25  # for stepscaling
+    MIRROR: True
+    RICH_CROP:
+        ENABLE: True
+        ASPECT_RATIO: 0
+        BLUR: True
+        BLUR_RATIO: 0.1
+        FLIP: True
+        FLIP_RATIO: 0.2
+        MAX_ROTATION: 45
+        MIN_AREA_RATIO: 0
+        BRIGHTNESS_JITTER_RATIO: 0.5
+        CONTRAST_JITTER_RATIO: 0.5
+        SATURATION_JITTER_RATIO: 0.5
+BATCH_SIZE: 24
+MEAN: [104.008, 116.669, 122.675]
+STD: [1.0, 1.0, 1.0]
+DATASET:
+    DATA_DIR: u"./data/humanseg/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 2
+    TEST_FILE_LIST: u"data/humanseg/list/val.txt"
+    TRAIN_FILE_LIST: u"data/humanseg/list/train.txt"
+    VAL_FILE_LIST: u"data/humanseg/list/val.txt"
+    IGNORE_INDEX: 255
+    SEPARATOR: "|"
+FREEZE:
+    MODEL_FILENAME: u"model"
+    PARAMS_FILENAME: u"params"
+    SAVE_DIR: u"human_freeze_model"
+MODEL:
+    DEFAULT_NORM_TYPE: u"bn"
+    MODEL_NAME: "deeplabv3p"
+    DEEPLAB:
+        BACKBONE: "xception_65"
+TEST:
+    TEST_MODEL: "snapshots/humanseg/aic_v2/final/"
+TRAIN:
+    MODEL_SAVE_DIR: "snapshots/humanseg/aic_v2/"
+    PRETRAINED_MODEL: u"pretrain/xception65_pretrained/"
+    RESUME: False
+    SNAPSHOT_EPOCH: 5
+SOLVER:
+    LR: 0.1
+    NUM_EPOCHS: 40
+    LR_POLICY: "poly"
+    OPTIMIZER: "sgd"
--- a/configs/line.yaml
+++ b/configs/line.yaml
+EVAL_CROP_SIZE: (1536, 576) # (width, height), for unpadding rangescaling and stepscaling
+TRAIN_CROP_SIZE: (1536, 576) # (width, height), for unpadding rangescaling and stepscaling
+AUG:
+    AUG_METHOD: u"unpadding" # choice unpadding rangescaling and stepscaling
+    FIX_RESIZE_SIZE: (1536, 576) # (width, height), for unpadding
+    INF_RESIZE_VALUE: 1280  # for rangescaling
+    MAX_RESIZE_VALUE: 1024  # for rangescaling
+    MIN_RESIZE_VALUE: 1536 # for rangescaling
+    MAX_SCALE_FACTOR: 2.0  # for stepscaling
+    MIN_SCALE_FACTOR: 0.5  # for stepscaling
+    SCALE_STEP_SIZE: 0.25  # for stepscaling
+    MIRROR: True
+    RICH_CROP:
+        ENABLE: False
+        ASPECT_RATIO: 0.33
+        BLUR: True
+        BLUR_RATIO: 0.1
+        FLIP: True
+        FLIP_RATIO: 0.2
+        MAX_ROTATION: 15
+        MIN_AREA_RATIO: 0.5
+        BRIGHTNESS_JITTER_RATIO: 0.5
+        CONTRAST_JITTER_RATIO: 0.5
+        SATURATION_JITTER_RATIO: 0.5
+BATCH_SIZE: 1
+MEAN: [127.5, 127.5, 127.5]
+STD: [127.5, 127.5, 127.5]
+DATASET:
+    DATA_DIR: "./data/line/L4_lane_mask_dataset_app/L4_360_0_2class/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 2
+    TEST_FILE_LIST: "data/line/L4_lane_mask_dataset_app/L4_360_0_2class/val.txt"
+    TRAIN_FILE_LIST: "data/line/L4_lane_mask_dataset_app/L4_360_0_2class/train.txt"
+    VAL_FILE_LIST: "data/line/L4_lane_mask_dataset_app/L4_360_0_2class/val.txt"
+    SEPARATOR: " "
+    IGNORE_INDEX: 255
+FREEZE:
+    MODEL_FILENAME: "__model__"
+    PARAMS_FILENAME: "__params__"
+    SAVE_DIR: "line_freeze_model"
+MODEL:
+    DEFAULT_NORM_TYPE: "bn"
+    MODEL_NAME: "deeplabv3p"
+    DEEPLAB:
+        BACKBONE: "mobilenet"
+TEST:
+    TEST_MODEL: "snapshots/line_v4/final/"
+TRAIN:
+    MODEL_SAVE_DIR: "snapshots/line_v4/"
+    PRETRAINED_MODEL: u"pretrain/MobileNetV2_pretrained/"
+    RESUME: False
+    SNAPSHOT_EPOCH: 10
+SOLVER:
+    LR: 0.01
+    LR_POLICY: "poly"
+    OPTIMIZER: "sgd"
+    NUM_EPOCHS: 40
--- a/configs/unet_pet.yaml
+++ b/configs/unet_pet.yaml
+TRAIN_CROP_SIZE: (512, 512) # (width, height), for unpadding rangescaling and stepscaling
+EVAL_CROP_SIZE: (512, 512) # (width, height), for unpadding rangescaling and stepscaling
+AUG:
+    AUG_METHOD: "unpadding" # choice unpadding rangescaling and stepscaling
+    FIX_RESIZE_SIZE: (512, 512) # (width, height), for unpadding
+    INF_RESIZE_VALUE: 500  # for rangescaling
+    MAX_RESIZE_VALUE: 600  # for rangescaling
+    MIN_RESIZE_VALUE: 400  # for rangescaling
+    MAX_SCALE_FACTOR: 1.25  # for stepscaling
+    MIN_SCALE_FACTOR: 0.75  # for stepscaling
+    SCALE_STEP_SIZE: 0.25  # for stepscaling
+    MIRROR: True
+    RICH_CROP:
+        ENABLE: False
+        ASPECT_RATIO: 0.33
+        BLUR: True
+        BLUR_RATIO: 0.1
+        FLIP: True
+        FLIP_RATIO: 0.2
+        MAX_ROTATION: 15
+        MIN_AREA_RATIO: 0.5
+        BRIGHTNESS_JITTER_RATIO: 0.5
+        CONTRAST_JITTER_RATIO: 0.5
+        SATURATION_JITTER_RATIO: 0.5
+BATCH_SIZE: 4
+MEAN: [104.008, 116.669, 122.675]
+STD: [1.0, 1.0, 1.0]
+DATASET:
+    DATA_DIR: "./dataset/mini_pet/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 3
+    TEST_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
+    TRAIN_FILE_LIST: "./dataset/mini_pet/file_list/train_list.txt"
+    VAL_FILE_LIST: "./dataset/mini_pet/file_list/val_list.txt"
+    VIS_FILE_LIST: "./dataset/mini_pet/file_list/test_list.txt"
+    IGNORE_INDEX: 255
+    SEPARATOR: " "
+FREEZE:
+    MODEL_FILENAME: "__model__"
+    PARAMS_FILENAME: "__params__"
+MODEL:
+    MODEL_NAME: "unet"
+    DEFAULT_NORM_TYPE: "bn"
+TEST:
+    TEST_MODEL: "./test/saved_model/unet_pet/final/"
+TRAIN:
+    MODEL_SAVE_DIR: "./test/saved_models/unet_pet/"
+    PRETRAINED_MODEL: "./test/models/unet_coco/"
+    RESUME: False
+    SNAPSHOT_EPOCH: 10
+SOLVER:
+    NUM_EPOCHS: 500
+    LR: 0.005
+    LR_POLICY: "poly"
+    OPTIMIZER: "adam"
--- a/contrib/ACE2P/README.md
+++ b/contrib/ACE2P/README.md
+# Augmented Context Embedding with Edge Perceiving(ACE2P)
+- 类别:  图像-语义分割
+- 网络:  ACE2P
+- 数据集:  LIP
+## 模型概述
+人体解析(Human Parsing)是细粒度的语义分割任务，旨在识别像素级别的人类图像的组成部分（例如，身体部位和服装）。ACE2P通过融合底层特征、全局上下文信息和边缘细节，
+端到端训练学习人体解析任务。以ACE2P单人人体解析网络为基础的解决方案在CVPR2019第三届LIP挑战赛中赢得了全部三个人体解析任务的第一名
+## 模型框架图
+![](imgs/net.jpg)
+## 模型细节
+ACE2P模型包含三个分支:
+* 语义分割分支
+* 边缘检测分支
+* 融合分支
+语义分割分支采用resnet101作为backbone,通过Pyramid Scene Parsing Network融合上下文信息以获得更加精确的特征表征
+边缘检测分支采用backbone的中间层特征作为输入，预测二值边缘信息
+融合分支将语义分割分支以及边缘检测分支的特征进行融合，以获得边缘细节更加准确的分割图像。
+分割问题一般采用mIoU作为评价指标，特别引入了IoU loss结合cross-entropy loss以针对性优化这一指标
+测试阶段，采用多尺度以及水平翻转的结果进行融合生成最终预测结果
+训练阶段，采用余弦退火的学习率策略， 并且在学习初始阶段采用线性warm up
+数据预处理方面，保持图片比例并进行随机缩放，随机旋转，水平翻转作为数据增强策略
+## LIP指标
+该模型在测试尺度为'377,377,473,473,567,567'且水平翻转的情况下，meanIoU为62.63
+多模型ensemble后meanIoU为65.18, 居LIP Single-Person Human Parsing Track榜单第一
+## 模型预测效果展示
+![](imgs/result.jpg)
+## 引用
+**论文** 
+*Devil in the Details: Towards Accurate Single and Multiple Human Parsing* https://arxiv.org/abs/1809.05996
+**代码**
+https://github.com/Microsoft/human-pose-estimation.pytorch 
+https://github.com/liutinglt/CE2P
--- a/contrib/ACE2P/__init__.py
+++ b/contrib/ACE2P/__init__.py
--- a/contrib/ACE2P/config.py
+++ b/contrib/ACE2P/config.py
+# -*- coding: utf-8 -*-
+from utils.util import AttrDict, merge_cfg_from_args, get_arguments
+import os
+args = get_arguments()
+cfg = AttrDict()
+# 待预测图像所在路径
+cfg.data_dir = os.path.join(args.example , "data", "testing_images")
+# 待预测图像名称列表
+cfg.data_list_file = os.path.join(args.example , "data", "test_id.txt")
+# 模型加载路径
+cfg.model_path = os.path.join(args.example , "ACE2P")
+# 预测结果保存路径
+cfg.vis_dir = os.path.join(args.example , "result")
+# 预测类别数
+cfg.class_num = 20
+# 均值, 图像预处理减去的均值
+cfg.MEAN = 0.406, 0.456, 0.485
+# 标准差，图像预处理除以标准差
+cfg.STD =  0.225, 0.224, 0.229
+# 多尺度预测时图像尺寸
+cfg.multi_scales = (377,377), (473,473), (567,567)
+# 多尺度预测时图像是否水平翻转
+cfg.flip = True
+merge_cfg_from_args(args, cfg)
--- a/contrib/ACE2P/imgs/net.jpg
+++ b/contrib/ACE2P/imgs/net.jpg
--- a/contrib/ACE2P/imgs/result.jpg
+++ b/contrib/ACE2P/imgs/result.jpg
--- a/contrib/ACE2P/reader.py
+++ b/contrib/ACE2P/reader.py
+# -*- coding: utf-8 -*- 
+import numpy as np
+import paddle.fluid as fluid
+from ACE2P.config import cfg
+import cv2
+def get_affine_points(src_shape, dst_shape, rot_grad=0):
+    # 获取图像和仿射后图像的三组对应点坐标
+    # 三组点为仿射变换后图像的中心点, [w/2,0], [0,0]，及对应原始图像的点
+    if dst_shape[0] == 0 or dst_shape[1] == 0:
+        raise Exception('scale shape should not be 0')
+    # 旋转角度
+    rotation = rot_grad * np.pi / 180.0
+    sin_v = np.sin(rotation)
+    cos_v = np.cos(rotation)
+    dst_ratio = float(dst_shape[0]) / dst_shape[1]
+    h, w = src_shape
+    src_ratio = float(h) / w if w != 0 else 0
+    affine_shape = [h, h * dst_ratio] if src_ratio > dst_ratio \
+                    else [w / dst_ratio, w]
+    # 原始图像三组点
+    points = [[0, 0]] * 3
+    points[0] = (np.array([w, h]) - 1) * 0.5 
+    points[1] = points[0] + 0.5 * affine_shape[0] * np.array([sin_v, -cos_v])
+    points[2] = points[1] - 0.5 * affine_shape[1] * np.array([cos_v, sin_v])
+    # 仿射变换后图三组点
+    points_trans = [[0, 0]] * 3
+    points_trans[0] = (np.array(dst_shape[::-1]) - 1) * 0.5
+    points_trans[1] = [points_trans[0][0], 0]
+    return points, points_trans
+def preprocess(im):
+    # ACE2P模型数据预处理
+    im_shape = im.shape[:2]
+    input_images = []
+    for i, scale in enumerate(cfg.multi_scales):
+        # 获取图像和仿射变换后图像的对应点坐标
+        points, points_trans = get_affine_points(im_shape, scale)
+        # 根据对应点集获得仿射矩阵
+        trans = cv2.getAffineTransform(np.float32(points),
+                                       np.float32(points_trans))
+        # 根据仿射矩阵对图像进行仿射
+        input = cv2.warpAffine(im,
+                               trans,
+                               scale[::-1],
+                               flags=cv2.INTER_LINEAR)
+        # 减均值测，除以方差，转换数据格式为NCHW
+        input = input.astype(np.float32)
+        input = (input / 255. - np.array(cfg.MEAN)) / np.array(cfg.STD)
+        input = input.transpose(2, 0, 1).astype(np.float32)
+        input = np.expand_dims(input, 0)
+        # 水平翻转
+        if cfg.flip:
+            flip_input = input[:, :, :, ::-1]
+            input_images.append(np.vstack((input, flip_input)))
+        else:
+            input_images.append(input)
+    return input_images
+def multi_scale_test(exe, test_prog, feed_name, fetch_list,
+                        input_ims, im_shape):
+    # 由于部分类别分左右部位, flipped_idx为其水平翻转后对应的标签
+    flipped_idx = (15, 14, 17, 16, 19, 18)
+    ms_outputs = []
+    # 多尺度预测
+    for idx, scale in enumerate(cfg.multi_scales):
+        input_im = input_ims[idx]
+        parsing_output = exe.run(program=test_prog,
+                                 feed={feed_name[0]: input_im},
+                                 fetch_list=fetch_list)
+        output = parsing_output[0][0]
+        if cfg.flip:
+            # 若水平翻转，对部分类别进行翻转，与原始预测结果取均值
+            flipped_output = parsing_output[0][1]
+            flipped_output[14:20, :, :] = flipped_output[flipped_idx, :, :]
+            flipped_output = flipped_output[:, :, ::-1]
+            output += flipped_output
+            output *= 0.5
+        output = np.transpose(output, [1, 2, 0])
+        # 仿射变换回图像原始尺寸
+        points, points_trans = get_affine_points(im_shape, scale)
+        M = cv2.getAffineTransform(np.float32(points_trans), np.float32(points))
+        logits_result = cv2.warpAffine(output, M, im_shape[::-1], flags=cv2.INTER_LINEAR)
+        ms_outputs.append(logits_result)
+    # 多尺度预测结果求均值，求预测概率最大的类别
+    ms_fused_parsing_output = np.stack(ms_outputs)
+    ms_fused_parsing_output = np.mean(ms_fused_parsing_output, axis=0)
+    parsing = np.argmax(ms_fused_parsing_output, axis=2)
+    return parsing, ms_fused_parsing_output
--- a/contrib/HumanSeg/__init__.py
+++ b/contrib/HumanSeg/__init__.py
--- a/contrib/HumanSeg/config.py
+++ b/contrib/HumanSeg/config.py
+# -*- coding: utf-8 -*-
+from utils.util import AttrDict, get_arguments, merge_cfg_from_args
+import os
+args = get_arguments()
+cfg = AttrDict()
+# 待预测图像所在路径
+cfg.data_dir = os.path.join(args.example , "data", "test_images")
+# 待预测图像名称列表
+cfg.data_list_file = os.path.join(args.example , "data", "test.txt")
+# 模型加载路径
+cfg.model_path = os.path.join(args.example , "model")
+# 预测结果保存路径
+cfg.vis_dir = os.path.join(args.example , "result")
+# 预测类别数
+cfg.class_num = 2
+# 均值, 图像预处理减去的均值
+cfg.MEAN = 104.008, 116.669, 122.675
+# 标准差，图像预处理除以标准差
+cfg.STD =  1.0, 1.0, 1.0
+# 待预测图像输入尺寸
+cfg.input_size = 513, 513
+merge_cfg_from_args(args, cfg)
--- a/contrib/README.md
+++ b/contrib/README.md
+# PaddleSeg 特色垂类分割模型
+提供基于PaddlePaddle最新的分割特色模型
+## Augmented Context Embedding with Edge Perceiving (ACE2P)
+### 1. 模型概述
+CVPR 19 Look into Person (LIP) 单人人像分割比赛冠军模型，详见[ACE2P/README](http://gitlab.baidu.com/Paddle/PaddleSeg/tree/master/contrib/ACE2P)
+### 2. 模型下载
+点击[链接](https://paddleseg.bj.bcebos.com/models/ACE2P.tgz)，下载, 在contrib/ACE2P下解压, `tar -xzf ACE2P.tgz`
+### 3. 数据下载
+前往LIP数据集官网: http://47.100.21.47:9999/overview.php 或点击 [Baidu_Drive](https://pan.baidu.com/s/1nvqmZBN#list/path=%2Fsharelink2787269280-523292635003760%2FLIP%2FLIP&parentPath=%2Fsharelink2787269280-523292635003760), 
+加载Testing_images.zip, 解压到contrib/ACE2P/data文件夹下
+### 4. 运行
+**NOTE:** 运行该模型需要需至少2.5G显存
+使用GPU预测
+```
+python -u infer.py --example ACE2P --use_gpu
+```
+使用CPU预测：
+```
+python -u infer.py --example ACE2P
+```
+## 人像分割 (HumanSeg)
+### 1. 模型结构
+DeepLabv3+ backbone为Xception65
+### 2. 下载模型和数据
+点击[链接](https://paddleseg.bj.bcebos.com/models/HumanSeg.tgz)，下载解压到contrib文件夹下
+### 3. 运行
+使用GPU预测：
+```
+python -u infer.py --example HumanSeg --use_gpu
+```
+使用CPU预测：
+```
+python -u infer.py --example HumanSeg
+```
+### 4. 预测结果示例：
+  原图：![](imgs/Human.jpg)
+  预测结果：![](imgs/HumanSeg.jpg)
+## 车道线分割 (RoadLine)
+### 1. 模型结构
+Deeplabv3+ backbone为MobileNetv2
+### 2. 下载模型和数据
+点击[链接](https://paddleseg.bj.bcebos.com/inference_model/RoadLine.tgz)，下载解压在contrib文件夹下
+### 3. 运行
+使用GPU预测：
+```
+python -u infer.py --example RoadLine --use_gpu
+```
+使用CPU预测：
+```
+python -u infer.py --example RoadLine
+```
+#### 4. 预测结果示例：
+  原图：![](imgs/RoadLine.jpg)
+  预测结果：![](imgs/RoadLine.png)
+# 备注
+1. 数据及模型路径等详细配置见ACE2P/HumanSeg/RoadLine下的config.py文件
+2. ACE2P模型需预留2G显存，若显存超可调小FLAGS_fraction_of_gpu_memory_to_use
--- a/contrib/RoadLine/__init__.py
+++ b/contrib/RoadLine/__init__.py
--- a/contrib/RoadLine/config.py
+++ b/contrib/RoadLine/config.py
+# -*- coding: utf-8 -*-
+from utils.util import AttrDict, merge_cfg_from_args, get_arguments
+import os
+args = get_arguments()
+cfg = AttrDict()
+# 待预测图像所在路径
+cfg.data_dir = os.path.join(args.example , "data", "test_images")
+# 待预测图像名称列表
+cfg.data_list_file = os.path.join(args.example , "data", "test.txt")
+# 模型加载路径
+cfg.model_path = os.path.join(args.example , "model")
+# 预测结果保存路径
+cfg.vis_dir = os.path.join(args.example , "result")
+# 预测类别数
+cfg.class_num = 2
+# 均值, 图像预处理减去的均值
+cfg.MEAN = 127.5, 127.5, 127.5
+# 标准差，图像预处理除以标准差
+cfg.STD =  127.5, 127.5, 127.5
+# 待预测图像输入尺寸
+cfg.input_size = 1536, 576
+merge_cfg_from_args(args, cfg)
--- a/contrib/imgs/Human.jpg
+++ b/contrib/imgs/Human.jpg
--- a/contrib/imgs/HumanSeg.jpg
+++ b/contrib/imgs/HumanSeg.jpg
--- a/contrib/imgs/RoadLine.jpg
+++ b/contrib/imgs/RoadLine.jpg
--- a/contrib/imgs/RoadLine.png
+++ b/contrib/imgs/RoadLine.png
--- a/contrib/infer.py
+++ b/contrib/infer.py
+# -*- coding: utf-8 -*-
+import os
+import cv2
+import numpy as np
+from utils.util import get_arguments
+from utils.palette import get_palette
+from PIL import Image as PILImage
+import importlib
+args = get_arguments()
+config = importlib.import_module(args.example+'.config')
+cfg = getattr(config, 'cfg')
+# paddle垃圾回收策略FLAG，ACE2P模型较大，当显存不够时建议开启
+os.environ['FLAGS_eager_delete_tensor_gb']='0.0'
+import paddle.fluid as fluid
+# 预测数据集类
+class TestDataSet():
+    def __init__(self):
+        self.data_dir = cfg.data_dir 
+        self.data_list_file = cfg.data_list_file
+        self.data_list = self.get_data_list()
+        self.data_num = len(self.data_list)
+    def get_data_list(self):
+        # 获取预测图像路径列表
+        data_list = []
+        data_file_handler = open(self.data_list_file, 'r')
+        for line in data_file_handler:
+            img_name = line.strip()
+            name_prefix = img_name.split('.')[0]
+            if len(img_name.split('.')) == 1:
+                img_name = img_name + '.jpg'
+            img_path = os.path.join(self.data_dir, img_name)
+            data_list.append(img_path)
+        return data_list
+    def preprocess(self, img):
+        # 图像预处理
+        if cfg.example == 'ACE2P':
+            reader = importlib.import_module(args.example+'.reader')
+            ACE2P_preprocess = getattr(reader, 'preprocess')
+            img = ACE2P_preprocess(img)
+        else:
+            img = cv2.resize(img, cfg.input_size).astype(np.float32)
+            img -= np.array(cfg.MEAN)
+            img /= np.array(cfg.STD)
+            img = img.transpose((2, 0, 1))
+            img = np.expand_dims(img, axis=0)
+        return img
+    def get_data(self, index):
+        # 获取图像信息
+        img_path = self.data_list[index]
+        img = cv2.imread(img_path, cv2.IMREAD_COLOR)
+        if img is None:
+            return img, img,img_path, None
+        img_name = img_path.split(os.sep)[-1]
+        name_prefix = img_name.replace('.'+img_name.split('.')[-1],'')
+        img_shape = img.shape[:2]
+        img_process = self.preprocess(img)
+        return img, img_process, name_prefix, img_shape
+def infer():
+    if not os.path.exists(cfg.vis_dir):
+        os.makedirs(cfg.vis_dir)
+    palette = get_palette(cfg.class_num)
+    # 人像分割结果显示阈值
+    thresh = 120
+    place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    # 加载预测模型
+    test_prog, feed_name, fetch_list = fluid.io.load_inference_model(
+        dirname=cfg.model_path, executor=exe, params_filename='__params__')
+    #加载预测数据集
+    test_dataset = TestDataSet()
+    data_num = test_dataset.data_num
+    for idx in range(data_num):
+        # 数据获取
+        ori_img, image, im_name, im_shape = test_dataset.get_data(idx)
+        if image is None:
+            print(im_name, 'is None')
+            continue
+        # 预测
+        if cfg.example == 'ACE2P':
+            # ACE2P模型使用多尺度预测
+            reader = importlib.import_module(args.example+'.reader')
+            multi_scale_test = getattr(reader, 'multi_scale_test')
+            parsing, logits = multi_scale_test(exe, test_prog, feed_name, fetch_list, image, im_shape)
+        else:
+            # HumanSeg,RoadLine模型单尺度预测
+            result = exe.run(program=test_prog, feed={feed_name[0]: image}, fetch_list=fetch_list)
+            parsing = np.argmax(result[0][0], axis=0)
+            parsing = cv2.resize(parsing.astype(np.uint8), im_shape[::-1])
+        # 预测结果保存
+        result_path = os.path.join(cfg.vis_dir, im_name + '.png')
+        if cfg.example == 'HumanSeg':
+            logits = result[0][0][1]*255
+            logits = cv2.resize(logits, im_shape[::-1])
+            ret, logits = cv2.threshold(logits, thresh, 0, cv2.THRESH_TOZERO)
+            logits = 255 *(logits - thresh)/(255 - thresh)
+            # 将分割结果添加到alpha通道
+            rgba = np.concatenate((ori_img, np.expand_dims(logits, axis=2)), axis=2)
+            cv2.imwrite(result_path, rgba)
+        else: 
+            output_im = PILImage.fromarray(np.asarray(parsing, dtype=np.uint8))
+            output_im.putpalette(palette)
+            output_im.save(result_path)
+        if idx % 100 == 0:
+            print('%d  processd' % (idx))
+    print('%d  processd done' % (idx))   
+    return 0
+if __name__ == "__main__":
+    infer()
--- a/contrib/utils/__init__.py
+++ b/contrib/utils/__init__.py
--- a/contrib/utils/palette.py
+++ b/contrib/utils/palette.py
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: RainbowSecret
+## Microsoft Research
+## yuyua@microsoft.com
+## Copyright (c) 2018
+##
+## This source code is licensed under the MIT-style license found in the
+## LICENSE file in the root directory of this source tree
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+import cv2
+def get_palette(num_cls):
+    """ Returns the color map for visualizing the segmentation mask.
+    Args:
+        num_cls: Number of classes
+    Returns:
+        The color map
+    """
+    n = num_cls
+    palette = [0] * (n * 3)
+    for j in range(0, n):
+        lab = j
+        palette[j * 3 + 0] = 0
+        palette[j * 3 + 1] = 0
+        palette[j * 3 + 2] = 0
+        i = 0
+        while lab:
+            palette[j * 3 + 0] |= (((lab >> 0) & 1) << (7 - i))
+            palette[j * 3 + 1] |= (((lab >> 1) & 1) << (7 - i))
+            palette[j * 3 + 2] |= (((lab >> 2) & 1) << (7 - i))
+            i += 1
+            lab >>= 3
+    return palette
--- a/contrib/utils/util.py
+++ b/contrib/utils/util.py
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+import argparse
+import os
+def get_arguments():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--use_gpu",
+                        action="store_true",
+                        help="Use gpu or cpu to test.")
+    parser.add_argument('--example',
+                        type=str,
+                        help='RoadLine, HumanSeg or ACE2P')
+    return parser.parse_args()
+class AttrDict(dict):
+    def __init__(self, *args, **kwargs):
+        super(AttrDict, self).__init__(*args, **kwargs)
+    def __getattr__(self, name):
+        if name in self.__dict__:
+            return self.__dict__[name]
+        elif name in self:
+            return self[name]
+        else:
+            raise AttributeError(name)
+    def __setattr__(self, name, value):
+        if name in self.__dict__:
+            self.__dict__[name] = value
+        else:
+            self[name] = value
+def merge_cfg_from_args(args, cfg):
+    """Merge config keys, values in args into the global config."""
+    for k, v in vars(args).items():
+        d = cfg
+        try:
+            value = eval(v)
+        except:
+            value = v
+        if value is not None:
+            cfg[k] = value
--- a/docs/annotation/README.md
+++ b/docs/annotation/README.md
+# PaddleSeg 数据标注
+用户需预先采集好用于训练、评估和测试的图片，并使用数据标注工具[LabelMe](https://github.com/wkentaro/labelme)完成数据标注，最后用我们提供的数据转换脚本将LabelMe产出的数据格式转换为模型训练时所需的数据格式。
+## 1 LabelMe的安装
+用户在采集完用于训练、评估和预测的图片之后，需使用数据标注工具[LabelMe](https://github.com/wkentaro/labelme)完成数据标注。LabelMe支持在Windows/macOS/Linux三个系统上使用，且三个系统下的标注格式是一样。具体的安装流程请参见[官方安装指南](https://github.com/wkentaro/labelme)。
+## 2 LabelMe的使用
+打开终端输入`labelme`会出现LableMe的交互界面，可以先预览`LabelMe`给出的已标注好的图片，再开始标注自定义数据集。
+<div align="center">
+    <img src="./docs/imgs/annotation/image-1.png" width="600px"/>
+    <p>图1 LableMe交互界面的示意图</p>
+ </div>
+* 预览已标注图片
+获取`LabelMe`的源码：
+```
+git clone https://github.com/wkentaro/labelme
+```
+终端输入`labelme`会出现LableMe的交互界面，点击`OpenDir`打开`<path/to/labelme>/examples/semantic_segmentation/data_annotated`，其中`<path/to/labelme>`为克隆下来的`labelme`的路径，打开后示意的是语义分割的真值标注。
+<div align="center">
+    <img src="./docs/imgs/annotation/image-2.png" width="600px"/>
+    <p>图2 已标注图片的示意图</p>
+ </div>
+* 开始标注
+请按照下述步骤标注数据集：
+		(1)   点击`OpenDir`打开待标注图片所在目录，点击`Create Polygons`，沿着目标的边缘画多边形，完成后输入目标的类别。在标注过程中，如果某个点画错了，可以按撤销快捷键可撤销该点。Mac下的撤销快捷键为`command+Z`。
+<div align="center">
+    <img src="./docs/imgs/annotation/image-3.png" width="600px"/>
+    <p>图3 标注单个目标的示意图</p>
+ </div>
+		(2)   右击选择`Edit Polygons`可以整体移动多边形的位置，也可以移动某个点的位置；右击选择`Edit Label`可以修改每个目标的类别。请根据自己的需要执行这一步骤，若不需要修改，可跳过。
+<div align="center">
+    <img src="./docs/imgs/annotation/image-4-1.png" width="00px" />
+  	<img src="./docs/imgs/annotation/image-4-2.png" width="600px"/>
+    <p>图4 修改标注的示意图</p>
+ </div>
+		(3)   图片中所有目标的标注都完成后，点击`Save`保存json文件，**请将json文件和图片放在同一个文件夹里**，点击`Next Image`标注下一张图片。
+LableMe产出的真值文件可参考我们给出的文件夹`data_annotated`。
+<div align="center">
+    <img src="./docs/imgs/annotation/image-5.png" width="600px"/>
+    <p>图5 LableMe产出的真值文件的示意图</p>
+ </div>
+ ## 3 数据格式转换
+* 我们用于完成语义分割的数据集目录结构如下：
+ ```
+ my_dataset                 # 根目录 
+ |-- JPEGImages             # 数据集图片
+ |-- SegmentationClassPNG   # 数据集真值 
+ |   |-- xxx.png            # 像素级别的真值信息 
+ |   |... 
+ |-- class_names.txt        # 数据集的类别名称
+ ```
+<div align="center">
+    <img src="./docs/imgs/annotation/image-6.png" width="600px"/>
+    <p>图6 训练所需的数据集目录的结构示意图</p>
+ </div>
+* 运行转换脚本需要依赖labelme和pillow，如未安装，请先安装。Labelme的具体安装流程请参见[官方安装指南](https://github.com/wkentaro/labelme)。Pillow的安装：
+```shell
+pip install pillow
+```
+* 运行以下代码，将标注后的数据转换成满足以上格式的数据集：
+```
+  python labelme2seg.py <path/to/label_json_file> <path/to/output_dataset>
+```
+其中，`<path/to/label_json_files>`为图片以及LabelMe产出的json文件所在文件夹的目录，`<path/to/output_dataset>`为转换后的数据集所在文件夹的目录。**需注意的是：`<path/to/output_dataset>`不用预先创建，脚本运行时会自动创建，否则会报错。**
+转换得到的数据集可参考我们给出的文件夹`my_dataset`。其中，文件`class_names.txt`是数据集中所有标注类别的名称，包含背景类；文件夹`JPEGImages`保存的是数据集的图片；文件夹`SegmentationClassPNG`保存的是各图片的像素级别的真值信息，背景类`_background_`对应为0，其它目标类别从1开始递增，至多为255。
+<div align="center">
+    <img src="./docs/imgs/annotation/image-7.png" width="600px"/>
+    <p>图7 训练所需的数据集各目录的内容示意图</p>
+ </div>	
--- a/docs/annotation/data_annotated/2011_000025.jpg
+++ b/docs/annotation/data_annotated/2011_000025.jpg
--- a/docs/annotation/data_annotated/2011_000025.json
+++ b/docs/annotation/data_annotated/2011_000025.json
+{
+  "shapes": [
+    {
+      "label": "bus",
+      "line_color": null,
+      "fill_color": null,
+      "points": [
+        [
+          260.936170212766,
+          22.563829787234056
+        ],
+        [
+          193.936170212766,
+          19.563829787234056
+        ],
+        [
+          124.93617021276599,
+          39.563829787234056
+        ],
+        [
+          89.93617021276599,
+          101.56382978723406
+        ],
+        [
+          81.93617021276599,
+          150.56382978723406
+        ],
+        [
+          108.93617021276599,
+          145.56382978723406
+        ],
+        [
+          88.93617021276599,
+          244.56382978723406
+        ],
+        [
+          89.93617021276599,
+          322.56382978723406
+        ],
+        [
+          116.93617021276599,
+          367.56382978723406
+        ],
+        [
+          158.936170212766,
+          368.56382978723406
+        ],
+        [
+          165.936170212766,
+          337.56382978723406
+        ],
+        [
+          347.936170212766,
+          335.56382978723406
+        ],
+        [
+          349.936170212766,
+          369.56382978723406
+        ],
+        [
+          391.936170212766,
+          373.56382978723406
+        ],
+        [
+          403.936170212766,
+          335.56382978723406
+        ],
+        [
+          425.936170212766,
+          332.56382978723406
+        ],
+        [
+          421.936170212766,
+          281.56382978723406
+        ],
+        [
+          428.936170212766,
+          252.56382978723406
+        ],
+        [
+          428.936170212766,
+          236.56382978723406
+        ],
+        [
+          409.936170212766,
+          220.56382978723406
+        ],
+        [
+          409.936170212766,
+          150.56382978723406
+        ],
+        [
+          430.936170212766,
+          143.56382978723406
+        ],
+        [
+          433.936170212766,
+          112.56382978723406
+        ],
+        [
+          431.936170212766,
+          96.56382978723406
+        ],
+        [
+          408.936170212766,
+          90.56382978723406
+        ],
+        [
+          395.936170212766,
+          50.563829787234056
+        ],
+        [
+          338.936170212766,
+          25.563829787234056
+        ]
+      ]
+    },
+    {
+      "label": "bus",
+      "line_color": null,
+      "fill_color": null,
+      "points": [
+        [
+          88.93617021276599,
+          115.56382978723406
+        ],
+        [
+          0.9361702127659877,
+          96.56382978723406
+        ],
+        [
+          0.0,
+          251.968085106388
+        ],
+        [
+          0.9361702127659877,
+          265.56382978723406
+        ],
+        [
+          27.936170212765987,
+          265.56382978723406
+        ],
+        [
+          29.936170212765987,
+          283.56382978723406
+        ],
+        [
+          63.93617021276599,
+          281.56382978723406
+        ],
+        [
+          89.93617021276599,
+          252.56382978723406
+        ],
+        [
+          100.93617021276599,
+          183.56382978723406
+        ],
+        [
+          108.93617021276599,
+          145.56382978723406
+        ],
+        [
+          81.93617021276599,
+          151.56382978723406
+        ]
+      ]
+    },
+    {
+      "label": "car",
+      "line_color": null,
+      "fill_color": null,
+      "points": [
+        [
+          413.936170212766,
+          168.56382978723406
+        ],
+        [
+          497.936170212766,
+          168.56382978723406
+        ],
+        [
+          497.936170212766,
+          256.56382978723406
+        ],
+        [
+          431.936170212766,
+          258.56382978723406
+        ],
+        [
+          430.936170212766,
+          236.56382978723406
+        ],
+        [
+          408.936170212766,
+          218.56382978723406
+        ]
+      ]
+    }
+  ],
+  "lineColor": [
+    0,
+    255,
+    0,
+    128
+  ],
+  "fillColor": [
+    255,
+    0,
+    0,
+    128
+  ],
+  "imagePath": "2011_000025.jpg",
+  "imageData": null
+}
\ No newline at end of file
--- a/docs/annotation/labelme2seg.py
+++ b/docs/annotation/labelme2seg.py
+#!/usr/bin/env python
+from __future__ import print_function
+import argparse
+import glob
+import json
+import os
+import os.path as osp
+import sys
+import numpy as np
+import PIL.Image
+import labelme
+def main():
+    parser = argparse.ArgumentParser(
+        formatter_class=argparse.ArgumentDefaultsHelpFormatter
+    )
+    parser.add_argument('input_dir', help='input annotated directory')
+    parser.add_argument('output_dir', help='output dataset directory')
+    args = parser.parse_args()
+    if osp.exists(args.output_dir):
+        print('Output directory already exists:', args.output_dir)
+        sys.exit(1)
+    os.makedirs(args.output_dir)
+    os.makedirs(osp.join(args.output_dir, 'JPEGImages'))
+    os.makedirs(osp.join(args.output_dir, 'SegmentationClassPNG'))
+    print('Creating dataset:', args.output_dir)
+    # get the all class names for the given dataset
+    class_names = ['_background_']
+    for label_file in glob.glob(osp.join(args.input_dir, '*.json')):
+        with open(label_file) as f:
+            data = json.load(f)
+            for shape in data['shapes']:
+                points = shape['points']
+                label = shape['label']
+                cls_name = label
+                if not cls_name in class_names:
+                    class_names.append(cls_name)
+    class_name_to_id = {}
+    for i, class_name in enumerate(class_names):
+        class_id = i # starts with 0
+        class_name_to_id[class_name] = class_id
+        if class_id == 0:
+            assert class_name == '_background_'
+    class_names = tuple(class_names)
+    print('class_names:', class_names)
+    out_class_names_file = osp.join(args.output_dir, 'class_names.txt')
+    with open(out_class_names_file, 'w') as f:
+        f.writelines('\n'.join(class_names))
+    print('Saved class_names:', out_class_names_file)
+    for label_file in glob.glob(osp.join(args.input_dir, '*.json')):
+        print('Generating dataset from:', label_file)
+        with open(label_file) as f:
+            base = osp.splitext(osp.basename(label_file))[0]
+            out_img_file = osp.join(
+                args.output_dir, 'JPEGImages', base + '.jpg')
+            out_png_file = osp.join(
+                args.output_dir, 'SegmentationClassPNG', base + '.png')
+            data = json.load(f)
+            img_file = osp.join(osp.dirname(label_file), data['imagePath'])
+            img = np.asarray(PIL.Image.open(img_file))
+            PIL.Image.fromarray(img).save(out_img_file)
+            lbl = labelme.utils.shapes_to_label(
+                img_shape=img.shape,
+                shapes=data['shapes'],
+                label_name_to_value=class_name_to_id,
+            )
+            if osp.splitext(out_png_file)[1] != '.png':
+                out_png_file += '.png'
+            # Assume label ranses [0, 255] for uint8,
+            if lbl.min() >= 0 and lbl.max() <= 255:
+                lbl_pil = PIL.Image.fromarray(lbl.astype(np.uint8), mode='L')
+                lbl_pil.save(out_png_file)
+            else:
+                raise ValueError(
+                    '[%s] Cannot save the pixel-wise class label as PNG. '
+                    'Please consider using the .npy format.' % out_png_file
+                )
+if __name__ == '__main__':
+    main()
--- a/docs/annotation/my_dataset/class_names.txt
+++ b/docs/annotation/my_dataset/class_names.txt
+_background_
+bus
+car
\ No newline at end of file
--- a/docs/benchmark.md
+++ b/docs/benchmark.md
+# PaddleSeg 性能Benchmark
+## 训练性能
+### 多GPU加速比
+### 显存开销对比
+## 预测性能对比
+### Windows
+### Linux
+#### Naive
+#### Analysis
--- a/docs/config.md
+++ b/docs/config.md
+# PaddleSeg 分割库配置说明
+PaddleSeg提供了提供了统一的配置用于 训练/评估/可视化/导出模型
+配置包含以下Group：
+* [通用](./configs/basic_group.md)
+* [DATASET](./configs/dataset_group.md)
+* [DATALOADER](./configs/dataloader_group.md)
+* [FREEZE](./configs/freeze_group.md)
+* [MODEL](./configs/model_group.md)
+* [SOLVER](./configs/solver_group.md)
+* [TRAIN](./configs/train_group.md)
+* [TEST](./configs/test_group.md)
+`Note`:
+ 代码详见pdseg/utils/config.py
--- a/docs/configs/.gitkeep
+++ b/docs/configs/.gitkeep
--- a/docs/configs/basic_group.md
+++ b/docs/configs/basic_group.md
+# cfg
+BASIC Group存放所有通用配置
+## `MEAN`
+图像预处理减去的均值（格式为*[R, G, B]*）
+### 默认值
+[104.008, 116.669, 122.675]
+<br/>
+<br/>
+## `STD`
+图像预处理所除的标准差（格式为*[R, G, B]*）
+### 默认值
+[1.000, 1.000, 1.000]
+<br/>
+<br/>
+## `EVAL_CROP_SIZE`
+评估时所对图片裁剪的大小（格式为*[宽, 高]*）
+### 默认值
+无（需要用户自己填写）
+### 注意事项
+* 裁剪的大小不能小于原图，请将该字段的值填写为评估数据中最长的宽和高
+<br/>
+<br/>
+## `TRAIN_CROP_SIZE`
+训练时所对图片裁剪的大小（格式为*[宽, 高]*）
+### 默认值
+无（需要用户自己填写）
+<br/>
+<br/>
+## `BATCH_SIZE`
+训练、评估、可视化时所用的BATCH大小
+### 默认值
+1（需要根据实际需求填写）
+### 注意事项
+* 当指定了多卡运行时，PaddleSeg会将数据平分到每张卡上运行，因此每张卡单次运行的数量为 BATCH_SIZE // dev_count
+* 多卡运行时，请确保BATCH_SIZE可被dev_count整除
+* 增大BATCH_SIZE有利于模型训练时的收敛速度，但是会带来显存的开销。请根据实际情况评估后填写合适的值
+<br/>
+<br/>
\ No newline at end of file
--- a/docs/configs/dataloader_group.md
+++ b/docs/configs/dataloader_group.md
+# cfg.DATALOADER
+DATALOADER Group存放所有与数据加载相关的配置
+## `NUM_WORKERS`
+数据载入时的并发数量
+### 默认值
+8
+### 注意事项
+* 该选项只在`pdseg/train.py`和`pdseg/eval.py`中使用到
+* 当使用多线程时，该字段表示线程适量，使用多进程时，该字段表示进程数量。一般该字段使用默认值即可
+<br/>
+<br/>
+## `BUF_SIZE`
+数据载入时的缓存队列大小
+### 默认值
+256
+<br/>
+<br/>
\ No newline at end of file
--- a/docs/configs/dataset_group.md
+++ b/docs/configs/dataset_group.md
+# cfg.DATASET
+DATASET Group存放所有与数据集相关的配置
+## `DATA_DIR`
+数据集主目录，PaddleSeg在读取数据文件列表时，会将列表中的文件名与主目录拼接得到图片的绝对路径
+### 默认值
+无（需要用户自己填写）
+<br/>
+<br/>
+## `TRAIN_FILE_LIST`
+训练集列表，调用`pdseg/train.py`进行训练时，会读取该列表中的图片进行训练
+文件列表由多行组成，每一行的格式为
+```
+<img_path><sep><label_path>
+```
+### 默认值
+无（需要用户自己填写）
+<br/>
+<br/>
+## `VAL_FILE_LIST`
+验证集列表，调用`pdseg/eval.py`进行效果评估时，会读取该列表中的图片进行评估
+文件列表由多行组成，每一行的格式为
+```
+<img_path><sep><label_path>
+```
+### 默认值
+无（需要用户自己填写）
+<br/>
+<br/>
+## `TEST_FILE_LIST`
+测试集列表，调用`pdseg/vis.py`进行可视化展示时，会读取该列表中的图片进行预测
+文件列表由多行组成，每一行的格式为
+```
+<img_path><sep><label_path>
+```
+### 默认值
+无（需要用户自己填写）
+<br/>
+<br/>
+## `VIS_FILE_LIST`
+可视化列表，调用`pdseg/train.py`进行训练时，如果打开了--use_tbx开关，则在每次模型保存的时候，会读取该列表中的图片进行可视化
+文件列表由多行组成，每一行的格式为
+```
+<img_path><sep><label_path>
+```
+### 默认值
+无（需要用户自己填写）
+<br/>
+<br/>
+## `NUM_CLASSES`
+类别数量，构建网络所需
+### 默认值
+19（但是一般需要用户修改为自己数据集的类别数量）
+### 注意事项
+数据集中的label标注必须为0 ~ NUM_CLASSES - 1，如果label设置错误，会导致计算IOU时出现异常
+<br/>
+<br/>
+## `IMAGE_TYPE`
+图片类型，支持`rgb`、`rgba`、`gray`三种格式
+### 默认值
+`rgb`
+<br/>
+<br/>
+## `SEPARATOR`
+文件列表中用于分隔输入图片和标签图片的分隔符
+### 默认值
+空格符` `
+### 例子
+假设训练文件列表如下，则 `SEPARATOR` 应该填写 `|`
+```
+mydata/train/image1.jpg|mydata/train/image1.label.jpg
+mydata/train/image2.jpg|mydata/train/image2.label.jpg
+mydata/train/image3.jpg|mydata/train/image3.label.jpg
+mydata/train/image4.jpg|mydata/train/image4.label.jpg
+...
+```
+<br/>
+<br/>
+## `IGNORE_INDEX`
+需要忽略的像素标签值，label中所有标记为该值的像素不会参与到loss的计算以及IOU、Acc等指标的计算
+### 默认值
+255
\ No newline at end of file
--- a/docs/configs/freeze_group.md
+++ b/docs/configs/freeze_group.md
+# cfg.FREEZE
+FREEZE Group存放所有与模型导出相关的配置
+## `MODEL_FILENAME`
+导出模型后所保存的模型文件名
+### 默认值
+`__model__`
+### 注意事项
+* 仅在使用`pdseg/export_model.py` 脚本导出模型时，该字段必填
+<br/>
+<br/>
+## `PARAMS_FILENAME`
+导出模型后所保存的参数文件名
+### 默认值
+`__params__`
+### 注意事项
+* 仅在使用`pdseg/export_model.py` 脚本导出模型时，该字段必填
+<br/>
+<br/>
+## `SAVE_DIR`
+保存导出模型的主目录
+### 默认值
+`freeze_model`
+### 注意事项
+* 仅在使用`pdseg/export_model.py` 脚本导出模型时，该字段必填
+<br/>
+<br/>
\ No newline at end of file
--- a/docs/configs/model_deeplabv3p_group.md
+++ b/docs/configs/model_deeplabv3p_group.md
+# cfg.MODEL.DEEPLAB
+MODEL.DEEPLAB 子Group存放所有和DeepLabv3+模型相关的配置
+## `BACKBONE`
+DeepLabV3+所用骨干网络，支持`mobilenetv2` `xception65`两种
+### 默认值
+`xception65`
+<br/>
+<br/>
+## `OUTPUT_STRIDE`
+DeepLabV3+下采样率，支持8/16两种选择
+### 默认值
+16
+<br/>
+<br/>
+## `DEPTH_MULTIPER`
+MobileNet V2的depth mutiper值，仅当`BACKBONE`为`mobilenetv2`生效
+### 默认值
+1.0
+<br/>
+<br/>
+## `ENCODER_WITH_ASPP`
+DeepLabv3+的模型Encoder中是否使用ASPP
+### 默认值
+True
+### 注意事项
+* 将该功能置为False可以提升模型计算速度，但是会降低精度
+<br/>
+<br/>
+## `DECODER_WITH_ASPP`
+DeepLabv3+的模型是否使用Decoder
+### 默认值
+True
+### 注意事项
+* 将该功能置为False可以提升模型计算速度，但是会降低精度
+<br/>
+<br/>
+## `ASPP_WITH_SEP_CONV`
+DeepLabv3+的模型的ASPP模块是否使用可分离卷积
+### 默认值
+False
+<br/>
+<br/>
+## `DECODER_WITH_SEP_CONV`
+DeepLabv3+的模型的Decoder模块是否使用可分离卷积
+### 默认值
+False
+<br/>
+<br/>
--- a/docs/configs/model_group.md
+++ b/docs/configs/model_group.md
+# cfg.MODEL
+MODEL Group存放所有和模型相关的配置，该Group还包含三个子Group
+* [DeepLabv3p](./model_deeplabv3p_group.md)
+* [UNet](./model_unet_group.md)
+* [ICNet](./model_icnet_group.md)
+## `MODEL_NAME`
+所选模型，支持`deeplabv3p` `unet` `icnet`三种模型
+### 默认值
+无（需要用户自己填写）
+<br/>
+<br/>
+## `DEFAULT_NORM_TYPE`
+模型所用norm类型，支持`bn` [`gn`]()
+### 默认值
+`bn`
+<br/>
+<br/>
+## `DEFAULT_GROUP_NUMBER`
+默认GROUP数量，仅在`DEFAULT_NORM_TYPE`为`gn`时生效
+### 默认值
+32
+<br/>
+<br/>
+## `BN_MOMENTUM`
+BatchNorm动量, 一般无需改动
+### 默认值
+0.99
+<br/>
+<br/>
+## `DEFAULT_EPSILON`
+BatchNorm计算时所用的极小值, 防止分母除0溢出，一般无需改动
+### 默认值
+1e-5
+<br/>
+<br/>
+## `FP16`
+是否开启FP16训练
+### 默认值
+False
+<br/>
+<br/>
+## `SCALE_LOSS`
+对损失进行缩放的系数
+### 默认值
+1.0
+### 注意事项
+* 启动fp16训练时，建议设置该字段为8
+<br/>
+<br/>
+## `MULTI_LOSS_WEIGHT`
+多路损失的权重
+### 默认值
+[1.0]
+### 注意事项
+* 该字段仅在模型存在多路损失的情况下生效
+* 目前支持的模型中只有`icnet`使用多路（3路）损失
+* 当选择模型为`icnet`且该字段的长度不为3时，PaddleSeg会强制设置该字段为[1.0, 0.4, 0.16]
+### 示例
+假设模型存在三路损失，计算结果分别为loss1/loss2/loss3，并且`MULTI_LOSS_WEIGHT`的值为[1.0, 0.4, 0.16]，则最终损失的计算结果为
+```math
+loss = 1.0 * loss1 + 0.4 * loss2 + 0.16 * loss3
+```
+<br/>
+<br/>
--- a/docs/configs/model_icnet_group.md
+++ b/docs/configs/model_icnet_group.md
+# cfg.MODEL.ICNET
+MODEL.ICNET 子Group存放所有和ICNet模型相关的配置
+## `DEPTH_MULTIPER`
+Resnet backbone的depth multiper
+### 默认值
+0.5
+<br/>
+<br/>
+## `LAYERS`
+ResNet backbone的层数，支持`18` `34` `50` `101` `152`等五种
+### 默认值
+50
+<br/>
+<br/>
\ No newline at end of file
--- a/docs/configs/model_unet_group.md
+++ b/docs/configs/model_unet_group.md
+# cfg.MODEL.UNET
+MODEL.UNET 子Group存放所有和UNet模型相关的配置
+## `UPSAMPLE_MODE`
+上采样方式，支持`bilinear`或者不设置
+### 默认值
+`bilinear`
+### 注意事项
+* 当`UPSAMPLE_MODE`值为`bilinear`时，UNet上采样方法为双线性插值法，否则使用转置卷积进行上采样
+<br/>
+<br/>
\ No newline at end of file
--- a/docs/configs/solver_group.md
+++ b/docs/configs/solver_group.md
+# cfg.SOLVER
+SOLVER Group定义所有和训练优化相关的配置
+## `LR`
+初始学习率
+### 默认值
+0.1
+<br/>
+<br/>
+## `LR_POLICY`
+学习率的衰减策略，支持`poly` `piecewise` `cosine`三种策略
+### 默认值
+`poly`
+### 示例
+* 当使用`poly`衰减时，假设初始学习率为0.1，训练总步数为10000，则在power分别为`0.4``0.8``1``1.2``1.6`时，衰减曲线如下图：
+  * power = 1 衰减曲线为直线
+  * power > 1 衰减曲线内凹
+  * power < 1 衰减曲线外凸
+  <p align="center">
+  <img src="../imgs/poly_decay_example.png" hspace='10' height="400" width="800"/> <br />
+  </p>
+* 当使用`piecewise`衰减时，假设初始学习率为0.1，GAMMA为0.9，总EPOCH数量为100，DECAY_EPOCH为[10, 20]，衰减曲线如下图：
+  <p align="center">
+  <img src="../imgs/piecewise_decay_example.png" hspace='10' height="400" width="800"/> <br />
+  </p>
+* 当使用`cosine`衰减时，假设初始学习率为0.1，总EPOCH数量为100，衰减曲线如下图：
+  <p align="center">
+  <img src="../imgs/cosine_decay_example.png" hspace='10' height="400" width="800"/> <br />
+  </p>
+<br/>
+<br/>
+## `POWER`
+学习率Poly下降指数，仅当策略为[`LR_POLICY`](#LR_POLICY)为`poly`时有效
+### 默认值
+0.9
+<br/>
+<br/>
+## `GAMMA`
+学习率piecewise下降指数，仅当策略为[`LR_POLICY`](#LR_POLICY)为`piecewise`时有效
+### 默认值
+0.1
+<br/>
+<br/>
+## `DECAY_EPOCH`
+学习率piecewise下降间隔，仅当策略为[`LR_POLICY`](#LR_POLICY)为`piecewise`时有效
+### 默认值
+[10, 20]
+<br/>
+<br/>
+## `WEIGHT_DECAY`
+L2正则化系数
+### 默认值
+0.00004
+<br/>
+<br/>
+## `BEGIN_EPOCH`
+起始EPOCH值
+### 默认值
+0
+<br/>
+<br/>
+## `NUM_EPOCHS`
+训练EPOCH数
+### 默认值
+30（需要根据实际需求进行调整）
+<br/>
+<br/>
+## `SNAPSHOT`
+训练时，保存模型的间隔（单位为EPOCH）
+### 默认值
+10（意味着每训练10个EPOCH保存一次模型）
+<br/>
+<br/>
\ No newline at end of file
--- a/docs/configs/test_group.md
+++ b/docs/configs/test_group.md
+# cfg.TEST
+TEST Group存放所有和测试模型相关的配置
+## `TEST_MODEL`
+待测试模型的路径
+### 默认值
+无（需要用户自己填写）
+### 注意事项
+* 使用`pdseg/export_model.py` `pdseg/eval.py` `pdseg/vis.py`等脚本进行模型的评估、可视化和导出时，该字段必填
+<br/>
+<br/>
\ No newline at end of file
--- a/docs/configs/train_group.md
+++ b/docs/configs/train_group.md
+# cfg.TRAIN
+TRAIN Group存放所有和训练相关的配置
+## `MODEL_SAVE_DIR`
+在训练周期内定期保存模型的主目录
+## 默认值
+无（需要用户自己填写）
+<br/>
+<br/>
+## `PRETRAINED_MODEL`
+预训练模型路径
+## 默认值
+无
+## 注意事项
+* 若未指定该字段，则模型会随机初始化所有的参数，从头开始训练
+* 若指定了该字段，但是路径不存在，则参数加载失败，仍然会被随机初始化
+* 若指定了该字段，且路径存在，但是部分参数不存在或者shape无法对应，则该部分参数随机初始化
+<br/>
+<br/>
+## `RESUME`
+是否从预训练模型中恢复参数并继续训练
+## 默认值
+False
+## 注意事项
+* 当该字段被置为True且`PRETRAINED_MODEL`不存在时，该选项不生效
+* 当该字段被置为True且`PRETRAINED_MODEL`存在时，PaddleSeg会恢复到上一次训练的最近一个epoch，并且恢复训练过程中的临时变量（如已经衰减过的学习率，Optimizer的动量数据等）
+* 当该字段被置为True且`PRETRAINED_MODEL`存在时，`PRETRAINED_MODEL`路径的最后一个目录必须为int数值或者字符串final，PaddleSeg会将int数值作为当前起始EPOCH继续训练，若目录为final，则不会继续训练。若目录不满足上述条件，PaddleSeg会抛出错误。
+<br/>
+<br/>
+## `SYNC_BATCH_NORM`
+是否在多卡间同步BN的均值和方差
+## 默认值
+False
+## 注意事项
+* 打开该选项会带来一定的性能消耗（多卡间同步数据导致）
+* 仅在GPU多卡训练时该开关有效（Windows不支持多卡训练，因此无需打开该开关）
+* GPU多卡训练时，建议开启该开关，可以提升模型的训练效果
\ No newline at end of file
--- a/docs/data_aug.md
+++ b/docs/data_aug.md
+# PaddleSeg 数据增强
+## 数据增强基本流程
+![](imgs/data_aug_flow.png)
+## resize  
+resize 步骤是指将输入图像按照某种规则先进行resize，PaddleSeg支持以下3种resize方式:
+![](imgs/aug_method.png)
+- unpadding
+将输入图像直接resize到某一个固定大小下，送入到网络中间训练，对应参数为AUG.FIX_RESIZE_SIZE。预测时同样操作。
+- stepscaling
+将输入图像按照某一个比例resize，这个比例以某一个步长在一定范围内随机变动。设定最小比例参数为`AUG.MIN_SCALE_FACTOR`, 最大比例参数`AUG.MAX_SCALE_FACTOR`，步长参数为`AUG.SCALE_STEP_SIZE`。预测时不对输入图像做处理。
+- rangescaling
+固定长宽比resize，即图像长边对齐到某一个固定大小，短边随同样的比例变化。设定最小大小参数为`AUG.MIN_RESIZE_VALUE`，设定最大大小参数为`AUG.MAX_RESIZE_VALUE`。预测时需要将长边对齐到`AUG.INF_RESIZE_VALUE`所指定的大小，其中`AUG.INF_RESIZE_VALUE`在`AUG.MIN_RESIZE_VALUE`和`AUG.MAX_RESIZE_VALUE`范围内。
+rangescaling示意图如下：
+![](imgs/rangescale.png)
+## rich crop  
+Rich Crop是PaddleSeg结合实际业务经验开放的一套数据增强策略，面向标注数据少，测试数据情况繁杂的分割业务场景使用的数据增强策略。流程如下图所示:
+![RichCrop示意图](imgs/data_aug_example.png)
+rich crop是指对图像进行多种变换，保证在训练过程中数据的丰富多样性，PaddleSeg支持以下几种变换。`AUG.RICH_CROP.ENABLE`为False时会直接跳过该步骤。
+- blur
+图像加模糊，使用开关`AUG.RICH_CROP.BLUR`，为False时该项功能关闭。`AUG.RICH_CROP.BLUR_RATIO`控制加入模糊的概率。
+- flip
+图像上下翻转，使用开关`AUG.RICH_CROP.FLIP`，为False时该项功能关闭。`AUG.RICH_CROP.FLIP_RATIO`控制加入模糊的概率。
+- rotation
+图像旋转，`AUG.RICH_CROP.MAX_ROTATION`控制最大旋转角度。旋转产生的多余的区域的填充值为均值。
+- aspect
+图像长宽比调整，从图像中crop一定区域出来之后在某一长宽比内进行resize。控制参数`AUG.RICH_CROP.MIN_AREA_RATIO`和`AUG.RICH_CROP.ASPECT_RATIO`。
+- color jitter
+图像颜色调整，控制参数`AUG.RICH_CROP.BRIGHTNESS_JITTER_RATIO`、`AUG.RICH_CROP.SATURATION_JITTER_RATIO`、`AUG.RICH_CROP.CONTRAST_JITTER_RATIO`。
+## random crop  
+该步骤主要是通过crop的方式使得输入到网络中的图像在某一个固定大小，控制该大小的参数为TRAIN_CROP_SIZE，类型为tuple，格式为(width, height). 当输入图像大小小于CROP_SIZE的时候会对输入图像进行padding，padding值为均值。
+- preprocess
+    - 减均值
+    - 除方差
+    - 水平翻转
+- 输入图片格式
+    - 原图
+        - 图片格式：rgb三通道图片和rgba四通道图片两种类型的图片进行训练，但是在一次训练过程只能存在一种格式。
+        - 图片转换：灰度图片经过预处理后之后会转变成三通道图片
+        - 图片参数设置：当图片为三通道图片时IMAGE_TYPE设置为rgb， 对应MEAN和STD也必须是一个长度为3的list，当图片为四通道图片时IMAGE_TYPE设置为rgba，对应的MEAN和STD必须是一个长度为4的list。
+    - 标注图
+        - 图片格式：标注图片必须为png格式的单通道多值图，元素值代表的是这个元素所属于的类别。
+        - 图片转换：在datalayer层对label图片进行的任何resize，以及旋转的操作，都必须采用最近邻的插值方式。
+        - 图片ignore：设置TRAIN.IGNORE_INDEX 参数可以选择性忽略掉属于某一个类别的所有像素点。这个参数一般设置为255
--- a/docs/data_prepare.md
+++ b/docs/data_prepare.md
+# PaddleSeg 数据准备
+## 数据标注
+数据标注推荐使用LabelMe工具，具体可参考文档[PaddleSeg 数据标注](./annotation/README.md)
+## 语义分割标注规范
+PaddleSeg采用通用的文件列表方式组织训练集、验证集和测试集。像素标注类别需要从0开始递增。
+**NOTE:** 标注图像请使用PNG无损压缩格式的图片
+以Cityscapes数据集为例, 我们需要整理出训练集、验证集、测试集对应的原图和标注文件列表用于PaddleSeg训练即可。
+其中`DATASET.DATA_DIR`为数据根目录，文件列表的路径以数据集根目录作为相对路径起始点。
+```
+./cityscapes/   # 数据集根目录
+├── gtFine      # 标注目录
+│   ├── test
+│   │   ├── berlin
+│   │   └── ...
+│   ├── train
+│   │   ├── aachen
+│   │   └── ...
+│   └── val
+│       ├── frankfurt
+│       └── ...
+└── leftImg8bit  # 原图目录
+    ├── test
+    │   ├── berlin
+    │   └── ...
+    ├── train
+    │   ├── aachen
+    │   └── ...
+    └── val
+        ├── frankfurt
+        └── ...
+```
+文件列表组织形式如下
+```
+原始图片路径 [SEP] 标注图片路径
+```
+其中`[SEP]`是文件路径分割库，可以`DATASET.SEPRATOR`配置中进行配置, 默认为空格。
+如果文件名中存在**空格**，推荐使用'|'等文件名不可用字符进行切分。
+**注意事项**
+* 务必保证分隔符在文件列表中每行只存在一次, 如文件名中存在空格，请使用'|'等文件名不可用字符进行切分
+* 文件列表请使用**UTF-8**格式保存, PaddleSeg默认使用UTF-8编码读取file_list文件
+如下图所示，左边为原图的图片路径，右边为图片对应的标注路径。
+![cityscapes_filelist](./docs/imgs/file_list.png)
+完整的配置信息可以参考[`./dataset/cityscapes_demo`](../dataset/cityscapes_demo/)目录下的yaml和文件列表。
+## 数据校验
+从7方面对用户自定义的数据集和yaml配置进行校验，帮助用户排查基本的数据和配置问题。
+数据校验脚本如下，支持通过`YAML_FILE_PATH`来指定配置文件。
+```
+# YAML_FILE_PATH为yaml配置文件路径
+python pdseg/check.py --cfg ${YAML_FILE_PATH}
+```
+### 1 数据集基本校验
+* 数据集路径检查，包括`DATASET.TRAIN_FILE_LIST`，`DATASET.VAL_FILE_LIST`，`DATASET.TEST_FILE_LIST`设置是否正确。
+* 列表分割符检查，判断在`TRAIN_FILE_LIST`，`VAL_FILE_LIST`和`TEST_FILE_LIST`列表文件中的分隔符`DATASET.SEPARATOR`设置是否正确。
+### 2 标注类别校验
+检查实际标注类别是否和配置参数`DATASET.NUM_CLASSES`，`DATASET.IGNORE_INDEX`匹配。
+**NOTE:**
+标注图像类别数值必须在[0~(`DATASET.NUM_CLASSES`-1)]范围内或者为`DATASET.IGNORE_INDEX`。
+标注类别最好从0开始，否则可能影响精度。
+### 3 标注像素统计
+统计每种类别像素数量，显示以供参考。
+### 4 标注格式校验
+检查标注图像是否为PNG格式。
+**NOTE:** 标注图像请使用PNG无损压缩格式的图片，若使用其他格式则可能影响精度。
+### 5 图像格式校验
+检查图片类型`DATASET.IMAGE_TYPE`是否设置正确。
+**NOTE:** 当数据集包含三通道图片时`DATASET.IMAGE_TYPE`设置为rgb；
+当数据集全部为四通道图片时`DATASET.IMAGE_TYPE`设置为rgba；
+### 6 图像与标注图尺寸一致性校验
+验证图像尺寸和对应标注图尺寸是否一致。
+### 7 模型验证参数`EVAL_CROP_SIZE`校验
+验证`EVAL_CROP_SIZE`是否设置正确，共有3种情形：
+- 当`AUG.AUG_METHOD`为unpadding时，`EVAL_CROP_SIZE`的宽高应不小于`AUG.FIX_RESIZE_SIZE`的宽高。
+- 当`AUG.AUG_METHOD`为stepscaling时，`EVAL_CROP_SIZE`的宽高应不小于原图中最大的宽高。
+- 当`AUG.AUG_METHOD`为rangscaling时，`EVAL_CROP_SIZE`的宽高应不小于缩放后图像中最大的宽高。
+我们将计算并给出`EVAL_CROP_SIZE`的建议值。
\ No newline at end of file
--- a/docs/deploy.md
+++ b/docs/deploy.md
+# PaddleSeg预测库部署
--- a/docs/imgs/annotation/image-1.png
+++ b/docs/imgs/annotation/image-1.png
--- a/docs/imgs/annotation/image-2.png
+++ b/docs/imgs/annotation/image-2.png
--- a/docs/imgs/annotation/image-3.png
+++ b/docs/imgs/annotation/image-3.png
--- a/docs/imgs/annotation/image-4-1.png
+++ b/docs/imgs/annotation/image-4-1.png
--- a/docs/imgs/annotation/image-4-2.png
+++ b/docs/imgs/annotation/image-4-2.png
--- a/docs/imgs/annotation/image-5.png
+++ b/docs/imgs/annotation/image-5.png
--- a/docs/imgs/annotation/image-6.png
+++ b/docs/imgs/annotation/image-6.png
--- a/docs/imgs/annotation/image-7.png
+++ b/docs/imgs/annotation/image-7.png
--- a/docs/imgs/aug_method.png
+++ b/docs/imgs/aug_method.png
--- a/docs/imgs/cosine_decay_example.png
+++ b/docs/imgs/cosine_decay_example.png
--- a/docs/imgs/data_aug_example.png
+++ b/docs/imgs/data_aug_example.png
--- a/docs/imgs/data_aug_flow.png
+++ b/docs/imgs/data_aug_flow.png
--- a/docs/imgs/deeplabv3p.png
+++ b/docs/imgs/deeplabv3p.png
--- a/docs/imgs/file_list.png
+++ b/docs/imgs/file_list.png
--- a/docs/imgs/gn.png
+++ b/docs/imgs/gn.png
--- a/docs/imgs/icnet.png
+++ b/docs/imgs/icnet.png
--- a/docs/imgs/piecewise_decay_example.png
+++ b/docs/imgs/piecewise_decay_example.png
--- a/docs/imgs/poly_decay_example.png
+++ b/docs/imgs/poly_decay_example.png
--- a/docs/imgs/rangescale.png
+++ b/docs/imgs/rangescale.png
--- a/docs/imgs/tensorboard_image.JPG
+++ b/docs/imgs/tensorboard_image.JPG
--- a/docs/imgs/tensorboard_scalar.JPG
+++ b/docs/imgs/tensorboard_scalar.JPG
--- a/docs/imgs/unet.png
+++ b/docs/imgs/unet.png
--- a/docs/installation.md
+++ b/docs/installation.md
+# PaddleSeg 安装说明
+## 推荐开发环境
+* Python2.7 or 3.5+
+* CUDA 9.2
+* cudnn v7.1
+## 1. 安装PaddlePaddle
+### pip安装
+由于图像分割任务模型计算量大，强烈推荐在GPU版本的paddlepaddle下使用PaddleSeg.
+```
+pip install paddlepaddle-gpu
+```
+### Conda安装
+PaddlePaddle最新版本1.5支持Conda安装，可以减少相关依赖安装成本，conda相关使用说明可以参考[Anaconda](https://www.anaconda.com/distribution/)
+```
+conda install -c paddle paddlepaddle-gpu cudatoolkit=9.0
+```
+更多安装方式详情可以查看 [PaddlePaddle快速开始](https://www.paddlepaddle.org.cn/start)
+## 2. 下载PaddleSeg代码
+```
+git clone https://github.com/PaddlePaddle/PaddleSeg
+```
+## 3. 安装PaddleSeg依赖
+```
+pip install -r requirements.txt
+```
+## 4. 本地流程测试
+通过执行以下命令，会完整执行数据下载，训练，可视化，预测模型导出四个环节，用于验证PaddleSeg安装和依赖是否正常。
+```
+python test/local_test_cityscapes.py
+```
\ No newline at end of file
--- a/docs/model_zoo.md
+++ b/docs/model_zoo.md
+# PaddleSeg 预训练模型
+PaddleSeg对所有内置的分割模型都提供了公开数据集的下的预训练模型，通过加载预训练模型后训练可以在自定义数据集中得到更稳定地效果。
+## ImageNet预训练模型 
+所有Imagenet预训练模型来自于PaddlePaddle图像分类库，想获取更多细节请点击[这里](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification))
+| 模型 | 数据集合 | Depth multiplier | 模型加载config设置 | 下载地址 | Accuray Top1/5 Error|
+|---|---|---|---|---|---|
+| MobieNetV2_1.0x  | ImageNet | 1.0x  | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.0 <br> MODEL.DEFAULT_NORM_TYPE: bn| [MobileNetV2_1.0x] (https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.tar) | 72.15%/90.65% |
+| MobieNetV2_0.25x | ImageNet | 0.25x | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 0.25 <br> MODEL.DEFAULT_NORM_TYPE: bn |[MobileNetV2_0.25x] (https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_25_pretrained.tar) | 53.21%/76.52% |
+| MobieNetV2_0.5x  | ImageNet | 0.5x  | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 0.5 <br> MODEL.DEFAULT_NORM_TYPE: bn | [MobileNetV2_0.5x] (https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_5_pretrained.tar) | 65.03%/85.72% |
+| MobieNetV2_1.5x  | ImageNet | 1.5x  | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.5 <br> MODEL.DEFAULT_NORM_TYPE: bn| [MobileNetV2_1.5x] (https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x1_5_pretrained.tar) | 74.12%/91.67% |
+| MobieNetV2_2.0x  | ImageNet | 2.0x  | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 2.0 <br> MODEL.DEFAULT_NORM_TYPE: bn | [MobileNetV2_2.0x] (https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x2_0_pretrained.tar) | 75.23%/92.58% |
+用户可以结合实际场景的精度和预测性能要求，选取不同`Depth multiplier`参数的MobileNet模型。
+| 模型 | 数据集合 | 模型加载config设置 | 下载地址 | Accuray Top1/5 Error |
+|---|---|---|---|---|
+| Xception41 | ImageNet | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: xception_41 <br> MODEL.DEFAULT_NORM_TYPE: bn| [Xception41_pretrained.tgz] (https://paddleseg.bj.bcebos.com/models/Xception41_pretrained.tgz) | 79.5%/94.38% |
+| Xception65 | ImageNet | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: xception_65 <br> MODEL.DEFAULT_NORM_TYPE: bn| [Xception65_pretrained.tgz] (https://paddleseg.bj.bcebos.com/models/Xception65_pretrained.tgz) | 80.32%/94.47% |
+| Xception71 | ImageNet | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: xception_71 <br> MODEL.DEFAULT_NORM_TYPE: bn| coming soon | -- |
+## COCO预训练模型 
+train数据集为coco instance分割数据集合转换成的语义分割数据集合
+| 模型 | 数据集合 | 模型加载config设置 | 下载地址 |Output Strid|multi-scale test| mIoU |
+|---|---|---|---|---|---|---|
+| DeepLabv3+/MobileNetv2/bn | COCO | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.0 <br> MODEL.DEFAULT_NORM_TYPE: bn|[deeplabv3plus_coco_bn_init.tgz](https://bj.bcebos.com/v1/paddleseg/deeplabv3plus_coco_bn_init.tgz) | 16 | --| -- |
+| DeeplabV3+/Xception65/bn | COCO | MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: xception_65 <br> MODEL.DEFAULT_NORM_TYPE: bn | [xception65_coco.tgz](https://paddleseg.bj.bcebos.com/models/xception65_coco.tgz)| 16 | -- | -- |
+| UNet/bn | COCO | MODEL.MODEL_NEME: unet  <br> MODEL.DEFAULT_NORM_TYPE: bn | [unet](https://paddleseg.bj.bcebos.com/models/unet_coco_v2.tgz) | 16 | -- | -- |
+## Cityscapes预训练模型 
+train数据集合为Cityscapes 训练集合，测试为Cityscapes的验证集合
+| 模型 | 数据集合 | 模型加载config设置 | 下载地址 |Output Stride| mutli-scale test| mIoU on val|
+|---|---|---|---|---|---|---|---|
+| DeepLabv3+/MobileNetv2/bn | Cityscapes |MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: mobilenet <br> MODEL.DEEPLAB.DEPTH_MULTIPLIER: 1.0 <br> MODEL.DEEPLAB.ENCODER_WITH_ASPP: False <br> MODEL.DEEPLAB.ENABLE_DECODER: False <br> MODEL.DEFAULT_NORM_TYPE: bn|[mobilenet_cityscapes.tgz] (https://paddleseg.bj.bcebos.com/models/mobilenet_cityscapes.tgz) |16|false| 0.698|
+| DeepLabv3+/Xception65/gn  | Cityscapes |MODEL.MODEL_NAME: deeplabv3p <br> MODEL.DEEPLAB.BACKBONE: xception_65 <br> MODEL.DEFAULT_NORM_TYPE: gn | [deeplabv3p_xception65_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/deeplabv3p_xception65_cityscapes.tgz) |16|false| 0.7804 |
+| DeepLabv3+/Xception65/bn | Cityscapes | MODEL.MODEL_NAME: deeplabv3p <br>  MODEL.DEEPLAB.BACKBONE: xception_65 <br> MODEL.DEFAULT_NORM_TYPE: bn| [Xception65_deeplab_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/Xception65_deeplab_cityscapes.tgz) | 16 | false | 0.7715 |
+| ICNet/bn | Cityscapes | MODEL.MODEL_NAME: icnet <br> MODEL.DEFAULT_NORM_TYPE: bn |  [icnet_cityscapes.tgz](https://paddleseg.bj.bcebos.com/models/icnet_cityscapes.tgz) |16|false| 0.6854 |
--- a/docs/models.md
+++ b/docs/models.md
+# PaddleSeg 模型列表
+### U-Net
+U-Net 起源于医疗图像分割，整个网络是标准的encoder-decoder网络，特点是参数少，计算快，应用性强，对于一般场景适应度很高。
+![](./imgs/unet.png)
+### DeepLabv3+
+DeepLabv3+ 是DeepLab系列的最后一篇文章，其前作有 DeepLabv1，DeepLabv2, DeepLabv3,
+在最新作中，DeepLab的作者通过encoder-decoder进行多尺度信息的融合，同时保留了原来的空洞卷积和ASSP层，
+其骨干网络使用了Xception模型，提高了语义分割的健壮性和运行速率，在 PASCAL VOC 2012 dataset取得新的state-of-art performance，89.0mIOU。
+![](./imgs/deeplabv3p.png)
+在PaddleSeg当前实现中，支持两种分类Backbone网络的切换
+- MobileNetv2:
+适用于移动设备的快速网络，如果对分割性能有较高的要求，请使用这一backbone网络。
+- Xception:
+DeepLabv3+原始实现的backbone网络，兼顾了精度和性能，适用于服务端部署。
+### ICNet
+Image Cascade Network（ICNet)主要用于图像实时语义分割。相较于其它压缩计算的方法，ICNet即考虑了速度，也考虑了准确性。 ICNet的主要思想是将输入图像变换为不同的分辨率，然后用不同计算复杂度的子网络计算不同分辨率的输入，然后将结果合并。ICNet由三个子网络组成，计算复杂度高的网络处理低分辨率输入，计算复杂度低的网络处理分辨率高的网络，通过这种方式在高分辨率图像的准确性和低复杂度网络的效率之间获得平衡。
+整个网络结构如下：
+![](./imgs/icnet.png)
+## 参考
+- [Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1802.02611)
+- [U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/abs/1505.04597)
+- [ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545)
+# PaddleSeg特殊网络结构介绍
+### Group Norm
+![](./imgs/gn.png)
+关于Group Norm的介绍可以参考论文：https://arxiv.org/abs/1803.08494
+GN 把通道分为组，并计算每一组之内的均值和方差，以进行归一化。GN 的计算与批量大小无关，其精度也在各种批量大小下保持稳定。适应于网络参数很重的模型，比如deeplabv3+这种，可以在一个小batch下取得一个较好的训练效果。
+### Synchronized Batch Norm
+Synchronized Batch Norm跨GPU批归一化策略最早在[MegDet: A Large Mini-Batch Object Detector](https://arxiv.org/abs/1711.07240)
+论文中提出，在[Bag of Freebies for Training Object Detection Neural Networks](https://arxiv.org/pdf/1902.04103.pdf)论文中以Yolov3验证了这一策略的有效性，[PaddleCV/yolov3](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/yolov3)实现了这一系列策略并比Darknet框架版本在COCO17数据上mAP高5.9.
+PaddleSeg基于PaddlePaddle框架的sync_batch_norm策略，可以支持通过多卡实现大batch size的分割模型训练，可以得到更高的mIoU精度。
--- a/docs/usage.md
+++ b/docs/usage.md
+PaddleSeg提供了 `训练`/`评估`/`预测(可视化)`/`模型导出` 等四个功能的使用脚本。四个脚本都支持通过不同的Flags来开启特定功能，也支持通过Options来修改默认的[训练配置](./config.md)。四者的使用方式非常接近，如下：
+```shell
+# 训练
+python pdseg/train.py ${FLAGS} ${OPTIONS}
+# 评估
+python pdseg/eval.py ${FLAGS} ${OPTIONS}
+# 预测/可视化
+python pdseg/vis.py ${FLAGS} ${OPTIONS}
+# 模型导出
+python pdseg/export_model.py ${FLAGS} ${OPTIONS}
+```
+`Note`:
+> * FLAGS必须位于OPTIONS之前，否会将会遇到报错，例如如下的例子:
+>
+> ```shell
+> # FLAGS "--cfg configs/cityscapes.yaml" 必须在 OPTIONS "BATCH_SIZE 1" 之前
+> python pdseg/train.py BATCH_SIZE 1 --cfg configs/cityscapes.yaml
+> ```
+## FLAGS
+|FLAG|支持脚本|用途|默认值|备注|
+|-|-|-|-|-|
+|--cfg|ALL|配置文件路径|None||
+|--use_gpu|train/eval/vis|是否使用GPU进行训练|False||
+|--use_mpio|train/eval|是否使用多线程进行IO处理|False|打开该开关会占用一定量的CPU内存，但是可以提高训练速度。</br> NOTE：windows平台下不支持该功能, 建议使用自定义数据初次训练时不打开，打开会导致数据读取异常不可见。 </br> |
+|--use_tbx|train|是否使用tensorboardX记录训练数据|False||
+|--log_steps|train|训练日志的打印周期（单位为step）|10||
+|--debug|train|是否打印debug信息|False|IOU等指标涉及到混淆矩阵的计算，会降低训练速度|
+|--tbx_log_dir|train|tensorboardX的日志路径|None||
+|--do_eval|train|是否在保存模型时进行效果评估|False||
+|--vis_dir|vis|保存可视化图片的路径|"visual"||
+|--also_save_raw_results|vis|是否保存原始的预测图片|False||
+## OPTIONS
+详见[训练配置](./config.md)
+## 使用示例
+下面通过一个简单的示例，说明如何使用PaddleSeg提供的预训练模型进行finetune。我们选择基于COCO数据集预训练的unet模型作为pretrained模型，在一个Oxford-IIIT Pet数据集上进行finetune。
+**Note:** 为了快速体验，我们使用Oxford-IIIT Pet做了一个小型数据集，后续数据都使用该小型数据集。
+### 准备工作
+在开始教程前，请先确认准备工作已经完成：
+1. 下载合适版本的paddlepaddle
+2. PaddleSeg相关依赖已经安装
+如果有不确认的地方，请参考[安装说明](./docs/installation.md)
+### 下载预训练模型
+```shell
+# 下载预训练模型
+wget https://bj.bcebos.com/v1/paddleseg/models/unet_coco_init.tgz
+# 解压缩到当前路径下
+tar xvzf unet_coco_init.tgz
+```
+### 下载Oxford-IIIT数据集
+```shell
+# 下载Oxford-IIIT Pet数据集
+wget https://paddleseg.bj.bcebos.com/dataset/mini_pet.zip --no-check-certificate
+# 解压缩到当前路径下
+unzip mini_pet.zip
+```
+### Finetune
+接着开始Finetune，为了方便体验，我们在configs目录下放置了Oxford-IIIT Pet所对应的配置文件`unet_pet.yaml`，可以通过`--cfg`指向该文件来设置训练配置。
+我们选择两张GPU进行训练，这可以通过环境变量`CUDA_VISIBLE_DEVICES`来指定。
+除此之外，我们指定总BATCH_SIZE为4，PaddleSeg会根据可用的GPU数量，将数据平分到每张卡上，务必确保BATCH_SIZE为GPU数量的整数倍（在本例中，每张卡的BATCH_SIZE为2）。
+```
+export CUDA_VISIBLE_DEVICES=0,1
+python pdseg/train.py --use_gpu \
+                      --do_eval \
+                      --use_tbx \
+                      --tbx_log_dir train_log \
+                      --cfg configs/unet_pet.yaml \
+                      BATCH_SIZE 4 \
+                      TRAIN.PRETRAINED_MODEL unet_coco_init \
+                      DATASET.DATA_DIR mini_pet \
+                      DATASET.TEST_FILE_LIST mini_pet/file_list/test_list.txt \
+                      DATASET.TRAIN_FILE_LIST mini_pet/file_list/train_list.txt \
+                      DATASET.VAL_FILE_LIST mini_pet/file_list/val_list.txt \
+                      DATASET.VIS_FILE_LIST mini_pet/file_list/val_list.txt
+                      TRAIN.SYNC_BATCH_NORM True
+                      SOLVER.LR 5e-5
+```
+`NOTE`:
+> * 上述示例中，一共存在三套配置方案: PaddleSeg默认配置/unet_pet.yaml/OPTIONS，三者的优先级顺序为 OPTIONS > yaml > 默认配置。这个原则对于train.py/eval.py/vis.py/export_model.py都适用
+>
+> * 如果发现因为内存不足而Crash。请适当调低BATCH_SIZE。如果本机GPU内存充足，则可以调高BATCH_SIZE的大小以获得更快的训练速度
+>
+> * windows并不支持多卡训练
+### 训练过程可视化
+当打开do_eval和use_tbx两个开关后，我们可以通过TensorBoard查看训练的效果
+```shell
+tensorboard --logdir train_log --host {$HOST_IP} --port {$PORT}
+```
+NOTE:
+1. 上述示例中，$HOST_IP为机器IP地址，请替换为实际IP，$PORT请替换为可访问的端口
+2. 数据量较大时，前端加载速度会比较慢，请耐心等待
+启动TensorBoard命令后，我们可以在浏览器中查看对应的训练数据
+在`SCALAR`这个tab中，查看训练loss、iou、acc的变化趋势
+![](docs/imgs/tensorboard_scalar.JPG)
+在`IMAGE`这个tab中，查看样本的预测情况
+![](docs/imgs/tensorboard_image.JPG)
+### 模型评估
+训练完成后，我们可以通过eval.py来评估模型效果。由于我们设置的训练EPOCH数量为500，保存间隔为10，因此一共会产生50个定期保存的模型，加上最终保存的final模型，一共有51个模型。我们选择最后保存的模型进行效果的评估：
+```shell
+python pdseg/eval.py --use_gpu \
+                     --cfg configs/unet_pet.yaml \
+                     DATASET.DATA_DIR mini_pet \
+                     DATASET.VAL_FILE_LIST mini_pet/file_list/val_list.txt \
+                     TEST.TEST_MODEL test/saved_models/unet_pet/final
+```
+### 模型预测/可视化
+通过vis.py来评估模型效果，我们选择最后保存的模型进行效果的评估：
+```shell
+python pdseg/vis.py --use_gpu \
+                     --cfg configs/unet_pet.yaml \
+                     DATASET.DATA_DIR mini_pet \
+                     DATASET.TEST_FILE_LIST mini_pet/file_list/test_list.txt \
+                     TEST.TEST_MODEL test/saved_models/unet_pet/final
+```
+`NOTE`
+1. 可视化的图片会默认保存在visual/visual_results目录下，可以通过`--vis_dir`来指定输出目录
+2. 训练过程中会使用DATASET.VIS_FILE_LIST中的图片进行可视化显示，而vis.py则会使用DATASET.TEST_FILE_LIST
+### 模型导出
+当确定模型效果满足预期后，我们需要通过export_model.py来导出一个可用于部署到服务端预测的模型：
+```shell
+python pdseg/export_model.py --cfg configs/unet_pet.yaml \
+                                   TEST.TEST_MODEL test/saved_models/unet_pet/final
+```
+模型会导出到freeze_model目录，接下来就是进行模型的部署，相关步骤，请查看[模型部署](./inference/README.md)
\ No newline at end of file
--- a/inference/CMakeLists.txt
+++ b/inference/CMakeLists.txt
+cmake_minimum_required(VERSION 3.0)
+project(cpp_inference_demo CXX C)
+option(WITH_MKL        "Compile demo with MKL/OpenBlas support,defaultuseMKL."          ON)
+option(WITH_GPU        "Compile demo with GPU/CPU, default use CPU."                    ON)
+option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static."   ON)
+option(USE_TENSORRT "Compile demo with TensorRT."   OFF)
+SET(PADDLE_DIR "" CACHE PATH "Location of libraries")
+SET(OPENCV_DIR "" CACHE PATH "Location of libraries")
+SET(CUDA_LIB "" CACHE PATH "Location of libraries")
+include(external-cmake/yaml-cpp.cmake)
+macro(safe_set_static_flag)
+    foreach(flag_var
+        CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
+        CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
+      if(${flag_var} MATCHES "/MD")
+        string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
+	  endif(${flag_var} MATCHES "/MD")
+    endforeach(flag_var)
+endmacro()
+if (WITH_MKL)
+    ADD_DEFINITIONS(-DUSE_MKL)
+endif()
+if (NOT DEFINED PADDLE_DIR OR ${PADDLE_DIR} STREQUAL "")
+    message(FATAL_ERROR "please set PADDLE_DIR with -DPADDLE_DIR=/path/paddle_influence_dir")
+endif()
+if (NOT DEFINED OPENCV_DIR OR ${OPENCV_DIR} STREQUAL "")
+    message(FATAL_ERROR "please set OPENCV_DIR with -DOPENCV_DIR=/path/opencv")
+endif()
+include_directories("${CMAKE_SOURCE_DIR}/")
+include_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/src/yaml-cpp/include")
+include_directories("${PADDLE_DIR}/")
+include_directories("${PADDLE_DIR}/third_party/install/protobuf/include")
+include_directories("${PADDLE_DIR}/third_party/install/glog/include")
+include_directories("${PADDLE_DIR}/third_party/install/gflags/include")
+include_directories("${PADDLE_DIR}/third_party/install/xxhash/include")
+include_directories("${PADDLE_DIR}/third_party/install/snappy/include")
+include_directories("${PADDLE_DIR}/third_party/install/snappystream/include")
+include_directories("${PADDLE_DIR}/third_party/install/zlib/include")
+include_directories("${PADDLE_DIR}/third_party/boost")
+include_directories("${PADDLE_DIR}/third_party/eigen3")
+link_directories("${PADDLE_DIR}/third_party/install/snappy/lib")
+link_directories("${PADDLE_DIR}/third_party/install/snappystream/lib")
+link_directories("${PADDLE_DIR}/third_party/install/zlib/lib")
+link_directories("${PADDLE_DIR}/third_party/install/protobuf/lib")
+link_directories("${PADDLE_DIR}/third_party/install/glog/lib")
+link_directories("${PADDLE_DIR}/third_party/install/gflags/lib")
+link_directories("${PADDLE_DIR}/third_party/install/xxhash/lib")
+link_directories("${PADDLE_DIR}/paddle/lib/")
+link_directories("${CMAKE_CURRENT_BINARY_DIR}/ext/yaml-cpp/lib")
+link_directories("${CMAKE_CURRENT_BINARY_DIR}")
+if (WIN32)
+  include_directories("${PADDLE_DIR}/paddle/fluid/inference")
+  link_directories("${PADDLE_DIR}/paddle/fluid/inference")
+  include_directories("${OPENCV_DIR}/build/include")
+  include_directories("${OPENCV_DIR}/opencv/build/include")
+  link_directories("${OPENCV_DIR}/build/x64/vc14/lib")
+else ()
+  include_directories("${PADDLE_DIR}/paddle/include")
+  link_directories("${PADDLE_DIR}/paddle/lib")
+  include_directories("${OPENCV_DIR}/include")
+  link_directories("${OPENCV_DIR}/lib64")
+endif ()
+if (WIN32)
+    add_definitions("/DGOOGLE_GLOG_DLL_DECL=")
+    set(CMAKE_C_FLAGS_DEBUG   "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd")
+    set(CMAKE_C_FLAGS_RELEASE  "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT")
+    set(CMAKE_CXX_FLAGS_DEBUG  "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd")
+    set(CMAKE_CXX_FLAGS_RELEASE   "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT")
+    if (WITH_STATIC_LIB)
+        safe_set_static_flag()
+        add_definitions(-DSTATIC_LIB)
+    endif()
+else()
+    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++14")
+    set(CMAKE_STATIC_LIBRARY_PREFIX "")
+endif()
+# TODO let users define cuda lib path
+if (WITH_GPU)
+    if (NOT DEFINED CUDA_LIB OR ${CUDA_LIB} STREQUAL "")
+        message(FATAL_ERROR "please set CUDA_LIB with -DCUDA_LIB=/path/cuda-8.0/lib64")
+    endif()
+    if (NOT WIN32)
+        if (NOT DEFINED CUDNN_LIB)
+            message(FATAL_ERROR "please set CUDNN_LIB with -DCUDNN_LIB=/path/cudnn_v7.4/cuda/lib64")
+        endif()
+    endif(NOT WIN32)
+endif() 
+if (NOT WIN32)
+  if (USE_TENSORRT AND WITH_GPU)
+      include_directories("${PADDLE_DIR}/third_party/install/tensorrt/include")
+      link_directories("${PADDLE_DIR}/third_party/install/tensorrt/lib")
+  endif()
+endif(NOT WIN32)
+if (NOT WIN32)
+    set(NGRAPH_PATH "${PADDLE_DIR}/third_party/install/ngraph")
+    if(EXISTS ${NGRAPH_PATH})
+        include(GNUInstallDirs)
+        include_directories("${NGRAPH_PATH}/include")
+        link_directories("${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}")
+        set(NGRAPH_LIB ${NGRAPH_PATH}/${CMAKE_INSTALL_LIBDIR}/libngraph${CMAKE_SHARED_LIBRARY_SUFFIX})
+    endif()
+endif()
+if(WITH_MKL)
+  include_directories("${PADDLE_DIR}/third_party/install/mklml/include")
+  if (WIN32)
+    set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.lib
+            ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.lib)
+  else ()
+    set(MATH_LIB ${PADDLE_DIR}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
+            ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
+  endif ()
+  set(MKLDNN_PATH "${PADDLE_DIR}/third_party/install/mkldnn")
+  if(EXISTS ${MKLDNN_PATH})
+    include_directories("${MKLDNN_PATH}/include")
+    if (WIN32)
+      set(MKLDNN_LIB ${MKLDNN_PATH}/lib/mkldnn.lib)
+    else ()
+      set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
+    endif ()
+  endif()
+else()
+  set(MATH_LIB ${PADDLE_DIR}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
+endif()
+if(WITH_STATIC_LIB)
+  if (WIN32)
+    set(DEPS
+        ${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
+  else ()
+    set(DEPS
+        ${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
+  endif()
+else()
+  if (WIN32)
+    set(DEPS
+        ${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
+  else ()
+    set(DEPS
+        ${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX})
+  endif()
+endif()
+if (NOT WIN32)
+    set(EXTERNAL_LIB "-lrt -ldl -lpthread")
+    set(DEPS ${DEPS}
+        ${MATH_LIB} ${MKLDNN_LIB}
+        glog gflags protobuf snappystream snappy z xxhash
+        ${EXTERNAL_LIB})
+else()
+    set(DEPS ${DEPS}
+        ${MATH_LIB} ${MKLDNN_LIB}
+        opencv_world346 glog libyaml-cppmt gflags_static libprotobuf snappy zlibstatic xxhash snappystream ${EXTERNAL_LIB})
+    set(DEPS ${DEPS} libcmt shlwapi)
+	set(DEPS ${DEPS}  ${YAML_CPP_LIBRARY})
+endif(NOT WIN32)
+if(WITH_GPU)
+  if(NOT WIN32)
+    if (USE_TENSORRT)
+      set(DEPS ${DEPS} ${PADDLE_DIR}/third_party/install/tensorrt/lib/libnvinfer${CMAKE_STATIC_LIBRARY_SUFFIX})
+      set(DEPS ${DEPS} ${PADDLE_DIR}/third_party/install/tensorrt/lib/libnvinfer_plugin${CMAKE_STATIC_LIBRARY_SUFFIX})
+    endif()
+    set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${CUDNN_LIB}/libcudnn${CMAKE_SHARED_LIBRARY_SUFFIX})
+  else()
+    set(DEPS ${DEPS} ${CUDA_LIB}/cudart${CMAKE_STATIC_LIBRARY_SUFFIX} )
+    set(DEPS ${DEPS} ${CUDA_LIB}/cublas${CMAKE_STATIC_LIBRARY_SUFFIX} )
+    set(DEPS ${DEPS} ${CUDA_LIB}/cudnn${CMAKE_STATIC_LIBRARY_SUFFIX})
+  endif()
+endif()
+if (NOT WIN32)
+    set(DEPS ${DEPS} ${OPENCV_DIR}/lib64/libopencv_imgcodecs${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_DIR}/lib64/libopencv_imgproc${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_DIR}/lib64/libopencv_core${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_DIR}/lib64/libopencv_highgui${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/libIlmImf${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/liblibjasper${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/liblibpng${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/liblibtiff${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/libittnotify${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/liblibjpeg-turbo${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/liblibwebp${CMAKE_STATIC_LIBRARY_SUFFIX})
+    set(DEPS ${DEPS} ${OPENCV_DIR}/share/OpenCV/3rdparty/lib64/libzlib${CMAKE_STATIC_LIBRARY_SUFFIX})
+endif()
+SET(PADDLESEG_INFERENCE_SRCS  preprocessor/preprocessor.cpp preprocessor/preprocessor_seg.cpp predictor/seg_predictor.cpp)
+ADD_LIBRARY(libpaddleseg_inference STATIC ${PADDLESEG_INFERENCE_SRCS})
+target_link_libraries(libpaddleseg_inference ${DEPS})
+add_executable(demo demo.cpp)
+ADD_DEPENDENCIES(libpaddleseg_inference yaml-cpp)
+ADD_DEPENDENCIES(demo yaml-cpp libpaddleseg_inference)
+target_link_libraries(demo ${DEPS} libpaddleseg_inference)
+add_custom_command(TARGET demo POST_BUILD
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./mklml.dll
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./libiomp5md.dll
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/bin/mkldnn.dll ./mkldnn.dll
+        )
\ No newline at end of file
--- a/inference/CMakeSettings.json
+++ b/inference/CMakeSettings.json
+{
+  "configurations": [
+    {
+      "name": "x64-Release",
+      "generator": "Ninja",
+      "configurationType": "RelWithDebInfo",
+      "inheritEnvironments": [ "msvc_x64_x64" ],
+      "buildRoot": "${projectDir}\\out\\build\\${name}",
+      "installRoot": "${projectDir}\\out\\install\\${name}",
+      "cmakeCommandArgs": "",
+      "buildCommandArgs": "-v",
+      "ctestCommandArgs": "",
+      "variables": [
+        {
+          "name": "CUDA_LIB",
+          "value": "C:/PaddleDeploy/cudalib/v8.0/lib/x64",
+          "type": "PATH"
+        },
+        {
+          "name": "OPENCV_DIR",
+          "value": "C:/PaddleDeploy/opencv",
+          "type": "PATH"
+        },
+        {
+          "name": "PADDLE_DIR",
+          "value": "C:/PaddleDeploy/fluid_inference",
+          "type": "PATH"
+        },
+        {
+          "name": "CMAKE_BUILD_TYPE",
+          "value": "Release",
+          "type": "STRING"
+        }
+      ]
+    }
+  ]
+}
\ No newline at end of file
--- a/inference/INSTALL.md
+++ b/inference/INSTALL.md
+# 依赖安装
+## OpenCV
+OpenCV官方Release地址：https://opencv.org/releases/
+### Windows
+1. 下载Windows安装包：OpenCV-3.4.6
+2. 双击安装到指定位置，如D:\opencv
+3. 配置环境变量  
+> 1.我的电脑->属性->高级系统设置->环境变量  
+> 2.在系统变量中找到Path（如没有，自行创建），并双击编辑  
+> 3.新建，将opencv路径填入并保存，如D:\opencv\build\x64\vc14\bin  
+### Linux
+1. 下载OpenCV-3.4.6 Sources，并解压，如/home/user/opencv-3.4.6
+2. cd opencv-3.4.6 & mkdir build & mkdir release
+3. 修改modules/videoio/src/cap_v4l.cpp 在代码第253行下，插入如下代码
+```
+#ifndef V4L2_CID_ROTATE
+#define V4L2_CID_ROTATE (V4L2_CID_BASE+34)
+#endif
+#ifndef V4L2_CID_IRIS_ABSOLUTE
+#define V4L2_CID_IRIS_ABSOLUTE (V4L2_CID_CAMERA_CLASS_BASE+17)
+#endif
+```
+3. cd build
+4. cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/ssd2/Jason/tmp/opencv-3.4.6/release/ --OPENCV_FORCE_3RDPARTY_BUILD=OFF
+5. make -j10
+6. make install
+编译后产出的头文件和lib即安装在/home/user/opencv-3.4.6/release目录下
--- a/inference/LICENSE
+++ b/inference/LICENSE
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/inference/README.md
+++ b/inference/README.md
+# PaddleSeg C++预测部署方案
+## 说明
+本目录提供一个跨平台的图像分割模型的C++预测部署方案，用户通过一定的配置，加上少量的代码，即可把模型集成到自己的服务中，完成图像分割的任务。
+主要设计的目标包括以下三点：
+- 跨平台，支持在 windows和Linux完成编译、开发和部署
+- 支持主流图像分割任务，用户通过少量配置即可加载模型完成常见预测任务，比如人像分割等
+- 可扩展性，支持用户针对新模型开发自己特殊的数据预处理、后处理等逻辑
+## 主要目录和文件
+| 文件 | 作用 |
+|-------|----------|
+| CMakeList.txt | cmake 编译配置文件 |
+| external-cmake| 依赖的外部项目 cmake (目前仅有yaml-cpp)|
+| demo.cpp | 示例C++代码，演示加载模型完成预测任务 |
+| predictor | 加载模型并预测的类代码|
+| preprocess |数据预处理相关的类代码|
+| utils | 一些基础公共函数|
+| images/humanseg | 样例人像分割模型的测试图片目录|
+| conf/humanseg.yaml | 示例人像分割模型配置|
+| tools/visualize.py | 预测结果彩色可视化脚本 |
+## Windows平台编译
+### 前置条件
+* Visual Studio 2015+ 
+* CUDA 8.0 / CUDA 9.0 + CuDNN 7
+* CMake 3.0+
+我们分别在 `Visual Studio 2015` 和 `Visual Studio 2019 Community` 两个版本下做了测试.
+**下面所有示例，以根目录为 `D:\`演示**
+### Step1: 下载代码
+1. `git clone http://gitlab.baidu.com/Paddle/PaddleSeg.git`
+2. 拷贝 `D:\PaddleSeg\inference\` 目录到 `D:\PaddleDeploy`下
+目录`D:\PaddleDeploy\inference` 目录包含了`CMakelist.txt`以及代码等项目文件.
+### Step2: 下载PaddlePaddle预测库fluid_inference
+根据Windows环境，下载相应版本的PaddlePaddle预测库，并解压到`D:\PaddleDeploy\`目录
+| CUDA | GPU | 下载地址 |
+|------|------|--------|
+| 8.0 | Yes | [fluid_inference.zip](https://bj.bcebos.com/v1/paddleseg/fluid_inference_win.zip) |
+| 9.0 | Yes | [fluid_inference_cuda90.zip](https://paddleseg.bj.bcebos.com/fluid_inference_cuda9_cudnn7.zip) |
+`D:\PaddleDeploy\fluid_inference`目录包含内容为：
+```bash
+paddle # paddle核心目录
+third_party # paddle 第三方依赖
+version.txt # 编译的版本信息
+```
+### Step3: 安装配置OpenCV
+1. 在OpenCV官网下载适用于Windows平台的3.4.6版本， [下载地址](https://sourceforge.net/projects/opencvlibrary/files/3.4.6/opencv-3.4.6-vc14_vc15.exe/download)  
+2. 运行下载的可执行文件，将OpenCV解压至指定目录，如`D:\PaddleDeploy\opencv`  
+3. 配置环境变量，如下流程所示  
+    1. 我的电脑->属性->高级系统设置->环境变量  
+    2. 在系统变量中找到Path（如没有，自行创建），并双击编辑  
+    3. 新建，将opencv路径填入并保存，如`D:\PaddleDeploy\opencv\build\x64\vc14\bin` 
+### Step4: 以VS2015为例编译代码
+以下命令需根据自己系统中各相关依赖的路径进行修改
+* 调用VS2015, 请根据实际VS安装路径进行调整，打开cmd命令行工具执行以下命令
+* 其他vs版本，请查找到对应版本的`vcvarsall.bat`路径，替换本命令即可
+```
+call "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" amd64
+```
+* CMAKE编译工程
+    * PADDLE_DIR: fluid_inference预测库目录
+    * CUDA_LIB: CUDA动态库目录, 请根据实际安装情况调整
+    * OPENCV_DIR: OpenCV解压目录
+```
+# 创建CMake的build目录
+D:
+cd PaddleDeploy\inference
+mkdir build
+cd build
+D:\PaddleDeploy\inference\build> cmake .. -G "Visual Studio 14 2015 Win64" -DWITH_GPU=ON -DPADDLE_DIR=D:\PaddleDeploy\fluid_inference -DCUDA_LIB=D:\PaddleDeploy\cudalib\v8.0\lib\x64 -DOPENCV_DIR=D:\PaddleDeploy\opencv -T host=x64
+```
+这里的`cmake`参数`-G`, 可以根据自己的`VS`版本调整，具体请参考[cmake文档](https://cmake.org/cmake/help/v3.15/manual/cmake-generators.7.html)
+* 生成可执行文件
+```
+D:\PaddleDeploy\inference\build> msbuild /m /p:Configuration=Release cpp_inference_demo.sln
+```
+### Step5: 预测及可视化
+上步骤中编译生成的可执行文件和相关动态链接库并保存在build/Release目录下，可通过Windows命令行直接调用。
+可下载并解压示例模型进行测试，点击下载示例的人像分割模型[下载地址](https://paddleseg.bj.bcebos.com/inference_model/deeplabv3p_xception65_humanseg.tgz)
+假设解压至 `D:\PaddleDeploy\models\deeplabv3p_xception65_humanseg` ，执行以下命令：
+```
+cd Release
+D:\PaddleDeploy\inference\build\Release> demo.ext --conf=D:\\PaddleDeploy\\inference\\conf\\humanseg.yaml --input_dir=D:\\PaddleDeploy\\inference\\images\humanseg\\
+```
+预测使用的两个命令参数说明如下：
+| 参数 | 含义 |
+|-------|----------|
+| conf | 模型配置的yaml文件路径 |
+| input_dir | 需要预测的图片目录 |
+**配置文件**的样例以及字段注释说明请参考: [conf/humanseg.yaml](inference/conf/humanseg.yaml)
+样例程序会扫描input_dir目录下的所有图片，并生成对应的预测结果图片。
+文件`14.jpg`预测的结果存储在`14_jpg.png`中，可视化结果在`14_jpg_scoremap.png`中， 原始尺寸的预测结果在`14_jpg_recover.png`中。
+输入原图  
+![avatar](inference/images/humanseg/demo.jpg)
+输出预测结果   
+![avatar](inference/images/humanseg/demo_jpg_recover.png)
--- a/inference/conf/humanseg.yaml
+++ b/inference/conf/humanseg.yaml
+DEPLOY:
+    USE_GPU: 1
+    MODEL_PATH: "C:\\PaddleDeploy\\models\\deeplabv3p_xception65_humanseg"
+    MODEL_NAME: "unet"
+    MODEL_FILENAME: "__model__"
+    PARAMS_FILENAME: "__params__"
+    EVAL_CROP_SIZE: (513, 513)
+    MEAN: [104.008, 116.669, 122.675]
+    STD: [1.0, 1.0, 1.0]
+    IMAGE_TYPE: "rgb"
+    NUM_CLASSES: 2
+    CHANNELS : 3
+    PRE_PROCESSOR: "SegPreProcessor"
+    PREDICTOR_MODE: "ANALYSIS"
+    BATCH_SIZE : 3
\ No newline at end of file
--- a/inference/demo.cpp
+++ b/inference/demo.cpp
+#include <glog/logging.h>
+#include <utils/utils.h>
+#include <predictor/seg_predictor.h>
+DEFINE_string(conf, "", "Configuration File Path");
+DEFINE_string(input_dir, "", "Directory of Input Images");
+int main(int argc, char** argv) {
+    // 0. parse args
+    google::ParseCommandLineFlags(&argc, &argv, true);
+    if (FLAGS_conf.empty() || FLAGS_input_dir.empty()) {
+        std::cout << "Usage: ./predictor --conf=/config/path/to/your/model --input_dir=/directory/of/your/input/images";
+        return -1;
+    }
+    // 1. create a predictor and init it with conf
+    PaddleSolution::Predictor predictor;
+    if (predictor.init(FLAGS_conf) != 0) {
+        LOG(FATAL) << "Fail to init predictor";
+        return -1;
+    }
+    // 2. get all the images with extension '.jpeg' at input_dir
+    auto imgs = PaddleSolution::utils::get_directory_images(FLAGS_input_dir, ".jpeg|.jpg");
+    // 3. predict
+    predictor.predict(imgs);
+    return 0;
+}
--- a/inference/external-cmake/yaml-cpp.cmake
+++ b/inference/external-cmake/yaml-cpp.cmake
+find_package(Git REQUIRED)
+include(ExternalProject)
+message("${CMAKE_BUILD_TYPE}")
+ExternalProject_Add(
+        yaml-cpp
+        GIT_REPOSITORY https://github.com/jbeder/yaml-cpp.git
+        GIT_TAG e0e01d53c27ffee6c86153fa41e7f5e57d3e5c90
+        CMAKE_ARGS
+        -DYAML_CPP_BUILD_TESTS=OFF
+		-DYAML_CPP_BUILD_TOOLS=OFF
+        -DYAML_CPP_INSTALL=OFF
+        -DYAML_CPP_BUILD_CONTRIB=OFF
+		-DMSVC_SHARED_RT=OFF
+		-DBUILD_SHARED_LIBS=OFF
+        -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}
+        -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS}
+        -DCMAKE_CXX_FLAGS_DEBUG=${CMAKE_CXX_FLAGS_DEBUG}
+        -DCMAKE_CXX_FLAGS_RELEASE=${CMAKE_CXX_FLAGS_RELEASE}
+        -DCMAKE_LIBRARY_OUTPUT_DIRECTORY=${CMAKE_BINARY_DIR}/ext/yaml-cpp/lib
+        -DCMAKE_ARCHIVE_OUTPUT_DIRECTORY=${CMAKE_BINARY_DIR}/ext/yaml-cpp/lib
+        PREFIX "${CMAKE_BINARY_DIR}/ext/yaml-cpp"
+        # Disable install step
+        INSTALL_COMMAND ""
+	    LOG_DOWNLOAD ON
+)
\ No newline at end of file
--- a/inference/images/humanseg/1.jpg
+++ b/inference/images/humanseg/1.jpg
--- a/inference/images/humanseg/10.jpg
+++ b/inference/images/humanseg/10.jpg
--- a/inference/images/humanseg/11.jpg
+++ b/inference/images/humanseg/11.jpg
--- a/inference/images/humanseg/12.jpg
+++ b/inference/images/humanseg/12.jpg
--- a/inference/images/humanseg/13.jpg
+++ b/inference/images/humanseg/13.jpg
--- a/inference/images/humanseg/14.jpg
+++ b/inference/images/humanseg/14.jpg
--- a/inference/images/humanseg/2.jpg
+++ b/inference/images/humanseg/2.jpg
--- a/inference/images/humanseg/3.jpg
+++ b/inference/images/humanseg/3.jpg
--- a/inference/images/humanseg/4.jpg
+++ b/inference/images/humanseg/4.jpg
--- a/inference/images/humanseg/5.jpg
+++ b/inference/images/humanseg/5.jpg
--- a/inference/images/humanseg/6.jpg
+++ b/inference/images/humanseg/6.jpg
--- a/inference/images/humanseg/7.jpg
+++ b/inference/images/humanseg/7.jpg
--- a/inference/images/humanseg/8.jpg
+++ b/inference/images/humanseg/8.jpg
--- a/inference/images/humanseg/9.jpg
+++ b/inference/images/humanseg/9.jpg
--- a/inference/images/humanseg/demo.jpg
+++ b/inference/images/humanseg/demo.jpg
--- a/inference/images/humanseg/demo_jpg_recover.png
+++ b/inference/images/humanseg/demo_jpg_recover.png
--- a/inference/predictor/seg_predictor.cpp
+++ b/inference/predictor/seg_predictor.cpp
+#include "seg_predictor.h"
+namespace PaddleSolution {
+        int Predictor::init(const std::string& conf) {
+            if (!_model_config.load_config(conf)) {
+                LOG(FATAL) << "Fail to load config file: [" << conf << "]";
+                return -1;
+            }
+            _preprocessor = PaddleSolution::create_processor(conf);
+            if (_preprocessor == nullptr) {
+                LOG(FATAL) << "Failed to create_processor";
+                return -1;
+            }
+            _mask.resize(_model_config._resize[0] * _model_config._resize[1]);
+            _scoremap.resize(_model_config._resize[0] * _model_config._resize[1]);
+            bool use_gpu = _model_config._use_gpu;
+            const auto& model_dir = _model_config._model_path;
+            const auto& model_filename = _model_config._model_file_name;
+            const auto& params_filename = _model_config._param_file_name;
+            // load paddle model file
+            if (_model_config._predictor_mode == "NATIVE") {
+                paddle::NativeConfig config;
+                auto prog_file = utils::path_join(model_dir, model_filename);
+                auto param_file = utils::path_join(model_dir, params_filename);
+                config.prog_file = prog_file;
+                config.param_file = param_file;
+                config.fraction_of_gpu_memory = 0;
+                config.use_gpu = use_gpu;
+                config.device = 0;
+                _main_predictor = paddle::CreatePaddlePredictor(config);
+            }
+            else if (_model_config._predictor_mode == "ANALYSIS") {
+                paddle::AnalysisConfig config;
+                if (use_gpu) {
+                    config.EnableUseGpu(100, 0);
+                }
+                auto prog_file = utils::path_join(model_dir, model_filename);
+                auto param_file = utils::path_join(model_dir, params_filename);
+                config.SetModel(prog_file, param_file);
+                config.SwitchUseFeedFetchOps(false);
+                _main_predictor = paddle::CreatePaddlePredictor(config);
+            }
+            else {
+                return -1;
+            }
+            return 0;
+        }
+        int Predictor::predict(const std::vector<std::string>& imgs) {
+            if (_model_config._predictor_mode == "NATIVE") {
+                return native_predict(imgs);
+            }
+            else if (_model_config._predictor_mode == "ANALYSIS") {
+                return analysis_predict(imgs);
+            }
+            return -1;
+        }
+        int Predictor::output_mask(const std::string& fname, float* p_out, int length, int* height, int* width) {
+            int eval_width = _model_config._resize[0];
+            int eval_height = _model_config._resize[1];
+            int eval_num_class = _model_config._class_num;
+            int blob_out_len = length;
+            int seg_out_len = eval_height * eval_width * eval_num_class;
+            if (blob_out_len != seg_out_len) {
+                LOG(ERROR) << " [FATAL] unequal: input vs output [" <<
+                    seg_out_len << "|" << blob_out_len << "]" << std::endl;
+                return -1;
+            }
+            //post process
+            _mask.clear();
+            _scoremap.clear();
+            int out_img_len = eval_height * eval_width;
+            for (int i = 0; i < out_img_len; ++i) {
+                float max_value = -1;
+                int label = 0;
+                for (int j = 0; j < eval_num_class; ++j) {
+                    int index = i + j * out_img_len;
+                    if (index >= blob_out_len) {
+                        break;
+                    }
+                    float value = p_out[index];
+                    if (value > max_value) {
+                        max_value = value;
+                        label = j;
+                    }
+                }
+                if (label == 0) max_value = 0;
+                _mask[i] = uchar(label);
+                _scoremap[i] = uchar(max_value * 255);
+            }
+            cv::Mat mask_png = cv::Mat(eval_height, eval_width, CV_8UC1);
+            mask_png.data = _mask.data();
+            std::string nname(fname);
+            auto pos = fname.find(".");
+            nname[pos] = '_';
+            std::string mask_save_name = nname + ".png";
+            cv::imwrite(mask_save_name, mask_png);
+            cv::Mat scoremap_png = cv::Mat(eval_height, eval_width, CV_8UC1);
+            scoremap_png.data = _scoremap.data();
+            std::string scoremap_save_name = nname + std::string("_scoremap.png");
+            cv::imwrite(scoremap_save_name, scoremap_png);
+            std::cout << "save mask of [" << fname << "] done" << std::endl;
+            if (height && width) {
+                int recover_height = *height;
+                int recover_width = *width;
+                cv::Mat recover_png = cv::Mat(recover_height, recover_width, CV_8UC1);
+                cv::resize(scoremap_png, recover_png, cv::Size(recover_width, recover_height),
+                    0, 0, cv::INTER_CUBIC);
+                std::string recover_name = nname + std::string("_recover.png");
+                cv::imwrite(recover_name, recover_png);
+            }
+            return 0;
+        }
+        int Predictor::native_predict(const std::vector<std::string>& imgs)
+        {
+            int config_batch_size = _model_config._batch_size;
+            int channels = _model_config._channels;
+            int eval_width = _model_config._resize[0];
+            int eval_height = _model_config._resize[1];
+            std::size_t total_size = imgs.size();
+            int default_batch_size = std::min(config_batch_size, (int)total_size);
+            int batch = total_size / default_batch_size + ((total_size % default_batch_size) != 0);
+            int batch_buffer_size = default_batch_size * channels * eval_width * eval_height;
+            auto& input_buffer = _buffer;
+            auto& org_width = _org_width;
+            auto& org_height = _org_height;
+            auto& imgs_batch = _imgs_batch;
+            input_buffer.resize(batch_buffer_size);
+            org_width.resize(default_batch_size);
+            org_height.resize(default_batch_size);
+            for (int u = 0; u < batch; ++u) {
+                int batch_size = default_batch_size;
+                if (u == (batch - 1) && (total_size % default_batch_size)) {
+                    batch_size = total_size % default_batch_size;
+                }
+                int real_buffer_size = batch_size * channels * eval_width * eval_height;
+                std::vector<paddle::PaddleTensor> feeds;
+                input_buffer.resize(real_buffer_size);
+                org_height.resize(batch_size);
+                org_width.resize(batch_size);
+                for (int i = 0; i < batch_size; ++i) {
+                    org_width[i] = org_height[i] = 0;
+                }
+                imgs_batch.clear();
+                for (int i = 0; i < batch_size; ++i) {
+                    int idx = u * default_batch_size + i;
+                    imgs_batch.push_back(imgs[idx]);
+                }
+                if (!_preprocessor->batch_process(imgs_batch, input_buffer.data(), org_width.data(), org_height.data())) {
+                    return -1;
+                }
+                paddle::PaddleTensor im_tensor;
+                im_tensor.name = "image";
+                im_tensor.shape = std::vector<int>({ batch_size, channels, eval_height, eval_width });
+                im_tensor.data.Reset(input_buffer.data(), real_buffer_size * sizeof(float));
+                im_tensor.dtype = paddle::PaddleDType::FLOAT32;
+                feeds.push_back(im_tensor);
+                _outputs.clear();
+                auto t1 = std::chrono::high_resolution_clock::now();
+                if (!_main_predictor->Run(feeds, &_outputs, batch_size)) {
+                    LOG(ERROR) << "Failed: NativePredictor->Run() return false at batch: " << u;
+                    continue;
+                }
+                auto t2 = std::chrono::high_resolution_clock::now();
+                auto duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
+                std::cout << "runtime = " << duration << std::endl;
+                int out_num = 1;
+                // print shape of first output tensor for debugging
+                std::cout << "size of outputs[" << 0 << "]: (";
+                for (int j = 0; j < _outputs[0].shape.size(); ++j) {
+                    out_num *= _outputs[0].shape[j];
+                    std::cout << _outputs[0].shape[j] << ",";
+                }
+                std::cout << ")" << std::endl;
+                const size_t nums = _outputs.front().data.length() / sizeof(float);
+                if (out_num % batch_size != 0 || out_num != nums) {
+                    LOG(ERROR) << "outputs data size mismatch with shape size.";
+                    return -1;
+                }
+                for (int i = 0; i < batch_size; ++i) {
+                    float* output_addr = (float*)(_outputs[0].data.data()) + i * (out_num / batch_size);
+                    output_mask(imgs_batch[i], output_addr, out_num / batch_size, &org_height[i], &org_width[i]);
+                }
+            }
+            return 0;
+        }
+        int Predictor::analysis_predict(const std::vector<std::string>& imgs) {
+            int config_batch_size = _model_config._batch_size;
+            int channels = _model_config._channels;
+            int eval_width = _model_config._resize[0];
+            int eval_height = _model_config._resize[1];
+            auto total_size = imgs.size();
+            int default_batch_size = std::min(config_batch_size, (int)total_size);
+            int batch = total_size / default_batch_size + ((total_size % default_batch_size) != 0);
+            int batch_buffer_size = default_batch_size * channels * eval_width * eval_height;
+            auto& input_buffer = _buffer;
+            auto& org_width = _org_width;
+            auto& org_height = _org_height;
+            auto& imgs_batch = _imgs_batch;
+            input_buffer.resize(batch_buffer_size);
+            org_width.resize(default_batch_size);
+            org_height.resize(default_batch_size);
+            for (int u = 0; u < batch; ++u) {
+                int batch_size = default_batch_size;
+                if (u == (batch - 1) && (total_size % default_batch_size)) {
+                    batch_size = total_size % default_batch_size;
+                }
+                int real_buffer_size = batch_size * channels * eval_width * eval_height;
+                std::vector<paddle::PaddleTensor> feeds;
+                input_buffer.resize(real_buffer_size);
+                org_height.resize(batch_size);
+                org_width.resize(batch_size);
+                for (int i = 0; i < batch_size; ++i) {
+                    org_width[i] = org_height[i] = 0;
+                }
+                imgs_batch.clear();
+                for (int i = 0; i < batch_size; ++i) {
+                    int idx = u * default_batch_size + i;
+                    imgs_batch.push_back(imgs[idx]);
+                }
+                if (!_preprocessor->batch_process(imgs_batch, input_buffer.data(), org_height.data(), org_width.data())) {
+                    return -1;
+                }
+                auto im_tensor = _main_predictor->GetInputTensor("image");
+                im_tensor->Reshape({ batch_size, channels, eval_height, eval_width });
+                im_tensor->copy_from_cpu(input_buffer.data());
+                auto t1 = std::chrono::high_resolution_clock::now();
+                _main_predictor->ZeroCopyRun();
+                auto t2 = std::chrono::high_resolution_clock::now();
+                auto duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
+                std::cout << "runtime = " << duration << std::endl;
+                auto output_names = _main_predictor->GetOutputNames();
+                auto output_t = _main_predictor->GetOutputTensor(output_names[0]);
+                std::vector<float> out_data;
+                std::vector<int> output_shape = output_t->shape();
+                int out_num = 1;
+                std::cout << "size of outputs[" << 0 << "]: (";
+                for (int j = 0; j < output_shape.size(); ++j) {
+                    out_num *= output_shape[j];
+                    std::cout << output_shape[j] << ",";
+                }
+                std::cout << ")" << std::endl;
+                out_data.resize(out_num);
+                output_t->copy_to_cpu(out_data.data());
+                for (int i = 0; i < batch_size; ++i) {
+                    float* out_addr = out_data.data() + (out_num / batch_size) * i;
+                    output_mask(imgs_batch[i], out_addr, out_num / batch_size, &org_height[i], &org_width[i]);
+                }
+            }
+            return 0;
+        }
+}
--- a/inference/predictor/seg_predictor.h
+++ b/inference/predictor/seg_predictor.h
+#pragma once
+#include <memory>
+#include <string>
+#include <vector>
+#include <thread>
+#include <chrono>
+#include <algorithm>
+#include <glog/logging.h>
+#include <yaml-cpp/yaml.h>
+#include <opencv2/opencv.hpp>
+#include <paddle_inference_api.h>
+#include <utils/seg_conf_parser.h>
+#include <utils/utils.h>
+#include <preprocessor/preprocessor.h>
+namespace PaddleSolution {
+    class Predictor {
+        public:
+            // init a predictor with a yaml config file
+            int init(const std::string& conf);
+            // predict api
+            int predict(const std::vector<std::string>& imgs);
+        private:
+            int output_mask(
+                const std::string& fname,
+                float* p_out,
+                int length,
+                int* height = NULL,
+                int* width = NULL);
+            int native_predict(const std::vector<std::string>& imgs);
+            int analysis_predict(const std::vector<std::string>& imgs);
+        private:
+            std::vector<float> _buffer;
+            std::vector<int> _org_width;
+            std::vector<int> _org_height;
+            std::vector<std::string> _imgs_batch;
+            std::vector<paddle::PaddleTensor> _outputs;
+            std::vector<uchar> _mask;
+            std::vector<uchar> _scoremap;
+            PaddleSolution::PaddleSegModelConfigPaser _model_config;
+            std::shared_ptr<PaddleSolution::ImagePreProcessor> _preprocessor;
+            std::unique_ptr<paddle::PaddlePredictor> _main_predictor;
+    };
+}
--- a/inference/preprocessor/preprocessor.cpp
+++ b/inference/preprocessor/preprocessor.cpp
+#include <glog/logging.h>
+#include "preprocessor.h"
+#include "preprocessor_seg.h"
+namespace PaddleSolution {
+    std::shared_ptr<ImagePreProcessor> create_processor(const std::string& conf_file) {
+        auto config = std::make_shared<PaddleSolution::PaddleSegModelConfigPaser>();
+        if (!config->load_config(conf_file)) {
+            LOG(FATAL) << "fail to laod conf file [" << conf_file << "]";
+            return nullptr;
+        }
+        if (config->_pre_processor == "SegPreProcessor") {
+            auto p = std::make_shared<SegPreProcessor>();
+            if (!p->init(config)) {
+                return nullptr;
+            }
+            return p;
+        }
+        LOG(FATAL) << "unknown processor_name [" << config->_pre_processor << "]";
+        return nullptr;
+    }
+}
--- a/inference/preprocessor/preprocessor.h
+++ b/inference/preprocessor/preprocessor.h
+#pragma once
+#include <vector>
+#include <string>
+#include <memory>
+#include <opencv2/core/core.hpp>
+#include <opencv2/imgproc/imgproc.hpp>
+#include <opencv2/highgui/highgui.hpp>
+#include "utils/seg_conf_parser.h"
+namespace  PaddleSolution {
+class ImagePreProcessor {
+protected:
+    ImagePreProcessor() {};
+public:
+    virtual ~ImagePreProcessor() {}
+    virtual bool single_process(const std::string& fname, float* data, int* ori_w, int* ori_h) = 0;
+    virtual bool batch_process(const std::vector<std::string>& imgs, float* data, int* ori_w, int* ori_h) = 0;
+}; // end of class ImagePreProcessor
+std::shared_ptr<ImagePreProcessor> create_processor(const std::string &config_file);
+} // end of namespace paddle_solution
--- a/inference/preprocessor/preprocessor_seg.cpp
+++ b/inference/preprocessor/preprocessor_seg.cpp
+#include <thread>
+#include <glog/logging.h>
+#include "preprocessor_seg.h"
+namespace PaddleSolution {
+    bool SegPreProcessor::single_process(const std::string& fname, float* data, int* ori_w, int* ori_h) {
+        cv::Mat im = cv::imread(fname, -1);
+        if (im.data == nullptr || im.empty()) {
+            LOG(ERROR) << "Failed to open image: " << fname;
+            return false;
+        }
+        int channels = im.channels();
+        *ori_w = im.cols;
+        *ori_h = im.rows;
+        if (channels == 1) {
+            cv::cvtColor(im, im, cv::COLOR_GRAY2BGR);
+        }
+        channels = im.channels();
+        if (channels != 3 && channels != 4) {
+            LOG(ERROR) << "Only support rgb(gray) and rgba image.";
+            return false;
+        }
+        cv::Size resize_size(_config->_resize[0], _config->_resize[1]);
+        int rw = resize_size.width;
+        int rh = resize_size.height;
+        if (*ori_h != rh || *ori_w != rw) {
+            cv::resize(im, im, resize_size, 0, 0, cv::INTER_LINEAR);
+        }
+        float* pmean = _config->_mean.data();
+        float* pscale = _config->_std.data();
+        for (int h = 0; h < rh; ++h) {
+            const uchar* ptr = im.ptr<uchar>(h);
+            int im_index = 0;
+            for (int w = 0; w < rw; ++w) {
+                for (int c = 0; c < channels; ++c) {
+                    int top_index = (c * rh + h) * rw + w;
+                    float pixel = static_cast<float>(ptr[im_index++]);
+                    pixel = (pixel - pmean[c]) / pscale[c];
+                    data[top_index] = pixel;
+                }
+            }
+        }
+        return true;
+    }
+    bool SegPreProcessor::batch_process(const std::vector<std::string>& imgs, float* data, int* ori_w, int* ori_h) {
+        auto ic = _config->_channels;
+        auto iw = _config->_resize[0];
+        auto ih = _config->_resize[1];
+        std::vector<std::thread> threads;
+        for (int i = 0; i < imgs.size(); ++i) {
+            std::string path = imgs[i];
+            float* buffer = data + i * ic * iw * ih;
+            int* width = &ori_w[i];
+            int* height = &ori_h[i];
+            threads.emplace_back([this, path, buffer, width, height] {
+                single_process(path, buffer, width, height);
+                });
+        }
+        for (auto& t : threads) {
+            if (t.joinable()) {
+                t.join();
+            }
+        }
+        return true;
+    }
+    bool SegPreProcessor::init(std::shared_ptr<PaddleSolution::PaddleSegModelConfigPaser> config) {
+        _config = config;
+        return true;
+    }
+}
--- a/inference/preprocessor/preprocessor_seg.h
+++ b/inference/preprocessor/preprocessor_seg.h
+#pragma once
+#include "preprocessor.h"
+namespace PaddleSolution {
+class SegPreProcessor : public ImagePreProcessor {
+public:
+    SegPreProcessor() : _config(nullptr){
+    };
+    bool init(std::shared_ptr<PaddleSolution::PaddleSegModelConfigPaser> config);
+    bool single_process(const std::string &fname, float* data, int* ori_w, int* ori_h);
+    bool batch_process(const std::vector<std::string>& imgs, float* data, int* ori_w, int* ori_h);
+private:
+    std::shared_ptr<PaddleSolution::PaddleSegModelConfigPaser> _config;
+};
+}
--- a/inference/tools/visualize.py
+++ b/inference/tools/visualize.py
+import cv2
+import sys
+# ColorMap for visualization more clearly
+color_map = [[128, 64, 128], [244, 35, 231], [69, 69, 69], [102, 102, 156],
+             [190, 153, 153], [153, 153, 153], [250, 170, 29], [219, 219, 0],
+             [106, 142, 35], [152, 250, 152], [69, 129, 180], [219, 19, 60],
+             [255, 0, 0], [0, 0, 142], [0, 0, 69], [0, 60, 100], [0, 79, 100],
+             [0, 0, 230], [119, 10, 32]]
+im = cv2.imread(sys.argv[1])
+# Please note (224, 224) just for daheng model
+print("visualizing...")
+for i in range(0, 224):
+    for j in range(0, 224):
+        im[i, j] = color_map[im[i, j, 0]]
+cv2.imwrite(sys.argv[1], im)
+print("visualizing done!")
--- a/inference/utils/seg_conf_parser.h
+++ b/inference/utils/seg_conf_parser.h
+#pragma once
+#include <iostream>
+#include <vector>
+#include <string>
+#include <yaml-cpp/yaml.h>
+namespace PaddleSolution {
+    class PaddleSegModelConfigPaser {
+    public:
+        PaddleSegModelConfigPaser()
+            :_class_num(0),
+            _channels(0),
+            _use_gpu(0),
+            _batch_size(1),
+            _model_file_name("__model__"),
+            _param_file_name("__params__") {
+        }
+        ~PaddleSegModelConfigPaser() {
+        }
+        void reset() {
+            _resize.clear();
+            _mean.clear();
+            _std.clear();
+            _img_type.clear();
+            _class_num = 0;
+            _channels = 0;
+            _use_gpu = 0;
+            _batch_size = 1;
+            _model_name.clear();
+            _model_file_name.clear();
+            _model_path.clear();
+            _param_file_name.clear();
+        }
+        std::string process_parenthesis(const std::string& str) {
+            if (str.size() < 2) {
+                return str;
+            }
+            std::string nstr(str);
+            if (str[0] == '(' && str.back() == ')') {
+                nstr[0] = '[';
+                nstr[str.size() - 1] = ']';
+            }
+            return nstr;
+        }
+        template <typename T>
+        std::vector<T> parse_str_to_vec(const std::string& str) {
+            std::vector<T> data;
+            auto node = YAML::Load(str);
+            for (const auto& item : node) {
+                data.push_back(item.as<T>());
+            }
+            return data;
+        }
+        bool load_config(const std::string& conf_file) {
+            reset();
+            YAML::Node config = YAML::LoadFile(conf_file);
+            // 1. get resize
+            auto str = config["DEPLOY"]["EVAL_CROP_SIZE"].as<std::string>();
+            _resize = parse_str_to_vec<int>(process_parenthesis(str));
+            // 2. get mean
+            for (const auto& item : config["DEPLOY"]["MEAN"]) {
+                _mean.push_back(item.as<float>());
+            }
+            // 3. get std
+            for (const auto& item : config["DEPLOY"]["STD"]) {
+                _std.push_back(item.as<float>());
+            }
+            // 4. get image type
+            _img_type = config["DEPLOY"]["IMAGE_TYPE"].as<std::string>();
+            // 5. get class number
+            _class_num = config["DEPLOY"]["NUM_CLASSES"].as<int>();
+            // 6. get model_name
+            _model_name = config["DEPLOY"]["MODEL_NAME"].as<std::string>();
+            // 7. set model path
+            _model_path = config["DEPLOY"]["MODEL_PATH"].as<std::string>();
+            // 8. get model file_name
+            _model_file_name = config["DEPLOY"]["MODEL_FILENAME"].as<std::string>();
+            // 9. get model param file name
+            _param_file_name = config["DEPLOY"]["PARAMS_FILENAME"].as<std::string>();
+            // 10. get pre_processor
+            _pre_processor = config["DEPLOY"]["PRE_PROCESSOR"].as<std::string>();
+            // 11. use_gpu
+            _use_gpu = config["DEPLOY"]["USE_GPU"].as<int>();
+            // 12. predictor_mode
+            _predictor_mode = config["DEPLOY"]["PREDICTOR_MODE"].as<std::string>();
+            // 13. batch_size
+            _batch_size = config["DEPLOY"]["BATCH_SIZE"].as<int>();
+            // 14. channels
+            _channels = config["DEPLOY"]["CHANNELS"].as<int>();
+            return true;
+        }
+        void debug() const {
+            std::cout << "EVAL_CROP_SIZE: (" << _resize[0] << ", " << _resize[1] << ")" << std::endl;
+            std::cout << "MEAN: [";
+            for (int i = 0; i < _mean.size(); ++i) {
+                if (i != _mean.size() - 1) {
+                    std::cout << _mean[i] << ", ";
+                } else {
+                    std::cout << _mean[i];
+                }
+            }
+            std::cout << "]" << std::endl;
+            std::cout << "STD: [";
+            for (int i = 0; i < _std.size(); ++i) {
+                if (i != _std.size() - 1) {
+                    std::cout << _std[i] << ", ";
+                }
+                else {
+                    std::cout << _std[i];
+                }
+            }
+            std::cout << "]" << std::endl;
+            std::cout << "DEPLOY.IMAGE_TYPE: " << _img_type << std::endl;
+            std::cout << "DEPLOY.NUM_CLASSES: " << _class_num << std::endl;
+            std::cout << "DEPLOY.CHANNELS: " << _channels << std::endl;
+            std::cout << "DEPLOY.MODEL_PATH: " << _model_path << std::endl;
+            std::cout << "DEPLOY.MODEL_NAME: " << _model_name << std::endl;
+            std::cout << "DEPLOY.MODEL_FILENAME: " << _model_file_name << std::endl;
+            std::cout << "DEPLOY.PARAMS_FILENAME: " << _param_file_name << std::endl;
+            std::cout << "DEPLOY.PRE_PROCESSOR: " << _pre_processor << std::endl;
+            std::cout << "DEPLOY.USE_GPU: " << _use_gpu << std::endl;
+            std::cout << "DEPLOY.PREDICTOR_MODE: " << _predictor_mode << std::endl;
+            std::cout << "DEPLOY.BATCH_SIZE: " << _batch_size << std::endl;
+        }
+        // DEPLOY.EVAL_CROP_SIZE
+        std::vector<int> _resize;
+        // DEPLOY.MEAN
+        std::vector<float> _mean;
+        // DEPLOY.STD
+        std::vector<float> _std;
+        // DEPLOY.IMAGE_TYPE
+        std::string _img_type;
+        // DEPLOY.NUM_CLASSES
+        int _class_num;
+        // DEPLOY.CHANNELS
+        int _channels;
+        // DEPLOY.MODEL_PATH
+        std::string _model_path;
+        // DEPLOY.MODEL_NAME
+        std::string _model_name;
+        // DEPLOY.MODEL_FILENAME
+        std::string _model_file_name;
+        // DEPLOY.PARAMS_FILENAME
+        std::string _param_file_name;
+        // DEPLOY.PRE_PROCESSOR
+        std::string _pre_processor;
+        // DEPLOY.USE_GPU
+        int _use_gpu;
+        // DEPLOY.PREDICTOR_MODE
+        std::string _predictor_mode;
+        // DEPLOY.BATCH_SIZE
+        int _batch_size;
+    };
+}
--- a/inference/utils/utils.h
+++ b/inference/utils/utils.h
+#pragma once
+#include <iostream>
+#include <vector>
+#include <string>
+#include <filesystem>
+namespace PaddleSolution {
+    namespace utils {
+        inline std::string path_join(const std::string& dir, const std::string& path) {
+            std::string seperator = "/";
+            #ifdef _WIN32
+            seperator = "\\";
+            #endif
+            return dir + seperator + path;
+        }
+        // scan a directory and get all files with input extensions
+        inline std::vector<std::string> get_directory_images(const std::string& path, const std::string& exts)
+        {
+            std::vector<std::string> imgs;
+            for (const auto& item : std::experimental::filesystem::directory_iterator(path)) {
+                auto suffix = item.path().extension().string();
+                if (exts.find(suffix) != std::string::npos && suffix.size() > 0) {
+                    auto fullname = path_join(path, item.path().filename().string());
+                    imgs.push_back(item.path().string());
+                }
+            }
+            return imgs;
+        }
+    }
+}
--- a/pdseg/__init__.py
+++ b/pdseg/__init__.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import models
+import utils
--- a/pdseg/check.py
+++ b/pdseg/check.py
+# coding: utf8
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+import os
+import sys
+import pprint
+import argparse
+import cv2
+from tqdm import tqdm
+import imghdr
+from utils.config import cfg
+def init_global_variable():
+    """
+    初始化全局变量
+    """
+    global png_format_right_num  # 格式错误的标签图数量
+    global png_format_wrong_num  # 格式错误的标签图数量
+    global total_grt_classes  # 总的标签类别
+    global total_num_of_each_class  # 每个类别总的像素数
+    global shape_unequal  # 图片和标签shape不一致
+    global png_format_wrong  # 标签格式错误
+    png_format_right_num = 0
+    png_format_wrong_num = 0
+    total_grt_classes = []
+    total_num_of_each_class = []
+    shape_unequal = []
+    png_format_wrong = []
+def parse_args():
+    parser = argparse.ArgumentParser(description='PaddleSeg check')
+    parser.add_argument(
+        '--cfg',
+        dest='cfg_file',
+        help='Config file for training (and optionally testing)',
+        default=None,
+        type=str)
+    return parser.parse_args()
+def cv2_imread(file_path, flag=cv2.IMREAD_COLOR):
+    # resolve cv2.imread open Chinese file path issues on Windows Platform.
+    return cv2.imdecode(np.fromfile(file_path, dtype=np.uint8), flag)
+def get_image_max_height_width(img, max_height, max_width):
+    img_shape = img.shape
+    height, width = img_shape[0], img_shape[1]
+    max_height = max(height, max_height)
+    max_width = max(width, max_width)
+    return max_height, max_width
+def get_image_min_max_aspectratio(img, min_aspectratio, max_aspectratio):
+    img_shape = img.shape
+    height, width = img_shape[0], img_shape[1]
+    min_aspectratio = min(width / height, min_aspectratio)
+    max_aspectratio = max(width / height, max_aspectratio)
+    return min_aspectratio, max_aspectratio
+def get_image_dim(img, img_dim):
+    """获取图像的维度"""
+    img_shape = img.shape
+    if img_shape[-1] not in img_dim:
+        img_dim.append(img_shape[-1])
+def sum_gt_check(png_format, grt_classes, num_of_each_class):
+    """
+    统计所有标签图上的格式、类别和每个类别的像素数
+    params:
+        png_format: 返回是否是png格式图片
+        grt_classes: 标签类别
+        num_of_each_class: 各个类别的像素数目
+    """
+    global png_format_right_num, png_format_wrong_num, total_grt_classes, total_num_of_each_class
+    if png_format:
+        png_format_right_num += 1
+    else:
+        png_format_wrong_num += 1
+    if cfg.DATASET.IGNORE_INDEX in grt_classes:
+        grt_classes2 = np.delete(
+            grt_classes, np.where(grt_classes == cfg.DATASET.IGNORE_INDEX))
+    if min(grt_classes2) < 0 or max(grt_classes2) > cfg.DATASET.NUM_CLASSES - 1:
+        print("fatal error: label class is out of range [0, {}]".format(
+            cfg.DATASET.NUM_CLASSES - 1))
+    add_class = []
+    add_num = []
+    for i in range(len(grt_classes)):
+        gi = grt_classes[i]
+        if gi in total_grt_classes:
+            j = total_grt_classes.index(gi)
+            total_num_of_each_class[j] += num_of_each_class[i]
+        else:
+            add_class.append(gi)
+            add_num.append(num_of_each_class[i])
+    total_num_of_each_class += add_num
+    total_grt_classes += add_class
+def gt_check():
+    """
+    对标签进行校验，输出校验结果
+    params：
+         png_format_wrong_num： 格式错误的标签图数量
+         png_format_right_num:  格式正确的标签图数量
+         total_grt_classes： 总的标签类别
+         total_num_of_each_class： 每个类别总的像素数目
+    return：
+        total_nc： 按升序排序后的总标签类别和像素数目
+    """
+    if png_format_wrong_num == 0:
+        print("Not pass label png format check!")
+    else:
+        print("Pass label png format check!")
+    print(
+        "total {} label imgs are png format, {} label imgs are not png fromat".
+        format(png_format_right_num, png_format_wrong_num))
+    total_nc = sorted(zip(total_grt_classes, total_num_of_each_class))
+    print("total label calsses and their corresponding numbers:\n{} ".format(
+        total_nc))
+    if total_nc[0][0]:
+        print(
+            "Not pass label class check!\nWarning: label classes should start from 0 !!!"
+        )
+    else:
+        print("Pass label class check!")
+def ground_truth_check(grt, grt_path):
+    """
+    验证标签是否重零开始，标签值为0，1，...，num_classes-1, ingnore_idx
+    验证标签图像的格式
+    返回标签的像素数
+    检查图像是否都是ignore_index
+    params:
+        grt: 标签图
+        grt_path: 标签图路径
+    return:
+        png_format: 返回是否是png格式图片
+        label_correct: 返回标签是否是正确的
+        label_pixel_num: 返回标签的像素数
+    """
+    if imghdr.what(grt_path) == "png":
+        png_format = True
+    else:
+        png_format = False
+    unique, counts = np.unique(grt, return_counts=True)
+    return png_format, unique, counts
+def eval_crop_size_check(max_height, max_width, min_aspectratio,
+                         max_aspectratio):
+    """
+    判断eval_crop_siz与验证集及测试集的max_height, max_width的关系
+    param
+        max_height: 数据集的最大高
+        max_width: 数据集的最大宽
+    """
+    if cfg.AUG.AUG_METHOD == "stepscaling":
+        flag = True
+        if max_width > cfg.EVAL_CROP_SIZE[0]:
+            print(
+                "ERROR: The EVAL_CROP_SIZE[0]: {} should larger max width of images {}!"
+                .format(cfg.EVAL_CROP_SIZE[0], max_width))
+            flag = False
+        if max_height > cfg.EVAL_CROP_SIZE[1]:
+            print(
+                "ERROR: The EVAL_CROP_SIZE[1]: {} should larger max height of images {}!"
+                .format(cfg.EVAL_CROP_SIZE[1], max_height))
+            flag = False
+        if flag:
+            print("EVAL_CROP_SIZE setting correct")
+    elif cfg.AUG.AUG_METHOD == "rangescaling":
+        if min_aspectratio <= 1 and max_aspectratio >= 1:
+            if cfg.EVAL_CROP_SIZE[
+                    0] >= cfg.AUG.INF_RESIZE_VALUE and cfg.EVAL_CROP_SIZE[
+                        1] >= cfg.AUG.INF_RESIZE_VALUE:
+                print("EVAL_CROP_SIZE setting correct")
+            else:
+                print(
+                    "ERROR: EVAL_CROP_SIZE: ({},{}) must large than img size({},{})"
+                    .format(cfg.EVAL_CROP_SIZE[0], cfg.EVAL_CROP_SIZE[1],
+                            cfg.AUG.INF_RESIZE_VALUE, cfg.AUG.INF_RESIZE_VALUE))
+        elif min_aspectratio > 1:
+            max_height_rangscaling = cfg.AUG.INF_RESIZE_VALUE / min_aspectratio
+            max_height_rangscaling = round(max_height_rangscaling)
+            if cfg.EVAL_CROP_SIZE[
+                    0] >= cfg.AUG.INF_RESIZE_VALUE and cfg.EVAL_CROP_SIZE[
+                        1] >= max_height_rangscaling:
+                print("EVAL_CROP_SIZE setting correct")
+            else:
+                print(
+                    "ERROR: EVAL_CROP_SIZE: ({},{}) must large than img size({},{})"
+                    .format(cfg.EVAL_CROP_SIZE[0], cfg.EVAL_CROP_SIZE[1],
+                            cfg.AUG.INF_RESIZE_VALUE, max_height_rangscaling))
+        elif max_aspectratio < 1:
+            max_width_rangscaling = cfg.AUG.INF_RESIZE_VALUE * max_aspectratio
+            max_width_rangscaling = round(max_width_rangscaling)
+            if cfg.EVAL_CROP_SIZE[
+                    0] >= max_width_rangscaling and cfg.EVAL_CROP_SIZE[
+                        1] >= cfg.AUG.INF_RESIZE_VALUE:
+                print("EVAL_CROP_SIZE setting correct")
+            else:
+                print(
+                    "ERROR: EVAL_CROP_SIZE: ({},{}) must large than img size({},{})"
+                    .format(cfg.EVAL_CROP_SIZE[0], cfg.EVAL_CROP_SIZE[1],
+                            max_width_rangscaling, cfg.AUG.INF_RESIZE_VALUE))
+    elif cfg.AUG.AUG_METHOD == "unpadding":
+        if cfg.EVAL_CROP_SIZE[0] >= cfg.AUG.FIX_RESIZE_SIZE[
+                0] and cfg.EVAL_CROP_SIZE[1] >= cfg.AUG.FIX_RESIZE_SIZE[1]:
+            print("EVAL_CROP_SIZE setting correct")
+        else:
+            print(
+                "ERROR: EVAL_CROP_SIZE: ({},{}) must large than img size({},{})"
+                .format(cfg.EVAL_CROP_SIZE[0], cfg.EVAL_CROP_SIZE[1],
+                        cfg.AUG.FIX_RESIZE_SIZE[0], cfg.AUG.FIX_RESIZE_SIZE[1]))
+    else:
+        print(
+            "ERROR: cfg.AUG.AUG_METHOD setting wrong, it should be one of [unpadding, stepscaling, rangescaling]"
+        )
+def inf_resize_value_check():
+    if cfg.AUG.AUG_METHOD == "rangescaling":
+        if cfg.AUG.INF_RESIZE_VALUE < cfg.AUG.MIN_RESIZE_VALUE or \
+                cfg.AUG.INF_RESIZE_VALUE > cfg.AUG.MIN_RESIZE_VALUE:
+            print(
+                "ERROR: you set AUG.AUG_METHOD = 'rangescaling'"
+                "AUG.INF_RESIZE_VALUE: {} not in [AUG.MIN_RESIZE_VALUE, AUG.MAX_RESIZE_VALUE]: "
+                "[{}, {}].".format(cfg.AUG.INF_RESIZE_VALUE,
+                                   cfg.AUG.MIN_RESIZE_VALUE,
+                                   cfg.AUG.MAX_RESIZE_VALUE))
+def image_type_check(img_dim):
+    """
+    验证图像的格式与DATASET.IMAGE_TYPE是否一致
+    param
+        img_dim: 图像包含的通道数
+    return
+    """
+    if (1 in img_dim or 3 in img_dim) and cfg.DATASET.IMAGE_TYPE == 'rgba':
+        print(
+            "ERROR: DATASET.IMAGE_TYPE is {} but the type of image has gray or rgb\n"
+            .format(cfg.DATASET.IMAGE_TYPE))
+    # elif (1 not in img_dim and 3 not in img_dim and 4 in img_dim) and cfg.DATASET.IMAGE_TYPE == 'rgb':
+    #     print("ERROR: DATASET.IMAGE_TYPE is {} but the type of image is rgba\n".format(cfg.DATASET.IMAGE_TYPE))
+    else:
+        print("DATASET.IMAGE_TYPE setting correct")
+def image_label_shape_check(img, grt):
+    """
+    验证图像和标签的大小是否匹配
+    """
+    flag = True
+    img_height = img.shape[0]
+    img_width = img.shape[1]
+    grt_height = grt.shape[0]
+    grt_width = grt.shape[1]
+    if img_height != grt_height or img_width != grt_width:
+        flag = False
+    return flag
+def check_train_dataset():
+    train_list = cfg.DATASET.TRAIN_FILE_LIST
+    print("\ncheck train dataset...")
+    with open(train_list, 'r') as fid:
+        img_dim = []
+        lines = fid.readlines()
+        for line in tqdm(lines):
+            parts = line.strip().split(cfg.DATASET.SEPARATOR)
+            if len(parts) != 2:
+                print(
+                    line, "File list format incorrect! It should be"
+                    " image_name{}label_name\\n ".format(cfg.DATASET.SEPARATOR))
+                continue
+            img_name, grt_name = parts[0], parts[1]
+            img_path = os.path.join(cfg.DATASET.DATA_DIR, img_name)
+            grt_path = os.path.join(cfg.DATASET.DATA_DIR, grt_name)
+            img = cv2_imread(img_path, cv2.IMREAD_UNCHANGED)
+            grt = cv2_imread(grt_path, cv2.IMREAD_GRAYSCALE)
+            get_image_dim(img, img_dim)
+            is_equal_img_grt_shape = image_label_shape_check(img, grt)
+            if not is_equal_img_grt_shape:
+                print(line,
+                      "ERROR: source img and label img must has the same size")
+            png_format, grt_classes, num_of_each_class = ground_truth_check(
+                grt, grt_path)
+            sum_gt_check(png_format, grt_classes, num_of_each_class)
+        gt_check()
+        image_type_check(img_dim)
+def check_val_dataset():
+    val_list = cfg.DATASET.VAL_FILE_LIST
+    with open(val_list) as fid:
+        max_height = 0
+        max_width = 0
+        min_aspectratio = sys.float_info.max
+        max_aspectratio = 0.0
+        img_dim = []
+        print("check val dataset...")
+        lines = fid.readlines()
+        for line in tqdm(lines):
+            parts = line.strip().split(cfg.DATASET.SEPARATOR)
+            if len(parts) != 2:
+                print(
+                    line, "File list format incorrect! It should be"
+                    " image_name{}label_name\\n ".format(cfg.DATASET.SEPARATOR))
+                continue
+            img_name, grt_name = parts[0], parts[1]
+            img_path = os.path.join(cfg.DATASET.DATA_DIR, img_name)
+            grt_path = os.path.join(cfg.DATASET.DATA_DIR, grt_name)
+            img = cv2_imread(img_path, cv2.IMREAD_UNCHANGED)
+            grt = cv2_imread(grt_path, cv2.IMREAD_GRAYSCALE)
+            max_height, max_width = get_image_max_height_width(
+                img, max_height, max_width)
+            min_aspectratio, max_aspectratio = get_image_min_max_aspectratio(
+                img, min_aspectratio, max_aspectratio)
+            get_image_dim(img, img_dim)
+            is_equal_img_grt_shape = image_label_shape_check(img, grt)
+            if not is_equal_img_grt_shape:
+                print(line,
+                      "ERROR: source img and label img must has the same size")
+            png_format, grt_classes, num_of_each_class = ground_truth_check(
+                grt, grt_path)
+            sum_gt_check(png_format, grt_classes, num_of_each_class)
+        gt_check()
+        eval_crop_size_check(max_height, max_width, min_aspectratio,
+                             max_aspectratio)
+        image_type_check(img_dim)
+def check_test_dataset():
+    test_list = cfg.DATASET.TEST_FILE_LIST
+    with open(test_list) as fid:
+        max_height = 0
+        max_width = 0
+        min_aspectratio = sys.float_info.max
+        max_aspectratio = 0.0
+        img_dim = []
+        print("check test dataset...")
+        lines = fid.readlines()
+        for line in tqdm(lines):
+            parts = line.strip().split(cfg.DATASET.SEPARATOR)
+            if len(parts) == 1:
+                img_name = parts
+                img_path = os.path.join(cfg.DATASET.DATA_DIR, img_name)
+                img = cv2_imread(img_path, cv2.IMREAD_UNCHANGED)
+            elif len(parts) == 2:
+                img_name, grt_name = parts[0], parts[1]
+                img_path = os.path.join(cfg.DATASET.DATA_DIR, img_name)
+                grt_path = os.path.join(cfg.DATASET.DATA_DIR, grt_name)
+                img = cv2_imread(img_path, cv2.IMREAD_UNCHANGED)
+                grt = cv2_imread(grt_path, cv2.IMREAD_GRAYSCALE)
+                is_equal_img_grt_shape = image_label_shape_check(img, grt)
+                if not is_equal_img_grt_shape:
+                    print(
+                        line,
+                        "ERROR: source img and label img must has the same size"
+                    )
+                png_format, grt_classes, num_of_each_class = ground_truth_check(
+                    grt, grt_path)
+                sum_gt_check(png_format, grt_classes, num_of_each_class)
+            else:
+                print(
+                    line, "File list format incorrect! It should be"
+                    " image_name{}label_name\\n or image_name\n ".format(
+                        cfg.DATASET.SEPARATOR))
+                continue
+            max_height, max_width = get_image_max_height_width(
+                img, max_height, max_width)
+            min_aspectratio, max_aspectratio = get_image_min_max_aspectratio(
+                img, min_aspectratio, max_aspectratio)
+            get_image_dim(img, img_dim)
+        gt_check()
+        eval_crop_size_check(max_height, max_width, min_aspectratio,
+                             max_aspectratio)
+        image_type_check(img_dim)
+def main(args):
+    if args.cfg_file is not None:
+        cfg.update_from_file(args.cfg_file)
+    cfg.check_and_infer(reset_dataset=True)
+    print(pprint.pformat(cfg))
+    init_global_variable()
+    check_train_dataset()
+    init_global_variable()
+    check_val_dataset()
+    init_global_variable()
+    check_test_dataset()
+    inf_resize_value_check()
+if __name__ == "__main__":
+    args = parse_args()
+    args.cfg_file = "../configs/cityscape.yaml"
+    main(args)
--- a/pdseg/data_aug.py
+++ b/pdseg/data_aug.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import print_function
+import cv2
+import numpy as np
+from utils.config import cfg
+from models.model_builder import ModelPhase
+def resize(img, grt=None, mode=ModelPhase.TRAIN):
+    """
+    改变图像及标签图像尺寸
+    AUG.AUG_METHOD为unpadding，所有模式均直接resize到AUG.FIX_RESIZE_SIZE的尺寸
+    AUG.AUG_METHOD为stepscaling, 按比例resize，训练时比例范围AUG.MIN_SCALE_FACTOR到AUG.MAX_SCALE_FACTOR,间隔为AUG.SCALE_STEP_SIZE，其他模式返回原图
+    AUG.AUG_METHOD为rangescaling，长边对齐，短边按比例变化，训练时长边对齐范围AUG.MIN_RESIZE_VALUE到AUG.MAX_RESIZE_VALUE，其他模式长边对齐AUG.INF_RESIZE_VALUE
+    Args：
+        img(numpy.ndarray): 输入图像
+        grt(numpy.ndarray): 标签图像，默认为None
+        mode(string): 模式, 默认训练模式，即ModelPhase.TRAIN
+    Returns：
+        resize后的图像和标签图
+    """
+    if cfg.AUG.AUG_METHOD == 'unpadding':
+        target_size = cfg.AUG.FIX_RESIZE_SIZE
+        img = cv2.resize(img, target_size, interpolation=cv2.INTER_LINEAR)
+        if grt is not None:
+            grt = cv2.resize(grt, target_size, interpolation=cv2.INTER_NEAREST)
+    elif cfg.AUG.AUG_METHOD == 'stepscaling':
+        if mode == ModelPhase.TRAIN:
+            min_scale_factor = cfg.AUG.MIN_SCALE_FACTOR
+            max_scale_factor = cfg.AUG.MAX_SCALE_FACTOR
+            step_size = cfg.AUG.SCALE_STEP_SIZE
+            scale_factor = get_random_scale(min_scale_factor, max_scale_factor,
+                                            step_size)
+            img, grt = randomly_scale_image_and_label(
+                img, grt, scale=scale_factor)
+    elif cfg.AUG.AUG_METHOD == 'rangescaling':
+        min_resize_value = cfg.AUG.MIN_RESIZE_VALUE
+        max_resize_value = cfg.AUG.MAX_RESIZE_VALUE
+        if mode == ModelPhase.TRAIN:
+            if min_resize_value == max_resize_value:
+                random_size = min_resize_value
+            else:
+                random_size = int(
+                    np.random.uniform(min_resize_value, max_resize_value) + 0.5)
+        else:
+            random_size = cfg.AUG.INF_RESIZE_VALUE
+        value = max(img.shape[0], img.shape[1])
+        scale = float(random_size) / float(value)
+        img = cv2.resize(
+            img, (0, 0), fx=scale, fy=scale, interpolation=cv2.INTER_LINEAR)
+        if grt is not None:
+            grt = cv2.resize(
+                grt, (0, 0),
+                fx=scale,
+                fy=scale,
+                interpolation=cv2.INTER_NEAREST)
+    else:
+        raise Exception("Unexpect data augmention method: {}".format(
+            cfg.AUG.AUG_METHOD))
+    return img, grt
+def get_random_scale(min_scale_factor, max_scale_factor, step_size):
+    """
+    在一定范围内得到随机值，范围为min_scale_factor到max_scale_factor，间隔为step_size
+    Args：
+        min_scale_factor(float): 随机尺度下限，大于0
+        max_scale_factor(float): 随机尺度上限，不小于下限值
+        step_size(float): 尺度间隔，非负, 等于为0时直接返回min_scale_factor到max_scale_factor范围内任一值
+    Returns：
+        随机尺度值
+    """
+    if min_scale_factor < 0 or min_scale_factor > max_scale_factor:
+        raise ValueError('Unexpected value of min_scale_factor.')
+    if min_scale_factor == max_scale_factor:
+        return min_scale_factor
+    if step_size == 0:
+        return np.random.uniform(min_scale_factor, max_scale_factor)
+    num_steps = int((max_scale_factor - min_scale_factor) / step_size + 1)
+    scale_factors = np.linspace(min_scale_factor, max_scale_factor,
+                                num_steps).tolist()
+    np.random.shuffle(scale_factors)
+    return scale_factors[0]
+def randomly_scale_image_and_label(image, label=None, scale=1.0):
+    """
+    按比例resize图像和标签图, 如果scale为1，返回原图
+    Args：
+        image(numpy.ndarray): 输入图像
+        label(numpy.ndarray): 标签图，默认None
+        sclae(float): 图片resize的比例，非负，默认1.0
+    Returns：
+        resize后的图像和标签图
+    """
+    if scale == 1.0:
+        return image, label
+    height = image.shape[0]
+    width = image.shape[1]
+    new_height = int(height * scale + 0.5)
+    new_width = int(width * scale + 0.5)
+    new_image = cv2.resize(
+        image, (new_width, new_height), interpolation=cv2.INTER_LINEAR)
+    if label is not None:
+        height = label.shape[0]
+        width = label.shape[1]
+        new_height = int(height * scale + 0.5)
+        new_width = int(width * scale + 0.5)
+        new_label = cv2.resize(
+            label, (new_width, new_height), interpolation=cv2.INTER_NEAREST)
+    return new_image, new_label
+def random_rotation(crop_img, crop_seg, rich_crop_max_rotation, mean_value):
+    """
+    随机旋转图像和标签图
+    Args：
+        crop_img(numpy.ndarray): 输入图像
+        crop_seg(numpy.ndarray): 标签图
+        rich_crop_max_rotation(int)：旋转最大角度，0-90
+        mean_value(list)：均值, 对图片旋转产生的多余区域使用均值填充
+    Returns：
+        旋转后的图像和标签图
+    """
+    ignore_index = cfg.DATASET.IGNORE_INDEX
+    if rich_crop_max_rotation > 0:
+        (h, w) = crop_img.shape[:2]
+        do_rotation = np.random.uniform(-rich_crop_max_rotation,
+                                        rich_crop_max_rotation)
+        pc = (w // 2, h // 2)
+        r = cv2.getRotationMatrix2D(pc, do_rotation, 1.0)
+        cos = np.abs(r[0, 0])
+        sin = np.abs(r[0, 1])
+        nw = int((h * sin) + (w * cos))
+        nh = int((h * cos) + (w * sin))
+        (cx, cy) = pc
+        r[0, 2] += (nw / 2) - cx
+        r[1, 2] += (nh / 2) - cy
+        dsize = (nw, nh)
+        crop_img = cv2.warpAffine(
+            crop_img,
+            r,
+            dsize=dsize,
+            flags=cv2.INTER_LINEAR,
+            borderMode=cv2.BORDER_CONSTANT,
+            borderValue=mean_value)
+        crop_seg = cv2.warpAffine(
+            crop_seg,
+            r,
+            dsize=dsize,
+            flags=cv2.INTER_NEAREST,
+            borderMode=cv2.BORDER_CONSTANT,
+            borderValue=(ignore_index, ignore_index, ignore_index))
+    return crop_img, crop_seg
+def rand_scale_aspect(crop_img,
+                      crop_seg,
+                      rich_crop_min_scale=0,
+                      rich_crop_aspect_ratio=0):
+    """
+    从输入图像和标签图像中裁取随机宽高比的图像，并reszie回原始尺寸
+    Args:
+        crop_img(numpy.ndarray): 输入图像
+        crop_seg(numpy.ndarray): 标签图像
+        rich_crop_min_scale(float)：裁取图像占原始图像的面积比，0-1，默认0返回原图
+        rich_crop_aspect_ratio(float): 裁取图像的宽高比范围，非负，默认0返回原图
+    Returns:
+        裁剪并resize回原始尺寸的图像和标签图像
+    """
+    if rich_crop_min_scale == 0 or rich_crop_aspect_ratio == 0:
+        return crop_img, crop_seg
+    else:
+        img_height = crop_img.shape[0]
+        img_width = crop_img.shape[1]
+        for i in range(0, 10):
+            area = img_height * img_width
+            target_area = area * np.random.uniform(rich_crop_min_scale, 1.0)
+            aspectRatio = np.random.uniform(rich_crop_aspect_ratio,
+                                            1.0 / rich_crop_aspect_ratio)
+            dw = int(np.sqrt(target_area * 1.0 * aspectRatio))
+            dh = int(np.sqrt(target_area * 1.0 / aspectRatio))
+            if (np.random.randint(10) < 5):
+                tmp = dw
+                dw = dh
+                dh = tmp
+            if (dh < img_height and dw < img_width):
+                h1 = np.random.randint(0, img_height - dh)
+                w1 = np.random.randint(0, img_width - dw)
+                crop_img = crop_img[h1:(h1 + dh), w1:(w1 + dw), :]
+                crop_seg = crop_seg[h1:(h1 + dh), w1:(w1 + dw)]
+                crop_img = cv2.resize(
+                    crop_img, (img_width, img_height),
+                    interpolation=cv2.INTER_LINEAR)
+                crop_seg = cv2.resize(
+                    crop_seg, (img_width, img_height),
+                    interpolation=cv2.INTER_NEAREST)
+                break
+        return crop_img, crop_seg
+def saturation_jitter(cv_img, jitter_range):
+    """
+    调节图像饱和度
+    Args:
+        cv_img(numpy.ndarray): 输入图像
+        jitter_range(float): 调节程度，0-1
+    Returns:
+        饱和度调整后的图像
+    """
+    greyMat = cv2.cvtColor(cv_img, cv2.COLOR_BGR2GRAY)
+    greyMat = greyMat[:, :, None] * np.ones(3, dtype=int)[None, None, :]
+    cv_img = cv_img.astype(np.float32)
+    cv_img = cv_img * (1 - jitter_range) + jitter_range * greyMat
+    cv_img = np.where(cv_img > 255, 255, cv_img)
+    cv_img = cv_img.astype(np.uint8)
+    return cv_img
+def brightness_jitter(cv_img, jitter_range):
+    """
+    调节图像亮度
+    Args:
+        cv_img(numpy.ndarray): 输入图像
+        jitter_range(float): 调节程度，0-1
+    Returns:
+        亮度调整后的图像
+    """
+    cv_img = cv_img.astype(np.float32)
+    cv_img = cv_img * (1.0 - jitter_range)
+    cv_img = np.where(cv_img > 255, 255, cv_img)
+    cv_img = cv_img.astype(np.uint8)
+    return cv_img
+def contrast_jitter(cv_img, jitter_range):
+    """
+    调节图像对比度
+    Args:
+        cv_img(numpy.ndarray): 输入图像
+        jitter_range(float): 调节程度，0-1
+    Returns:
+        对比度调整后的图像
+    """
+    greyMat = cv2.cvtColor(cv_img, cv2.COLOR_BGR2GRAY)
+    mean = np.mean(greyMat)
+    cv_img = cv_img.astype(np.float32)
+    cv_img = cv_img * (1 - jitter_range) + jitter_range * mean
+    cv_img = np.where(cv_img > 255, 255, cv_img)
+    cv_img = cv_img.astype(np.uint8)
+    return cv_img
+def random_jitter(cv_img, saturation_range, brightness_range, contrast_range):
+    """
+    图像亮度、饱和度、对比度调节，在调整范围内随机获得调节比例，并随机顺序叠加三种效果
+    Args:
+        cv_img(numpy.ndarray): 输入图像
+        saturation_range(float): 饱和对调节范围，0-1
+        brightness_range(float): 亮度调节范围，0-1
+        contrast_range(float): 对比度调节范围，0-1
+    Returns:
+        亮度、饱和度、对比度调整后图像
+    """
+    saturation_ratio = np.random.uniform(-saturation_range, saturation_range)
+    brightness_ratio = np.random.uniform(-brightness_range, brightness_range)
+    contrast_ratio = np.random.uniform(-contrast_range, contrast_range)
+    order = [1, 2, 3]
+    np.random.shuffle(order)
+    for i in range(3):
+        if order[i] == 0:
+            cv_img = saturation_jitter(cv_img, saturation_ratio)
+        if order[i] == 1:
+            cv_img = brightness_jitter(cv_img, brightness_ratio)
+        if order[i] == 2:
+            cv_img = contrast_jitter(cv_img, contrast_ratio)
+    return cv_img
+def hsv_color_jitter(crop_img,
+                     brightness_jitter_ratio=0,
+                     saturation_jitter_ratio=0,
+                     contrast_jitter_ratio=0):
+    """
+    图像亮度、饱和度、对比度调节
+    Args:
+        crop_img(numpy.ndarray): 输入图像
+        brightness_jitter_ratio(float): 亮度调节度最大值，1-0，默认0
+        saturation_jitter_ratio(float): 饱和度调节度最大值，1-0，默认0
+        contrast_jitter_ratio(float): 对比度调节度最大值，1-0，默认0
+    Returns：
+        亮度、饱和度、对比度调节后图像
+   """
+    if brightness_jitter_ratio > 0 or \
+        saturation_jitter_ratio > 0 or \
+        contrast_jitter_ratio > 0:
+        random_jitter(crop_img, saturation_jitter_ratio,
+                      brightness_jitter_ratio, contrast_jitter_ratio)
+    return crop_img
+def rand_crop(crop_img, crop_seg, mode=ModelPhase.TRAIN):
+    """
+    随机裁剪图片和标签图, 若crop尺寸大于原始尺寸，分别使用均值和ignore值填充再进行crop，
+    crop尺寸与原始尺寸一致，返回原图，crop尺寸小于原始尺寸直接crop
+    Args:
+        crop_img(numpy.ndarray): 输入图像
+        crop_seg(numpy.ndarray): 标签图
+        mode(string): 模式, 默认训练模式，验证或预测模式时crop尺寸需大于原始图片尺寸, 其他模式无限制
+    Returns：
+        裁剪后的图片和标签图
+    """
+    img_height = crop_img.shape[0]
+    img_width = crop_img.shape[1]
+    if ModelPhase.is_train(mode):
+        crop_width = cfg.TRAIN_CROP_SIZE[0]
+        crop_height = cfg.TRAIN_CROP_SIZE[1]
+    else:
+        crop_width = cfg.EVAL_CROP_SIZE[0]
+        crop_height = cfg.EVAL_CROP_SIZE[1]
+    if ModelPhase.is_eval(mode) or ModelPhase.is_predict(mode):
+        if (crop_height < img_height or crop_width < img_width):
+            raise Exception(
+                "Crop size({},{}) must large than img size({},{}) when in EvalPhase."
+                .format(crop_width, crop_height, img_width, img_height))
+    if img_height == crop_height and img_width == crop_width:
+        return crop_img, crop_seg
+    else:
+        pad_height = max(crop_height - img_height, 0)
+        pad_width = max(crop_width - img_width, 0)
+        if (pad_height > 0 or pad_width > 0):
+            crop_img = cv2.copyMakeBorder(
+                crop_img,
+                0,
+                pad_height,
+                0,
+                pad_width,
+                cv2.BORDER_CONSTANT,
+                value=cfg.MEAN)
+            if crop_seg is not None:
+                crop_seg = cv2.copyMakeBorder(
+                    crop_seg,
+                    0,
+                    pad_height,
+                    0,
+                    pad_width,
+                    cv2.BORDER_CONSTANT,
+                    value=cfg.DATASET.IGNORE_INDEX)
+            img_height = crop_img.shape[0]
+            img_width = crop_img.shape[1]
+        if crop_height > 0 and crop_width > 0:
+            h_off = np.random.randint(img_height - crop_height + 1)
+            w_off = np.random.randint(img_width - crop_width + 1)
+            crop_img = crop_img[h_off:(crop_height + h_off), w_off:(
+                w_off + crop_width), :]
+            if crop_seg is not None:
+                crop_seg = crop_seg[h_off:(crop_height + h_off), w_off:(
+                    w_off + crop_width)]
+        return crop_img, crop_seg
--- a/pdseg/data_utils.py
+++ b/pdseg/data_utils.py
+"""
+This code is based on https://github.com/fchollet/keras/blob/master/keras/utils/data_utils.py
+"""
+import time
+import numpy as np
+import threading
+import multiprocessing
+try:
+    import queue
+except ImportError:
+    import Queue as queue
+class GeneratorEnqueuer(object):
+    """
+    Multiple generators 
+    Args:
+        generators: 
+        wait_time (float): time to sleep in-between calls to `put()`.
+    """
+    def __init__(self, generators, wait_time=0.05):
+        self.wait_time = wait_time
+        self._generators = generators
+        self._threads = []
+        self._stop_events = []
+        self.queue = None
+        self._manager = None
+        self.workers = 1
+    def start(self, workers=1, max_queue_size=16):
+        """
+        Start worker threads which add data from the generator into the queue.
+        Args:
+            workers (int): number of worker threads
+            max_queue_size (int): queue size
+                (when full, threads could block on `put()`)
+        """
+        self.workers = workers
+        def data_generator_task(pid):
+            """
+            Data generator task.
+            """
+            def task(pid):
+                if (self.queue is not None
+                        and self.queue.qsize() < max_queue_size):
+                    generator_output = next(self._generators[pid])
+                    self.queue.put((generator_output))
+                else:
+                    time.sleep(self.wait_time)
+            while not self._stop_events[pid].is_set():
+                try:
+                    task(pid)
+                except Exception:
+                    self._stop_events[pid].set()
+                    break
+        try:
+            self._manager = multiprocessing.Manager()
+            self.queue = self._manager.Queue(maxsize=max_queue_size)
+            for pid in range(self.workers):
+                self._stop_events.append(multiprocessing.Event())
+                thread = multiprocessing.Process(
+                    target=data_generator_task, args=(pid, ))
+                thread.daemon = True
+                self._threads.append(thread)
+                thread.start()
+        except:
+            self.stop()
+            raise
+    def is_running(self):
+        """
+        Returns:
+            bool: Whether the worker theads are running.
+        """
+        # If queue is not empty then still in runing state wait for consumer
+        if not self.queue.empty():
+            return True
+        for pid in range(self.workers):
+            if not self._stop_events[pid].is_set():
+                return True
+        return False
+    def stop(self, timeout=None):
+        """
+        Stops running threads and wait for them to exit, if necessary.
+        Should be called by the same thread which called `start()`.
+        Args:
+            timeout(int|None): maximum time to wait on `thread.join()`.
+        """
+        if self.is_running():
+            for pid in range(self.workers):
+                self._stop_events[pid].set()
+        for thread in self._threads:
+            if thread.is_alive():
+                thread.join(timeout)
+        if self._manager:
+            self._manager.shutdown()
+        self._threads = []
+        self._stop_events = []
+        self.queue = None
--- a/pdseg/eval.py
+++ b/pdseg/eval.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+# GPU memory garbage collection optimization flags
+os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
+import sys
+import time
+import argparse
+import functools
+import pprint
+import cv2
+import numpy as np
+import paddle
+import paddle.fluid as fluid
+from utils.config import cfg
+from utils.timer import Timer, calculate_eta
+from models.model_builder import build_model
+from models.model_builder import ModelPhase
+from reader import SegDataset
+from metrics import ConfusionMatrix
+def parse_args():
+    parser = argparse.ArgumentParser(description='PaddleSeg model evalution')
+    parser.add_argument(
+        '--cfg',
+        dest='cfg_file',
+        help='Config file for training (and optionally testing)',
+        default=None,
+        type=str)
+    parser.add_argument(
+        '--use_gpu',
+        dest='use_gpu',
+        help='Use gpu or cpu',
+        action='store_true',
+        default=False)
+    parser.add_argument(
+        '--use_mpio',
+        dest='use_mpio',
+        help='Use multiprocess IO or not',
+        action='store_true',
+        default=False)
+    parser.add_argument(
+        'opts',
+        help='See utils/config.py for all options',
+        default=None,
+        nargs=argparse.REMAINDER)
+    if len(sys.argv) == 1:
+        parser.print_help()
+        sys.exit(1)
+    return parser.parse_args()
+def evaluate(cfg, ckpt_dir=None, use_gpu=False, use_mpio=False, **kwargs):
+    np.set_printoptions(precision=5, suppress=True)
+    startup_prog = fluid.Program()
+    test_prog = fluid.Program()
+    dataset = SegDataset(
+        file_list=cfg.DATASET.VAL_FILE_LIST,
+        mode=ModelPhase.EVAL,
+        data_dir=cfg.DATASET.DATA_DIR)
+    def data_generator():
+        #TODO: check is batch reader compatitable with Windows
+        if use_mpio:
+            data_gen = dataset.multiprocess_generator(
+                num_processes=cfg.DATALOADER.NUM_WORKERS,
+                max_queue_size=cfg.DATALOADER.BUF_SIZE)
+        else:
+            data_gen = dataset.generator()
+        for b in data_gen:
+            yield b[0], b[1], b[2]
+    py_reader, avg_loss, pred, grts, masks = build_model(
+        test_prog, startup_prog, phase=ModelPhase.EVAL)
+    py_reader.decorate_sample_generator(
+        data_generator, drop_last=False, batch_size=cfg.BATCH_SIZE)
+    # Get device environment
+    places = fluid.cuda_places() if use_gpu else fluid.cpu_places()
+    place = places[0]
+    dev_count = len(places)
+    print("Device count = {}".format(dev_count))
+    exe = fluid.Executor(place)
+    exe.run(startup_prog)
+    test_prog = test_prog.clone(for_test=True)
+    ckpt_dir = cfg.TEST.TEST_MODEL if not ckpt_dir else ckpt_dir
+    if ckpt_dir is not None:
+        print('load test model:', ckpt_dir)
+        fluid.io.load_params(exe, ckpt_dir, main_program=test_prog)
+    # Use streaming confusion matrix to calculate mean_iou
+    np.set_printoptions(
+        precision=4, suppress=True, linewidth=160, floatmode="fixed")
+    conf_mat = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
+    fetch_list = [avg_loss.name, pred.name, grts.name, masks.name]
+    num_images = 0
+    step = 0
+    all_step = cfg.DATASET.TEST_TOTAL_IMAGES // cfg.BATCH_SIZE + 1
+    timer = Timer()
+    timer.start()
+    py_reader.start()
+    while True:
+        try:
+            step += 1
+            loss, pred, grts, masks = exe.run(
+                test_prog, fetch_list=fetch_list, return_numpy=True)
+            loss = np.mean(np.array(loss))
+            num_images += pred.shape[0]
+            conf_mat.calculate(pred, grts, masks)
+            _, iou = conf_mat.mean_iou()
+            _, acc = conf_mat.accuracy()
+            speed = 1.0 / timer.elapsed_time()
+            print(
+                "[EVAL]step={} loss={:.5f} acc={:.4f} IoU={:.4f} step/sec={:.2f} | ETA {}"
+                .format(step, loss, acc, iou, speed,
+                        calculate_eta(all_step - step, speed)))
+            timer.restart()
+            sys.stdout.flush()
+        except fluid.core.EOFException:
+            break
+    category_iou, avg_iou = conf_mat.mean_iou()
+    category_acc, avg_acc = conf_mat.accuracy()
+    print("[EVAL]#image={} acc={:.4f} IoU={:.4f}".format(
+        num_images, avg_acc, avg_iou))
+    print("[EVAL]Category IoU:", category_iou)
+    print("[EVAL]Category Acc:", category_acc)
+    print("[EVAL]Kappa:{:.4f}".format(conf_mat.kappa()))
+    return category_iou, avg_iou, category_acc, avg_acc
+def main():
+    args = parse_args()
+    if args.cfg_file is not None:
+        cfg.update_from_file(args.cfg_file)
+    if args.opts is not None:
+        cfg.update_from_list(args.opts)
+    cfg.check_and_infer()
+    print(pprint.pformat(cfg))
+    evaluate(cfg, **args.__dict__)
+if __name__ == '__main__':
+    main()
--- a/pdseg/export_model.py
+++ b/pdseg/export_model.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import sys
+import time
+import pprint
+import cv2
+import argparse
+import numpy as np
+import paddle.fluid as fluid
+from utils.config import cfg
+from models.model_builder import build_model
+from models.model_builder import ModelPhase
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='PaddleSeg Inference Model Exporter')
+    parser.add_argument(
+        '--cfg',
+        dest='cfg_file',
+        help='Config file for training (and optionally testing)',
+        default=None,
+        type=str)
+    parser.add_argument(
+        'opts',
+        help='See utils/config.py for all options',
+        default=None,
+        nargs=argparse.REMAINDER)
+    if len(sys.argv) == 1:
+        parser.print_help()
+        sys.exit(1)
+    return parser.parse_args()
+def export_inference_model(args):
+    """
+    Export PaddlePaddle inference model for prediction depolyment and serving.
+    """
+    print("Exporting inference model...")
+    startup_prog = fluid.Program()
+    infer_prog = fluid.Program()
+    image, logit_out = build_model(
+        infer_prog, startup_prog, phase=ModelPhase.PREDICT)
+    # Use CPU for exporting inference model instead of GPU
+    place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    exe.run(startup_prog)
+    infer_prog = infer_prog.clone(for_test=True)
+    if os.path.exists(cfg.TEST.TEST_MODEL):
+        fluid.io.load_params(exe, cfg.TEST.TEST_MODEL, main_program=infer_prog)
+    else:
+        print("TEST.TEST_MODEL diretory is empty!")
+        exit(-1)
+    fluid.io.save_inference_model(
+        cfg.FREEZE.SAVE_DIR,
+        feeded_var_names=[image.name],
+        target_vars=[logit_out],
+        executor=exe,
+        main_program=infer_prog,
+        model_filename=cfg.FREEZE.MODEL_FILENAME,
+        params_filename=cfg.FREEZE.PARAMS_FILENAME)
+    print("Inference model exported!")
+def main():
+    args = parse_args()
+    if args.cfg_file is not None:
+        cfg.update_from_file(args.cfg_file)
+    if args.opts is not None:
+        cfg.update_from_list(args.opts)
+    cfg.check_and_infer()
+    print(pprint.pformat(cfg))
+    export_inference_model(args)
+if __name__ == '__main__':
+    main()
--- a/pdseg/loss.py
+++ b/pdseg/loss.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import sys
+import paddle.fluid as fluid
+import numpy as np
+import importlib
+from utils.config import cfg
+def softmax_with_loss(logit, label, ignore_mask=None, num_classes=2):
+    ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
+    label = fluid.layers.elementwise_min(
+        label, fluid.layers.assign(np.array([num_classes - 1], dtype=np.int32)))
+    logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+    logit = fluid.layers.reshape(logit, [-1, num_classes])
+    label = fluid.layers.reshape(label, [-1, 1])
+    label = fluid.layers.cast(label, 'int64')
+    ignore_mask = fluid.layers.reshape(ignore_mask, [-1, 1])
+    loss, probs = fluid.layers.softmax_with_cross_entropy(
+        logit,
+        label,
+        ignore_index=cfg.DATASET.IGNORE_INDEX,
+        return_softmax=True)
+    loss = loss * ignore_mask
+    if cfg.MODEL.FP16:
+        loss = fluid.layers.cast(loss, 'float32')
+        avg_loss = fluid.layers.mean(loss) / fluid.layers.mean(ignore_mask)
+        avg_loss = fluid.layers.cast(avg_loss, 'float16')
+    else:
+        avg_loss = fluid.layers.mean(loss) / fluid.layers.mean(ignore_mask)
+    if cfg.MODEL.SCALE_LOSS > 1.0:
+        avg_loss = avg_loss * cfg.MODEL.SCALE_LOSS
+    label.stop_gradient = True
+    ignore_mask.stop_gradient = True
+    return avg_loss
+def multi_softmax_with_loss(logits, label, ignore_mask=None, num_classes=2):
+    if isinstance(logits, tuple):
+        avg_loss = 0
+        for i, logit in enumerate(logits):
+            logit_label = fluid.layers.resize_nearest(label, logit.shape[2:])
+            logit_mask = (logit_label.astype('int32') !=
+                          cfg.DATASET.IGNORE_INDEX).astype('int32')
+            loss = softmax_with_loss(logit, logit_label, logit_mask,
+                                     num_classes)
+            avg_loss += cfg.MODEL.MULTI_LOSS_WEIGHT[i] * loss
+    else:
+        avg_loss = softmax_with_loss(logits, label, ignore_mask, num_classes)
+    return avg_loss
+# to change, how to appicate ignore index and ignore mask
+def dice_loss(logit, label, ignore_mask=None, num_classes=2):
+    if num_classes != 2:
+        raise Exception("dice loss is only applicable to binary classfication")
+    ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
+    label = fluid.layers.elementwise_min(
+        label, fluid.layers.assign(np.array([num_classes - 1], dtype=np.int32)))
+    logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+    logit = fluid.layers.reshape(logit, [-1, num_classes])
+    logit = fluid.layers.softmax(logit)
+    label = fluid.layers.reshape(label, [-1, 1])
+    label = fluid.layers.cast(label, 'int64')
+    ignore_mask = fluid.layers.reshape(ignore_mask, [-1, 1])
+    loss = fluid.layers.dice_loss(logit, label)
+    return loss
--- a/pdseg/metrics.py
+++ b/pdseg/metrics.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import sys
+import numpy as np
+from scipy.sparse import csr_matrix
+class ConfusionMatrix(object):
+    """
+        Confusion Matrix for segmentation evaluation
+    """
+    def __init__(self, num_classes=2, streaming=False):
+        self.confusion_matrix = np.zeros([num_classes, num_classes],
+                                         dtype='int64')
+        self.num_classes = num_classes
+        self.streaming = streaming
+    def calculate(self, pred, label, ignore=None):
+        # If not in streaming mode, clear matrix everytime when call `calculate`
+        if not self.streaming:
+            self.zero_matrix()
+        label = np.transpose(label, (0, 2, 3, 1))
+        ignore = np.transpose(ignore, (0, 2, 3, 1))
+        mask = np.array(ignore) == 1
+        label = np.asarray(label)[mask]
+        pred = np.asarray(pred)[mask]
+        one = np.ones_like(pred)
+        # Accumuate ([row=label, col=pred], 1) into sparse matrix
+        spm = csr_matrix((one, (label, pred)),
+                         shape=(self.num_classes, self.num_classes))
+        spm = spm.todense()
+        self.confusion_matrix += spm
+    def zero_matrix(self):
+        """ Clear confusion matrix """
+        self.confusion_matrix = np.zeros([self.num_classes, self.num_classes],
+                                         dtype='int64')
+    def mean_iou(self):
+        iou_list = []
+        avg_iou = 0
+        # TODO: use numpy sum axis api to simpliy
+        vji = np.zeros(self.num_classes, dtype=int)
+        vij = np.zeros(self.num_classes, dtype=int)
+        for j in range(self.num_classes):
+            v_j = 0
+            for i in range(self.num_classes):
+                v_j += self.confusion_matrix[j][i]
+            vji[j] = v_j
+        for i in range(self.num_classes):
+            v_i = 0
+            for j in range(self.num_classes):
+                v_i += self.confusion_matrix[j][i]
+            vij[i] = v_i
+        for c in range(self.num_classes):
+            total = vji[c] + vij[c] - self.confusion_matrix[c][c]
+            if total == 0:
+                iou = 0
+            else:
+                iou = float(self.confusion_matrix[c][c]) / total
+            avg_iou += iou
+            iou_list.append(iou)
+        avg_iou = float(avg_iou) / float(self.num_classes)
+        return np.array(iou_list), avg_iou
+    def accuracy(self):
+        total = self.confusion_matrix.sum()
+        total_right = 0
+        for c in range(self.num_classes):
+            total_right += self.confusion_matrix[c][c]
+        if total == 0:
+            avg_acc = 0
+        else:
+            avg_acc = float(total_right) / total
+        vij = np.zeros(self.num_classes, dtype=int)
+        for i in range(self.num_classes):
+            v_i = 0
+            for j in range(self.num_classes):
+                v_i += self.confusion_matrix[j][i]
+            vij[i] = v_i
+        acc_list = []
+        for c in range(self.num_classes):
+            if vij[c] == 0:
+                acc = 0
+            else:
+                acc = self.confusion_matrix[c][c] / float(vij[c])
+            acc_list.append(acc)
+        return np.array(acc_list), avg_acc
+    def kappa(self):
+        vji = np.zeros(self.num_classes)
+        vij = np.zeros(self.num_classes)
+        for j in range(self.num_classes):
+            v_j = 0
+            for i in range(self.num_classes):
+                v_j += self.confusion_matrix[j][i]
+            vji[j] = v_j
+        for i in range(self.num_classes):
+            v_i = 0
+            for j in range(self.num_classes):
+                v_i += self.confusion_matrix[j][i]
+            vij[i] = v_i
+        total = self.confusion_matrix.sum()
+        # avoid spillovers
+        # TODO: is it reasonable to hard code 10000.0?
+        total = float(total) / 10000.0
+        vji = vji / 10000.0
+        vij = vij / 10000.0
+        tp = 0
+        tc = 0
+        for c in range(self.num_classes):
+            tp += vji[c] * vij[c]
+            tc += self.confusion_matrix[c][c]
+        tc = tc / 10000.0
+        pe = tp / (total * total)
+        po = tc / total
+        kappa = (po - pe) / (1 - pe)
+        return kappa
--- a/pdseg/models/__init__.py
+++ b/pdseg/models/__init__.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import models.modeling
+import models.libs
+import models.backbone
--- a/pdseg/models/backbone/__init__.py
+++ b/pdseg/models/backbone/__init__.py
--- a/pdseg/models/backbone/mobilenet_v2.py
+++ b/pdseg/models/backbone/mobilenet_v2.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import paddle.fluid as fluid
+from paddle.fluid.initializer import MSRA
+from paddle.fluid.param_attr import ParamAttr
+from utils.config import cfg
+__all__ = [
+    'MobileNetV2', 'MobileNetV2_x0_25', 'MobileNetV2_x0_5', 'MobileNetV2_x1_0',
+    'MobileNetV2_x1_5', 'MobileNetV2_x2_0', 'MobileNetV2_scale'
+]
+train_parameters = {
+    "input_size": [3, 224, 224],
+    "input_mean": [0.485, 0.456, 0.406],
+    "input_std": [0.229, 0.224, 0.225],
+    "learning_strategy": {
+        "name": "piecewise_decay",
+        "batch_size": 256,
+        "epochs": [30, 60, 90],
+        "steps": [0.1, 0.01, 0.001, 0.0001]
+    }
+}
+class MobileNetV2():
+    def __init__(self, scale=1.0, change_depth=False, output_stride=None):
+        self.params = train_parameters
+        self.scale = scale
+        self.change_depth = change_depth
+        self.bottleneck_params_list = [
+            (1, 16, 1, 1),
+            (6, 24, 2, 2),
+            (6, 32, 3, 2),
+            (6, 64, 4, 2),
+            (6, 96, 3, 1),
+            (6, 160, 3, 2),
+            (6, 320, 1, 1),
+        ] if change_depth == False else [
+            (1, 16, 1, 1),
+            (6, 24, 2, 2),
+            (6, 32, 5, 2),
+            (6, 64, 7, 2),
+            (6, 96, 5, 1),
+            (6, 160, 3, 2),
+            (6, 320, 1, 1),
+        ]
+        self.modify_bottle_params(output_stride)
+    def modify_bottle_params(self, output_stride=None):
+        if output_stride is not None and output_stride % 2 != 0:
+            raise Exception("output stride must to be even number")
+        if output_stride is None:
+            return
+        else:
+            stride = 2
+            for i, layer_setting in enumerate(self.bottleneck_params_list):
+                t, c, n, s = layer_setting
+                stride = stride * s
+                if stride > output_stride:
+                    s = 1
+                self.bottleneck_params_list[i] = (t, c, n, s)
+    def net(self, input, class_dim=1000, end_points=None, decode_points=None):
+        scale = self.scale
+        change_depth = self.change_depth
+        #if change_depth is True, the new depth is 1.4 times as deep as before.
+        bottleneck_params_list = self.bottleneck_params_list
+        decode_ends = dict()
+        def check_points(count, points):
+            if points is None:
+                return False
+            else:
+                if isinstance(points, list):
+                    return (True if count in points else False)
+                else:
+                    return (True if count == points else False)
+        #conv1
+        input = self.conv_bn_layer(
+            input,
+            num_filters=int(32 * scale),
+            filter_size=3,
+            stride=2,
+            padding=1,
+            if_act=True,
+            name='conv1_1')
+        layer_count = 1
+        #print("node test:", layer_count, input.shape)
+        if check_points(layer_count, decode_points):
+            decode_ends[layer_count] = input
+        if check_points(layer_count, end_points):
+            return input, decode_ends
+        # bottleneck sequences
+        i = 1
+        in_c = int(32 * scale)
+        for layer_setting in bottleneck_params_list:
+            t, c, n, s = layer_setting
+            i += 1
+            input, depthwise_output = self.invresi_blocks(
+                input=input,
+                in_c=in_c,
+                t=t,
+                c=int(c * scale),
+                n=n,
+                s=s,
+                name='conv' + str(i))
+            in_c = int(c * scale)
+            layer_count += n
+            #print("node test:", layer_count, input.shape)
+            if check_points(layer_count, decode_points):
+                decode_ends[layer_count] = depthwise_output
+            if check_points(layer_count, end_points):
+                return input, decode_ends
+        #last_conv
+        input = self.conv_bn_layer(
+            input=input,
+            num_filters=int(1280 * scale) if scale > 1.0 else 1280,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            if_act=True,
+            name='conv9')
+        input = fluid.layers.pool2d(
+            input=input,
+            pool_size=7,
+            pool_stride=1,
+            pool_type='avg',
+            global_pooling=True)
+        output = fluid.layers.fc(
+            input=input,
+            size=class_dim,
+            param_attr=ParamAttr(name='fc10_weights'),
+            bias_attr=ParamAttr(name='fc10_offset'))
+        return output
+    def conv_bn_layer(self,
+                      input,
+                      filter_size,
+                      num_filters,
+                      stride,
+                      padding,
+                      channels=None,
+                      num_groups=1,
+                      if_act=True,
+                      name=None,
+                      use_cudnn=True):
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=stride,
+            padding=padding,
+            groups=num_groups,
+            act=None,
+            use_cudnn=use_cudnn,
+            param_attr=ParamAttr(name=name + '_weights'),
+            bias_attr=False)
+        bn_name = name + '_bn'
+        bn = fluid.layers.batch_norm(
+            input=conv,
+            param_attr=ParamAttr(name=bn_name + "_scale"),
+            bias_attr=ParamAttr(name=bn_name + "_offset"),
+            moving_mean_name=bn_name + '_mean',
+            moving_variance_name=bn_name + '_variance')
+        if if_act:
+            return fluid.layers.relu6(bn)
+        else:
+            return bn
+    def shortcut(self, input, data_residual):
+        return fluid.layers.elementwise_add(input, data_residual)
+    def inverted_residual_unit(self,
+                               input,
+                               num_in_filter,
+                               num_filters,
+                               ifshortcut,
+                               stride,
+                               filter_size,
+                               padding,
+                               expansion_factor,
+                               name=None):
+        num_expfilter = int(round(num_in_filter * expansion_factor))
+        channel_expand = self.conv_bn_layer(
+            input=input,
+            num_filters=num_expfilter,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            num_groups=1,
+            if_act=True,
+            name=name + '_expand')
+        bottleneck_conv = self.conv_bn_layer(
+            input=channel_expand,
+            num_filters=num_expfilter,
+            filter_size=filter_size,
+            stride=stride,
+            padding=padding,
+            num_groups=num_expfilter,
+            if_act=True,
+            name=name + '_dwise',
+            use_cudnn=True if cfg.MODEL.FP16 else False)
+        depthwise_output = bottleneck_conv
+        linear_out = self.conv_bn_layer(
+            input=bottleneck_conv,
+            num_filters=num_filters,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            num_groups=1,
+            if_act=False,
+            name=name + '_linear')
+        if ifshortcut:
+            out = self.shortcut(input=input, data_residual=linear_out)
+            return out, depthwise_output
+        else:
+            return linear_out, depthwise_output
+    def invresi_blocks(self, input, in_c, t, c, n, s, name=None):
+        first_block, depthwise_output = self.inverted_residual_unit(
+            input=input,
+            num_in_filter=in_c,
+            num_filters=c,
+            ifshortcut=False,
+            stride=s,
+            filter_size=3,
+            padding=1,
+            expansion_factor=t,
+            name=name + '_1')
+        last_residual_block = first_block
+        last_c = c
+        for i in range(1, n):
+            last_residual_block, depthwise_output = self.inverted_residual_unit(
+                input=last_residual_block,
+                num_in_filter=last_c,
+                num_filters=c,
+                ifshortcut=True,
+                stride=1,
+                filter_size=3,
+                padding=1,
+                expansion_factor=t,
+                name=name + '_' + str(i + 1))
+        return last_residual_block, depthwise_output
+def MobileNetV2_x0_25():
+    model = MobileNetV2(scale=0.25)
+    return model
+def MobileNetV2_x0_5():
+    model = MobileNetV2(scale=0.5)
+    return model
+def MobileNetV2_x1_0():
+    model = MobileNetV2(scale=1.0)
+    return model
+def MobileNetV2_x1_5():
+    model = MobileNetV2(scale=1.5)
+    return model
+def MobileNetV2_x2_0():
+    model = MobileNetV2(scale=2.0)
+    return model
+def MobileNetV2_scale():
+    model = MobileNetV2(scale=1.2, change_depth=True)
+    return model
+if __name__ == '__main__':
+    image_shape = [3, 224, 224]
+    image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
+    model = MobileNetV2_x1_0()
+    logit, decode_ends = model.net(image)
+    #print("logit:", logit.shape)
--- a/pdseg/models/backbone/resnet.py
+++ b/pdseg/models/backbone/resnet.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import math
+import numpy as np
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+__all__ = [
+    "ResNet", "ResNet18", "ResNet34", "ResNet50", "ResNet101", "ResNet152"
+]
+train_parameters = {
+    "input_size": [3, 224, 224],
+    "input_mean": [0.485, 0.456, 0.406],
+    "input_std": [0.229, 0.224, 0.225],
+    "learning_strategy": {
+        "name": "piecewise_decay",
+        "batch_size": 256,
+        "epochs": [30, 60, 90],
+        "steps": [0.1, 0.01, 0.001, 0.0001]
+    }
+}
+class ResNet():
+    def __init__(self, layers=50, scale=1.0, stem=None):
+        self.params = train_parameters
+        self.layers = layers
+        self.scale = scale
+        self.stem = stem
+    def net(self,
+            input,
+            class_dim=1000,
+            end_points=None,
+            decode_points=None,
+            resize_points=None,
+            dilation_dict=None):
+        layers = self.layers
+        supported_layers = [18, 34, 50, 101, 152]
+        assert layers in supported_layers, \
+            "supported layers are {} but input layer is {}".format(supported_layers, layers)
+        decode_ends = dict()
+        def check_points(count, points):
+            if points is None:
+                return False
+            else:
+                if isinstance(points, list):
+                    return (True if count in points else False)
+                else:
+                    return (True if count == points else False)
+        def get_dilated_rate(dilation_dict, idx):
+            if dilation_dict is None or idx not in dilation_dict:
+                return 1
+            else:
+                return dilation_dict[idx]
+        if layers == 18:
+            depth = [2, 2, 2, 2]
+        elif layers == 34 or layers == 50:
+            depth = [3, 4, 6, 3]
+        elif layers == 101:
+            depth = [3, 4, 23, 3]
+        elif layers == 152:
+            depth = [3, 8, 36, 3]
+        num_filters = [64, 128, 256, 512]
+        if self.stem == 'icnet':
+            conv = self.conv_bn_layer(
+                input=input,
+                num_filters=int(64 * self.scale),
+                filter_size=3,
+                stride=2,
+                act='relu',
+                name="conv1_1")
+            conv = self.conv_bn_layer(
+                input=conv,
+                num_filters=int(64 * self.scale),
+                filter_size=3,
+                stride=1,
+                act='relu',
+                name="conv1_2")
+            conv = self.conv_bn_layer(
+                input=conv,
+                num_filters=int(128 * self.scale),
+                filter_size=3,
+                stride=1,
+                act='relu',
+                name="conv1_3")
+        else:
+            conv = self.conv_bn_layer(
+                input=input,
+                num_filters=int(64 * self.scale),
+                filter_size=7,
+                stride=2,
+                act='relu',
+                name="conv1")
+        conv = fluid.layers.pool2d(
+            input=conv,
+            pool_size=3,
+            pool_stride=2,
+            pool_padding=1,
+            pool_type='max')
+        layer_count = 1
+        if check_points(layer_count, decode_points):
+            decode_ends[layer_count] = conv
+        if check_points(layer_count, end_points):
+            return conv, decode_ends
+        if layers >= 50:
+            for block in range(len(depth)):
+                for i in range(depth[block]):
+                    if layers in [101, 152] and block == 2:
+                        if i == 0:
+                            conv_name = "res" + str(block + 2) + "a"
+                        else:
+                            conv_name = "res" + str(block + 2) + "b" + str(i)
+                    else:
+                        conv_name = "conv" + str(block + 2) + '_' + str(1 + i)
+                    dilation_rate = get_dilated_rate(dilation_dict, block)
+                    conv = self.bottleneck_block(
+                        input=conv,
+                        num_filters=int(num_filters[block] * self.scale),
+                        stride=2
+                        if i == 0 and block != 0 and dilation_rate == 1 else 1,
+                        name=conv_name,
+                        dilation=dilation_rate)
+                    layer_count += 3
+                    if check_points(layer_count, decode_points):
+                        decode_ends[layer_count] = conv
+                    if check_points(layer_count, end_points):
+                        return conv, decode_ends
+                    if check_points(layer_count, resize_points):
+                        conv = self.interp(
+                            conv,
+                            np.ceil(
+                                np.array(conv.shape[2:]).astype('int32') / 2))
+            pool = fluid.layers.pool2d(
+                input=conv, pool_size=7, pool_type='avg', global_pooling=True)
+            stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
+            out = fluid.layers.fc(
+                input=pool,
+                size=class_dim,
+                param_attr=fluid.param_attr.ParamAttr(
+                    initializer=fluid.initializer.Uniform(-stdv, stdv)))
+        else:
+            for block in range(len(depth)):
+                for i in range(depth[block]):
+                    conv_name = "res" + str(block + 2) + chr(97 + i)
+                    conv = self.basic_block(
+                        input=conv,
+                        num_filters=num_filters[block],
+                        stride=2 if i == 0 and block != 0 else 1,
+                        is_first=block == i == 0,
+                        name=conv_name)
+                    layer_count += 2
+                    if check_points(layer_count, decode_points):
+                        decode_ends[layer_count] = conv
+                    if check_points(layer_count, end_points):
+                        return conv, decode_ends
+            pool = fluid.layers.pool2d(
+                input=conv, pool_size=7, pool_type='avg', global_pooling=True)
+            stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
+            out = fluid.layers.fc(
+                input=pool,
+                size=class_dim,
+                param_attr=fluid.param_attr.ParamAttr(
+                    initializer=fluid.initializer.Uniform(-stdv, stdv)))
+        return out
+    def zero_padding(self, input, padding):
+        return fluid.layers.pad(
+            input, [0, 0, 0, 0, padding, padding, padding, padding])
+    def interp(self, input, out_shape):
+        out_shape = list(out_shape.astype("int32"))
+        return fluid.layers.resize_bilinear(input, out_shape=out_shape)
+    def conv_bn_layer(self,
+                      input,
+                      num_filters,
+                      filter_size,
+                      stride=1,
+                      dilation=1,
+                      groups=1,
+                      act=None,
+                      name=None):
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=stride,
+            padding=(filter_size - 1) // 2 if dilation == 1 else 0,
+            dilation=dilation,
+            groups=groups,
+            act=None,
+            param_attr=ParamAttr(name=name + "/weights"),
+            bias_attr=False,
+            name=name + '.conv2d.output.1')
+        bn_name = name + '/BatchNorm/'
+        return fluid.layers.batch_norm(
+            input=conv,
+            act=act,
+            name=bn_name + '.output.1',
+            param_attr=ParamAttr(name=bn_name + 'gamma'),
+            bias_attr=ParamAttr(bn_name + 'beta'),
+            moving_mean_name=bn_name + 'moving_mean',
+            moving_variance_name=bn_name + 'moving_variance',
+        )
+    def shortcut(self, input, ch_out, stride, is_first, name):
+        ch_in = input.shape[1]
+        if ch_in != ch_out or stride != 1 or is_first == True:
+            return self.conv_bn_layer(input, ch_out, 1, stride, name=name)
+        else:
+            return input
+    def bottleneck_block(self, input, num_filters, stride, name, dilation=1):
+        conv0 = self.conv_bn_layer(
+            input=input,
+            num_filters=num_filters,
+            filter_size=1,
+            dilation=1,
+            stride=stride,
+            act='relu',
+            name=name + "_branch2a")
+        if dilation > 1:
+            conv0 = self.zero_padding(conv0, dilation)
+        conv1 = self.conv_bn_layer(
+            input=conv0,
+            num_filters=num_filters,
+            filter_size=3,
+            dilation=dilation,
+            act='relu',
+            name=name + "_branch2b")
+        conv2 = self.conv_bn_layer(
+            input=conv1,
+            num_filters=num_filters * 4,
+            dilation=1,
+            filter_size=1,
+            act=None,
+            name=name + "_branch2c")
+        short = self.shortcut(
+            input,
+            num_filters * 4,
+            stride,
+            is_first=False,
+            name=name + "_branch1")
+        return fluid.layers.elementwise_add(
+            x=short, y=conv2, act='relu', name=name + ".add.output.5")
+    def basic_block(self, input, num_filters, stride, is_first, name):
+        conv0 = self.conv_bn_layer(
+            input=input,
+            num_filters=num_filters,
+            filter_size=3,
+            act='relu',
+            stride=stride,
+            name=name + "_branch2a")
+        conv1 = self.conv_bn_layer(
+            input=conv0,
+            num_filters=num_filters,
+            filter_size=3,
+            act=None,
+            name=name + "_branch2b")
+        short = self.shortcut(
+            input, num_filters, stride, is_first, name=name + "_branch1")
+        return fluid.layers.elementwise_add(x=short, y=conv1, act='relu')
+def ResNet18():
+    model = ResNet(layers=18)
+    return model
+def ResNet34():
+    model = ResNet(layers=34)
+    return model
+def ResNet50():
+    model = ResNet(layers=50)
+    return model
+def ResNet101():
+    model = ResNet(layers=101)
+    return model
+def ResNet152():
+    model = ResNet(layers=152)
+    return model
--- a/pdseg/models/backbone/xception.py
+++ b/pdseg/models/backbone/xception.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import contextlib
+import paddle
+import math
+import paddle.fluid as fluid
+from models.libs.model_libs import scope, name_scope
+from models.libs.model_libs import bn, bn_relu, relu
+from models.libs.model_libs import conv
+from models.libs.model_libs import separate_conv
+__all__ = ['xception_65', 'xception_41', 'xception_71']
+def check_data(data, number):
+    if type(data) == int:
+        return [data] * number
+    assert len(data) == number
+    return data
+def check_stride(s, os):
+    if s <= os:
+        return True
+    else:
+        return False
+def check_points(count, points):
+    if points is None:
+        return False
+    else:
+        if isinstance(points, list):
+            return (True if count in points else False)
+        else:
+            return (True if count == points else False)
+class Xception():
+    def __init__(self, backbone="xception_65"):
+        self.bottleneck_params = self.gen_bottleneck_params(backbone)
+        self.backbone = backbone
+    def gen_bottleneck_params(self, backbone='xception_65'):
+        if backbone == 'xception_65':
+            bottleneck_params = {
+                "entry_flow": (3, [2, 2, 2], [128, 256, 728]),
+                "middle_flow": (16, 1, 728),
+                "exit_flow": (2, [2, 1], [[728, 1024, 1024], [1536, 1536,
+                                                              2048]])
+            }
+        elif backbone == 'xception_41':
+            bottleneck_params = {
+                "entry_flow": (3, [2, 2, 2], [128, 256, 728]),
+                "middle_flow": (8, 1, 728),
+                "exit_flow": (2, [2, 1], [[728, 1024, 1024], [1536, 1536,
+                                                              2048]])
+            }
+        elif backbone == 'xception_71':
+            bottleneck_params = {
+                "entry_flow": (5, [2, 1, 2, 1, 2], [128, 256, 256, 728, 728]),
+                "middle_flow": (16, 1, 728),
+                "exit_flow": (2, [2, 1], [[728, 1024, 1024], [1536, 1536,
+                                                              2048]])
+            }
+        else:
+            raise Exception(
+                "xception backbont only support xception_41/xception_65/xception_71"
+            )
+        return bottleneck_params
+    def net(self,
+            input,
+            output_stride=32,
+            num_classes=1000,
+            end_points=None,
+            decode_points=None):
+        self.stride = 2
+        self.block_point = 0
+        self.output_stride = output_stride
+        self.decode_points = decode_points
+        self.short_cuts = dict()
+        with scope(self.backbone):
+            # Entry flow
+            data = self.entry_flow(input)
+            if check_points(self.block_point, end_points):
+                return data, self.short_cuts
+            # Middle flow
+            data = self.middle_flow(data)
+            if check_points(self.block_point, end_points):
+                return data, self.short_cuts
+            # Exit flow
+            data = self.exit_flow(data)
+            if check_points(self.block_point, end_points):
+                return data, self.short_cuts
+            data = fluid.layers.reduce_mean(data, [2, 3], keep_dim=True)
+            data = fluid.layers.dropout(data, 0.5)
+            stdv = 1.0 / math.sqrt(data.shape[1] * 1.0)
+            with scope("logit"):
+                out = fluid.layers.fc(
+                    input=data,
+                    size=num_classes,
+                    act='softmax',
+                    param_attr=fluid.param_attr.ParamAttr(
+                        name='weights',
+                        initializer=fluid.initializer.Uniform(-stdv, stdv)),
+                    bias_attr=fluid.param_attr.ParamAttr(name='bias'))
+            return out
+    def entry_flow(self, data):
+        param_attr = fluid.ParamAttr(
+            name=name_scope + 'weights',
+            regularizer=None,
+            initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.09))
+        with scope("entry_flow"):
+            with scope("conv1"):
+                data = bn_relu(
+                    conv(
+                        data, 32, 3, stride=2, padding=1,
+                        param_attr=param_attr))
+            with scope("conv2"):
+                data = bn_relu(
+                    conv(
+                        data, 64, 3, stride=1, padding=1,
+                        param_attr=param_attr))
+        # get entry flow params
+        block_num = self.bottleneck_params["entry_flow"][0]
+        strides = self.bottleneck_params["entry_flow"][1]
+        chns = self.bottleneck_params["entry_flow"][2]
+        strides = check_data(strides, block_num)
+        chns = check_data(chns, block_num)
+        #print("entry:", block_num, strides, chns)
+        # params to control your flow
+        s = self.stride
+        block_point = self.block_point
+        output_stride = self.output_stride
+        #print("entry:", s, block_point, output_stride)
+        with scope("entry_flow"):
+            for i in range(block_num):
+                block_point = block_point + 1
+                with scope("block" + str(i + 1)):
+                    stride = strides[i] if check_stride(s * strides[i],
+                                                        output_stride) else 1
+                    data, short_cuts = self.xception_block(
+                        data, chns[i], [1, 1, stride])
+                    s = s * stride
+                    if check_points(block_point, self.decode_points):
+                        #print("decode shortcut:", block_point)
+                        self.short_cuts[block_point] = short_cuts[1]
+                    #print("entry:", i, data.shape)
+        self.stride = s
+        self.block_point = block_point
+        #print("entry:", s, block_point, output_stride)
+        return data
+    def middle_flow(self, data):
+        block_num = self.bottleneck_params["middle_flow"][0]
+        strides = self.bottleneck_params["middle_flow"][1]
+        chns = self.bottleneck_params["middle_flow"][2]
+        strides = check_data(strides, block_num)
+        chns = check_data(chns, block_num)
+        #print("middle:", block_num, strides, chns)
+        # params to control your flow
+        s = self.stride
+        block_point = self.block_point
+        output_stride = self.output_stride
+        #print("middle:", s, block_point, output_stride)
+        with scope("middle_flow"):
+            for i in range(block_num):
+                block_point = block_point + 1
+                with scope("block" + str(i + 1)):
+                    stride = strides[i] if check_stride(s * strides[i],
+                                                        output_stride) else 1
+                    data, short_cuts = self.xception_block(
+                        data, chns[i], [1, 1, strides[i]], skip_conv=False)
+                    s = s * stride
+                    if check_points(block_point, self.decode_points):
+                        #print("decode shortcut:", block_point)
+                        self.short_cuts[block_point] = short_cuts[1]
+                    #print("middle:", i, data.shape)
+        self.stride = s
+        self.block_point = block_point
+        #print("middle:", s, block_point, output_stride)
+        return data
+    def exit_flow(self, data):
+        block_num = self.bottleneck_params["exit_flow"][0]
+        strides = self.bottleneck_params["exit_flow"][1]
+        chns = self.bottleneck_params["exit_flow"][2]
+        strides = check_data(strides, block_num)
+        chns = check_data(chns, block_num)
+        #print("exit:", block_num, strides, chns)
+        assert (block_num == 2)
+        # params to control your flow
+        s = self.stride
+        block_point = self.block_point
+        output_stride = self.output_stride
+        #print("exit:", s, block_point, output_stride)
+        with scope("exit_flow"):
+            with scope('block1'):
+                block_point += 1
+                stride = strides[0] if check_stride(s * strides[0],
+                                                    output_stride) else 1
+                data, short_cuts = self.xception_block(data, chns[0],
+                                                       [1, 1, stride])
+                s = s * stride
+                if check_points(block_point, self.decode_points):
+                    #print("decode shortcut:", block_point)
+                    self.short_cuts[block_point] = short_cuts[1]
+                #print("exit:", 0, data.shape)
+            with scope('block2'):
+                block_point += 1
+                stride = strides[1] if check_stride(s * strides[1],
+                                                    output_stride) else 1
+                data, short_cuts = self.xception_block(
+                    data,
+                    chns[1], [1, 1, stride],
+                    dilation=2,
+                    has_skip=False,
+                    activation_fn_in_separable_conv=True)
+                s = s * stride
+                if check_points(block_point, self.decode_points):
+                    #print("decode shortcut:", block_point)
+                    self.short_cuts[block_point] = short_cuts[1]
+                #print("exit:", 1, data.shape)
+        self.stride = s
+        self.block_point = block_point
+        #print("exit:", s, block_point, output_stride)
+        return data
+    def xception_block(self,
+                       input,
+                       channels,
+                       strides=1,
+                       filters=3,
+                       dilation=1,
+                       skip_conv=True,
+                       has_skip=True,
+                       activation_fn_in_separable_conv=False):
+        repeat_number = 3
+        channels = check_data(channels, repeat_number)
+        filters = check_data(filters, repeat_number)
+        strides = check_data(strides, repeat_number)
+        data = input
+        results = []
+        for i in range(repeat_number):
+            with scope('separable_conv' + str(i + 1)):
+                if not activation_fn_in_separable_conv:
+                    data = relu(data)
+                    data = separate_conv(
+                        data,
+                        channels[i],
+                        strides[i],
+                        filters[i],
+                        dilation=dilation)
+                else:
+                    data = separate_conv(
+                        data,
+                        channels[i],
+                        strides[i],
+                        filters[i],
+                        dilation=dilation,
+                        act=relu)
+                results.append(data)
+        if not has_skip:
+            return data, results
+        if skip_conv:
+            param_attr = fluid.ParamAttr(
+                name=name_scope + 'weights',
+                regularizer=None,
+                initializer=fluid.initializer.TruncatedNormal(
+                    loc=0.0, scale=0.09))
+            with scope('shortcut'):
+                skip = bn(
+                    conv(
+                        input,
+                        channels[-1],
+                        1,
+                        strides[-1],
+                        groups=1,
+                        padding=0,
+                        param_attr=param_attr))
+        else:
+            skip = input
+        return data + skip, results
+def xception_65():
+    model = Xception("xception_65")
+    return model
+def xception_41():
+    model = Xception("xception_41")
+    return model
+def xception_71():
+    model = Xception("xception_71")
+    return model
+if __name__ == '__main__':
+    image_shape = [3, 224, 224]
+    image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
+    model = xception_65()
+    logit = model.net(image)
+    #print("logit:", logit.shape)
--- a/pdseg/models/libs/__init__.py
+++ b/pdseg/models/libs/__init__.py
--- a/pdseg/models/libs/model_libs.py
+++ b/pdseg/models/libs/model_libs.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import paddle
+import paddle.fluid as fluid
+from utils.config import cfg
+import contextlib
+bn_regularizer = fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.0)
+name_scope = ""
+@contextlib.contextmanager
+def scope(name):
+    global name_scope
+    bk = name_scope
+    name_scope = name_scope + name + '/'
+    yield
+    name_scope = bk
+def max_pool(input, kernel, stride, padding):
+    data = fluid.layers.pool2d(
+        input,
+        pool_size=kernel,
+        pool_type='max',
+        pool_stride=stride,
+        pool_padding=padding)
+    return data
+def avg_pool(input, kernel, stride, padding=0):
+    data = fluid.layers.pool2d(
+        input,
+        pool_size=kernel,
+        pool_type='avg',
+        pool_stride=stride,
+        pool_padding=padding)
+    return data
+def group_norm(input, G, eps=1e-5, param_attr=None, bias_attr=None):
+    N, C, H, W = input.shape
+    if C % G != 0:
+        # print "group can not divide channle:", C, G
+        for d in range(10):
+            for t in [d, -d]:
+                if G + t <= 0: continue
+                if C % (G + t) == 0:
+                    G = G + t
+                    break
+            if C % G == 0:
+                # print "use group size:", G
+                break
+    assert C % G == 0
+    x = fluid.layers.group_norm(
+        input,
+        groups=G,
+        param_attr=param_attr,
+        bias_attr=bias_attr,
+        name=name_scope + 'group_norm')
+    return x
+def bn(*args, **kargs):
+    if cfg.MODEL.DEFAULT_NORM_TYPE == 'bn':
+        with scope('BatchNorm'):
+            return fluid.layers.batch_norm(
+                *args,
+                epsilon=cfg.MODEL.DEFAULT_EPSILON,
+                momentum=cfg.MODEL.BN_MOMENTUM,
+                param_attr=fluid.ParamAttr(
+                    name=name_scope + 'gamma', regularizer=bn_regularizer),
+                bias_attr=fluid.ParamAttr(
+                    name=name_scope + 'beta', regularizer=bn_regularizer),
+                moving_mean_name=name_scope + 'moving_mean',
+                moving_variance_name=name_scope + 'moving_variance',
+                **kargs)
+    elif cfg.MODEL.DEFAULT_NORM_TYPE == 'gn':
+        with scope('GroupNorm'):
+            return group_norm(
+                args[0],
+                cfg.MODEL.DEFAULT_GROUP_NUMBER,
+                eps=cfg.MODEL.DEFAULT_EPSILON,
+                param_attr=fluid.ParamAttr(
+                    name=name_scope + 'gamma', regularizer=bn_regularizer),
+                bias_attr=fluid.ParamAttr(
+                    name=name_scope + 'beta', regularizer=bn_regularizer))
+    else:
+        raise Exception("Unsupport norm type:" + cfg.MODEL.DEFAULT_NORM_TYPE)
+def bn_relu(data):
+    return fluid.layers.relu(bn(data))
+def relu(data):
+    return fluid.layers.relu(data)
+def conv(*args, **kargs):
+    kargs['param_attr'] = name_scope + 'weights'
+    if 'bias_attr' in kargs and kargs['bias_attr']:
+        kargs['bias_attr'] = fluid.ParamAttr(
+            name=name_scope + 'biases',
+            regularizer=None,
+            initializer=fluid.initializer.ConstantInitializer(value=0.0))
+    else:
+        kargs['bias_attr'] = False
+    return fluid.layers.conv2d(*args, **kargs)
+def deconv(*args, **kargs):
+    kargs['param_attr'] = name_scope + 'weights'
+    if 'bias_attr' in kargs and kargs['bias_attr']:
+        kargs['bias_attr'] = name_scope + 'biases'
+    else:
+        kargs['bias_attr'] = False
+    return fluid.layers.conv2d_transpose(*args, **kargs)
+def separate_conv(input, channel, stride, filter, dilation=1, act=None):
+    param_attr = fluid.ParamAttr(
+        name=name_scope + 'weights',
+        regularizer=fluid.regularizer.L2DecayRegularizer(
+            regularization_coeff=0.0),
+        initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.33))
+    with scope('depthwise'):
+        input = conv(
+            input,
+            input.shape[1],
+            filter,
+            stride,
+            groups=input.shape[1],
+            padding=(filter // 2) * dilation,
+            dilation=dilation,
+            use_cudnn=True if cfg.MODEL.FP16 else False,
+            param_attr=param_attr)
+        input = bn(input)
+        if act: input = act(input)
+    param_attr = fluid.ParamAttr(
+        name=name_scope + 'weights',
+        regularizer=None,
+        initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
+    with scope('pointwise'):
+        input = conv(
+            input, channel, 1, 1, groups=1, padding=0, param_attr=param_attr)
+        input = bn(input)
+        if act: input = act(input)
+    return input
--- a/pdseg/models/model_builder.py
+++ b/pdseg/models/model_builder.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import sys
+import struct
+import importlib
+import paddle.fluid as fluid
+import numpy as np
+from paddle.fluid.proto.framework_pb2 import VarType
+import solver
+from utils.config import cfg
+from loss import multi_softmax_with_loss
+class ModelPhase(object):
+    """
+    Standard name for model phase in PaddleSeg
+    The following standard keys are defined:
+    * `TRAIN`: training mode.
+    * `EVAL`: testing/evaluation mode.
+    * `PREDICT`: prediction/inference mode.
+    * `VISUAL` : visualization mode
+    """
+    TRAIN = 'train'
+    EVAL = 'eval'
+    PREDICT = 'predict'
+    VISUAL = 'visual'
+    @staticmethod
+    def is_train(phase):
+        return phase == ModelPhase.TRAIN
+    @staticmethod
+    def is_predict(phase):
+        return phase == ModelPhase.PREDICT
+    @staticmethod
+    def is_eval(phase):
+        return phase == ModelPhase.EVAL
+    @staticmethod
+    def is_visual(phase):
+        return phase == ModelPhase.VISUAL
+    @staticmethod
+    def is_valid_phase(phase):
+        """ Check valid phase """
+        if ModelPhase.is_train(phase) or ModelPhase.is_predict(phase) \
+                or ModelPhase.is_eval(phase) or ModelPhase.is_visual(phase):
+            return True
+        return False
+def map_model_name(model_name):
+    name_dict = {
+        "unet": "unet.unet",
+        "deeplabv3p": "deeplab.deeplabv3p",
+        "icnet": "icnet.icnet",
+    }
+    if model_name in name_dict.keys():
+        return name_dict[model_name]
+    else:
+        raise Exception(
+            "unknow model name, only support unet, deeplabv3p, icnet")
+def get_func(func_name):
+    """Helper to return a function object by name. func_name must identify a
+    function in this module or the path to a function relative to the base
+    'modeling' module.
+    """
+    if func_name == '':
+        return None
+    try:
+        parts = func_name.split('.')
+        # Refers to a function in this module
+        if len(parts) == 1:
+            return globals()[parts[0]]
+        # Otherwise, assume we're referencing a module under modeling
+        module_name = 'models.' + '.'.join(parts[:-1])
+        module = importlib.import_module(module_name)
+        return getattr(module, parts[-1])
+    except Exception:
+        print('Failed to find function: {}'.format(func_name))
+    return module
+def softmax(logit):
+    logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+    logit = fluid.layers.softmax(logit)
+    logit = fluid.layers.transpose(logit, [0, 3, 1, 2])
+    return logit
+def build_model(main_prog, start_prog, phase=ModelPhase.TRAIN):
+    if not ModelPhase.is_valid_phase(phase):
+        raise ValueError("ModelPhase {} is not valid!".format(phase))
+    if ModelPhase.is_train(phase):
+        width = cfg.TRAIN_CROP_SIZE[0]
+        height = cfg.TRAIN_CROP_SIZE[1]
+    else:
+        width = cfg.EVAL_CROP_SIZE[0]
+        height = cfg.EVAL_CROP_SIZE[1]
+    image_shape = [cfg.DATASET.DATA_DIM, height, width]
+    grt_shape = [1, height, width]
+    class_num = cfg.DATASET.NUM_CLASSES
+    with fluid.program_guard(main_prog, start_prog):
+        with fluid.unique_name.guard():
+            image = fluid.layers.data(
+                name='image', shape=image_shape, dtype='float32')
+            label = fluid.layers.data(
+                name='label', shape=grt_shape, dtype='int32')
+            mask = fluid.layers.data(
+                name='mask', shape=grt_shape, dtype='int32')
+            # use PyReader when doing traning and evaluation
+            if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
+                py_reader = fluid.io.PyReader(
+                    feed_list=[image, label, mask],
+                    capacity=cfg.DATALOADER.BUF_SIZE,
+                    iterable=False,
+                    use_double_buffer=True)
+            if cfg.MODEL.FP16:
+                image = fluid.layers.cast(image, "float16")
+            model_name = map_model_name(cfg.MODEL.MODEL_NAME)
+            model_func = get_func("modeling." + model_name)
+            logits = model_func(image, class_num)
+            if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
+                avg_loss = multi_softmax_with_loss(logits, label, mask,
+                                                   class_num)
+            #get pred result in original size
+            if isinstance(logits, tuple):
+                logit = logits[0]
+            else:
+                logit = logits
+            if logit.shape[2:] != label.shape[2:]:
+                logit = fluid.layers.resize_bilinear(logit, label.shape[2:])
+            # return image input and logit output for inference graph prune
+            if ModelPhase.is_predict(phase):
+                logit = softmax(logit)
+                return image, logit
+            out = fluid.layers.transpose(x=logit, perm=[0, 2, 3, 1])
+            if cfg.MODEL.FP16:
+                out = fluid.layers.cast(out, 'float32')
+            pred = fluid.layers.argmax(out, axis=3)
+            pred = fluid.layers.unsqueeze(pred, axes=[3])
+            if ModelPhase.is_visual(phase):
+                logit = softmax(logit)
+                return pred, logit
+            if ModelPhase.is_eval(phase):
+                return py_reader, avg_loss, pred, label, mask
+            if ModelPhase.is_train(phase):
+                optimizer = solver.Solver(main_prog, start_prog)
+                decayed_lr = optimizer.optimise(avg_loss)
+                return py_reader, avg_loss, decayed_lr, pred, label, mask
+def to_int(string, dest="I"):
+    return struct.unpack(dest, string)[0]
+def parse_shape_from_file(filename):
+    with open(filename, "rb") as file:
+        version = file.read(4)
+        lod_level = to_int(file.read(8), dest="Q")
+        for i in range(lod_level):
+            _size = to_int(file.read(8), dest="Q")
+            _ = file.read(_size)
+        version = file.read(4)
+        tensor_desc_size = to_int(file.read(4))
+        tensor_desc = VarType.TensorDesc()
+        tensor_desc.ParseFromString(file.read(tensor_desc_size))
+    return tuple(tensor_desc.dims)
--- a/pdseg/models/modeling/__init__.py
+++ b/pdseg/models/modeling/__init__.py
--- a/pdseg/models/modeling/deeplab.py
+++ b/pdseg/models/modeling/deeplab.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import contextlib
+import paddle
+import paddle.fluid as fluid
+from utils.config import cfg
+from models.libs.model_libs import scope, name_scope
+from models.libs.model_libs import bn, bn_relu, relu
+from models.libs.model_libs import conv
+from models.libs.model_libs import separate_conv
+from models.backbone.mobilenet_v2 import MobileNetV2 as mobilenet_backbone
+from models.backbone.xception import Xception as xception_backbone
+def encoder(input):
+    # 编码器配置，采用ASPP架构，pooling + 1x1_conv + 三个不同尺度的空洞卷积并行, concat后1x1conv
+    # ASPP_WITH_SEP_CONV：默认为真，使用depthwise可分离卷积，否则使用普通卷积
+    # OUTPUT_STRIDE: 下采样倍数，8或16，决定aspp_ratios大小
+    # aspp_ratios：ASPP模块空洞卷积的采样率
+    if cfg.MODEL.DEEPLAB.OUTPUT_STRIDE == 16:
+        aspp_ratios = [6, 12, 18]
+    elif cfg.MODEL.DEEPLAB.OUTPUT_STRIDE == 8:
+        aspp_ratios = [12, 24, 36]
+    else:
+        raise Exception("deeplab only support stride 8 or 16")
+    param_attr = fluid.ParamAttr(
+        name=name_scope + 'weights',
+        regularizer=None,
+        initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
+    with scope('encoder'):
+        channel = 256
+        with scope("image_pool"):
+            if cfg.MODEL.FP16:
+                image_avg = fluid.layers.reduce_mean(
+                    fluid.layers.cast(input, 'float32'), [2, 3], keep_dim=True)
+                image_avg = fluid.layers.cast(image_avg, 'float16')
+            else:
+                image_avg = fluid.layers.reduce_mean(
+                    input, [2, 3], keep_dim=True)
+            image_avg = bn_relu(
+                conv(
+                    image_avg,
+                    channel,
+                    1,
+                    1,
+                    groups=1,
+                    padding=0,
+                    param_attr=param_attr))
+            if cfg.MODEL.FP16:
+                image_avg = fluid.layers.cast(image_avg, 'float32')
+            image_avg = fluid.layers.resize_bilinear(image_avg, input.shape[2:])
+            if cfg.MODEL.FP16:
+                image_avg = fluid.layers.cast(image_avg, 'float16')
+        with scope("aspp0"):
+            aspp0 = bn_relu(
+                conv(
+                    input,
+                    channel,
+                    1,
+                    1,
+                    groups=1,
+                    padding=0,
+                    param_attr=param_attr))
+        with scope("aspp1"):
+            if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
+                aspp1 = separate_conv(
+                    input, channel, 1, 3, dilation=aspp_ratios[0], act=relu)
+            else:
+                aspp1 = bn_relu(
+                    conv(
+                        input,
+                        channel,
+                        stride=1,
+                        filter_size=3,
+                        dilation=aspp_ratios[0],
+                        padding=aspp_ratios[0],
+                        param_attr=param_attr))
+        with scope("aspp2"):
+            if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
+                aspp2 = separate_conv(
+                    input, channel, 1, 3, dilation=aspp_ratios[1], act=relu)
+            else:
+                aspp2 = bn_relu(
+                    conv(
+                        input,
+                        channel,
+                        stride=1,
+                        filter_size=3,
+                        dilation=aspp_ratios[1],
+                        padding=aspp_ratios[1],
+                        param_attr=param_attr))
+        with scope("aspp3"):
+            if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
+                aspp3 = separate_conv(
+                    input, channel, 1, 3, dilation=aspp_ratios[2], act=relu)
+            else:
+                aspp3 = bn_relu(
+                    conv(
+                        input,
+                        channel,
+                        stride=1,
+                        filter_size=3,
+                        dilation=aspp_ratios[2],
+                        padding=aspp_ratios[2],
+                        param_attr=param_attr))
+        with scope("concat"):
+            data = fluid.layers.concat([image_avg, aspp0, aspp1, aspp2, aspp3],
+                                       axis=1)
+            data = bn_relu(
+                conv(
+                    data,
+                    channel,
+                    1,
+                    1,
+                    groups=1,
+                    padding=0,
+                    param_attr=param_attr))
+            data = fluid.layers.dropout(data, 0.9)
+        return data
+def decoder(encode_data, decode_shortcut):
+    # 解码器配置
+    # encode_data：编码器输出
+    # decode_shortcut: 从backbone引出的分支, resize后与encode_data concat
+    # DECODER_USE_SEP_CONV: 默认为真，则concat后连接两个可分离卷积，否则为普通卷积
+    param_attr = fluid.ParamAttr(
+        name=name_scope + 'weights',
+        regularizer=None,
+        initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
+    with scope('decoder'):
+        with scope('concat'):
+            decode_shortcut = bn_relu(
+                conv(
+                    decode_shortcut,
+                    48,
+                    1,
+                    1,
+                    groups=1,
+                    padding=0,
+                    param_attr=param_attr))
+            if cfg.MODEL.FP16:
+                encode_data = fluid.layers.cast(encode_data, 'float32')
+            encode_data = fluid.layers.resize_bilinear(
+                encode_data, decode_shortcut.shape[2:])
+            if cfg.MODEL.FP16:
+                encode_data = fluid.layers.cast(encode_data, 'float16')
+            encode_data = fluid.layers.concat([encode_data, decode_shortcut],
+                                              axis=1)
+        if cfg.MODEL.DEEPLAB.DECODER_USE_SEP_CONV:
+            with scope("separable_conv1"):
+                encode_data = separate_conv(
+                    encode_data, 256, 1, 3, dilation=1, act=relu)
+            with scope("separable_conv2"):
+                encode_data = separate_conv(
+                    encode_data, 256, 1, 3, dilation=1, act=relu)
+        else:
+            with scope("decoder_conv1"):
+                encode_data = bn_relu(
+                    conv(
+                        encode_data,
+                        256,
+                        stride=1,
+                        filter_size=3,
+                        dilation=1,
+                        padding=1,
+                        param_attr=param_attr))
+            with scope("decoder_conv2"):
+                encode_data = bn_relu(
+                    conv(
+                        encode_data,
+                        256,
+                        stride=1,
+                        filter_size=3,
+                        dilation=1,
+                        padding=1,
+                        param_attr=param_attr))
+        return encode_data
+def mobilenetv2(input):
+    # Backbone: mobilenetv2结构配置
+    # DEPTH_MULTIPLIER: mobilenetv2的scale设置，默认1.0
+    # OUTPUT_STRIDE：下采样倍数
+    # end_points: mobilenetv2的block数
+    # decode_point: 从mobilenetv2中引出分支所在block数, 作为decoder输入
+    scale = cfg.MODEL.DEEPLAB.DEPTH_MULTIPLIER
+    output_stride = cfg.MODEL.DEEPLAB.OUTPUT_STRIDE
+    model = mobilenet_backbone(scale=scale, output_stride=output_stride)
+    end_points = 18
+    decode_point = 4
+    data, decode_shortcuts = model.net(
+        input, end_points=end_points, decode_points=decode_point)
+    decode_shortcut = decode_shortcuts[decode_point]
+    return data, decode_shortcut
+def xception(input):
+    # Backbone: Xception结构配置, xception_65, xception_41, xception_71三种可选
+    # decode_point: 从Xception中引出分支所在block数，作为decoder输入
+    # end_point：Xception的block数
+    cfg.MODEL.DEFAULT_EPSILON = 1e-3
+    model = xception_backbone(cfg.MODEL.DEEPLAB.BACKBONE)
+    backbone = cfg.MODEL.DEEPLAB.BACKBONE
+    output_stride = cfg.MODEL.DEEPLAB.OUTPUT_STRIDE
+    if '65' in backbone:
+        decode_point = 2
+        end_points = 21
+    if '41' in backbone:
+        decode_point = 2
+        end_points = 13
+    if '71' in backbone:
+        decode_point = 3
+        end_points = 23
+    data, decode_shortcuts = model.net(
+        input,
+        output_stride=output_stride,
+        end_points=end_points,
+        decode_points=decode_point)
+    decode_shortcut = decode_shortcuts[decode_point]
+    return data, decode_shortcut
+def deeplabv3p(img, num_classes):
+    # Backbone设置：xception 或 mobilenetv2
+    if 'xception' in cfg.MODEL.DEEPLAB.BACKBONE:
+        data, decode_shortcut = xception(img)
+    elif 'mobilenet' in cfg.MODEL.DEEPLAB.BACKBONE:
+        data, decode_shortcut = mobilenetv2(img)
+    else:
+        raise Exception("deeplab only support xception and mobilenet backbone")
+    # 编码器解码器设置
+    cfg.MODEL.DEFAULT_EPSILON = 1e-5
+    if cfg.MODEL.DEEPLAB.ENCODER_WITH_ASPP:
+        data = encoder(data)
+    if cfg.MODEL.DEEPLAB.ENABLE_DECODER:
+        data = decoder(data, decode_shortcut)
+    # 根据类别数设置最后一个卷积层输出，并resize到图片原始尺寸
+    param_attr = fluid.ParamAttr(
+        name=name_scope + 'weights',
+        regularizer=fluid.regularizer.L2DecayRegularizer(
+            regularization_coeff=0.0),
+        initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
+    with scope('logit'):
+        logit = conv(
+            data,
+            num_classes,
+            1,
+            stride=1,
+            padding=0,
+            bias_attr=True,
+            param_attr=param_attr)
+        if cfg.MODEL.FP16:
+            logit = fluid.layers.cast(logit, 'float32')
+        logit = fluid.layers.resize_bilinear(logit, img.shape[2:])
+        if cfg.MODEL.FP16:
+            logit = fluid.layers.cast(logit, 'float16')
+    return logit
--- a/pdseg/models/modeling/icnet.py
+++ b/pdseg/models/modeling/icnet.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import paddle.fluid as fluid
+from utils.config import cfg
+from models.libs.model_libs import scope
+from models.libs.model_libs import bn, avg_pool, conv
+from models.backbone.resnet import ResNet as resnet_backbone
+import numpy as np
+def interp(input, out_shape):
+    out_shape = list(out_shape.astype("int32"))
+    return fluid.layers.resize_bilinear(input, out_shape=out_shape)
+def pyramis_pooling(input, input_shape):
+    shape = np.ceil(input_shape / 32).astype("int32")
+    h, w = shape
+    pool1 = avg_pool(input, [h, w], [h, w])
+    pool1_interp = interp(pool1, shape)
+    pool2 = avg_pool(input, [h // 2, w // 2], [h // 2, w // 2])
+    pool3 = avg_pool(input, [h // 3, w // 3], [h // 3, w // 3])
+    pool4 = avg_pool(input, [h // 4, w // 4], [h // 4, w // 4])
+    # official caffe repo eval use following hyparam
+    # pool2 = avg_pool(input, [17, 33], [16, 32])
+    # pool3 = avg_pool(input, [13, 25], [10, 20])
+    # pool4 = avg_pool(input, [8, 15], [5, 10])
+    pool2_interp = interp(pool2, shape)
+    pool3_interp = interp(pool3, shape)
+    pool4_interp = interp(pool4, shape)
+    conv5_3_sum = input + pool4_interp + pool3_interp + pool2_interp + pool1_interp
+    return conv5_3_sum
+def zero_padding(input, padding):
+    return fluid.layers.pad(input,
+                            [0, 0, 0, 0, padding, padding, padding, padding])
+def sub_net_4(input, input_shape):
+    tmp = pyramis_pooling(input, input_shape)
+    with scope("conv5_4_k1"):
+        tmp = conv(tmp, 256, 1, 1)
+        tmp = bn(tmp, act='relu')
+    tmp = interp(tmp, out_shape=np.ceil(input_shape / 16))
+    return tmp
+def sub_net_2(input):
+    with scope("conv3_1_sub2_proj"):
+        tmp = conv(input, 128, 1, 1)
+        tmp = bn(tmp)
+    return tmp
+def sub_net_1(input):
+    with scope("conv1_sub1"):
+        tmp = conv(input, 32, 3, 2, padding=1)
+        tmp = bn(tmp, act='relu')
+    with scope("conv2_sub1"):
+        tmp = conv(tmp, 32, 3, 2, padding=1)
+        tmp = bn(tmp, act='relu')
+    with scope("conv3_sub1"):
+        tmp = conv(tmp, 64, 3, 2, padding=1)
+        tmp = bn(tmp, act='relu')
+    with scope("conv3_sub1_proj"):
+        tmp = conv(tmp, 128, 1, 1)
+        tmp = bn(tmp)
+    return tmp
+def CCF24(sub2_out, sub4_out, input_shape):
+    with scope("conv_sub4"):
+        tmp = conv(sub4_out, 128, 3, dilation=2, padding=2)
+        tmp = bn(tmp)
+    tmp = tmp + sub2_out
+    tmp = fluid.layers.relu(tmp)
+    tmp = interp(tmp, np.ceil(input_shape / 8))
+    return tmp
+def CCF124(sub1_out, sub24_out, input_shape):
+    tmp = zero_padding(sub24_out, padding=2)
+    with scope("conv_sub2"):
+        tmp = conv(tmp, 128, 3, dilation=2)
+        tmp = bn(tmp)
+    tmp = tmp + sub1_out
+    tmp = fluid.layers.relu(tmp)
+    tmp = interp(tmp, input_shape // 4)
+    return tmp
+def resnet(input):
+    # ICNET backbone: resnet, 默认resnet50
+    # end_points: resnet终止层数
+    # decode_point: backbone引出分支所在层数
+    # resize_point：backbone所在的该层卷积尺寸缩小至1/2
+    # dilation_dict: resnet block数及对应的膨胀卷积尺度
+    scale = cfg.MODEL.ICNET.DEPTH_MULTIPLIER
+    layers = cfg.MODEL.ICNET.LAYERS
+    model = resnet_backbone(scale=scale, layers=layers, stem='icnet')
+    end_points = 49
+    decode_point = 13
+    resize_point = 13
+    dilation_dict = {2: 2, 3: 4}
+    data, decode_shortcuts = model.net(
+        input,
+        end_points=end_points,
+        decode_points=decode_point,
+        resize_points=resize_point,
+        dilation_dict=dilation_dict)
+    return data, decode_shortcuts[decode_point]
+def encoder(data13, data49, input, input_shape):
+    # ICENT encoder配置
+    # sub_net_4：对resnet49层数据进行pyramis_pooling操作
+    # sub_net_2：对resnet13层数据进行卷积操作
+    # sub_net_1: 对原始尺寸图像进行3次下采样卷积操作
+    sub4_out = sub_net_4(data49, input_shape)
+    sub2_out = sub_net_2(data13)
+    sub1_out = sub_net_1(input)
+    return sub1_out, sub2_out, sub4_out
+def decoder(sub1_out, sub2_out, sub4_out, input_shape):
+    # ICENT decoder配置
+    # CCF: Cascade Feature Fusion 级联特征融合
+    sub24_out = CCF24(sub2_out, sub4_out, input_shape)
+    sub124_out = CCF124(sub1_out, sub24_out, input_shape)
+    return sub24_out, sub124_out
+def get_logit(data, num_classes, name="logit"):
+    param_attr = fluid.ParamAttr(
+        name=name + 'weights',
+        regularizer=fluid.regularizer.L2DecayRegularizer(
+            regularization_coeff=0.0),
+        initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
+    with scope(name):
+        data = conv(
+            data,
+            num_classes,
+            1,
+            stride=1,
+            padding=0,
+            param_attr=param_attr,
+            bias_attr=True)
+    return data
+def icnet(input, num_classes):
+    # Backbone resnet: 输入 image_sub2: 图片尺寸缩小至1/2
+    #                  输出 data49: resnet第49层数据，原始尺寸1/32
+    #                       data13：resnet第13层数据, 原始尺寸1/16
+    input_shape = input.shape[2:]
+    input_shape = np.array(input_shape).astype("float32")
+    image_sub2 = interp(input, out_shape=np.ceil(input_shape * 0.5))
+    data49, data13 = resnet(image_sub2)
+    # encoder：输入：input, data13, data49，分别进行下采样，卷积和金字塔pooling操作
+    #          输出：分别对应sub1_out, sub2_out, sub4_out
+    sub1_out, sub2_out, sub4_out = encoder(data13, data49, input, input_shape)
+    # decoder: 对编码器三个分支结果进行级联特征融合
+    sub24_out, sub124_out = decoder(sub1_out, sub2_out, sub4_out, input_shape)
+    # get_logit: 根据类别数决定最后一层卷积输出
+    logit124 = get_logit(sub124_out, num_classes, "logit124")
+    logit4 = get_logit(sub4_out, num_classes, "logit4")
+    logit24 = get_logit(sub24_out, num_classes, "logit24")
+    return logit124, logit24, logit4
+if __name__ == '__main__':
+    image_shape = [3, 320, 320]
+    image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
+    logit = icnet(image, 4)
+    print("logit:", logit.shape)
--- a/pdseg/models/modeling/unet.py
+++ b/pdseg/models/modeling/unet.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import contextlib
+import paddle
+import paddle.fluid as fluid
+from utils.config import cfg
+from models.libs.model_libs import scope, name_scope
+from models.libs.model_libs import bn, bn_relu, relu
+from models.libs.model_libs import conv, max_pool, deconv
+def double_conv(data, out_ch):
+    param_attr = fluid.ParamAttr(
+        name='weights',
+        regularizer=fluid.regularizer.L2DecayRegularizer(
+            regularization_coeff=0.0),
+        initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.33))
+    with scope("conv0"):
+        data = bn_relu(
+            conv(data, out_ch, 3, stride=1, padding=1, param_attr=param_attr))
+    with scope("conv1"):
+        data = bn_relu(
+            conv(data, out_ch, 3, stride=1, padding=1, param_attr=param_attr))
+    return data
+def down(data, out_ch):
+    # 下采样：max_pool + 2个卷积
+    with scope("down"):
+        data = max_pool(data, 2, 2, 0)
+        data = double_conv(data, out_ch)
+    return data
+def up(data, short_cut, out_ch):
+    # 上采样：data上采样(resize或deconv), 并与short_cut concat
+    param_attr = fluid.ParamAttr(
+        name='weights',
+        regularizer=fluid.regularizer.L2DecayRegularizer(
+            regularization_coeff=0.0),
+        initializer=fluid.initializer.XavierInitializer(),
+    )
+    with scope("up"):
+        if cfg.MODEL.UNET.UPSAMPLE_MODE == 'bilinear':
+            data = fluid.layers.resize_bilinear(data, short_cut.shape[2:])
+        else:
+            data = deconv(
+                data,
+                out_ch // 2,
+                filter_size=2,
+                stride=2,
+                padding=0,
+                param_attr=param_attr)
+        data = fluid.layers.concat([data, short_cut], axis=1)
+        data = double_conv(data, out_ch)
+    return data
+def encode(data):
+    # 编码器设置
+    short_cuts = []
+    with scope("encode"):
+        with scope("block1"):
+            data = double_conv(data, 64)
+            short_cuts.append(data)
+        with scope("block2"):
+            data = down(data, 128)
+            short_cuts.append(data)
+        with scope("block3"):
+            data = down(data, 256)
+            short_cuts.append(data)
+        with scope("block4"):
+            data = down(data, 512)
+            short_cuts.append(data)
+        with scope("block5"):
+            data = down(data, 512)
+    return data, short_cuts
+def decode(data, short_cuts):
+    # 解码器设置，与编码器对称
+    with scope("decode"):
+        with scope("decode1"):
+            data = up(data, short_cuts[3], 256)
+        with scope("decode2"):
+            data = up(data, short_cuts[2], 128)
+        with scope("decode3"):
+            data = up(data, short_cuts[1], 64)
+        with scope("decode4"):
+            data = up(data, short_cuts[0], 64)
+    return data
+def get_logit(data, num_classes):
+    # 根据类别数设置最后一个卷积层输出
+    param_attr = fluid.ParamAttr(
+        name='weights',
+        regularizer=fluid.regularizer.L2DecayRegularizer(
+            regularization_coeff=0.0),
+        initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
+    with scope("logit"):
+        data = conv(
+            data, num_classes, 3, stride=1, padding=1, param_attr=param_attr)
+    return data
+def unet(input, num_classes):
+    # UNET网络配置，对称的编码器解码器
+    encode_data, short_cuts = encode(input)
+    decode_data = decode(encode_data, short_cuts)
+    logit = get_logit(decode_data, num_classes)
+    return logit
+if __name__ == '__main__':
+    image_shape = [3, 320, 320]
+    image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
+    logit = unet(image, 4)
+    print("logit:", logit.shape)
--- a/pdseg/reader.py
+++ b/pdseg/reader.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import print_function
+import sys
+import os
+import math
+import random
+import functools
+import io
+import time
+import codecs
+import numpy as np
+import paddle
+import paddle.fluid as fluid
+import cv2
+import data_aug as aug
+from utils.config import cfg
+from data_utils import GeneratorEnqueuer
+from models.model_builder import ModelPhase
+def cv2_imread(file_path, flag=cv2.IMREAD_COLOR):
+    # resolve cv2.imread open Chinese file path issues on Windows Platform.
+    return cv2.imdecode(np.fromfile(file_path, dtype=np.uint8), flag)
+class SegDataset(object):
+    def __init__(self,
+                 file_list,
+                 data_dir,
+                 shuffle=False,
+                 mode=ModelPhase.TRAIN):
+        self.mode = mode
+        self.shuffle = shuffle
+        self.data_dir = data_dir
+        # NOTE: Please ensure file list was save in UTF-8 coding format
+        with codecs.open(file_list, 'r', 'utf-8') as flist:
+            self.lines = [line.strip() for line in flist]
+            if shuffle:
+                np.random.shuffle(self.lines)
+    def generator(self):
+        if self.shuffle:
+            np.random.shuffle(self.lines)
+        for line in self.lines:
+            yield self.process_image(line, self.data_dir, self.mode)
+    def sharding_generator(self, pid=0, num_processes=1):
+        """
+        Use line id as shard key for multiprocess io
+        It's a normal generator if pid=0, num_processes=1
+        """
+        for index, line in enumerate(self.lines):
+            # Use index and pid to shard file list
+            if index % num_processes == pid:
+                yield self.process_image(line, self.data_dir, self.mode)
+    def batch_reader(self, batch_size):
+        br = self.batch(self.reader, batch_size)
+        for batch in br:
+            yield batch[0], batch[1], batch[2]
+    def multiprocess_generator(self, max_queue_size=32, num_processes=8):
+        # Re-shuffle file list
+        if self.shuffle:
+            np.random.shuffle(self.lines)
+        # Create multiple sharding generators according to num_processes for multiple processes
+        generators = []
+        for pid in range(num_processes):
+            generators.append(self.sharding_generator(pid, num_processes))
+        try:
+            enqueuer = GeneratorEnqueuer(generators)
+            enqueuer.start(max_queue_size=max_queue_size, workers=num_processes)
+            while True:
+                generator_out = None
+                while enqueuer.is_running():
+                    if not enqueuer.queue.empty():
+                        generator_out = enqueuer.queue.get(timeout=5)
+                        break
+                    else:
+                        time.sleep(0.01)
+                if generator_out is None:
+                    break
+                yield generator_out
+        finally:
+            if enqueuer is not None:
+                enqueuer.stop()
+    def batch(self, reader, batch_size, is_test=False, drop_last=False):
+        def batch_reader(is_test=False, drop_last=drop_last):
+            if is_test:
+                imgs, img_names, valid_shapes, org_shapes = [], [], [], []
+                for img, img_name, valid_shape, org_shape in reader():
+                    imgs.append(img)
+                    img_names.append(img_name)
+                    valid_shapes.append(valid_shape)
+                    org_shapes.append(org_shape)
+                    if len(imgs) == batch_size:
+                        yield np.array(imgs), img_names, np.array(
+                            valid_shapes), np.array(org_shapes)
+                        imgs, img_names, valid_shapes, org_shapes = [], [], [], []
+                if not drop_last and len(imgs) > 0:
+                    yield np.array(imgs), img_names, np.array(
+                        valid_shapes), np.array(org_shapes)
+            else:
+                imgs, labs, ignore = [], [], []
+                bs = 0
+                for img, lab, ig in reader():
+                    imgs.append(img)
+                    labs.append(lab)
+                    ignore.append(ig)
+                    bs += 1
+                    if bs == batch_size:
+                        yield np.array(imgs), np.array(labs), np.array(ignore)
+                        bs = 0
+                        imgs, labs, ignore = [], [], []
+                if not drop_last and bs > 0:
+                    yield np.array(imgs), np.array(labs), np.array(ignore)
+        return batch_reader(is_test, drop_last)
+    def load_image(self, line, src_dir, mode=ModelPhase.TRAIN):
+        # original image cv2.imread flag setting
+        cv2_imread_flag = cv2.IMREAD_COLOR
+        if cfg.DATASET.IMAGE_TYPE == "rgba":
+            # If use RBGA 4 channel ImageType, use IMREAD_UNCHANGED flags to
+            # reserver alpha channel
+            cv2_imread_flag = cv2.IMREAD_UNCHANGED
+        if mode == ModelPhase.TRAIN or mode == ModelPhase.EVAL:
+            parts = line.strip().split(cfg.DATASET.SEPARATOR)
+            if len(parts) != 2:
+                raise Exception("File list format incorrect! It should be"
+                                " image_name{}label_name\\n".format(
+                                    cfg.DATASET.SEPARATOR))
+            img_name, grt_name = parts[0], parts[1]
+            img_path = os.path.join(src_dir, img_name)
+            grt_path = os.path.join(src_dir, grt_name)
+            img = cv2_imread(img_path, cv2_imread_flag)
+            grt = cv2_imread(grt_path, cv2.IMREAD_GRAYSCALE)
+            if img is None or grt is None:
+                raise Exception(
+                    "Empty image, src_dir: {}, img: {} & lab: {}".format(
+                        src_dir, img_path, grt_path))
+            img_height = img.shape[0]
+            img_width = img.shape[1]
+            grt_height = grt.shape[0]
+            grt_width = grt.shape[1]
+            if img_height != grt_height or img_width != grt_width:
+                raise Exception(
+                    "source img and label img must has the same size")
+            if len(img.shape) < 3:
+                img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
+            img_channels = img.shape[2]
+            if img_channels < 3:
+                raise Exception(
+                    "PaddleSeg only supports gray, rgb or rgba image")
+            if img_channels != cfg.DATASET.DATA_DIM:
+                raise Exception(
+                    "Input image channel({}) is not match cfg.DATASET.DATA_DIM({}), img_name={}"
+                    .format(img_channels, cfg.DATASET.DATADIM, img_name))
+            if img_channels != len(cfg.MEAN):
+                raise Exception(
+                    "img name {}, img chns {} mean size {}, size unequal".
+                    format(img_name, img_channels, len(cfg.MEAN)))
+            if img_channels != len(cfg.STD):
+                raise Exception(
+                    "img name {}, img chns {} std size {}, size unequal".format(
+                        img_name, img_channels, len(cfg.STD)))
+        # visualization mode
+        elif mode == ModelPhase.VISUAL:
+            if cfg.DATASET.SEPARATOR in line:
+                parts = line.strip().split(cfg.DATASET.SEPARATOR)
+                img_name = parts[0]
+            else:
+                img_name = line.strip()
+            img_path = os.path.join(src_dir, img_name)
+            img = cv2_imread(img_path, cv2_imread_flag)
+            if img is None:
+                raise Exception("empty image, src_dir:{}, img: {}".format(
+                    src_dir, img_name))
+            # Convert grayscale image to BGR 3 channel image
+            if len(img.shape) < 3:
+                img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
+            img_height = img.shape[0]
+            img_width = img.shape[1]
+            img_channels = img.shape[2]
+            if img_channels < 3:
+                raise Exception("this repo only recept gray, rgb or rgba image")
+            if img_channels != cfg.DATASET.DATA_DIM:
+                raise Exception("data dim must equal to image channels")
+            if img_channels != len(cfg.MEAN):
+                raise Exception(
+                    "img name {}, img chns {} mean size {}, size unequal".
+                    format(img_name, img_channels, len(cfg.MEAN)))
+            if img_channels != len(cfg.STD):
+                raise Exception(
+                    "img name {}, img chns {} std size {}, size unequal".format(
+                        img_name, img_channels, len(cfg.STD)))
+            grt = None
+            grt_name = None
+        else:
+            raise ValueError("mode error: {}".format(mode))
+        return img, grt, img_name, grt_name
+    def normalize_image(self, img):
+        """ 像素归一化后减均值除方差 """
+        img = img.transpose((2, 0, 1)).astype('float32') / 255.0
+        img_mean = np.array(cfg.MEAN).reshape((len(cfg.MEAN), 1, 1))
+        img_std = np.array(cfg.STD).reshape((len(cfg.STD), 1, 1))
+        img -= img_mean
+        img /= img_std
+        return img
+    def process_image(self, line, data_dir, mode):
+        """ process_image """
+        img, grt, img_name, grt_name = self.load_image(
+            line, data_dir, mode=mode)
+        if mode == ModelPhase.TRAIN:
+            img, grt = aug.resize(img, grt, mode)
+            if cfg.AUG.RICH_CROP.ENABLE:
+                if cfg.AUG.RICH_CROP.BLUR:
+                    if cfg.AUG.RICH_CROP.BLUR_RATIO <= 0:
+                        n = 0
+                    elif cfg.AUG.RICH_CROP.BLUR_RATIO >= 1:
+                        n = 1
+                    else:
+                        n = int(1.0 / cfg.AUG.RICH_CROP.BLUR_RATIO)
+                    if n > 0:
+                        if np.random.randint(0, n) == 0:
+                            radius = np.random.randint(3, 10)
+                            if radius % 2 != 1:
+                                radius = radius + 1
+                            if radius > 9:
+                                radius = 9
+                            img = cv2.GaussianBlur(img, (radius, radius), 0, 0)
+                img, grt = aug.random_rotation(
+                    img,
+                    grt,
+                    rich_crop_max_rotation=cfg.AUG.RICH_CROP.MAX_ROTATION,
+                    mean_value=cfg.MEAN)
+                img, grt = aug.rand_scale_aspect(
+                    img,
+                    grt,
+                    rich_crop_min_scale=cfg.AUG.RICH_CROP.MIN_AREA_RATIO,
+                    rich_crop_aspect_ratio=cfg.AUG.RICH_CROP.ASPECT_RATIO)
+                img = aug.hsv_color_jitter(
+                    img,
+                    brightness_jitter_ratio=cfg.AUG.RICH_CROP.
+                    BRIGHTNESS_JITTER_RATIO,
+                    saturation_jitter_ratio=cfg.AUG.RICH_CROP.
+                    SATURATION_JITTER_RATIO,
+                    contrast_jitter_ratio=cfg.AUG.RICH_CROP.
+                    CONTRAST_JITTER_RATIO)
+                if cfg.AUG.RICH_CROP.FLIP:
+                    if cfg.AUG.RICH_CROP.FLIP_RATIO <= 0:
+                        n = 0
+                    elif cfg.AUG.RICH_CROP.FLIP_RATIO >= 1:
+                        n = 1
+                    else:
+                        n = int(1.0 / cfg.AUG.RICH_CROP.FLIP_RATIO)
+                    if n > 0:
+                        if np.random.randint(0, n) == 0:
+                            img = img[::-1, :, :]
+                            grt = grt[::-1, :]
+            if cfg.AUG.MIRROR:
+                if np.random.randint(0, 2) == 1:
+                    img = img[:, ::-1, :]
+                    grt = grt[:, ::-1]
+            img, grt = aug.rand_crop(img, grt, mode=mode)
+        elif ModelPhase.is_eval(mode):
+            img, grt = aug.resize(img, grt, mode=mode)
+            img, grt = aug.rand_crop(img, grt, mode=mode)
+        elif ModelPhase.is_visual(mode):
+            org_shape = [img.shape[0], img.shape[1]]
+            img, grt = aug.resize(img, grt, mode=mode)
+            valid_shape = [img.shape[0], img.shape[1]]
+            img, grt = aug.rand_crop(img, grt, mode=mode)
+        else:
+            raise ValueError("Dataset mode={} Error!".format(mode))
+        # Normalize image
+        img = self.normalize_image(img)
+        if ModelPhase.is_train(mode) or ModelPhase.is_eval(mode):
+            grt = np.expand_dims(np.array(grt).astype('int32'), axis=0)
+            ignore = (grt != cfg.DATASET.IGNORE_INDEX).astype('int32')
+        if ModelPhase.is_train(mode):
+            return (img, grt, ignore)
+        elif ModelPhase.is_eval(mode):
+            return (img, grt, ignore)
+        elif ModelPhase.is_visual(mode):
+            return (img, img_name, valid_shape, org_shape)
--- a/pdseg/solver.py
+++ b/pdseg/solver.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import sys
+import paddle.fluid as fluid
+import numpy as np
+import importlib
+from utils.config import cfg
+from paddle.fluid.contrib.mixed_precision.fp16_utils import create_master_params_grads, master_param_to_train_param
+class Solver(object):
+    def __init__(self, main_prog, start_prog):
+        total_images = cfg.DATASET.TRAIN_TOTAL_IMAGES
+        self.weight_decay = cfg.SOLVER.WEIGHT_DECAY
+        self.momentum = cfg.SOLVER.MOMENTUM
+        self.momentum2 = cfg.SOLVER.MOMENTUM2
+        self.step_per_epoch = total_images // cfg.BATCH_SIZE
+        if total_images % cfg.BATCH_SIZE != 0:
+            self.step_per_epoch += 1
+        self.total_step = cfg.SOLVER.NUM_EPOCHS * self.step_per_epoch
+        self.main_prog = main_prog
+        self.start_prog = start_prog
+    def piecewise_decay(self):
+        gamma = cfg.SOLVER.GAMMA
+        bd = [self.step_per_epoch * e for e in cfg.SOLVER.DECAY_EPOCH]
+        lr = [cfg.SOLVER.LR * (gamma**i) for i in range(len(bd) + 1)]
+        decayed_lr = fluid.layers.piecewise_decay(boundaries=bd, values=lr)
+        return decayed_lr
+    def poly_decay(self):
+        power = cfg.SOLVER.POWER
+        decayed_lr = fluid.layers.polynomial_decay(
+            cfg.SOLVER.LR, self.total_step, end_learning_rate=0, power=power)
+        return decayed_lr
+    def cosine_decay(self):
+        decayed_lr = fluid.layers.cosine_decay(
+            cfg.SOLVER.LR, self.step_per_epoch, cfg.SOLVER.NUM_EPOCHS)
+        return decayed_lr
+    def get_lr(self, lr_policy):
+        if lr_policy.lower() == 'poly':
+            decayed_lr = self.poly_decay()
+        elif lr_policy.lower() == 'piecewise':
+            decayed_lr = self.piecewise_decay()
+        elif lr_policy.lower() == 'cosine':
+            decayed_lr = self.cosine_decay()
+        else:
+            raise Exception(
+                "unsupport learning decay policy! only support poly,piecewise,cosine"
+            )
+        return decayed_lr
+    def sgd_optimizer(self, lr_policy, loss):
+        decayed_lr = self.get_lr(lr_policy)
+        optimizer = fluid.optimizer.Momentum(
+            learning_rate=decayed_lr,
+            momentum=self.momentum,
+            regularization=fluid.regularizer.L2Decay(
+                regularization_coeff=self.weight_decay),
+        )
+        if cfg.MODEL.FP16:
+            params_grads = optimizer.backward(loss, self.start_prog)
+            master_params_grads = create_master_params_grads(
+                params_grads, self.main_prog, self.start_prog,
+                cfg.MODEL.SCALE_LOSS)
+            optimizer.apply_gradients(master_params_grads)
+            master_param_to_train_param(master_params_grads, params_grads,
+                                        self.main_prog)
+        else:
+            optimizer.minimize(loss)
+        return decayed_lr
+    def adam_optimizer(self, lr_policy, loss):
+        decayed_lr = self.get_lr(lr_policy)
+        optimizer = fluid.optimizer.Adam(
+            learning_rate=decayed_lr,
+            beta1=self.momentum,
+            beta2=self.momentum2,
+            regularization=fluid.regularizer.L2Decay(
+                regularization_coeff=self.weight_decay),
+        )
+        optimizer.minimize(loss)
+        return decayed_lr
+    def optimise(self, loss):
+        lr_policy = cfg.SOLVER.LR_POLICY
+        opt = cfg.SOLVER.OPTIMIZER
+        if opt.lower() == 'adam':
+            return self.adam_optimizer(lr_policy, loss)
+        elif opt.lower() == 'sgd':
+            return self.sgd_optimizer(lr_policy, loss)
+        else:
+            raise Exception(
+                "unsupport optimizer solver, only support adam and sgd")
--- a/pdseg/train.py
+++ b/pdseg/train.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+# GPU memory garbage collection optimization flags
+os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
+import sys
+import argparse
+import pprint
+import shutil
+import functools
+import paddle
+import numpy as np
+import paddle.fluid as fluid
+from utils.config import cfg
+from utils.timer import Timer, calculate_eta
+from metrics import ConfusionMatrix
+from reader import SegDataset
+from models.model_builder import build_model
+from models.model_builder import ModelPhase
+from models.model_builder import parse_shape_from_file
+from eval import evaluate
+from vis import visualize
+from utils.fp16_utils import load_fp16_vars
+def parse_args():
+    parser = argparse.ArgumentParser(description='PaddleSeg training')
+    parser.add_argument(
+        '--cfg',
+        dest='cfg_file',
+        help='Config file for training (and optionally testing)',
+        default=None,
+        type=str)
+    parser.add_argument(
+        '--use_gpu',
+        dest='use_gpu',
+        help='Use gpu or cpu',
+        action='store_true',
+        default=False)
+    parser.add_argument(
+        '--use_mpio',
+        dest='use_mpio',
+        help='Use multiprocess I/O or not',
+        action='store_true',
+        default=False)
+    parser.add_argument(
+        '--log_steps',
+        dest='log_steps',
+        help='Display logging information at every log_steps',
+        default=10,
+        type=int)
+    parser.add_argument(
+        '--debug',
+        dest='debug',
+        help='debug mode, display detail information of training',
+        action='store_true')
+    parser.add_argument(
+        '--use_tb',
+        dest='use_tb',
+        help='whether to record the data during training to Tensorboard',
+        action='store_true')
+    parser.add_argument(
+        '--tb_log_dir',
+        dest='tb_log_dir',
+        help='Tensorboard logging directory',
+        default=None,
+        type=str)
+    parser.add_argument(
+        '--do_eval',
+        dest='do_eval',
+        help='Evaluation models result on every new checkpoint',
+        action='store_true')
+    parser.add_argument(
+        'opts',
+        help='See utils/config.py for all options',
+        default=None,
+        nargs=argparse.REMAINDER)
+    return parser.parse_args()
+def save_vars(executor, dirname, program=None, vars=None):
+    """
+    Temporary resolution for Win save variables compatability.
+    Will fix in PaddlePaddle v1.5.2
+    """
+    save_program = fluid.Program()
+    save_block = save_program.global_block()
+    for each_var in vars:
+        # NOTE: don't save the variable which type is RAW
+        if each_var.type == fluid.core.VarDesc.VarType.RAW:
+            continue
+        new_var = save_block.create_var(
+            name=each_var.name,
+            shape=each_var.shape,
+            dtype=each_var.dtype,
+            type=each_var.type,
+            lod_level=each_var.lod_level,
+            persistable=True)
+        file_path = os.path.join(dirname, new_var.name)
+        file_path = os.path.normpath(file_path)
+        save_block.append_op(
+            type='save',
+            inputs={'X': [new_var]},
+            outputs={},
+            attrs={'file_path': file_path})
+    executor.run(save_program)
+def save_checkpoint(exe, program, ckpt_name):
+    """
+    Save checkpoint for evaluation or resume training
+    """
+    ckpt_dir = os.path.join(cfg.TRAIN.MODEL_SAVE_DIR, str(ckpt_name))
+    print("Save model checkpoint to {}".format(ckpt_dir))
+    if not os.path.isdir(ckpt_dir):
+        os.makedirs(ckpt_dir)
+    save_vars(
+        exe,
+        ckpt_dir,
+        program,
+        vars=list(filter(fluid.io.is_persistable, program.list_vars())))
+    return ckpt_dir
+def load_checkpoint(exe, program):
+    """
+    Load checkpoiont from pretrained model directory for resume training
+    """
+    print('Resume model training from:', cfg.TRAIN.PRETRAINED_MODEL)
+    if not os.path.exists(cfg.TRAIN.PRETRAINED_MODEL):
+        raise ValueError("TRAIN.PRETRAIN_MODEL {} not exist!".format(
+            cfg.TRAIN.PRETRAINED_MODEL))
+    fluid.io.load_persistables(
+        exe, cfg.TRAIN.PRETRAINED_MODEL, main_program=program)
+    model_path = cfg.TRAIN.PRETRAINED_MODEL
+    # Check is path ended by path spearator
+    if model_path[-1] == os.sep:
+        model_path = model_path[0:-1]
+    epoch_name = os.path.basename(model_path)
+    # If resume model is final model
+    if epoch_name == 'final':
+        begin_epoch = cfg.SOLVER.NUM_EPOCHS
+    # If resume model path is end of digit, restore epoch status
+    elif epoch_name.isdigit():
+        epoch = int(epoch_name)
+        begin_epoch = epoch + 1
+    else:
+        raise ValueError("Resume model path is not valid!")
+    print("Model checkpoint loaded successfully!")
+    return begin_epoch
+def train(cfg):
+    startup_prog = fluid.Program()
+    train_prog = fluid.Program()
+    drop_last = True
+    dataset = SegDataset(
+        file_list=cfg.DATASET.TRAIN_FILE_LIST,
+        mode=ModelPhase.TRAIN,
+        shuffle=True,
+        data_dir=cfg.DATASET.DATA_DIR)
+    def data_generator():
+        if args.use_mpio:
+            print("Use multiprocess reader")
+            data_gen = dataset.multiprocess_generator(
+                num_processes=cfg.DATALOADER.NUM_WORKERS,
+                max_queue_size=cfg.DATALOADER.BUF_SIZE)
+        else:
+            print("Use multi-thread reader")
+            data_gen = dataset.generator()
+        batch_data = []
+        for b in data_gen:
+            batch_data.append(b)
+            if len(batch_data) == cfg.BATCH_SIZE:
+                for item in batch_data:
+                    yield item[0], item[1], item[2]
+                batch_data = []
+        # If use sync batch norm strategy, drop last batch if number of samples
+        # in batch_data is less then cfg.BATCH_SIZE to avoid NCCL hang issues
+        if not cfg.TRAIN.SYNC_BATCH_NORM:
+            for item in batch_data:
+                yield item[0], item[1], item[2]
+    # Get device environment
+    places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
+    place = places[0]
+    # Get number of GPU
+    dev_count = len(places)
+    print("#GPU-Devices: {}".format(dev_count))
+    # Make sure BATCH_SIZE can divided by GPU cards
+    assert cfg.BATCH_SIZE % dev_count == 0, (
+        'BATCH_SIZE:{} not divisble by number of GPUs:{}'.format(
+            cfg.BATCH_SIZE, dev_count))
+    # If use multi-gpu training mode, batch data will allocated to each GPU evenly
+    batch_size_per_dev = cfg.BATCH_SIZE // dev_count
+    print("batch_size_per_dev: {}".format(batch_size_per_dev))
+    py_reader, avg_loss, lr, pred, grts, masks = build_model(
+        train_prog, startup_prog, phase=ModelPhase.TRAIN)
+    py_reader.decorate_sample_generator(
+        data_generator, batch_size=batch_size_per_dev, drop_last=drop_last)
+    exe = fluid.Executor(place)
+    exe.run(startup_prog)
+    exec_strategy = fluid.ExecutionStrategy()
+    # Clear temporary variables every 100 iteration
+    if args.use_gpu:
+        exec_strategy.num_threads = fluid.core.get_cuda_device_count()
+    exec_strategy.num_iteration_per_drop_scope = 100
+    build_strategy = fluid.BuildStrategy()
+    if cfg.TRAIN.SYNC_BATCH_NORM and args.use_gpu:
+        if dev_count > 1:
+            # Apply sync batch norm strategy
+            print("Sync BatchNorm strategy is effective.")
+            build_strategy.sync_batch_norm = True
+        else:
+            print("Sync BatchNorm strategy will not be effective if GPU device"
+                  " count <= 1")
+    compiled_train_prog = fluid.CompiledProgram(train_prog).with_data_parallel(
+        loss_name=avg_loss.name,
+        exec_strategy=exec_strategy,
+        build_strategy=build_strategy)
+    # Resume training
+    begin_epoch = cfg.SOLVER.BEGIN_EPOCH
+    if cfg.TRAIN.RESUME:
+        begin_epoch = load_checkpoint(exe, train_prog)
+    # Load pretrained model
+    elif os.path.exists(cfg.TRAIN.PRETRAINED_MODEL):
+        print('Pretrained model dir:', cfg.TRAIN.PRETRAINED_MODEL)
+        load_vars = []
+        def var_shape_matched(var, shape):
+            """
+            Check whehter persitable variable shape is match with current network
+            """
+            var_exist = os.path.exists(
+                os.path.join(cfg.TRAIN.PRETRAINED_MODEL, var.name))
+            if var_exist:
+                var_shape = parse_shape_from_file(
+                    os.path.join(cfg.TRAIN.PRETRAINED_MODEL, var.name))
+                if var_shape == shape:
+                    return True
+                else:
+                    print(
+                        "Variable[{}] shape does not match current network, skip"
+                        " to load it.".format(var.name))
+                    return False
+        for x in train_prog.list_vars():
+            if isinstance(x, fluid.framework.Parameter):
+                shape = tuple(fluid.global_scope().find_var(
+                    x.name).get_tensor().shape())
+                if var_shape_matched(x, shape):
+                    load_vars.append(x)
+        if cfg.MODEL.FP16:
+            # If open FP16 training mode, load FP16 var separate
+            load_fp16_vars(exe, cfg.TRAIN.PRETRAINED_MODEL, train_prog)
+        else:
+            fluid.io.load_vars(
+                exe, dirname=cfg.TRAIN.PRETRAINED_MODEL, vars=load_vars)
+        print("Pretrained model loaded successfully!")
+    else:
+        print('Pretrained model dir {} not exists, training from scratch...'.
+              format(cfg.TRAIN.PRETRAINED_MODEL))
+    fetch_list = [avg_loss.name, lr.name]
+    if args.debug:
+        # Fetch more variable info and use streaming confusion matrix to
+        # calculate IoU results if in debug mode
+        np.set_printoptions(
+            precision=4, suppress=True, linewidth=160, floatmode="fixed")
+        fetch_list.extend([pred.name, grts.name, masks.name])
+        cm = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
+    if args.use_tb:
+        if not args.tb_log_dir:
+            print("Please specify the log directory by --tb_log_dir.")
+            exit(1)
+        from tb_paddle import SummaryWriter
+        if os.path.exists(args.tb_log_dir):
+            shutil.rmtree(args.tb_log_dir)
+        log_writer = SummaryWriter(args.tb_log_dir)
+    global_step = 0
+    all_step = cfg.DATASET.TRAIN_TOTAL_IMAGES // cfg.BATCH_SIZE
+    if cfg.DATASET.TRAIN_TOTAL_IMAGES % cfg.BATCH_SIZE and drop_last != True:
+        all_step += 1
+    all_step *= (cfg.SOLVER.NUM_EPOCHS - begin_epoch + 1)
+    avg_loss = 0.0
+    timer = Timer()
+    timer.start()
+    if begin_epoch > cfg.SOLVER.NUM_EPOCHS:
+        raise ValueError(
+            ("begin epoch[{}] is larger than cfg.SOLVER.NUM_EPOCHS[{}]").format(
+                begin_epoch, cfg.SOLVER.NUM_EPOCHS))
+    for epoch in range(begin_epoch, cfg.SOLVER.NUM_EPOCHS + 1):
+        py_reader.start()
+        while True:
+            try:
+                if args.debug:
+                    # Print category IoU and accuracy to check whether the
+                    # traning process is corresponed to expectation
+                    loss, lr, pred, grts, masks = exe.run(
+                        program=compiled_train_prog,
+                        fetch_list=fetch_list,
+                        return_numpy=True)
+                    cm.calculate(pred, grts, masks)
+                    avg_loss += np.mean(np.array(loss))
+                    global_step += 1
+                    if global_step % args.log_steps == 0:
+                        speed = args.log_steps / timer.elapsed_time()
+                        avg_loss /= args.log_steps
+                        category_acc, mean_acc = cm.accuracy()
+                        category_iou, mean_iou = cm.mean_iou()
+                        print((
+                            "epoch={} step={} lr={:.5f} loss={:.4f} acc={:.5f} mIoU={:.5f} step/sec={:.3f} | ETA {}"
+                        ).format(epoch, global_step, lr[0], avg_loss, mean_acc,
+                                 mean_iou, speed,
+                                 calculate_eta(all_step - global_step, speed)))
+                        print("Category IoU:", category_iou)
+                        print("Category Acc:", category_acc)
+                        if args.use_tb:
+                            log_writer.add_scalar('Train/mean_iou', mean_iou,
+                                                  global_step)
+                            log_writer.add_scalar('Train/mean_acc', mean_acc,
+                                                  global_step)
+                            log_writer.add_scalar('Train/loss', avg_loss,
+                                                  global_step)
+                            log_writer.add_scalar('Train/lr', lr[0],
+                                                  global_step)
+                            log_writer.add_scalar('Train/step/sec', speed,
+                                                  global_step)
+                        sys.stdout.flush()
+                        avg_loss = 0.0
+                        cm.zero_matrix()
+                        timer.restart()
+                else:
+                    # If not in debug mode, avoid unnessary log and calculate
+                    loss, lr = exe.run(
+                        program=compiled_train_prog,
+                        fetch_list=fetch_list,
+                        return_numpy=True)
+                    avg_loss += np.mean(np.array(loss))
+                    global_step += 1
+                    if global_step % args.log_steps == 0:
+                        avg_loss /= args.log_steps
+                        speed = args.log_steps / timer.elapsed_time()
+                        print((
+                            "epoch={} step={} lr={:.5f} loss={:.4f} step/sec={:.3f} | ETA {}"
+                        ).format(epoch, global_step, lr[0], avg_loss, speed,
+                                 calculate_eta(all_step - global_step, speed)))
+                        if args.use_tb:
+                            log_writer.add_scalar('Train/loss', avg_loss,
+                                                  global_step)
+                            log_writer.add_scalar('Train/lr', lr[0],
+                                                  global_step)
+                            log_writer.add_scalar('Train/speed', speed,
+                                                  global_step)
+                        sys.stdout.flush()
+                        avg_loss = 0.0
+                        timer.restart()
+            except fluid.core.EOFException:
+                py_reader.reset()
+                break
+            except Exception as e:
+                print(e)
+        if epoch % cfg.TRAIN.SNAPSHOT_EPOCH == 0:
+            ckpt_dir = save_checkpoint(exe, train_prog, epoch)
+            if args.do_eval:
+                print("Evaluation start")
+                _, mean_iou, _, mean_acc = evaluate(
+                    cfg=cfg,
+                    ckpt_dir=ckpt_dir,
+                    use_gpu=args.use_gpu,
+                    use_mpio=args.use_mpio)
+                if args.use_tb:
+                    log_writer.add_scalar('Evaluate/mean_iou', mean_iou,
+                                          global_step)
+                    log_writer.add_scalar('Evaluate/mean_acc', mean_acc,
+                                          global_step)
+            # Use Tensorboard to visualize results
+            if args.use_tb and cfg.DATASET.VIS_FILE_LIST is not None:
+                visualize(
+                    cfg=cfg,
+                    use_gpu=args.use_gpu,
+                    vis_file_list=cfg.DATASET.VIS_FILE_LIST,
+                    vis_dir="visual",
+                    ckpt_dir=ckpt_dir,
+                    log_writer=log_writer)
+    # save final model
+    save_checkpoint(exe, train_prog, 'final')
+def main(args):
+    if args.cfg_file is not None:
+        cfg.update_from_file(args.cfg_file)
+    if args.opts is not None:
+        cfg.update_from_list(args.opts)
+    cfg.check_and_infer(reset_dataset=True)
+    print(pprint.pformat(cfg))
+    train(cfg)
+if __name__ == '__main__':
+    args = parse_args()
+    if fluid.core.is_compiled_with_cuda() != True and args.use_gpu == True:
+        print(
+            "You can not set use_gpu = True in the model because you are using paddlepaddle-cpu."
+        )
+        print(
+            "Please: 1. Install paddlepaddle-gpu to run your models on GPU or 2. Set use_gpu=False to run models on CPU."
+        )
+        sys.exit(1)
+    main(args)
--- a/pdseg/utils/__init__.py
+++ b/pdseg/utils/__init__.py
--- a/pdseg/utils/collect.py
+++ b/pdseg/utils/collect.py
+#   Copyright (c) 2019  PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""A simple attribute dictionary used for representing configuration options."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+import copy
+import codecs
+from ast import literal_eval
+import yaml
+import six
+class SegConfig(dict):
+    def __init__(self, *args, **kwargs):
+        super(SegConfig, self).__init__(*args, **kwargs)
+        self.immutable = False
+    def __setattr__(self, key, value, create_if_not_exist=True):
+        if key in ["immutable"]:
+            self.__dict__[key] = value
+            return
+        t = self
+        keylist = key.split(".")
+        for k in keylist[:-1]:
+            t = t.__getattr__(k, create_if_not_exist)
+        t.__getattr__(keylist[-1], create_if_not_exist)
+        t[keylist[-1]] = value
+    def __getattr__(self, key, create_if_not_exist=True):
+        if key in ["immutable"]:
+            return self.__dict__[key]
+        if not key in self:
+            if not create_if_not_exist:
+                raise KeyError
+            self[key] = SegConfig()
+        return self[key]
+    def __setitem__(self, key, value):
+        #
+        if self.immutable:
+            raise AttributeError(
+                'Attempted to set "{}" to "{}", but SegConfig is immutable'.
+                format(key, value))
+        #
+        if isinstance(value, six.string_types):
+            try:
+                value = literal_eval(value)
+            except ValueError:
+                pass
+            except SyntaxError:
+                pass
+        super(SegConfig, self).__setitem__(key, value)
+    def update_from_segconfig(self, other):
+        if isinstance(other, dict):
+            other = SegConfig(other)
+        assert isinstance(other, SegConfig)
+        diclist = [("", other)]
+        while len(diclist):
+            prefix, tdic = diclist[0]
+            diclist = diclist[1:]
+            for key, value in tdic.items():
+                key = "{}.{}".format(prefix, key) if prefix else key
+                if isinstance(value, dict):
+                    diclist.append((key, value))
+                    continue
+                try:
+                    self.__setattr__(key, value, create_if_not_exist=False)
+                except KeyError:
+                    raise KeyError('Non-existent config key: {}'.format(key))
+    def check_and_infer(self, reset_dataset=False):
+        if self.DATASET.IMAGE_TYPE in ['rgb', 'gray']:
+            self.DATASET.DATA_DIM = 3
+        elif self.DATASET.IMAGE_TYPE in ['rgba']:
+            self.DATASET.DATA_DIM = 4
+        else:
+            raise KeyError(
+                'DATASET.IMAGE_TYPE config error, only support `rgb`, `gray` and `rgba`'
+            )
+        if reset_dataset:
+            # Ensure file list is use UTF-8 encoding
+            train_sets = codecs.open(self.DATASET.TRAIN_FILE_LIST, 'r',
+                                     'utf-8').readlines()
+            val_sets = codecs.open(self.DATASET.VAL_FILE_LIST, 'r',
+                                   'utf-8').readlines()
+            test_sets = codecs.open(self.DATASET.TEST_FILE_LIST, 'r',
+                                    'utf-8').readlines()
+            self.DATASET.TRAIN_TOTAL_IMAGES = len(train_sets)
+            self.DATASET.VAL_TOTAL_IMAGES = len(val_sets)
+            self.DATASET.TEST_TOTAL_IMAGES = len(test_sets)
+        if self.MODEL.MODEL_NAME == 'icnet' and \
+                len(self.MODEL.MULTI_LOSS_WEIGHT) != 3:
+            self.MODEL.MULTI_LOSS_WEIGHT = [1.0, 0.4, 0.16]
+    def update_from_list(self, config_list):
+        if len(config_list) % 2 != 0:
+            raise ValueError(
+                "Command line options config format error! Please check it: {}".
+                format(config_list))
+        for key, value in zip(config_list[0::2], config_list[1::2]):
+            try:
+                self.__setattr__(key, value, create_if_not_exist=False)
+            except KeyError:
+                raise KeyError('Non-existent config key: {}'.format(key))
+    def update_from_file(self, config_file):
+        with codecs.open(config_file, 'r', 'utf-8') as file:
+            dic = yaml.load(file)
+        self.update_from_segconfig(dic)
+    def set_immutable(self, immutable):
+        self.immutable = immutable
+        for value in self.values():
+            if isinstance(value, SegConfig):
+                value.set_immutable(immutable)
+    def is_immutable(self):
+        return self.immutable
--- a/pdseg/utils/config.py
+++ b/pdseg/utils/config.py
+# -*- coding: utf-8 -*-
+#   Copyright (c) 2019  PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import print_function
+from __future__ import unicode_literals
+from utils.collect import SegConfig
+import numpy as np
+cfg = SegConfig()
+########################## 基本配置 ###########################################
+# 均值，图像预处理减去的均值
+cfg.MEAN = [104.008, 116.669, 122.675]
+# 标准差，图像预处理除以标准差·
+cfg.STD = [1.000, 1.000, 1.000]
+# 批处理大小
+cfg.BATCH_SIZE = 1
+# 验证时图像裁剪尺寸（宽，高）
+cfg.EVAL_CROP_SIZE = tuple()
+# 训练时图像裁剪尺寸（宽，高）
+cfg.TRAIN_CROP_SIZE = tuple()
+########################## 数据载入配置 #######################################
+# 数据载入时的并发数, 建议值8
+cfg.DATALOADER.NUM_WORKERS = 8
+# 数据载入时缓存队列大小, 建议值256
+cfg.DATALOADER.BUF_SIZE = 256
+########################## 数据集配置 #########################################
+# 数据主目录目录
+cfg.DATASET.DATA_DIR = './dataset/cityscapes/'
+# 训练集列表
+cfg.DATASET.TRAIN_FILE_LIST = './dataset/cityscapes/train.list'
+# 训练集数量
+cfg.DATASET.TRAIN_TOTAL_IMAGES = 2975
+# 验证集列表
+cfg.DATASET.VAL_FILE_LIST = './dataset/cityscapes/val.list'
+# 验证数据数量
+cfg.DATASET.VAL_TOTAL_IMAGES = 500
+# 测试数据列表
+cfg.DATASET.TEST_FILE_LIST = './dataset/cityscapes/test.list'
+# 测试数据数量
+cfg.DATASET.TEST_TOTAL_IMAGES = 500
+# Tensorboard 可视化的数据集
+cfg.DATASET.VIS_FILE_LIST = None
+# 类别数(需包括背景类)
+cfg.DATASET.NUM_CLASSES = 19
+# 输入图像类型, 支持三通道'rgb',四通道'rgba',单通道灰度图'gray'
+cfg.DATASET.IMAGE_TYPE = 'rgb'
+# 输入图片的通道数
+cfg.DATASET.DATA_DIM = 3
+# 数据列表分割符, 默认为空格
+cfg.DATASET.SEPARATOR = ' '
+# 忽略的像素标签值, 默认为255，一般无需改动
+cfg.DATASET.IGNORE_INDEX = 255
+########################### 数据增强配置 ######################################
+# 图像镜像左右翻转
+cfg.AUG.MIRROR = True
+# 图像resize的固定尺寸（宽，高），非负
+cfg.AUG.FIX_RESIZE_SIZE = tuple()
+# 图像resize的方式有三种：
+# unpadding（固定尺寸），stepscaling（按比例resize），rangescaling（长边对齐）
+cfg.AUG.AUG_METHOD = 'rangescaling'
+# 图像resize方式为stepscaling，resize最小尺度，非负
+cfg.AUG.MIN_SCALE_FACTOR = 0.5
+# 图像resize方式为stepscaling，resize最大尺度，不小于MIN_SCALE_FACTOR
+cfg.AUG.MAX_SCALE_FACTOR = 2.0
+# 图像resize方式为stepscaling，resize尺度范围间隔，非负
+cfg.AUG.SCALE_STEP_SIZE = 0.25
+# 图像resize方式为rangescaling，训练时长边resize的范围最小值，非负
+cfg.AUG.MIN_RESIZE_VALUE = 400
+# 图像resize方式为rangescaling，训练时长边resize的范围最大值，
+# 不小于MIN_RESIZE_VALUE
+cfg.AUG.MAX_RESIZE_VALUE = 600
+# 图像resize方式为rangescaling, 测试验证可视化模式下长边resize的长度，
+# 在MIN_RESIZE_VALUE到MAX_RESIZE_VALUE范围内
+cfg.AUG.INF_RESIZE_VALUE = 500
+# RichCrop数据增广开关，用于提升模型鲁棒性
+cfg.AUG.RICH_CROP.ENABLE = False
+# 图像旋转最大角度，0-90
+cfg.AUG.RICH_CROP.MAX_ROTATION = 15
+# 裁取图像与原始图像面积比，0-1
+cfg.AUG.RICH_CROP.MIN_AREA_RATIO = 0.5
+# 裁取图像宽高比范围，非负
+cfg.AUG.RICH_CROP.ASPECT_RATIO = 0.33
+# 亮度调节范围，0-1
+cfg.AUG.RICH_CROP.BRIGHTNESS_JITTER_RATIO = 0.5
+# 饱和度调节范围，0-1
+cfg.AUG.RICH_CROP.SATURATION_JITTER_RATIO = 0.5
+# 对比度调节范围，0-1
+cfg.AUG.RICH_CROP.CONTRAST_JITTER_RATIO = 0.5
+# 图像模糊开关，True/False
+cfg.AUG.RICH_CROP.BLUR = False
+# 图像启动模糊百分比，0-1
+cfg.AUG.RICH_CROP.BLUR_RATIO = 0.1
+# 图像上下翻转开关，True/False
+cfg.AUG.RICH_CROP.FLIP = False
+# 图像启动上下翻转百分比，0-1
+cfg.AUG.RICH_CROP.FLIP_RATIO = 0.2
+########################### 训练配置 ##########################################
+# 模型保存路径
+cfg.TRAIN.MODEL_SAVE_DIR = ''
+# 预训练模型路径
+cfg.TRAIN.PRETRAINED_MODEL = ''
+# 是否resume，继续训练
+cfg.TRAIN.RESUME = False
+# 是否使用多卡间同步BatchNorm均值和方差
+cfg.TRAIN.SYNC_BATCH_NORM = False
+# 模型参数保存的epoch间隔数，可用来继续训练中断的模型
+cfg.TRAIN.SNAPSHOT_EPOCH = 10
+########################### 模型优化相关配置 ##################################
+# 初始学习率
+cfg.SOLVER.LR = 0.1
+# 学习率下降方法, 支持poly piecewise cosine 三种
+cfg.SOLVER.LR_POLICY = "poly"
+# 优化算法, 支持SGD和Adam两种算法
+cfg.SOLVER.OPTIMIZER = "sgd"
+# 动量参数
+cfg.SOLVER.MOMENTUM = 0.9
+# 二阶矩估计的指数衰减率
+cfg.SOLVER.MOMENTUM2 = 0.999
+# 学习率Poly下降指数
+cfg.SOLVER.POWER = 0.9
+# step下降指数
+cfg.SOLVER.GAMMA = 0.1
+# step下降间隔
+cfg.SOLVER.DECAY_EPOCH = [10, 20]
+# 学习率权重衰减，0-1
+cfg.SOLVER.WEIGHT_DECAY = 0.00004
+# 训练开始epoch数，默认为1
+cfg.SOLVER.BEGIN_EPOCH = 1
+# 训练epoch数，正整数
+cfg.SOLVER.NUM_EPOCHS = 30
+########################## 测试配置 ###########################################
+# 测试模型路径
+cfg.TEST.TEST_MODEL = ''
+########################## 模型通用配置 #######################################
+# 模型名称, 支持deeplab, unet, icnet三种
+cfg.MODEL.MODEL_NAME = ''
+# BatchNorm类型: bn、gn(group_norm)
+cfg.MODEL.DEFAULT_NORM_TYPE = 'bn'
+# 多路损失加权值
+cfg.MODEL.MULTI_LOSS_WEIGHT = [1.0]
+# DEFAULT_NORM_TYPE为gn时group数
+cfg.MODEL.DEFAULT_GROUP_NUMBER = 32
+# 极小值, 防止分母除0溢出，一般无需改动
+cfg.MODEL.DEFAULT_EPSILON = 1e-5
+# BatchNorm动量, 一般无需改动
+cfg.MODEL.BN_MOMENTUM = 0.99
+# 是否使用FP16训练
+cfg.MODEL.FP16 = False
+# FP16需对LOSS进行scale, 一般训练FP16设置为8.0
+cfg.MODEL.SCALE_LOSS = 1.0
+########################## DeepLab模型配置 ####################################
+# DeepLab backbone 配置, 可选项xception_65, mobilenetv2
+cfg.MODEL.DEEPLAB.BACKBONE = "xception_65"
+# DeepLab output stride
+cfg.MODEL.DEEPLAB.OUTPUT_STRIDE = 16
+# MobileNet backbone scale 设置
+cfg.MODEL.DEEPLAB.DEPTH_MULTIPLIER = 1.0
+# MobileNet backbone scale 设置
+cfg.MODEL.DEEPLAB.ENCODER_WITH_ASPP = True
+# MobileNet backbone scale 设置
+cfg.MODEL.DEEPLAB.ENABLE_DECODER = True
+# ASPP是否使用可分离卷积
+cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV = True
+# 解码器是否使用可分离卷积
+cfg.MODEL.DEEPLAB.DECODER_USE_SEP_CONV = True
+########################## UNET模型配置 #######################################
+# 上采样方式, 默认为双线性插值
+cfg.MODEL.UNET.UPSAMPLE_MODE = 'bilinear'
+########################## ICNET模型配置 ######################################
+# RESNET backbone scale 设置
+cfg.MODEL.ICNET.DEPTH_MULTIPLIER = 0.5
+# RESNET 层数 设置
+cfg.MODEL.ICNET.LAYERS = 50
+########################## 预测部署模型配置 ###################################
+# 预测保存的模型名称
+cfg.FREEZE.MODEL_FILENAME = '__model__'
+# 预测保存的参数名称
+cfg.FREEZE.PARAMS_FILENAME = '__params__'
+# 预测模型参数保存的路径
+cfg.FREEZE.SAVE_DIR = 'freeze_model'
--- a/pdseg/utils/fp16_utils.py
+++ b/pdseg/utils/fp16_utils.py
+import os
+from paddle import fluid
+def load_fp16_vars(executor, dirname, program):
+    load_dirname = os.path.normpath(dirname)
+    def _if_exist(var):
+        name = var.name[:-7] if var.name.endswith('.master') else var.name
+        b = os.path.exists(os.path.join(load_dirname, name))
+        if not b and isinstance(var, fluid.framework.Parameter):
+            print("===== {} not found ====".format(var.name))
+        return b
+    load_prog = fluid.Program()
+    load_block = load_prog.global_block()
+    vars = list(filter(_if_exist, program.list_vars()))
+    for var in vars:
+        new_var = fluid.io._clone_var_in_block_(load_block, var)
+        name = var.name[:-7] if var.name.endswith('.master') else var.name
+        file_path = os.path.join(load_dirname, name)
+        load_block.append_op(
+            type='load',
+            inputs={},
+            outputs={'Out': [new_var]},
+            attrs={
+                'file_path': file_path,
+                'load_as_fp16': var.dtype == fluid.core.VarDesc.VarType.FP16
+            })
+    executor.run(load_prog)
\ No newline at end of file
--- a/pdseg/utils/timer.py
+++ b/pdseg/utils/timer.py
+#   Copyright (c) 2019  PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import time
+def calculate_eta(remaining_step, speed):
+    if remaining_step < 0:
+        remaining_step = 0
+    remaining_time = int(remaining_step / speed)
+    result = "{:0>2}:{:0>2}:{:0>2}"
+    arr = []
+    for i in range(2, -1, -1):
+        arr.append(int(remaining_time / 60**i))
+        remaining_time %= 60**i
+    return result.format(*arr)
+class Timer(object):
+    """ Simple timer class for measuring time consuming """
+    def __init__(self):
+        self._start_time = 0.0
+        self._end_time = 0.0
+        self._elapsed_time = 0.0
+        self._is_running = False
+    def start(self):
+        self._is_running = True
+        self._start_time = time.time()
+    def restart(self):
+        self.start()
+    def stop(self):
+        self._is_running = False
+        self._end_time = time.time()
+    def elapsed_time(self):
+        self._end_time = time.time()
+        self._elapsed_time = self._end_time - self._start_time
+        if not self.is_running:
+            return 0.0
+        return self._elapsed_time
+    @property
+    def is_running(self):
+        return self._is_running
--- a/pdseg/vis.py
+++ b/pdseg/vis.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+# GPU memory garbage collection optimization flags
+os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
+import sys
+import time
+import argparse
+import pprint
+import cv2
+import numpy as np
+import paddle
+import paddle.fluid as fluid
+from PIL import Image as PILImage
+from utils.config import cfg
+from metrics import ConfusionMatrix
+from reader import SegDataset
+from models.model_builder import build_model
+from models.model_builder import ModelPhase
+def parse_args():
+    parser = argparse.ArgumentParser(description='PaddeSeg visualization tools')
+    parser.add_argument(
+        '--cfg',
+        dest='cfg_file',
+        help='Config file for training (and optionally testing)',
+        default=None,
+        type=str)
+    parser.add_argument(
+        '--use_gpu', dest='use_gpu', help='Use gpu or cpu', action='store_true')
+    parser.add_argument(
+        '--vis_dir',
+        dest='vis_dir',
+        help='visual save dir',
+        type=str,
+        default='visual')
+    parser.add_argument(
+        '--also_save_raw_results',
+        dest='also_save_raw_results',
+        help='whether to save raw result',
+        action='store_true')
+    parser.add_argument(
+        '--local_test',
+        dest='local_test',
+        help='if in local test mode, only visualize 5 images for testing',
+        action='store_true')
+    parser.add_argument(
+        'opts',
+        help='See config.py for all options',
+        default=None,
+        nargs=argparse.REMAINDER)
+    if len(sys.argv) == 1:
+        parser.print_help()
+        sys.exit(1)
+    return parser.parse_args()
+def makedirs(directory):
+    if not os.path.exists(directory):
+        os.makedirs(directory)
+def get_color_map(num_classes):
+    """ Returns the color map for visualizing the segmentation mask,
+        which can support arbitrary number of classes.
+    Args:
+        num_classes: Number of classes
+    Returns:
+        The color map
+    """
+    #color_map = num_classes * 3 *  [0]
+    color_map = num_classes * [[0, 0, 0]]
+    for i in range(0, num_classes):
+        j = 0
+        color_map[i] = [0, 0, 0]
+        lab = i
+        while lab:
+            color_map[i][0] |= (((lab >> 0) & 1) << (7 - j))
+            color_map[i][1] |= (((lab >> 1) & 1) << (7 - j))
+            color_map[i][2] |= (((lab >> 2) & 1) << (7 - j))
+            j += 1
+            lab >>= 3
+    return color_map
+def colorize(image, shape, color_map):
+    """
+    Convert segment result to color image.
+    """
+    color_map = np.array(color_map).astype("uint8")
+    # Use OpenCV LUT for color mapping
+    c1 = cv2.LUT(image, color_map[:, 0])
+    c2 = cv2.LUT(image, color_map[:, 1])
+    c3 = cv2.LUT(image, color_map[:, 2])
+    color_res = np.dstack((c1, c2, c3))
+    return color_res
+def to_png_fn(fn):
+    """
+    Append png as filename postfix
+    """
+    directory, filename = os.path.split(fn)
+    basename, ext = os.path.splitext(filename)
+    return basename + ".png"
+def visualize(cfg,
+              vis_file_list=None,
+              use_gpu=False,
+              vis_dir="visual",
+              also_save_raw_results=False,
+              ckpt_dir=None,
+              log_writer=None,
+              local_test=False,
+              **kwargs):
+    if vis_file_list is None:
+        vis_file_list = cfg.DATASET.TEST_FILE_LIST
+    dataset = SegDataset(
+        file_list=vis_file_list,
+        mode=ModelPhase.VISUAL,
+        data_dir=cfg.DATASET.DATA_DIR)
+    startup_prog = fluid.Program()
+    test_prog = fluid.Program()
+    pred, logit = build_model(test_prog, startup_prog, phase=ModelPhase.VISUAL)
+    # Clone forward graph
+    test_prog = test_prog.clone(for_test=True)
+    # Generator full colormap for maximum 256 classes
+    color_map = get_color_map(256)
+    # Get device environment
+    place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    exe.run(startup_prog)
+    ckpt_dir = cfg.TEST.TEST_MODEL if not ckpt_dir else ckpt_dir
+    fluid.io.load_params(exe, ckpt_dir, main_program=test_prog)
+    save_dir = os.path.join(vis_dir, 'visual_results')
+    makedirs(save_dir)
+    if also_save_raw_results:
+        raw_save_dir = os.path.join(vis_dir, 'raw_results')
+        makedirs(raw_save_dir)
+    fetch_list = [pred.name]
+    test_reader = dataset.batch(dataset.generator, batch_size=1, is_test=True)
+    img_cnt = 0
+    for imgs, img_names, valid_shapes, org_shapes in test_reader:
+        pred_shape = (imgs.shape[2], imgs.shape[3])
+        pred, = exe.run(
+            program=test_prog,
+            feed={'image': imgs},
+            fetch_list=fetch_list,
+            return_numpy=True)
+        num_imgs = pred.shape[0]
+        # TODO: use multi-thread to write images
+        for i in range(num_imgs):
+            # Add more comments
+            res_map = np.squeeze(pred[i, :, :, :]).astype(np.uint8)
+            img_name = img_names[i]
+            res_shape = (res_map.shape[0], res_map.shape[1])
+            if res_shape[0] != pred_shape[0] or res_shape[1] != pred_shape[1]:
+                res_map = cv2.resize(
+                    res_map, pred_shape, interpolation=cv2.INTER_NEAREST)
+            valid_shape = (valid_shapes[i, 0], valid_shapes[i, 1])
+            res_map = res_map[0:valid_shape[0], 0:valid_shape[1]]
+            org_shape = (org_shapes[i, 0], org_shapes[i, 1])
+            res_map = cv2.resize(
+                res_map, (org_shape[1], org_shape[0]),
+                interpolation=cv2.INTER_NEAREST)
+            png_fn = to_png_fn(img_names[i])
+            if also_save_raw_results:
+                raw_fn = os.path.join(raw_save_dir, png_fn)
+                dirname = os.path.dirname(raw_save_dir)
+                makedirs(dirname)
+                cv2.imwrite(raw_fn, res_map)
+            # colorful segment result visualization
+            vis_fn = os.path.join(save_dir, png_fn)
+            dirname = os.path.dirname(vis_fn)
+            makedirs(dirname)
+            pred_mask = colorize(res_map, org_shapes[i], color_map)
+            cv2.imwrite(vis_fn, pred_mask)
+            img_cnt += 1
+            print("#{} visualize image path: {}".format(img_cnt, vis_fn))
+            # Use Tensorboard to visualize image
+            if log_writer is not None:
+                # Calulate epoch from ckpt_dir folder name
+                epoch = int(ckpt_dir.split(os.path.sep)[-1])
+                print("Tensorboard visualization epoch", epoch)
+                log_writer.add_image(
+                    "Predict/{}".format(img_names[i]),
+                    pred_mask[..., ::-1],
+                    epoch,
+                    dataformats='HWC')
+                # Original image
+                # BGR->RGB
+                img = cv2.imread(
+                    os.path.join(cfg.DATASET.DATA_DIR, img_names[i]))[..., ::-1]
+                log_writer.add_image(
+                    "Images/{}".format(img_names[i]),
+                    img,
+                    epoch,
+                    dataformats='HWC')
+                #TODO: add ground truth (label) images
+        # If in local_test mode, only visualize 5 images just for testing
+        # procedure
+        if local_test and img_cnt >= 5:
+            break
+if __name__ == '__main__':
+    args = parse_args()
+    if args.cfg_file is not None:
+        cfg.update_from_file(args.cfg_file)
+    if args.opts is not None:
+        cfg.update_from_list(args.opts)
+    cfg.check_and_infer()
+    print(pprint.pformat(cfg))
+    visualize(cfg, **args.__dict__)
--- a/requirements.txt
+++ b/requirements.txt
+pre-commit
+yapf == 0.26.0
+flake8
+pyyaml
+tb-paddle
+tb-nightly == 1.15.0a20190818
+Pillow
+numpy
+six
+opencv-python
--- a/serving/COMPILE_GUIDE.md
+++ b/serving/COMPILE_GUIDE.md
+# 源码编译安装及搭建服务流程 
+本文将介绍源码编译安装以及在服务搭建流程。
+## 1. 系统依赖项
+依赖项 | 验证过的版本
+   -- | --
+Linux | Centos 6.10 / 7
+CMake | 3.0+
+GCC   | 4.8.2/5.4.0
+Python| 2.7
+GO编译器| 1.9.2
+openssl| 1.0.1+
+bzip2  | 1.0.6+
+如果需要使用GPU预测，还需安装以下几个依赖库
+ GPU库   | 验证过的版本
+   -- | --
+CUDA  | 9.2
+cuDNN | 7.1.4
+nccl  | 2.4.7
+## 2. 安装依赖项
+以下流程在百度云CentOS7.5+CUDA9.2环境下进行。
+### 2.1. 安装openssl、Go编译器以及bzip2
+```bash
+yum -y install openssl openssl-devel golang bzip2-libs bzip2-devel
+```
+### 2.2. 安装GPU预测的依赖项（如果需要使用GPU预测，必须执行此步骤）
+#### 2.2.1. 安装配置CUDA9.2以及cuDNN 7.1.4
+该百度云机器已经安装CUDA以及cuDNN，仅需复制相关头文件与链接库 
+```bash
+# 看情况确定是否需要安装 cudnn
+# 进入 cudnn 根目录
+cd /home/work/cudnn/cudnn7.1.4
+# 拷贝头文件
+cp include/cudnn.h /usr/local/cuda/include/
+# 拷贝链接库
+cp lib64/libcudnn* /usr/local/cuda/lib64/
+# 修改头文件、链接库访问权限
+chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
+```
+#### 2.2.2. 安装nccl库
+```bash
+# 下载文件 nccl-repo-rhel7-2.4.7-ga-cuda9.2-1-1.x86_64.rpm
+wget -c https://paddlehub.bj.bcebos.com/serving/nccl-repo-rhel7-2.4.7-ga-cuda9.2-1-1.x86_64.rpm
+# 安装nccl的repo
+rpm -i nccl-repo-rhel7-2.4.7-ga-cuda9.2-1-1.x86_64.rpm
+# 更新索引
+yum -y update
+# 安装包
+yum -y install libnccl-2.4.7-1+cuda9.2 libnccl-devel-2.4.7-1+cuda9.2 libnccl-static-2.4.7-1+cuda9.2
+```
+### 2.3. 安装 cmake 3.15 
+如果机器没有安装cmake或者已安装cmake的版本低于3.0，请执行以下步骤
+```bash
+# 如果原来的已经安装低于3.0版本的cmake，请先卸载原有低版本 cmake
+yum -y remove cmake
+# 下载源代码并解压
+wget -c https://github.com/Kitware/CMake/releases/download/v3.15.0/cmake-3.15.0.tar.gz
+tar xvfz cmake-3.15.0.tar.gz
+# 编译cmake
+cd cmake-3.15.0
+./configure
+make -j4
+# 安装并检查cmake版本
+make install
+cmake --version
+# 在cmake-3.15.0目录中，将相应的头文件目录（curl目录，为PaddleServing的依赖头文件目录）拷贝到系统include目录下
+cp -r Utilities/cmcurl/include/curl/ /usr/include/
+```
+### 2.4. 为依赖库增加相应的软连接
+  现在Linux系统中大部分链接库的名称都以版本号作为后缀，如libcurl.so.4.3.0。这种命名方式最大的问题是，CMakeList.txt中find_library命令是无法识别使用这种命名方式的链接库，会导致CMake时候出错。由于本项目是用CMake构建，所以务必保证相应的链接库以 .so 或 .a为后缀命名。解决这个问题最简单的方式就是用创建一个软连接指向相应的链接库。在百度云的机器中，只有curl库的命名方式有问题。所以命令如下：（如果是其他库，解决方法也类似）：
+```bash
+ln -s /usr/lib64/libcurl.so.4.3.0 /usr/lib64/libcurl.so
+```
+### 2.5. 编译安装PaddleServing 
+下列步骤介绍CPU版本以及GPU版本的PaddleServing编译安装过程。
+```bash
+# Step 1. 在～目录下下载paddle-serving代码
+cd ~
+git clone https://github.com/PaddlePaddle/serving.git
+# Step 2. 进入serving目录，创建build目录编译、安装
+cd serving
+mkdir build 
+cd build
+# Step 3. 以下为生成GPU版本的makefile，生成CPU版本的makefile执行 cmake -DWITH_GPU=OFF ..
+cmake -DWITH_GPU=ON -DCUDNN_ROOT=/usr/local/cuda/lib64 ..
+# Step 4. nproc 可以输出当前机器的核心数,利用多核进行编译。如果make时候报错退出，可以多执行几次make解决
+make -j$(nproc)
+# Step 5. 安装
+make install
+# Step 6. 安装后可以看PaddleServing的目录结构如下
+serving
+├── build
+├── cmake
+├── CMakeLists.txt
+├── configure
+├── CONTRIBUTING.md
+├── cube
+├── demo-client
+├── demo-serving 
+│   ├── CMakeLists.txt
+│   ├── conf        # demo-serving 的配置文件目录
+│   ├── data        # 模型文件以及参数文件的目录
+│   ├── op          # 数据处理的源文件目录
+│   ├── proto       # 数据传输的proto文件目录
+│   └── scripts 
+├── doc
+├── inferencer-fluid-cpu
+├── inferencer-fluid-gpu
+├── kvdb
+├── LICENSE
+├── pdcodegen
+├── predictor
+├── README.md
+├── sdk-cpp
+└── tools
+```
+### 2.6. 安装PaddleSegServing
+```bash
+# Step 1. 在～目录下下载PaddleSeg代码
+git clone http://gitlab.baidu.com/Paddle/PaddleSeg.git
+# Step 2. 进入PaddleSeg的serving目录（注意区分PaddleServing的serving目录），并将seg-serving目录复制到PaddleServing的serving目录下
+cd PaddleSeg/serving
+cp -r seg-serving ~/serving
+# 复制后PaddleServing的目录结构如下
+serving
+├── build
+├── cmake
+├── CMakeLists.txt
+├── configure
+├── CONTRIBUTING.md
+├── cube
+├── demo-client
+├── demo-serving 
+├── doc
+├── inferencer-fluid-cpu
+├── inferencer-fluid-gpu
+├── kvdb
+├── LICENSE
+├── pdcodegen
+├── predictor
+├── README.md
+├── sdk-cpp
+├── seg-serving   # 此为新增的目录
+└── tools
+# Step 3. 修改PaddleServing的serving目录下的CMakeLists.txt
+cd ~/serving
+vim CMakeLists.txt
+# Step 4. 倒数第二行加入代码，使得seg-serving下的代码可与PaddleServing一起编译
+add_subdirectory(seg-serving)
+# Step 5. 进入PaddleServing的build目录，编译安装PaddleSegServing
+cd ~/serving/build
+make -j$(nproc)
+make install
+# Step 6. 完成安装后,可以看到执行文件的目录结构如下
+build
+├── boost_dummy.c
+├── CMakeCache.txt
+├── CMakeFiles
+├── cmake_install.cmake
+├── configure
+├── demo-client
+├── error
+├── human-seg-serving
+├── inferencer-fluid-cpu
+├── inferencer-fluid-gpu
+├── info
+├── install_manifest.txt
+├── kvdb
+├── libboost.a
+├── log
+├── Makefile
+├── output          # 所有服务端的执行文件、配置文件、数据文件均安装到此目录下
+│   ├── bin
+│   ├── demo
+│   │   ├── client
+│   │   ├── db_func
+│   │   ├── db_thread
+│   │   ├── seg-serving  
+│   │   │   └── bin
+│   │   │       ├── conf    # 配置文件目录
+│   │   │       ├── data    # 数据模型文件、参数文件目录
+│   │   │       ├── seg-serving #可执行文件
+│   │   │       ├── kvdb
+│   │   │       ├── libiomp5.so
+│   │   │       ├── libmklml_gnu.so
+│   │   │       ├── libmklml_intel.so
+│   │   │       └── log
+│   │   ├── kvdb_test
+│   │   └── serving
+│   ├── include
+│   └── lib
+├── Paddle
+├── pdcodegen
+├── predictor
+├── sdk-cpp
+├── seg-serving
+└── third_party
+```
+## 3. 运行PaddleSegServing
+### 3.1. 搭建人脸分割服务
+搭建人脸分割服务只需完成一些配置文件的编写即可。与预编译版本的搭建大致相同，但模型文件、参数文件放置的目录略有不同。
+#### 3.1.1. 下载人脸分割模型文件，并将其复制到PaddleSegServing相应目录。
+可参考[预编译安装流程](./README.md)中2.2.1.1节。模型文件放置的目录在
+～/serving/seg-serving/data/model/paddle/fluid/。
+#### 3.1.2. 配置参数文件。
+可参考[预编译安装流程](./README.md)中2.2.1.2节。配置文件的目录在～/serving/seg-serving/conf。
+### 3.2 安装模型文件、配置文件。
+```bash
+cd ~/serving/build
+make install
+```
+### 3.3 运行服务端程序
+可参考[预编译安装流程](./README.md)中2.2.2节。可执行文件在该目录下：～/serving/build/output/demo/seg-serving/bin/。
+### 3.4 运行客户端程序进行测试。
+可参考[预编译安装流程](./README.md)中2.2.3节。
--- a/serving/README.md
+++ b/serving/README.md
+# PaddleSegServing 
+## 1.简介
+PaddleSegServing是基于PaddleSeg开发的实时图像分割服务的企业级解决方案。用户仅需关注模型本身，无需理解模型模型的加载、预测以及GPU/CPU资源的并发调度等细节操作，通过设置不同的参数配置，即可根据自身的业务需求定制化不同图像分割服务。目前，PaddleSegServing支持人脸分割、城市道路分割、宠物外形分割模型。本文将通过一个人脸分割服务的搭建示例，展示PaddleSeg服务通用的搭建流程。
+## 2.预编译版本安装及搭建服务流程
+### 2.1. 下载预编译的PaddleSegServing
+预编译版本在Centos7.6系统下编译，如果想快速体验PaddleSegServing，可在此系统下下载预编译版本进行安装。预编译版本有两个，一个是针对有GPU的机器，推荐安装GPU版本PaddleSegServing。另一个是CPU版本PaddleServing，针对无GPU的机器。
+#### 2.1.1. 下载并解压GPU版本PaddleSegServing
+```bash
+cd ~
+wget -c XXXX/PaddleSegServing.centos7.6_cuda9.2_gpu.tar.gz
+tar xvfz PaddleSegServing.centos7.6_cuda9.2_gpu.tar.gz
+```
+#### 2.1.2. 下载并解压CPU版本PaddleSegServing
+```bash
+cd ~
+wget -c XXXX/PaddleSegServing.centos7.6_cuda9.2_cpu.tar.gz
+tar xvfz PaddleSegServing.centos7.6_cuda9.2_gpu.tar.gz
+```
+解压后的PaddleSegServing目录如下。
+```bash
+├── seg-serving  
+    └── bin
+        ├── conf    # 配置文件目录
+        ├── data    # 数据模型文件、参数文件目录
+        ├── seg-serving #可执行文件
+        ├── kvdb
+        ├── libiomp5.so
+        ├── libmklml_gnu.so
+        ├── libmklml_intel.so
+        └── log
+```
+### 2.2. 运行PaddleSegServing
+本节将介绍如何运行以及测试PaddleSegServing。
+#### 2.2.1. 搭建人脸分割服务
+搭建人脸分割服务只需完成一些配置文件的编写即可，其他分割服务的搭建流程类似。
+##### 2.2.1.1. 下载人脸分割模型文件，并将其复制到相应目录。
+```bash
+# 下载人脸分割模型
+wget -c https://paddleseg.bj.bcebos.com/inference_model/deeplabv3p_xception65_humanseg.tgz
+tar xvfz deeplabv3p_xception65_humanseg.tgz
+# 安装模型
+cp -r deeplabv3p_xception65_humanseg seg-serving/bin/data/model/paddle/fluid
+```
+##### 2.2.1.2. 配置参数文件
+参数文件如，PaddleSegServing仅新增一个配置文件seg_conf.yaml,用来指定具体分割模型的一些参数，如均值、方差、图像尺寸等。该配置文件可在gflags.conf中通过--seg_conf_file指定。
+其他配置文件的字段解释可参考以下链接：https://github.com/PaddlePaddle/Serving/blob/develop/doc/SERVING_CONFIGURE.md  （TODO：介绍seg_conf.yaml中每个字段的含义）
+```bash
+conf/
+├── gflags.conf
+├── model_toolkit.prototxt
+├── resource.prototxt
+├── seg_conf.yaml
+├── service.prototxt
+└── workflow.prototxt
+```
+#### 2.2.2 运行服务端程序
+```bash
+# 1. 设置环境变量
+export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/lib64:$LD_LIBRARY_PATH
+# 2. 切换到bin目录，运行服务端程序
+cd ~/serving/build/output/demo/seg-serving/bin/
+./seg-serving
+```
+#### 2.2.3.运行客户端程序进行测试 (建议在windows、mac测试，可直接查看分割后的图像)
+客户端程序是用Python3编写的，代码简洁易懂，可以通过运行客户端验证服务的正确性以及性能表现。
+```bash
+# 使用Python3.6，需要安装opencv-python、requests、numpy包（建议安装anaconda）
+cd tools
+vim image_seg_client.py (修改IMAGE_SEG_URL变量，改成服务端的ip地址)
+python3.6 image_seg_client.py
+# 当前目录下可以看到生成出分割结果的图片。
+```
+## 3. 源码编译安装及搭建服务流程 (可选)
+源码编译安装时间较长，一般推荐在centos7.6下安装预编译版本进行使用。如果您系统版本非centos7.6或者您想进行二次开发，请点击以下链接查看[源码编译安装流程](./COMPILE_GUIDE.md)。
--- a/serving/imgs/GF1_PMS1_sino0_0.png
+++ b/serving/imgs/GF1_PMS1_sino0_0.png
--- a/serving/imgs/GF1_PMS1_sino0_1.png
+++ b/serving/imgs/GF1_PMS1_sino0_1.png
--- a/serving/imgs/GF1_PMS1_sino0_2.png
+++ b/serving/imgs/GF1_PMS1_sino0_2.png
--- a/serving/imgs/GF1_PMS1_sino0_3.png
+++ b/serving/imgs/GF1_PMS1_sino0_3.png
--- a/serving/imgs/GF1_PMS1_sino0_4.png
+++ b/serving/imgs/GF1_PMS1_sino0_4.png
--- a/serving/imgs/GF1_PMS1_sino0_5.png
+++ b/serving/imgs/GF1_PMS1_sino0_5.png
--- a/serving/imgs/GF1_PMS1_sino0_6.png
+++ b/serving/imgs/GF1_PMS1_sino0_6.png
--- a/serving/imgs/GF1_PMS1_sino0_7.png
+++ b/serving/imgs/GF1_PMS1_sino0_7.png
--- a/serving/imgs/GF1_PMS1_sino0_8.png
+++ b/serving/imgs/GF1_PMS1_sino0_8.png
--- a/serving/imgs/GF1_PMS1_sino0_9.png
+++ b/serving/imgs/GF1_PMS1_sino0_9.png
--- a/serving/requirements.txt
+++ b/serving/requirements.txt
+opencv-python
+requests
+numpy
--- a/serving/seg-serving/CMakeLists.txt
+++ b/serving/seg-serving/CMakeLists.txt
+# TODO：下载data数据
+#if (NOT EXISTS
+#                ${CMAKE_CURRENT_LIST_DIR}/data/model/paddle/fluid/text_classification_lstm)
+#        execute_process(COMMAND wget
+#                --no-check-certificate https://paddle-serving.bj.bcebos.com/data/text_classification/text_classification_lstm.tar.gz
+#                --output-document
+#                ${CMAKE_CURRENT_LIST_DIR}/data/model/paddle/fluid/text_classification_lstm.tar.gz)
+#           execute_process(COMMAND ${CMAKE_COMMAND} -E tar xzf
+#                "${CMAKE_CURRENT_LIST_DIR}/data/model/paddle/fluid/text_classification_lstm.tar.gz"
+#                WORKING_DIRECTORY
+#                ${CMAKE_CURRENT_LIST_DIR}/data/model/paddle/fluid
+#                )
+# endif()
+include_directories(SYSTEM  ${CMAKE_CURRENT_LIST_DIR}/../kvdb/include)
+find_library(MKLML_LIBS NAMES libmklml_intel.so libiomp5.so)
+include(op/CMakeLists.txt)
+include(proto/CMakeLists.txt)
+add_executable(seg-serving ${serving_srcs})
+add_dependencies(seg-serving pdcodegen fluid_cpu_engine pdserving paddle_fluid
+        opencv_imgcodecs)
+if (WITH_GPU)
+    add_dependencies(seg-serving fluid_gpu_engine)
+endif()
+target_include_directories(seg-serving PUBLIC
+        ${CMAKE_CURRENT_BINARY_DIR}/../predictor
+        )
+if(WITH_GPU)
+    target_link_libraries(seg-serving -Wl,--whole-archive fluid_gpu_engine
+            -Wl,--no-whole-archive)
+endif()
+target_link_libraries(seg-serving -Wl,--whole-archive fluid_cpu_engine
+        -Wl,--no-whole-archive)
+target_link_libraries(seg-serving paddle_fluid ${paddle_depend_libs})
+target_link_libraries(seg-serving opencv_imgcodecs
+        ${opencv_depend_libs})
+target_link_libraries(seg-serving pdserving)
+target_link_libraries(seg-serving cube-api)
+target_link_libraries(seg-serving kvdb rocksdb)
+if(WITH_GPU)
+    target_link_libraries(seg-serving ${CUDA_LIBRARIES})
+endif()
+target_link_libraries(seg-serving ${MKLML_LIB} ${MKLML_IOMP_LIB} -lpthread
+        -lcrypto -lm -lrt -lssl -ldl -lz -lbz2)
+install(TARGETS seg-serving
+        RUNTIME DESTINATION
+        ${PADDLE_SERVING_INSTALL_DIR}/demo/seg-serving/bin)
+install(DIRECTORY ${CMAKE_CURRENT_LIST_DIR}/conf DESTINATION
+        ${PADDLE_SERVING_INSTALL_DIR}/demo/seg-serving/bin)
+install(DIRECTORY ${CMAKE_CURRENT_LIST_DIR}/data DESTINATION
+        ${PADDLE_SERVING_INSTALL_DIR}/demo/seg-serving/bin)
+FILE(GLOB inc ${CMAKE_CURRENT_BINARY_DIR}/*.pb.h)
+install(FILES ${inc}
+        DESTINATION ${PADDLE_SERVING_INSTALL_DIR}/include/seg-serving)
+if (${WITH_MKL})
+        install(FILES ${THIRD_PARTY_PATH}/install/mklml/lib/libmklml_intel.so
+                ${THIRD_PARTY_PATH}/install/mklml/lib/libmklml_gnu.so
+                ${THIRD_PARTY_PATH}/install/mklml/lib/libiomp5.so DESTINATION
+                ${PADDLE_SERVING_INSTALL_DIR}/demo/seg-serving/bin)
+endif()
--- a/serving/seg-serving/conf/gflags.conf
+++ b/serving/seg-serving/conf/gflags.conf
+--enable_model_toolkit
+--seg_conf_file=./conf/seg_conf.yaml
--- a/serving/seg-serving/conf/model_toolkit.prototxt
+++ b/serving/seg-serving/conf/model_toolkit.prototxt
+engines {
+  name: "human_segmentation"
+  type: "FLUID_GPU_NATIVE"
+  reloadable_meta: "./data/model/paddle/fluid_time_file"
+  reloadable_type: "timestamp_ne"
+  model_data_path: "./data/model/paddle/fluid/deeplabv3p_xception65_humanseg"
+  runtime_thread_num: 0
+  batch_infer_size: 0
+  enable_batch_align: 0
+}
--- a/serving/seg-serving/conf/resource.prototxt
+++ b/serving/seg-serving/conf/resource.prototxt
+model_toolkit_path: "./conf/"
+model_toolkit_file: "model_toolkit.prototxt"
--- a/serving/seg-serving/conf/seg_conf.yaml
+++ b/serving/seg-serving/conf/seg_conf.yaml
+%YAML:1.0
+SIZE: [513, 513] # (width, height) crop size for test eval 
+MEAN: [104.008, 116.669, 122.675]
+STD: [1.0, 1.0, 1.0]
+CHANNELS: 3
+CLASS_NUM: 2
+MODEL_NAME: "human_segmentation"
--- a/serving/seg-serving/conf/seg_conf2.yaml
+++ b/serving/seg-serving/conf/seg_conf2.yaml
+%YAML:1.0
+SIZE: [500, 500] # (width, height) crop size for test eval 
+MEAN: [127.5, 127.5, 127.5, 127.5]
+STD: [1.0, 1.0, 1.0, 1.0]
+CHANNELS: 4
+CLASS_NUM: 2
+MODEL_NAME: "image_segmentation"
--- a/serving/seg-serving/conf/service.prototxt
+++ b/serving/seg-serving/conf/service.prototxt
+services {
+  name: "ImageSegService"
+  workflows: "workflow1"
+}
--- a/serving/seg-serving/conf/workflow.prototxt
+++ b/serving/seg-serving/conf/workflow.prototxt
+workflows {
+  name: "workflow1"
+  workflow_type: "Sequence"
+  nodes {
+    name: "image_reader_op"
+    type: "ReaderOp"
+  }
+  nodes {
+    name: "image_seg_op"
+    type: "ImageSegOp"
+    dependencies {
+      name: "image_reader_op"
+      mode: "RO"
+    }
+  }
+  nodes {
+    name: "image_writer_op"
+    type: "WriteJsonOp"
+    dependencies {
+      name: "image_seg_op"
+      mode: "RO"
+    }
+  }
+}
--- a/serving/seg-serving/data/model/paddle/fluid_reload_flag
+++ b/serving/seg-serving/data/model/paddle/fluid_reload_flag
+paddle fluid model
+time:20180531
--- a/serving/seg-serving/data/model/paddle/fluid_time_file
+++ b/serving/seg-serving/data/model/paddle/fluid_time_file
+201805311000
+model paddle fluid
--- a/serving/seg-serving/op/CMakeLists.txt
+++ b/serving/seg-serving/op/CMakeLists.txt
+FILE(GLOB op_srcs ${CMAKE_CURRENT_LIST_DIR}/*.cpp)
+LIST(APPEND serving_srcs ${op_srcs})
--- a/serving/seg-serving/op/image_seg_op.cpp
+++ b/serving/seg-serving/op/image_seg_op.cpp
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#include <vector>
+#include "predictor/framework/infer.h"
+#include "predictor/framework/memory.h"
+#include "seg-serving/op/image_seg_op.h"
+#include "seg-serving/op/reader_op.h"
+#include "seg-serving/op/seg_conf.h"
+namespace baidu {
+namespace paddle_serving {
+namespace serving {
+using baidu::paddle_serving::image_segmentation::ImageSegResItem;
+using baidu::paddle_serving::image_segmentation::ImageSegResponse;
+using baidu::paddle_serving::predictor::InferManager;
+using baidu::utils::seg_conf::SegConf;
+int ImageSegOp::inference() {
+  const ReaderOutput* reader_out =
+      get_depend_argument<ReaderOutput>("image_reader_op");
+  if (!reader_out) {
+    LOG(ERROR) << "Failed mutable depended argument, op:"
+               << "reader_op";
+    return -1;
+  }
+  const TensorVector* in = &reader_out->tensors;
+  const std::vector<int> *width_vec = &reader_out->width_vec;
+  const std::vector<int> *height_vec = &reader_out->height_vec;
+  //debug
+  for(int i = 0; i < width_vec->size(); ++i){
+      LOG(INFO) << "width = " << (*width_vec)[i] << ", height = " << (*height_vec)[i];
+  }
+  TensorVector* out = butil::get_object<TensorVector>();
+  if (!out) {
+    LOG(ERROR) << "Failed get tls output object failed";
+    return -1;
+  }
+  if (in->size() != 1) {
+    LOG(ERROR) << "Samples should have been packed into a single tensor";
+    return -1;
+  }
+  int batch_size = in->at(0).shape[0];
+  static const SegConf *sc_ptr = SegConf::instance();
+  // call paddle fluid model for inferencing
+  std::string model_name;
+  sc_ptr->get_model_name(model_name);
+  LOG(INFO) << "model name = " << model_name;
+  int ret;  
+  if ((ret = InferManager::instance().infer(
+          model_name.c_str(), in, out, batch_size))) {
+    LOG(ERROR) << "Failed do infer in fluid model: "
+               << model_name;
+    return -1;
+  }
+  LOG(INFO) << "ret = " << ret;
+  if (out->size() != in->size()) {
+    LOG(ERROR) << "Output size not eq input size: " << in->size()
+               << out->size();
+    return -1;
+  }
+  // copy output tensor into response
+  ImageSegResponse* res = mutable_data<ImageSegResponse>();
+  const paddle::PaddleTensor& out_tensor = (*out)[0];
+  int sample_size = out_tensor.shape[0];
+  uint32_t total_size = 1;
+  for (int i = 0; i < out_tensor.shape.size(); ++i) {
+      total_size *= out_tensor.shape[i];    
+  }
+  LOG(INFO) << "total_size = " << total_size;
+  uint32_t item_size = total_size / sample_size;
+  for (uint32_t si = 0; si < sample_size; si++) {
+    ImageSegResItem* ins = res->add_item();
+//    res->add_width((*width_vec)[si]);
+//    res->add_height((*height_vec)[si]);
+    if (!ins) {
+      LOG(ERROR) << "Failed append new out tensor";
+      return -1;
+    }
+    // assign output data
+    float* data = reinterpret_cast<float*>(out_tensor.data.data() +
+                                           si * sizeof(float) * item_size);
+    std::vector<int> size_vec;
+    sc_ptr->get_size_vector(size_vec);
+    int width = size_vec[0];
+    int height = size_vec[1];
+    int class_num;
+    sc_ptr->get_class_num(class_num);	   
+    LOG(INFO) << "width = " << width << ", height = " << height << ", class_num = " << class_num; 
+    uint32_t out_size = width * height;
+    mask_raw.clear();
+    mask_raw.resize(out_size);
+    for (uint32_t di = 0; di < out_size; ++di) {
+        float max_value = -1;
+        int label = 0;
+        for (int j = 0; j < class_num; ++j) {
+            int index = di + j * out_size;
+            if (index >= class_num * width * height) {
+              break;
+            }
+            float value = data[index];
+            if (value > max_value){
+                max_value = value;
+                label = j;
+            }
+        }
+        if (label == 0) max_value = 0;
+        mask_raw[di] = label;
+    }
+    cv::Mat mask_mat = cv::Mat(height, width, CV_8UC1);
+    mask_mat.data = mask_raw.data();
+    cv::Mat mask_temp_mat((*height_vec)[si], (*width_vec)[si], mask_mat.type());
+    //Size(cols, rows)
+    cv::resize(mask_mat, mask_temp_mat, mask_temp_mat.size());
+   // cv::resize(mask_mat, mask_temp_mat, cv::Size((*width_vec)[si], (*height_vec)[si]));
+    std::vector<uchar> mat_buff;
+    cv::imencode(".png", mask_temp_mat, mat_buff);
+    ins->set_mask(mat_buff.data(), mat_buff.size());
+  }
+  // release out tensor object resource
+  size_t out_size = out->size();
+  for (size_t oi = 0; oi < out_size; ++oi) {
+    (*out)[oi].shape.clear();
+  }
+  out->clear();
+  butil::return_object<TensorVector>(out);
+  return 0;
+}
+DEFINE_OP(ImageSegOp);
+}  // namespace serving
+}  // namespace paddle_serving
+}  // namespace baidu
--- a/serving/seg-serving/op/image_seg_op.h
+++ b/serving/seg-serving/op/image_seg_op.h
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#pragma once
+#include <vector>
+#include "paddle/fluid/inference/paddle_inference_api.h"
+#include "seg-serving/image_seg.pb.h"
+namespace baidu {
+namespace paddle_serving {
+namespace serving {
+// rename
+static const char* IMAGE_CLASSIFICATION_MODEL_NAME =
+    "image_seg_deeplabv3p";
+class ImageSegOp : public baidu::paddle_serving::predictor::OpWithChannel<
+                       baidu::paddle_serving::image_segmentation::
+                           ImageSegResponse> {
+ public:
+  typedef std::vector<paddle::PaddleTensor> TensorVector;
+  DECLARE_OP(ImageSegOp);
+  int inference();
+ private:
+  std::vector<unsigned char> mask_raw;
+};
+}  // namespace serving
+}  // namespace paddle_serving
+}  // namespace baidu
--- a/serving/seg-serving/op/reader_op.cpp
+++ b/serving/seg-serving/op/reader_op.cpp
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#include <algorithm>
+#include "predictor/framework/memory.h"
+#include "seg-serving/op/reader_op.h"
+#include "seg-serving/op/seg_conf.h"
+namespace baidu {
+namespace paddle_serving {
+namespace serving {
+using baidu::paddle_serving::predictor::MempoolWrapper;
+using baidu::paddle_serving::image_segmentation::Request;
+using baidu::paddle_serving::image_segmentation::ImageSegReqItem;
+using baidu::utils::seg_conf::SegConf;
+int ReaderOp::inference() {
+  const Request* req = dynamic_cast<const Request*>(get_request_message());
+//  LOG(INFO) << "Receive request in dense service:" << req->ShortDebugString();
+  ReaderOutput* res = mutable_data<ReaderOutput>();
+  if (!res) {
+    LOG(ERROR) << "Failed get op tls reader object output";
+    return -1;
+  }
+  TensorVector* in = &res->tensors;
+  uint32_t batch_size = req->instances_size();
+  if (batch_size <= 0) {
+    LOG(WARNING) << "No instances need to inference!";
+    return -1;
+  }
+  static const SegConf *sc_ptr = SegConf::instance();
+  std::vector<double> pmean;
+  if(sc_ptr->get_mean_vector(pmean) != 0) {
+   LOG(ERROR) << "Can't load the mean items";
+   return -1; 
+  }
+  std::vector<double> scale;
+  if(sc_ptr->get_std_vector(scale) != 0) {
+    LOG(ERROR) << "Can't load the scale items";
+    return -1;
+  }
+  std::vector<int> iresize;
+  if(sc_ptr->get_size_vector(iresize) != 0) {
+    LOG(ERROR) << "Can't load size vector";
+    return -1;
+  }
+  int channels;
+  if(sc_ptr->get_channels(channels) != 0) {
+    LOG(ERROR) << "Can't load channels";
+    return -1;
+  }
+  //bool enable_crop = SegConf._enable_crop;
+  cv::Size resize;
+  resize.height = iresize[1];
+  resize.width = iresize[0];
+  paddle::PaddleTensor in_tensor;
+  in_tensor.name = "image";
+  in_tensor.dtype = paddle::FLOAT32;
+  // shape assignment
+  in_tensor.shape.push_back(batch_size);  // batch_size
+  in_tensor.shape.push_back(channels);
+  in_tensor.shape.push_back(resize.height);
+  in_tensor.shape.push_back(resize.width);
+  // tls resource assignment
+  size_t dense_capacity = channels * resize.width * resize.height;
+  size_t len = dense_capacity * sizeof(float) * batch_size;
+  // Allocate buffer in PaddleTensor, so that buffer will be managed by the
+  // Tensor
+  in_tensor.data.Resize(len);
+  float* data = reinterpret_cast<float*>(in_tensor.data.data());
+  if (in_tensor.data.data() == NULL) {
+    LOG(ERROR) << "Failed create temp float array, "
+               << "size=" << dense_capacity * batch_size * sizeof(float);
+    return -1;
+  }
+  std::vector<int> *in_width_vec = &res->width_vec;
+  std::vector<int> *in_height_vec = &res->height_vec;
+  for (uint32_t si = 0; si < batch_size; si++) {
+    // parse image object from x-image
+    const ImageSegReqItem& ins = req->instances(si);
+    // read dense image from request bytes
+    const char* binary = ins.image_binary().c_str();
+    size_t length = ins.image_length();
+    if (length == 0) {
+      LOG(ERROR) << "Empty image, length is 0";
+      return -1;
+    }
+    _image_vec_tmp.clear();
+    _image_vec_tmp.assign(binary, binary + length);
+    _image_8u_tmp = cv::imdecode(cv::Mat(_image_vec_tmp),
+                CV_LOAD_IMAGE_UNCHANGED);
+    if (_image_8u_tmp.data == NULL) {
+      LOG(ERROR) << "Image decode failed!";
+      return -1;
+    }
+    // accumulate length
+    const int HH = _image_8u_tmp.rows;
+    const int WW = _image_8u_tmp.cols;
+    const int CC = _image_8u_tmp.channels();
+   //HH: cols WW:rows 
+    in_width_vec->push_back(HH);
+    in_height_vec->push_back(WW);
+    // resize/crop
+    if (_image_8u_tmp.cols != resize.width ||
+        _image_8u_tmp.rows != resize.height) {
+//      int short_egde = std::min<int>(_image_8u_tmp.cols, _image_8u_tmp.rows);
+//      int yy = static_cast<int>((_image_8u_tmp.rows - short_egde) / 2);
+//      int xx = static_cast<int>((_image_8u_tmp.cols - short_egde) / 2);
+//      _image_8u_tmp =
+//          cv::Mat(_image_8u_tmp, cv::Rect(xx, yy, short_egde, short_egde));
+//      if (_image_8u_tmp.cols != resize.width ||
+//          _image_8u_tmp.rows != resize.height) {
+          cv::Mat resize_image;
+//        cv::resize(_image_8u_tmp, resize_image, resize);
+//        _image_8u_tmp = resize_image;
+//      }
+//
+      cv::resize(_image_8u_tmp, resize_image, resize);
+      _image_8u_tmp = resize_image;
+      LOG(INFO) << "Succ crop one image[CHW=" << _image_8u_tmp.channels()
+                << ", " << _image_8u_tmp.cols << ", " << _image_8u_tmp.rows
+                << "]"
+                << " from image[CHW=" << CC << ", " << HH << ", " << WW << "]";
+    }
+    // BGR->RGB transformer
+    //cv::cvtColor(_image_8u_tmp, _image_8u_rgb, cv::COLOR_GRAY2BGR);
+    _image_8u_rgb = _image_8u_tmp;
+    const int H = _image_8u_rgb.rows;
+    const int W = _image_8u_rgb.cols;
+    const int C = _image_8u_rgb.channels();
+    if (H != resize.height || W != resize.width || C != channels) {
+      LOG(ERROR) << "Image " << si << " has incompitable size (" << H << ", " << W << "," << C << ")";
+      return -1;
+    }
+    LOG(INFO) << "Succ read one image, C: " << C << ", W: " << W
+              << ", H: " << H;
+    float* data_ptr = data + dense_capacity * si;
+    for (int h = 0; h < H; h++) {
+      // p points to a new line
+      unsigned char* p = _image_8u_rgb.ptr<unsigned char>(h);
+      for (int w = 0; w < W; w++) {
+        for (int c = 0; c < C; c++) {
+          // HWC(row,column,channel) -> CWH
+          data_ptr[W * H * c + W * h + w] =
+              (p[C * w + c] - pmean[c]) / scale[c];
+         //HWC->CHW
+         //data_ptr[W * H * c + w * H + h] = 
+         //     (p[C * w + c] - pmean[c]) / scale[c];
+        }
+      }
+    }
+  }
+  in->push_back(in_tensor);
+  return 0;
+}
+DEFINE_OP(ReaderOp);
+}  // namespace serving
+}  // namespace paddle_serving
+}  // namespace baidu
--- a/serving/seg-serving/op/reader_op.h
+++ b/serving/seg-serving/op/reader_op.h
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#pragma once
+// stl
+#include <string>
+#include <vector>
+// paddle inference
+#include "paddle/fluid/inference/paddle_inference_api.h"
+#include "predictor/builtin_format.pb.h"
+#include "predictor/common/inner_common.h"
+#include "predictor/framework/channel.h"
+#include "predictor/framework/op_repository.h"
+#include "predictor/op/op.h"
+// opencv
+#include "opencv/cv.h"
+#include "opencv/cv.hpp"
+#include "opencv/cxcore.h"
+#include "opencv/highgui.h"
+// project related
+#include "seg-serving/image_seg.pb.h"
+namespace baidu {
+namespace paddle_serving {
+namespace serving {
+struct ReaderOutput {
+  std::vector<paddle::PaddleTensor> tensors;
+  std::vector<int> width_vec;
+  std::vector<int> height_vec;
+  void Clear() {
+    size_t tensor_count = tensors.size();
+    for (size_t ti = 0; ti < tensor_count; ++ti) {
+      tensors[ti].shape.clear();
+    }
+    tensors.clear();
+    width_vec.clear();
+    height_vec.clear();    
+  }
+  std::string ShortDebugString() const { return "Not implemented!"; }
+};
+class ReaderOp
+    : public baidu::paddle_serving::predictor::OpWithChannel<ReaderOutput> {
+ public:
+  typedef std::vector<paddle::PaddleTensor> TensorVector;
+  DECLARE_OP(ReaderOp);
+  int inference();
+ private:
+  cv::Mat _image_8u_tmp;
+  cv::Mat _image_8u_rgb;
+  std::vector<char> _image_vec_tmp;
+};
+}  // namespace serving
+}  // namespace paddle_serving
+}  // namespace baidu
--- a/serving/seg-serving/op/seg_conf.cpp
+++ b/serving/seg-serving/op/seg_conf.cpp
+#include "seg_conf.h"
+DEFINE_string(seg_conf_file, "conf/seg_conf.yaml", "seg configuration filename");
+namespace baidu{
+namespace utils{
+namespace seg_conf{
+SegConf::SegConf(const std::string &configuration_filename) {
+    std::cout << "filename: " << configuration_filename << std::endl;
+    try{
+        if(!_seg_conf_file.open(configuration_filename, cv::FileStorage::READ)){        
+                std::cout << "Configuration file open error!" << std::endl;
+        }
+    } catch(...){
+        std::cout << "error" << std::endl;
+    }
+}
+SegConf::~SegConf(){
+    _seg_conf_file.release();
+}
+bool SegConf::get_item_by_name(const std::string &conf_node_name, cv::FileNode &return_file_node) const{
+    return_file_node = _seg_conf_file[conf_node_name];
+    if(return_file_node.isNone()) {
+        std::cout << "You haven't configure this item" << std::endl;
+        return false;
+    }
+    return true;
+}
+int SegConf::get_mean_vector(std::vector<double> &mean_vec) const {
+    return get_array_from_file_node("MEAN", mean_vec);
+}
+int SegConf::get_std_vector(std::vector<double> &std_vec) const{
+    return get_array_from_file_node("STD", std_vec);
+}
+int SegConf::get_size_vector(std::vector<int> &size_vec) const{
+    return get_array_from_file_node("SIZE", size_vec);
+}
+int SegConf::get_channels(int &channels) const{
+    return get_scalar_from_file_node("CHANNELS", channels);
+} 
+int SegConf::get_class_num(int &class_num) const {
+    return get_scalar_from_file_node("CLASS_NUM", class_num);
+}
+int SegConf::get_model_name(std::string &name) const {
+    return get_scalar_from_file_node("MODEL_NAME", name);
+}
+const SegConf* SegConf::instance() {
+//lock
+    static const SegConf s_seg_conf_instance(FLAGS_seg_conf_file);
+    return &s_seg_conf_instance;
+}
+//SegConf SegConf::s_seg_conf_instance(FLAGS_seg_conf_file);
+} //seg_conf
+} //utils
+} //baidu
--- a/serving/seg-serving/op/seg_conf.h
+++ b/serving/seg-serving/op/seg_conf.h
+#ifndef SRC_SEG_CONF_H
+#define SRC_SEG_CONF_H
+#include <string>
+#include <vector>
+#include "opencv2/opencv.hpp"
+#include "gflags/gflags.h"
+DECLARE_string(seg_conf_file);
+namespace baidu{
+namespace utils{
+namespace seg_conf{
+class SegConf{
+private:
+    explicit SegConf(const std::string &configuration_filename);
+public:
+    static const SegConf *instance();
+    bool get_item_by_name(const std::string &conf_node_name, cv::FileNode &return_file_node) const;
+    int get_mean_vector(std::vector<double> &mean_vec) const;
+    int get_std_vector(std::vector<double> &std_vec) const;
+    int get_size_vector(std::vector<int> &size_vec) const;
+    int get_channels(int &channels) const;
+    int get_class_num(int &class_num) const;
+    int get_model_name(std::string &name) const;
+    ~SegConf();
+private:
+    cv::FileStorage _seg_conf_file;
+    //static SegConf s_seg_conf_instance;
+    template <typename T>
+    int get_array_from_file_node(const std::string &conf_node_name, std::vector<T> &vec) const{
+        cv::FileNode node;
+        if(!get_item_by_name(conf_node_name, node) && !node.isSeq()) {
+            return -1;
+        }
+        //node >> vec;
+        cv::FileNodeIterator start_file_node_iter = node.begin();
+        cv::FileNodeIterator end_file_node_iter = node.end();
+        for(cv::FileNodeIterator it = start_file_node_iter; it != end_file_node_iter; ++it) {
+            vec.push_back(static_cast<T>(*it));
+        }
+        return 0;
+    }
+    template<typename T>
+    int get_scalar_from_file_node(const std::string &conf_node_name, T &scalar) const{
+        cv::FileNode node;
+        if(!get_item_by_name(conf_node_name, node) && !(node.isReal() || node.isInt() || node.isString())) {
+            return -1;
+        }
+        node >> scalar;
+        return 0;
+    }
+};
+} //seg_conf
+} //utils
+} //baidu
+#endif 
--- a/serving/seg-serving/op/write_json_op.cpp
+++ b/serving/seg-serving/op/write_json_op.cpp
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#include <string>
+#include <google/protobuf/text_format.h>
+#include "predictor/framework/memory.h"
+#include "json2pb/pb_to_json.h"
+#include "seg-serving/op/write_json_op.h"
+namespace baidu {
+namespace paddle_serving {
+namespace predictor {
+using json2pb::ProtoMessageToJson;
+using baidu::paddle_serving::image_segmentation::ImageSegResponse;
+using baidu::paddle_serving::image_segmentation::ResponseItem;
+using baidu::paddle_serving::image_segmentation::Response;
+int WriteJsonOp::inference() {
+  const ImageSegResponse* seg_out =
+      get_depend_argument<ImageSegResponse>("image_seg_op");
+  if (!seg_out) {
+    LOG(ERROR) << "Failed mutable depended argument, op:"
+               << "image_seg_op";
+    return -1;
+  }
+  Response* res = mutable_data<Response>();
+  if (!res) {
+    LOG(ERROR) << "Failed mutable output response in op:"
+               << "WriteJsonOp";
+    return -1;
+  }
+  // transfer seg output message into json format
+  std::string err_string;
+  uint32_t batch_size = seg_out->item_size();
+  LOG(INFO) << "batch_size = " << batch_size;  
+  LOG(INFO) << seg_out->ShortDebugString();
+  for (uint32_t si = 0; si < batch_size; si++) {
+    ResponseItem* ins = res->add_prediction();
+    //LOG(INFO) << "Original image width = " << seg_out->width(si) << ", height = " << seg_out->height(si);
+    if (!ins) {
+      LOG(ERROR) << "Failed add one prediction ins";
+      return -1;
+    }
+    std::string* text = ins->mutable_info();
+    if (!ProtoMessageToJson(seg_out->item(si), text, &err_string)) {
+      LOG(ERROR) << "Failed convert message["
+                 << seg_out->item(si).ShortDebugString()
+                 << "], err: " << err_string;
+      return -1;
+    }
+  }
+  return 0;
+}
+DEFINE_OP(WriteJsonOp);
+}  // namespace predictor
+}  // namespace paddle_serving
+}  // namespace baidu
--- a/serving/seg-serving/op/write_json_op.h
+++ b/serving/seg-serving/op/write_json_op.h
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#pragma once
+#include "predictor/common/inner_common.h"
+#include "predictor/framework/channel.h"
+#include "predictor/framework/op_repository.h"
+#include "predictor/op/op.h"
+#include "seg-serving/image_seg.pb.h"
+namespace baidu {
+namespace paddle_serving {
+namespace predictor {
+class WriteJsonOp
+    : public OpWithChannel<
+          baidu::paddle_serving::image_segmentation::Response> {
+ public:
+  DECLARE_OP(WriteJsonOp);
+  int inference();
+};
+}  // namespace predictor
+}  // namespace paddle_serving
+}  // namespace baiu
--- a/serving/seg-serving/proto/CMakeLists.txt
+++ b/serving/seg-serving/proto/CMakeLists.txt
+LIST(APPEND protofiles
+        ${CMAKE_CURRENT_LIST_DIR}/image_seg.proto
+)
+PROTOBUF_GENERATE_SERVING_CPP(TRUE PROTO_SRCS PROTO_HDRS ${protofiles})
+LIST(APPEND serving_srcs ${PROTO_SRCS})
--- a/serving/seg-serving/proto/image_seg.proto
+++ b/serving/seg-serving/proto/image_seg.proto
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+syntax = "proto2";
+import "pds_option.proto";
+package baidu.paddle_serving.image_segmentation;
+option cc_generic_services = true;
+message ImageSegReqItem {
+  required bytes image_binary = 1;
+  required uint32 image_length = 2;
+};
+message ImageSegResItem {
+  required bytes mask = 1;
+};
+message ImageSegResponse {
+  repeated ImageSegResItem  item = 1;
+ // repeated int32    width = 2;
+ // repeated int32    height = 3;
+};
+message Request {
+  repeated ImageSegReqItem instances = 1;
+};
+message ResponseItem {
+  required string info = 1;
+};
+message Response {
+  repeated ResponseItem prediction = 1;
+};
+service ImageSegService {
+  rpc inference(Request) returns (Response);
+  rpc debug(Request) returns (Response);
+  option (pds.options).generate_impl = true;
+};
--- a/serving/seg-serving/scripts/start.sh
+++ b/serving/seg-serving/scripts/start.sh
+/home/work/image-class/bin/image_class --workflow_path=/home/work/image-class/conf/ --inferservice_path=/home/work/image-class/conf/ --logger_path=/home/work/image-class/conf/ --resource_path=/home/work/image-class/conf/
--- a/serving/tools/image_seg_client.py
+++ b/serving/tools/image_seg_client.py
+# coding: utf-8
+import sys
+import cv2
+import requests
+import json
+import base64
+import numpy as np
+import time
+import threading
+#分割服务的地址
+#IMAGE_SEG_URL = 'http://yq01-gpu-151-23-00.epc:8010/ImageSegService/inference'
+#IMAGE_SEG_URL = 'http://106.12.25.202:8010/ImageSegService/inference'
+IMAGE_SEG_URL = 'http://180.76.118.53:8010/ImageSegService/inference'
+# 请求预测服务
+# input_img 要预测的图片列表
+def get_item_json(input_img):
+    with open(input_img, mode="rb") as fp:
+        # 使用 http 协议请求服务时, 请使用 base64 编码发送图片
+        item_binary_b64 = str(base64.b64encode(fp.read()), 'utf-8')
+        item_size = len(item_binary_b64)
+        item_json = {
+            "image_length": item_size,
+            "image_binary": item_binary_b64
+        }
+        return item_json
+def request_predictor_server(input_img_list, dir_name):
+    data = {"instances" : [get_item_json(dir_name + input_img) for input_img in input_img_list]}
+    response = requests.post(IMAGE_SEG_URL, data=json.dumps(data))
+    try:
+        response = json.loads(response.text)
+        prediction_list = response["prediction"]
+        mask_response_list = [mask_response["info"] for mask_response in prediction_list]
+        mask_raw_list = [json.loads(mask_response)["mask"] for mask_response in mask_response_list]
+    except Exception as err:
+        print ("Exception[%s], server_message[%s]" % (str(err), response.text))
+        return None
+    # 使用 json 协议回复的包也是 base64 编码过的
+    mask_binary_list = [base64.b64decode(mask_raw) for mask_raw in mask_raw_list]
+    m = [np.fromstring(mask_binary, np.uint8) for mask_binary in mask_binary_list]
+    return m
+# 对预测结果进行可视化
+# input_raw_mask 是server返回的预测结果
+# output_img 是可视化结果存储路径
+def visualization(mask_mat, output_img):
+    # ColorMap for visualization more clearly
+    color_map = [[128, 64, 128],
+                 [244, 35, 231],
+                 [69, 69, 69],
+                 [102, 102, 156],
+                 [190, 153, 153],
+                 [153, 153, 153],
+                 [250, 170, 29],
+                 [219, 219, 0],
+                 [106, 142, 35],
+                 [152, 250, 152],
+                 [69, 129, 180],
+                 [219, 19, 60],
+                 [255, 0, 0],
+                 [0, 0, 142],
+                 [0, 0, 69],
+                 [0, 60, 100],
+                 [0, 79, 100],
+                 [0, 0, 230],
+                 [119, 10, 32]]
+    im = cv2.imdecode(mask_mat, 1)
+    w, h, c = im.shape
+    im2 = cv2.resize(im, (w, h))
+    im = im2
+    for i in range(0, h):
+        for j in range(0, w):
+            im[i, j] = color_map[im[i, j, 0]]
+    cv2.imwrite(output_img, im)
+#benchmark test
+def benchmark_test(batch_size, img_list):
+    start = time.time()
+    total_size = len(img_list)
+    for i in range(0, total_size, batch_size):
+        mask_mat_list = request_predictor_server(img_list[i : np.min([i + batch_size, total_size])], "images/")
+        # 将获得的mask matrix转换成可视化图像，并在当前目录下保存为图像文件
+        # 如果进行压测，可以把这句话注释掉
+        # for j in range(len(mask_mat_list)):
+        #     visualization(mask_mat_list[j], img_list[j + i])
+    latency = time.time() - start
+    print("batch size = %d, total latency = %f s" % (batch_size, latency))
+class ClientThread(threading.Thread):
+    def __init__(self, thread_id, batch_size):
+        threading.Thread.__init__(self)
+        self.__thread_id = thread_id
+        self.__batch_size = batch_size
+    def run(self):
+        self.request_image_seg_service(3)
+    def request_image_seg_service(self, imgs_num):
+        total_size = imgs_num
+        img_list = [str(i + 1) + ".jpg" for i in range(total_size)]
+        # batch_size_list = [2**i for i in range(0, 4)]
+        # 持续发送150个请求
+        batch_size_list = [self.__batch_size] * 150
+        i = 1
+        for batch_size in batch_size_list:
+            print("Epoch %d, thread %d" % (i, self.__thread_id))
+            i += 1
+            benchmark_test(batch_size, img_list)
+def create_thread_pool(thread_num, batch_size):
+    return [ClientThread(i + 1, batch_size) for i in range(thread_num)]
+def run_threads(thread_pool):
+    for thread in thread_pool:
+        thread.start()
+    for thread in thread_pool:
+        thread.join()
+if __name__ == "__main__":
+    thread_pool = create_thread_pool(thread_num=2, batch_size=1)
+    run_threads(thread_pool)
--- a/serving/tools/images/1.jpg
+++ b/serving/tools/images/1.jpg
--- a/serving/tools/images/2.jpg
+++ b/serving/tools/images/2.jpg
--- a/serving/tools/images/3.jpg
+++ b/serving/tools/images/3.jpg
--- a/test/configs/deeplabv3p_mobilenetv2_line.yaml
+++ b/test/configs/deeplabv3p_mobilenetv2_line.yaml
+EVAL_CROP_SIZE: (1536, 576) # (width, height), for unpadding rangescaling and stepscaling
+TRAIN_CROP_SIZE: (1536, 576) # (width, height), for unpadding rangescaling and stepscaling
+AUG:
+    AUG_METHOD: "unpadding" # choice unpadding rangescaling and stepscaling
+    FIX_RESIZE_SIZE: (1536, 576) # (width, height), for unpadding
+    INF_RESIZE_VALUE: 1280  # for rangescaling
+    MAX_RESIZE_VALUE: 1024  # for rangescaling
+    MIN_RESIZE_VALUE: 1536 # for rangescaling
+    MAX_SCALE_FACTOR: 2.0  # for stepscaling
+    MIN_SCALE_FACTOR: 0.5  # for stepscaling
+    SCALE_STEP_SIZE: 0.25  # for stepscaling
+    MIRROR: True
+    RICH_CROP:
+        ENABLE: False
+        ASPECT_RATIO: 0.33
+        BLUR: True
+        BLUR_RATIO: 0.1
+        FLIP: True
+        FLIP_RATIO: 0.2
+        MAX_ROTATION: 15
+        MIN_AREA_RATIO: 0.5
+        BRIGHTNESS_JITTER_RATIO: 0.5
+        CONTRAST_JITTER_RATIO: 0.5
+        SATURATION_JITTER_RATIO: 0.5
+BATCH_SIZE: 1
+MEAN: [127.5, 127.5, 127.5]
+STD: [127.5, 127.5, 127.5]
+DATASET:
+    DATA_DIR: "./dataset/line/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 2
+    TEST_FILE_LIST: "./dataset/line/test_list.txt"
+    SEPARATOR: " "
+    IGNORE_INDEX: 255
+FREEZE:
+    MODEL_FILENAME: "__model__"
+    PARAMS_FILENAME: "__params__"
+    SAVE_DIR: "line_freeze_model"
+MODEL:
+    DEFAULT_NORM_TYPE: "bn"
+    MODEL_NAME: "deeplabv3p"
+    DEEPLAB:
+        BACKBONE: "mobilenet"
+TEST:
+    TEST_MODEL: "./test/models/line/"
+TRAIN:
+    MODEL_SAVE_DIR: "snapshots/line_v4/"
+    PRETRAINED_MODEL: "./models/deeplabv3p_mobilenetv2_init/"
+    RESUME: False
+    SNAPSHOT_EPOCH: 40
+SOLVER:
+    LR: 0.01
+    LR_POLICY: "poly"
+    OPTIMIZER: "sgd"
+    SNAPSHOT: 10
--- a/test/configs/deeplabv3p_xception65_cityscapes.yaml
+++ b/test/configs/deeplabv3p_xception65_cityscapes.yaml
+EVAL_CROP_SIZE: (2049, 1025) # (width, height), for unpadding rangescaling and stepscaling
+TRAIN_CROP_SIZE: (769, 769) # (width, height), for unpadding rangescaling and stepscaling
+AUG:
+    AUG_METHOD: "stepscaling" # choice unpadding rangescaling and stepscaling
+    FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding
+    INF_RESIZE_VALUE: 500  # for rangescaling
+    MAX_RESIZE_VALUE: 600  # for rangescaling
+    MIN_RESIZE_VALUE: 400  # for rangescaling
+    MAX_SCALE_FACTOR: 2.0  # for stepscaling
+    MIN_SCALE_FACTOR: 0.5  # for stepscaling
+    SCALE_STEP_SIZE: 0.25  # for stepscaling
+    MIRROR: True
+    RICH_CROP:
+        ENABLE: False
+        ASPECT_RATIO: 0.33
+        BLUR: True
+        BLUR_RATIO: 0.1
+        FLIP: True
+        FLIP_RATIO: 0.2
+        MAX_ROTATION: 15
+        MIN_AREA_RATIO: 0.5
+        BRIGHTNESS_JITTER_RATIO: 0.5
+        CONTRAST_JITTER_RATIO: 0.5
+        SATURATION_JITTER_RATIO: 0.5
+BATCH_SIZE: 4
+MEAN: [0.5, 0.5, 0.5]
+STD: [0.5, 0.5, 0.5]
+DATASET:
+    DATA_DIR: "./dataset/cityscapes/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 19
+    TEST_FILE_LIST: "dataset/cityscapes/val.list"
+    TRAIN_FILE_LIST: "dataset/cityscapes/train.list"
+    VAL_FILE_LIST: "dataset/cityscapes/val.list"
+    VIS_FILE_LIST: "dataset/cityscapes/vis.list"
+    SEPARATOR: " "
+    IGNORE_INDEX: 255
+FREEZE:
+    MODEL_FILENAME: "__model__"
+    PARAMS_FILENAME: "__params__"
+MODEL:
+    DEFAULT_NORM_TYPE: "gn"
+    MODEL_NAME: "deeplabv3p"
+    DEEPLAB:
+        ASPP_WITH_SEP_CONV: True
+        DECODER_USE_SEP_CONV: True
+TEST:
+    TEST_MODEL: "snapshots/cityscape_v5/final/"
+TRAIN:
+    MODEL_SAVE_DIR: "snapshots/cityscape_v5/"
+    PRETRAINED_MODEL: "pretrain/deeplabv3plus_gn_init"
+    RESUME: False
+    SNAPSHOT_EPOCH: 10
+SOLVER:
+    LR: 0.001
+    LR_POLICY: "poly"
+    OPTIMIZER: "sgd"
+    NUM_EPOCHS: 700
--- a/test/configs/deeplabv3p_xception65_humanseg.yaml
+++ b/test/configs/deeplabv3p_xception65_humanseg.yaml
+TRAIN_CROP_SIZE: (513, 513) # (width, height), for unpadding rangescaling and stepscaling
+EVAL_CROP_SIZE: (513, 513) # (width, height), for unpadding rangescaling and stepscaling
+AUG:
+    AUG_METHOD: u"unpadding" # choice unpadding rangescaling and stepscaling
+    FIX_RESIZE_SIZE: (513, 513) # (width, height), for unpadding
+    INF_RESIZE_VALUE: 513  # for rangescaling
+    MAX_RESIZE_VALUE: 400  # for rangescaling
+    MIN_RESIZE_VALUE: 513  # for rangescaling
+    MAX_SCALE_FACTOR: 2.0  # for stepscaling
+    MIN_SCALE_FACTOR: 0.5  # for stepscaling
+    SCALE_STEP_SIZE: 0.25  # for stepscaling
+    MIRROR: True
+    RICH_CROP:
+        ENABLE: True
+        ASPECT_RATIO: 0
+        BLUR: True
+        BLUR_RATIO: 0.1
+        FLIP: True
+        FLIP_RATIO: 0.2
+        MAX_ROTATION: 45
+        MIN_AREA_RATIO: 0
+        BRIGHTNESS_JITTER_RATIO: 0.5
+        CONTRAST_JITTER_RATIO: 0.5
+        SATURATION_JITTER_RATIO: 0.5
+BATCH_SIZE: 24
+MEAN: [104.008, 116.669, 122.675]
+STD: [1.0, 1.0, 1.0]
+DATASET:
+    DATA_DIR: u"./data/humanseg/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 2
+    TEST_FILE_LIST: u"data/humanseg/list/val.txt"
+    TRAIN_FILE_LIST: u"data/humanseg/list/train.txt"
+    VAL_FILE_LIST: u"data/humanseg/list/val.txt"
+    IGNORE_INDEX: 255
+    SEPARATOR: "|"
+FREEZE:
+    MODEL_FILENAME: "__model__"
+    PARAMS_FILENAME: "__params__"
+    SAVE_DIR: "human_freeze_model"
+MODEL:
+    DEFAULT_NORM_TYPE: u"bn"
+    MODEL_NAME: "deeplabv3p"
+    DEEPLAB:
+        BACKBONE: "xception_65"
+TEST:
+    TEST_MODEL: "snapshots/humanseg/aic_v2/final/"
+TRAIN:
+    MODEL_SAVE_DIR: "snapshots/humanseg/aic_v2/"
+    PRETRAINED_MODEL: "pretrain/xception65_pretrained/"
+    RESUME: False
+    SNAPSHOT_EPOCH: 5
+SOLVER:
+    LR: 0.1
+    NUM_EPOCHS: 40
+    LR_POLICY: "poly"
+    OPTIMIZER: "sgd"
--- a/test/configs/unet_coco.yaml
+++ b/test/configs/unet_coco.yaml
+EVAL_CROP_SIZE: (512, 512) # (width, height), for unpadding rangescaling and stepscaling
+TRAIN_CROP_SIZE: (512, 512) # (width, height), for unpadding rangescaling and stepscaling
+AUG:
+    AUG_METHOD: u"stepscaling" # choice unpadding rangescaling and stepscaling
+    FIX_RESIZE_SIZE: (640, 640) # (width, height), for unpadding
+    INF_RESIZE_VALUE: 500  # for rangescaling
+    MAX_RESIZE_VALUE: 600  # for rangescaling
+    MIN_RESIZE_VALUE: 400  # for rangescaling
+    MAX_SCALE_FACTOR: 2.0  # for stepscaling
+    MIN_SCALE_FACTOR: 0.5  # for stepscaling
+    SCALE_STEP_SIZE: 0.25  # for stepscaling
+    MIRROR: True
+    RICH_CROP:
+        ENABLE: False
+        ASPECT_RATIO: 0.33
+        BLUR: True
+        BLUR_RATIO: 0.1
+        FLIP: True
+        FLIP_RATIO: 0.2
+        MAX_ROTATION: 15
+        MIN_AREA_RATIO: 0.5
+        BRIGHTNESS_JITTER_RATIO: 0.5
+        CONTRAST_JITTER_RATIO: 0.5
+        SATURATION_JITTER_RATIO: 0.5
+BATCH_SIZE: 10
+#MEAN: [104.008, 116.669, 122.675]
+#STD: [1.0, 1.0, 1.0]
+MEAN: [127.5, 127.5, 127.5]
+STD: [127.5, 127.5, 127.5]
+DATASET:
+    DATA_DIR: "./data/COCO2014/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 21
+    TEST_FILE_LIST: "data/COCO2014/ImageSets/val.txt"
+    TRAIN_FILE_LIST: "data/COCO2014/ImageSets/train.txt"
+    VAL_FILE_LIST: "data/COCO2014/ImageSets/val.txt"
+    SEPARATOR: "|"
+    IGNORE_INDEX: 255
+FREEZE:
+    MODEL_FILENAME: "model"
+    PARAMS_FILENAME: "params"
+MODEL:
+    DEFAULT_NORM_TYPE: "bn"
+    MODEL_NAME: "unet"
+    UNET:
+        UPSAMPLE_MODE: "bilinear"
+TEST:
+    TEST_MODEL: "snapshots/coco_v1/"
+TRAIN:
+    MODEL_SAVE_DIR: "snapshots/coco_v1/"
+    PRETRAINED_MODEL: ""
+    RESUME: False
+    SNAPSHOT_EPOCH: 10
+SOLVER:
+    LR: 0.025
+    WEIGHT_DECAY: 0.00004
+    NUM_EPOCHS: 50
+    LR_POLICY: "piecewise"
+    OPTIMIZER: "Adam"
+    DECAY_EPOCH: "20,35,45"
--- a/test/configs/unet_pet.yaml
+++ b/test/configs/unet_pet.yaml
+TRAIN_CROP_SIZE: (512, 512) # (width, height), for unpadding rangescaling and stepscaling
+EVAL_CROP_SIZE: (512, 512) # (width, height), for unpadding rangescaling and stepscaling
+AUG:
+    AUG_METHOD: "unpadding" # choice unpadding rangescaling and stepscaling
+    FIX_RESIZE_SIZE: (512, 512) # (width, height), for unpadding
+    INF_RESIZE_VALUE: 500  # for rangescaling
+    MAX_RESIZE_VALUE: 600  # for rangescaling
+    MIN_RESIZE_VALUE: 400  # for rangescaling
+    MAX_SCALE_FACTOR: 1.25  # for stepscaling
+    MIN_SCALE_FACTOR: 0.75  # for stepscaling
+    SCALE_STEP_SIZE: 0.25  # for stepscaling
+    MIRROR: True
+    RICH_CROP:
+        ENABLE: False
+        ASPECT_RATIO: 0.33
+        BLUR: True
+        BLUR_RATIO: 0.1
+        FLIP: True
+        FLIP_RATIO: 0.2
+        MAX_ROTATION: 15
+        MIN_AREA_RATIO: 0.5
+        BRIGHTNESS_JITTER_RATIO: 0.5
+        CONTRAST_JITTER_RATIO: 0.5
+        SATURATION_JITTER_RATIO: 0.5
+BATCH_SIZE: 6
+MEAN: [104.008, 116.669, 122.675]
+STD: [1.0, 1.0, 1.0]
+DATASET:
+    DATA_DIR: "./dataset/pet/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 4 # including ignore
+    TEST_FILE_LIST: "./dataset/pet/test_list.txt"
+    TRAIN_FILE_LIST: "./dataset/pet/train_list.txt"
+    VAL_FILE_LIST: "./dataset/pet/val_list.txt"
+    VIS_FILE_LIST: "./dataset/pet/val_list.txt"
+    IGNORE_INDEX: 255
+    SEPARATOR: " "
+FREEZE:
+    MODEL_FILENAME: "__model__"
+    PARAMS_FILENAME: "__params__"
+MODEL:
+    MODEL_NAME: "unet"
+    DEFAULT_NORM_TYPE: "bn"
+TEST:
+    TEST_MODEL: "./test/saved_model/unet_pet/final/"
+TRAIN:
+    MODEL_SAVE_DIR: "./test/saved_models/unet_pet/"
+    PRETRAINED_MODEL: "./test/models/unet_coco/"
+    RESUME: False
+    SNAPSHOT_EPOCH: 10
+SOLVER:
+    NUM_EPOCHS: 500
+    LR: 0.005
+    LR_POLICY: "poly"
+    OPTIMIZER: "adam"
--- a/test/local_test_cityscapes.py
+++ b/test/local_test_cityscapes.py
+# Copyright (c) 2019  PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from test_utils import download_file_and_uncompress, train, eval, vis, export_model
+import os
+LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
+DATASET_PATH = os.path.join(LOCAL_PATH, "..", "dataset")
+MODEL_PATH = os.path.join(LOCAL_PATH, "models")
+def download_cityscapes_dataset(savepath, extrapath):
+    url = "https://paddleseg.bj.bcebos.com/dataset/cityscapes.tar"
+    download_file_and_uncompress(
+        url=url, savepath=savepath, extrapath=extrapath)
+def download_deeplabv3p_xception65_cityscapes_model(savepath, extrapath):
+    url = "https://paddleseg.bj.bcebos.com/models/deeplabv3p_xception65_cityscapes.tgz"
+    download_file_and_uncompress(
+        url=url, savepath=savepath, extrapath=extrapath)
+if __name__ == "__main__":
+    download_cityscapes_dataset(".", DATASET_PATH)
+    download_deeplabv3p_xception65_cityscapes_model(".", MODEL_PATH)
+    model_name = "deeplabv3p_xception65_cityscapes"
+    test_model = os.path.join(LOCAL_PATH, "models", model_name)
+    cfg = os.path.join(LOCAL_PATH, "configs", "{}.yaml".format(model_name))
+    freeze_save_dir = os.path.join(LOCAL_PATH, "inference_model", model_name)
+    vis_dir = os.path.join(LOCAL_PATH, "visual", model_name)
+    saved_model = os.path.join(LOCAL_PATH, "saved_model", model_name)
+    devices = ['0']
+    export_model(
+        flags=["--cfg", cfg],
+        options=[
+            "TEST.TEST_MODEL", test_model, "FREEZE.SAVE_DIR", freeze_save_dir
+        ],
+        devices=devices)
+    # Final eval results should be #image=500 acc=0.9615 IoU=0.7804
+    eval(
+        flags=["--cfg", cfg, "--use_gpu"],
+        options=["TEST.TEST_MODEL", test_model],
+        devices=devices)
+    vis(flags=["--cfg", cfg, "--use_gpu", "--local_test", "--vis_dir", vis_dir],
+        options=["TEST.TEST_MODEL", test_model],
+        devices=devices)
+    train(
+        flags=["--cfg", cfg, "--use_gpu", "--log_steps", "10"],
+        options=[
+            "SOLVER.NUM_EPOCHS", "1", "TRAIN.PRETRAINED_MODEL", test_model,
+            "TRAIN.MODEL_SAVE_DIR", saved_model
+        ],
+        devices=devices)
--- a/test/local_test_pet.py
+++ b/test/local_test_pet.py
+# Copyright (c) 2019  PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from test_utils import download_file_and_uncompress, train, eval, vis, export_model
+import os
+LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
+DATASET_PATH = os.path.join(LOCAL_PATH, "..", "dataset")
+MODEL_PATH = os.path.join(LOCAL_PATH, "models")
+def download_pet_dataset(savepath, extrapath):
+    url = "https://paddleseg.bj.bcebos.com/dataset/mini_pet.zip"
+    download_file_and_uncompress(
+        url=url, savepath=savepath, extrapath=extrapath)
+def download_unet_coco_model(savepath, extrapath):
+    url = "https://bj.bcebos.com/v1/paddleseg/models/unet_coco_init.tgz"
+    download_file_and_uncompress(
+        url=url, savepath=savepath, extrapath=extrapath)
+if __name__ == "__main__":
+    download_pet_dataset(LOCAL_PATH, DATASET_PATH)
+    download_unet_coco_model(LOCAL_PATH, MODEL_PATH)
+    model_name = "unet_pet"
+    test_model = os.path.join(LOCAL_PATH, "models", "unet_coco_init")
+    cfg = os.path.join(LOCAL_PATH, "..", "configs",
+                       "{}.yaml".format(model_name))
+    freeze_save_dir = os.path.join(LOCAL_PATH, "inference_model", model_name)
+    vis_dir = os.path.join(LOCAL_PATH, "visual", model_name)
+    saved_model = os.path.join(LOCAL_PATH, "saved_model", model_name)
+    devices = ['0']
+    train(
+        flags=["--cfg", cfg, "--use_gpu", "--log_steps", "10"],
+        options=[
+            "SOLVER.NUM_EPOCHS", "1", "TRAIN.PRETRAINED_MODEL", test_model,
+            "TRAIN.MODEL_SAVE_DIR", saved_model, "DATASET.TRAIN_FILE_LIST",
+            os.path.join(DATASET_PATH, "mini_pet", "file_list",
+                         "train_list.txt"), "DATASET.VAL_FILE_LIST",
+            os.path.join(DATASET_PATH, "mini_pet", "file_list",
+                         "val_list.txt"), "DATASET.TEST_FILE_LIST",
+            os.path.join(DATASET_PATH, "mini_pet", "file_list",
+                         "test_list.txt"), "DATASET.DATA_DIR",
+            os.path.join(DATASET_PATH, "mini_pet"), "BATCH_SIZE", "1"
+        ],
+        devices=devices)
+    eval(
+        flags=["--cfg", cfg, "--use_gpu"],
+        options=[
+            "TEST.TEST_MODEL",
+            os.path.join(saved_model, "final"), "DATASET.VAL_FILE_LIST",
+            os.path.join(DATASET_PATH, "mini_pet", "file_list", "val_list.txt"),
+            "DATASET.DATA_DIR",
+            os.path.join(DATASET_PATH, "mini_pet")
+        ],
+        devices=devices)
+    vis(flags=["--cfg", cfg, "--use_gpu", "--local_test", "--vis_dir", vis_dir],
+        options=[
+            "DATASET.TEST_FILE_LIST",
+            os.path.join(DATASET_PATH, "mini_pet", "file_list",
+                         "test_list.txt"), "DATASET.DATA_DIR",
+            os.path.join(DATASET_PATH, "mini_pet"), "TEST.TEST_MODEL",
+            os.path.join(saved_model, "final")
+        ],
+        devices=devices)
+    export_model(
+        flags=["--cfg", cfg],
+        options=[
+            "TEST.TEST_MODEL",
+            os.path.join(saved_model, "final"), "FREEZE.SAVE_DIR",
+            freeze_save_dir
+        ],
+        devices=devices)
--- a/test/test_utils.py
+++ b/test/test_utils.py
+# Copyright (c) 2019  PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import time
+import shutil
+import requests
+import sys
+import tarfile
+import zipfile
+import platform
+lasttime = time.time()
+FLUSH_INTERVAL = 0.1
+LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
+PDSEG_PATH = os.path.join(LOCAL_PATH, "..", "pdseg")
+def get_platform():
+    return platform.platform()
+def is_windows():
+    return get_platform().lower().startswith("windows")
+def progress(str, end=False):
+    global lasttime
+    if end:
+        str += "\n"
+        lasttime = 0
+    if time.time() - lasttime >= FLUSH_INTERVAL:
+        sys.stdout.write("\r%s" % str)
+        lasttime = time.time()
+        sys.stdout.flush()
+def _download_file(url, savepath, print_progress):
+    r = requests.get(url, stream=True)
+    total_length = r.headers.get('content-length')
+    if total_length is None:
+        with open(savepath, 'wb') as f:
+            shutil.copyfileobj(r.raw, f)
+    else:
+        with open(savepath, 'wb') as f:
+            dl = 0
+            total_length = int(total_length)
+            starttime = time.time()
+            if print_progress:
+                print("Downloading %s" % os.path.basename(savepath))
+            for data in r.iter_content(chunk_size=4096):
+                dl += len(data)
+                f.write(data)
+                if print_progress:
+                    done = int(50 * dl / total_length)
+                    progress("[%-50s] %.2f%%" %
+                             ('=' * done, float(dl / total_length * 100)))
+        if print_progress:
+            progress("[%-50s] %.2f%%" % ('=' * 50, 100), end=True)
+def _uncompress_file(filepath, extrapath, delete_file, print_progress):
+    if print_progress:
+        print("Uncompress %s" % os.path.basename(filepath))
+    if filepath.endswith("zip"):
+        handler = _uncompress_file_zip
+    else:
+        handler = _uncompress_file_tar
+    for total_num, index in handler(filepath, extrapath):
+        if print_progress:
+            done = int(50 * float(index) / total_num)
+            progress(
+                "[%-50s] %.2f%%" % ('=' * done, float(index / total_num * 100)))
+    if print_progress:
+        progress("[%-50s] %.2f%%" % ('=' * 50, 100), end=True)
+    if delete_file:
+        os.remove(filepath)
+def _uncompress_file_zip(filepath, extrapath):
+    files = zipfile.ZipFile(filepath, 'r')
+    filelist = files.namelist()
+    total_num = len(filelist)
+    for index, file in enumerate(filelist):
+        files.extract(file, extrapath)
+        yield total_num, index
+    files.close()
+    yield total_num, index
+def _uncompress_file_tar(filepath, extrapath):
+    files = tarfile.open(filepath, "r:gz")
+    filelist = files.getnames()
+    total_num = len(filelist)
+    for index, file in enumerate(filelist):
+        files.extract(file, extrapath)
+        yield total_num, index
+    files.close()
+    yield total_num, index
+def download_file_and_uncompress(url,
+                                 savepath=None,
+                                 extrapath=None,
+                                 print_progress=True,
+                                 cover=False,
+                                 delete_file=True):
+    if savepath is None:
+        savepath = "."
+    if extrapath is None:
+        extrapath = "."
+    savename = url.split("/")[-1]
+    savepath = os.path.join(savepath, savename)
+    extraname = ".".join(savename.split(".")[:-1])
+    extraname = os.path.join(extrapath, extraname)
+    if cover:
+        if os.path.exists(savepath):
+            shutil.rmtree(savepath)
+        if os.path.exists(extraname):
+            shutil.rmtree(extraname)
+    if not os.path.exists(extraname):
+        if not os.path.exists(savepath):
+            _download_file(url, savepath, print_progress)
+        _uncompress_file(savepath, extrapath, delete_file, print_progress)
+def _pdseg(command, flags, options, devices):
+    script = "{}{}{}.py".format(PDSEG_PATH, os.sep, command)
+    flags = " ".join(flags)
+    options = " ".join(options)
+    if is_windows():
+        set_cuda_command = "set CUDA_VISIBLE_DEVICES={}".format(
+            ",".join(devices))
+    else:
+        set_cuda_command = "export CUDA_VISIBLE_DEVICES={}".format(
+            ",".join(devices))
+    cmd = "{} && python {} {} {}".format(set_cuda_command, script, flags,
+                                         options)
+    print(cmd)
+    os.system(cmd)
+def train(flags, options, devices):
+    _pdseg("train", flags, options, devices)
+def eval(flags, options, devices):
+    _pdseg("eval", flags, options, devices)
+def vis(flags, options, devices):
+    _pdseg("vis", flags, options, devices)
+def export_model(flags, options, devices):
+    _pdseg("export_model", flags, options, devices)