add SemSegPaddle (#4192)

* add SemSegPaddle * update README.md

add SemSegPaddle (#4192)
* add SemSegPaddle * update README.md
eb60742f · Rosun · GitHub · 2b8f904e · eb60742f · eb60742f
54 changed file
--- a/PaddleCV/Research/SemSegPaddle/README.md
+++ b/PaddleCV/Research/SemSegPaddle/README.md
+# SemSegPaddle: A Paddle-based Framework for Deep Learning in Semantic Segmentation
+This is a Paddle implementation of semantic segmentation models on multiple datasets, including Cityscapes, Pascal Context, and ADE20K.
+## Updates
+- [**2020/01/08**] We release ***PSPNet-ResNet101*** and ***GloRe-ResNet101*** models on Pascal Context and Cityscapes datasets.
+## Highlights
+Synchronized Batch Normlization is important for segmenation.
+  - The implementation is easy to use as it is pure-python, no any C++ extra extension libs.
+  - Paddle provides sync_batch_norm.
+## Support models
+We split our models into backbone and decoder network, where backbone network are transfered from classification networks.
+Backbone:
+  - ResNet
+  - ResNeXt
+  - HRNet
+  - EfficientNet
+Decoder:
+  - PSPNet: [Pyramid Scene Parsing Network](http://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf)
+  - DeepLabv3: [Rethinking Atrous Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1706.05587)
+  - GloRe: [Graph-Based Global Reasoning Networks](http://openaccess.thecvf.com/content_CVPR_2019/papers/Chen_Graph-Based_Global_Reasoning_Networks_CVPR_2019_paper.pdf)
+  - GINet: [GINet: Graph Interaction Netowrk for Scene Parsing]()
+## Peformance
+ - Performance of Cityscapes validation set.
+**Method**  | **Backbone** | **lr**     | **BatchSize**  | **epoch**    | **mean IoU (Single-scale)** |  **Trained weights**   |
+------------|:------------:|:----------:|:--------------:|:------------:|:---------------------------:|------------------------|
+PSPNet      | resnet101    |     0.01   |        8       | 80           | 78.1                        |  [pspnet_resnet_cityscapes_epoch_80.pdparams](https://pan.baidu.com/s/1adfvtq2JnLKRv_j7lOmW1A)|
+GloRe      | resnet101    |     0.01   |        8       | 80           |  78.4                        |  [pspnet_resnet_pascalcontext_epoch_80.pdparams](https://pan.baidu.com/s/1r4SbrYKbVk38c0dXZLAi9w)              |
+ - Performance of Pascal-context validation set.
+**Method**  | **Backbone** | **lr**     | **BatchSize**  | **epoch**    | **mean IoU (Single-scale)** |  **Trained weights**   |
+------------|:------------:|:----------:|:--------------:|:------------:|:---------------------------:|:----------------------:|
+PSPNet       | resnet101    | 0.005       |   16            | 80           |   48.9                   |  [glore_resnet_cityscapes_epoch_80.pdparams](https://pan.baidu.com/s/1l7-sqt2DsUunD9l4YivgQw)                       |
+GloRe       | resnet101    | 0.005       |   16            | 80           |    48.4                   |  [glore_resnet_pascalcontext_epoch_80.pdparams](https://pan.baidu.com/s/1rVuk7OfSj-AXR3ZCFGNmKg)                |
+## Environment
+This repo is developed under the following configurations:
+ - Hardware: 4 GPUs for training, 1 GPU for testing
+ - Software: Centos 6.10, ***CUDA>=9.2 Python>=3.6, Paddle>=1.6***
+## Quick start: training and testing models
+### 1. Preparing data
+Download the [Cityscapes](https://www.cityscapes-dataset.com/) dataset. It should have this basic structure:
+      cityscapes/
+      ├── cityscapes_list
+      │   ├── test.lst
+      │   ├── train.lst
+      │   ├── train+.lst
+      │   ├── train++.lst
+      │   ├── trainval.lst
+      │   └── val.lst
+      ├── gtFine
+      │   ├── test
+      │   ├── train
+      │   └── val
+      ├── leftImg8bit
+      │   ├── test
+      │   ├── train
+      │   └── val
+      ├── license.txt
+      └── README
+ Download Pascal-Context dataset. It should have this basic structure:  
+      pascalContext/
+      ├── GroundTruth_trainval_mat
+      ├── GroundTruth_trainval_png
+      ├── JPEGImages
+      ├── pascal_context_train.txt
+      ├── pascal_context_val.txt
+      ├── README.md
+      └── VOCdevkit
+ Then, create symlinks for the Cityscapes and Pascal-Context datasets
+ ```
+ cd SemSegPaddle/data
+ ln -s $cityscapes ./
+ ln -s $pascalContext ./
+ ```
+### 2. Download pretrained weights
+  Downlaod pretrained [resnet-101](https://pan.baidu.com/s/1niXBDZnLlUIulB7FY068DQ) weights file, and put it into the directory: ***./pretrained_model***
+  Then, run the following command:
+```
+  tar -zxvf  ./repretrained/resnet101_v2.tgz -C pretrained_model 
+```
+### 3. Training
+select confiure file for training according to the DECODER\_NAME, BACKBONE\_NAME and DATASET\_NAME.
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch train.py  --use_gpu --use_mpio \
+                                  --cfg ./configs/pspnet_res101_cityscapes.yaml 
+```
+### 4. Testing 
+select confiure file for testing according to the DECODER\_NAME, BACKBONE\_NAME and DATASET\_NAME.
+Single-scale testing:
+```
+CUDA_VISIBLE_DEVICES=0 python  eval.py --use_gpu \
+                                       --use_mpio \
+                                       --cfg ./configs/pspnet_res101_cityscapes.yaml 
+```
+Multi-scale testing:
+```
+CUDA_VISIBLE_DEVICES=0 python  eval.py --use_gpu \
+                                       --use_mpio \
+                                       --multi_scales \
+                                       --cfg ./configs/pspnet_res101_cityscapes.yaml 
+```
+## Contact
+If you have any questions regarding the repo, please create an issue.
--- a/PaddleCV/Research/SemSegPaddle/configs/deeplabv3_res101_cityscapes.yaml
+++ b/PaddleCV/Research/SemSegPaddle/configs/deeplabv3_res101_cityscapes.yaml
+DATAAUG:
+    RAND_SCALE_MIN: 0.75
+    RAND_SCALE_MAX: 2.0
+    BASE_SIZE: 1024
+    CROP_SIZE: 769
+    EXTRA: True
+TRAIN_BATCH_SIZE_PER_GPU: 2
+NUM_TRAINERS: 4
+EVAL_BATCH_SIZE: 1
+DATASET:
+    DATASET_NAME: "cityscapes"
+    DATA_DIR: "./data/cityscapes/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 19
+    TEST_FILE_LIST: "./data/cityscapes/cityscapes_list/test.lst"
+    TRAIN_FILE_LIST: "./data/cityscapes/cityscapes_list/train.lst"
+    VAL_FILE_LIST: "./data/cityscapes/cityscapes_list/val.lst"
+    IGNORE_INDEX: 255
+    DATA_DIM: 3
+MODEL:
+    MODEL_NAME: "deeplabv3"
+    DEFAULT_NORM_TYPE: "bn"
+    MULTI_LOSS_WEIGHT: [1.0,0.4]
+    BACKBONE: "resnet"
+    BACKBONE_LAYERS: 101
+    BACKBONE_MULTI_GRID: True
+    DEEPLABv3:
+        DEPTH_MULTIPLIER: 1
+        ASPP_WITH_SEP_CONV: True
+        AuxHead: True
+TRAIN:
+    PRETRAINED_MODEL_DIR: "./pretrained_model/resnet101_v2/"
+    MODEL_SAVE_DIR: "./snapshots/deeplabv3_resnet_cityscapes/"
+    SNAPSHOT_EPOCH: 1
+TEST:
+    TEST_MODEL: "./snapshots/deeplabv3_resnet_cityscapes"
+    BASE_SIZE: 2048
+    CROP_SIZE: 769
+    SLIDE_WINDOW: True
+SOLVER:
+    LR: 0.01
+    LR_POLICY: "poly"
+    OPTIMIZER: "sgd"
+    NUM_EPOCHS: 80
+    LOSS: "['softmax_loss']"
--- a/PaddleCV/Research/SemSegPaddle/configs/deeplabv3_res101_pascalcontext.yaml
+++ b/PaddleCV/Research/SemSegPaddle/configs/deeplabv3_res101_pascalcontext.yaml
+DATAAUG:
+    RAND_SCALE_MIN: 0.5
+    RAND_SCALE_MAX: 2.0
+    BASE_SIZE: 520
+    CROP_SIZE: 520
+    EXTRA: True
+TRAIN_BATCH_SIZE_PER_GPU: 4
+NUM_TRAINERS: 4
+EVAL_BATCH_SIZE: 1
+DATASET:
+    DATASET_NAME: "pascalContext"
+    DATA_DIR: "./data/pascalContext/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 59
+    TEST_FILE_LIST: "./data/pascalContext/pascal_context_val.txt"
+    TRAIN_FILE_LIST: "./data/pascalContext/pascal_context_train.txt"
+    VAL_FILE_LIST: "./data/pascalContext/pascal_context_val.txt"
+    IGNORE_INDEX: -1
+    DATA_DIM: 3
+    SEPARATOR: ' '
+MODEL:
+    MODEL_NAME: "deeplabv3"
+    DEFAULT_NORM_TYPE: "bn"
+    MULTI_LOSS_WEIGHT: [1.0,0.4]
+    BACKBONE: "resnet"
+    BACKBONE_LAYERS: 101
+    BACKBONE_MULTI_GRID: True
+    DEEPLABv3:
+        DEPTH_MULTIPLIER: 1
+        ASPP_WITH_SEP_CONV: True
+        AuxHead: True
+TRAIN:
+    PRETRAINED_MODEL_DIR: "./pretrained_model/resnet101_v2/"
+    MODEL_SAVE_DIR: "./snapshots/deeplabv3_resnet_pascalcontext/"
+    SNAPSHOT_EPOCH: 1
+TEST:
+    TEST_MODEL: "./snapshots/deeplabv3_resnet_pascalcontext"
+    BASE_SIZE: 520
+    CROP_SIZE: 520
+    SLIDE_WINDOW: True
+SOLVER:
+    LR: 0.005
+    LR_POLICY: "poly"
+    OPTIMIZER: "sgd"
+    NUM_EPOCHS: 80
+    LOSS: "['softmax_loss']"
--- a/PaddleCV/Research/SemSegPaddle/configs/glore_res101_cityscapes.yaml
+++ b/PaddleCV/Research/SemSegPaddle/configs/glore_res101_cityscapes.yaml
+DATAAUG:
+    RAND_SCALE_MIN: 0.5
+    RAND_SCALE_MAX: 2.0
+    BASE_SIZE: 1024
+    CROP_SIZE: 769
+    EXTRA: True
+TRAIN_BATCH_SIZE_PER_GPU: 2
+NUM_TRAINERS: 4
+EVAL_BATCH_SIZE: 1
+DATASET:
+    DATASET_NAME: "cityscapes"
+    DATA_DIR: "./data/cityscapes/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 19
+    TEST_FILE_LIST: "./data/cityscapes/cityscapes_list/test.lst"
+    TRAIN_FILE_LIST: "./data/cityscapes/cityscapes_list/train.lst"
+    VAL_FILE_LIST: "./data/cityscapes/cityscapes_list/val.lst"
+    IGNORE_INDEX: 255
+    DATA_DIM: 3
+MODEL:
+    MODEL_NAME: "glore"
+    DEFAULT_NORM_TYPE: "bn"
+    MULTI_LOSS_WEIGHT: [1.0, 0.4]
+    BACKBONE: "resnet"
+    BACKBONE_LAYERS: 101
+    BACKBONE_MULTI_GRID: True
+    GLORE:
+        DEPTH_MULTIPLIER: 1
+        AuxHead: True
+TRAIN:
+    MODEL_SAVE_DIR: "snapshots/glore_res101_cityscapes/"
+    PRETRAINED_MODEL_DIR: "./pretrained_model/resnet101_v2/"
+    SNAPSHOT_EPOCH: 1
+TEST:
+    TEST_MODEL: "snapshots/glore_res101_cityscapes"
+    BASE_SIZE: 2048
+    CROP_SIZE: 769
+    SLIDE_WINDOW: True
+SOLVER:
+    LR: 0.01
+    LR_POLICY: "poly"
+    OPTIMIZER: "sgd"
+    NUM_EPOCHS: 80
+    LOSS: "['softmax_loss']"
--- a/PaddleCV/Research/SemSegPaddle/configs/glore_res101_pascalcontext.yaml
+++ b/PaddleCV/Research/SemSegPaddle/configs/glore_res101_pascalcontext.yaml
+DATAAUG:
+    RAND_SCALE_MIN: 0.5
+    RAND_SCALE_MAX: 2.0
+    BASE_SIZE: 520
+    CROP_SIZE: 520
+    EXTRA: True
+TRAIN_BATCH_SIZE_PER_GPU: 4
+NUM_TRAINERS: 4
+EVAL_BATCH_SIZE: 1
+DATASET:
+    DATASET_NAME: "pascalContext"
+    DATA_DIR: "./data/pascalContext/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 59
+    TEST_FILE_LIST: "./data/pascalContext/pascal_context_val.txt"
+    TRAIN_FILE_LIST: "./data/pascalContext/pascal_context_train.txt"
+    VAL_FILE_LIST: "./data/pascalContext/pascal_context_val.txt"
+    IGNORE_INDEX: -1
+    DATA_DIM: 3
+    SEPARATOR: ' '
+MODEL:
+    MODEL_NAME: "glore"
+    DEFAULT_NORM_TYPE: "bn"
+    MULTI_LOSS_WEIGHT: [1.0,0.4]
+    BACKBONE: "resnet"
+    BACKBONE_LAYERS: 101
+    BACKBONE_MULTI_GRID: True
+    GLORE:
+        DEPTH_MULTIPLIER: 1
+        AuxHead: True
+TEST:
+    TEST_MODEL: "snapshots/glore_res101_pascalContext"
+    BASE_SIZE: 520
+    CROP_SIZE: 520
+    SLIDE_WINDOW: True
+TRAIN:
+    MODEL_SAVE_DIR: "snapshots/glore_res101_pascalContext/"
+    PRETRAINED_MODEL_DIR: "./pretrained_model/resnet101_v2/"
+    SNAPSHOT_EPOCH: 1
+SOLVER:
+    LR: 0.005
+    LR_POLICY: "poly"
+    OPTIMIZER: "sgd"
+    NUM_EPOCHS: 80
+    LOSS: "['softmax_loss']"
--- a/PaddleCV/Research/SemSegPaddle/configs/pspnet_hrnet_cityscapes.yaml
+++ b/PaddleCV/Research/SemSegPaddle/configs/pspnet_hrnet_cityscapes.yaml
+DATAAUG:
+    RAND_SCALE_MIN: 0.75
+    RAND_SCALE_MAX: 2.0
+    BASE_SIZE: 2048
+    CROP_SIZE: 769
+    SLIDE_WINDOW: True
+TRAIN_BATCH_SIZE_PER_GPU: 2
+EVAL_BATCH_SIZE: 1
+NUM_TRAINERS: 4
+DATASET:
+    DATASET_NAME: "cityscapes"
+    DATA_DIR: "./data/cityscapes/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 19
+    TEST_FILE_LIST: "./data/cityscapes/cityscapes_list/test.lst"
+    TRAIN_FILE_LIST: "./data/cityscapes/cityscapes_list/train.lst"
+    VAL_FILE_LIST: "./data/cityscapes/cityscapes_list/val.lst"
+    IGNORE_INDEX: 255
+    DATA_DIM: 3
+MODEL:
+    MODEL_NAME: "pspnet"
+    DEFAULT_NORM_TYPE: "bn"
+    MULTI_LOSS_WEIGHT: [1.0,]
+    BACKBONE: "hrnet"
+    PSPNET:
+        DEPTH_MULTIPLIER: 1
+        AuxHead: False
+TRAIN:
+    MODEL_SAVE_DIR: "snapshots/pspnet_hrnet_cityscapes/"
+    PRETRAINED_MODEL_DIR: "./pretrained_model/HRNet_W40_C_pretrained/"
+    SNAPSHOT_EPOCH: 1
+TEST:
+    TEST_MODEL: "snapshots/pspnet_hrnet_cityscapes"
+    BASE_SIZE: 2048
+    CROP_SIZE: 769
+    SLIDE_WINDOW: True
+SOLVER:
+    LR: 0.001
+    LR_POLICY: "poly"
+    OPTIMIZER: "sgd"
+    NUM_EPOCHS: 240
+    LOSS: "['softmax_loss']"
--- a/PaddleCV/Research/SemSegPaddle/configs/pspnet_res101_ade.yaml
+++ b/PaddleCV/Research/SemSegPaddle/configs/pspnet_res101_ade.yaml
+DATAAUG:
+    RAND_SCALE_MIN: 0.5
+    RAND_SCALE_MAX: 2.0
+    BASE_SIZE: 520
+    CROP_SIZE: 520
+    EXTRA: True
+TRAIN_BATCH_SIZE_PER_GPU: 2
+NUM_TRAINERS: 4
+EVAL_BATCH_SIZE: 1
+DATASET:
+    DATASET_NAME: "ade"
+    DATA_DIR: "./data/ade/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 150
+    TEST_FILE_LIST: "./data/ade/ade_val.lst"
+    TRAIN_FILE_LIST: "./data/ade/ade_train.lst"
+    VAL_FILE_LIST: "./data/ade/ade_val.lst"
+    IGNORE_INDEX: -1
+    DATA_DIM: 3
+MODEL:
+    MODEL_NAME: "pspnet"
+    DEFAULT_NORM_TYPE: "bn"
+    MULTI_LOSS_WEIGHT: [1.0, 0.4]
+    BACKBONE: "resnet"
+    BACKBONE_LAYERS: 101
+    BACKBONE_MULTI_GRID: True
+    PSPNET:
+        DEPTH_MULTIPLIER: 1
+        AuxHead: True
+TEST:
+    TEST_MODEL: "snapshots/pspnet_res101_ade/"
+    BASE_SIZE: 520
+    CROP_SIZE: 520
+    SLIDE_WINDOW: True
+TRAIN:
+    MODEL_SAVE_DIR: "snapshots/pspnet_res101_ade/"
+    PRETRAINED_MODEL_DIR: "./pretrained_model/resnet101_v2/"
+    SNAPSHOT_EPOCH: 10
+SOLVER:
+    LR: 0.01
+    LR_POLICY: "poly"
+    OPTIMIZER: "sgd"
+    NUM_EPOCHS: 120
+    LOSS: "['softmax_loss']"
--- a/PaddleCV/Research/SemSegPaddle/configs/pspnet_res101_cityscapes.yaml
+++ b/PaddleCV/Research/SemSegPaddle/configs/pspnet_res101_cityscapes.yaml
+DATAAUG:
+    RAND_SCALE_MIN: 0.5
+    RAND_SCALE_MAX: 2.0
+    BASE_SIZE: 1024
+    CROP_SIZE: 769
+    EXTRA: True
+TRAIN_BATCH_SIZE_PER_GPU: 2
+NUM_TRAINERS: 4
+EVAL_BATCH_SIZE: 1
+DATASET:
+    DATASET_NAME: "cityscapes"
+    DATA_DIR: "./data/cityscapes/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 19
+    TEST_FILE_LIST: "./data/cityscapes/cityscapes_list/test.lst"
+    TRAIN_FILE_LIST: "./data/cityscapes/cityscapes_list/train.lst"
+    VAL_FILE_LIST: "./data/cityscapes/cityscapes_list/val.lst"
+    IGNORE_INDEX: 255
+    DATA_DIM: 3
+MODEL:
+    MODEL_NAME: "pspnet"
+    DEFAULT_NORM_TYPE: "bn"
+    MULTI_LOSS_WEIGHT: [1.0, 0.4]
+    BACKBONE: "resnet"
+    BACKBONE_LAYERS: 101
+    BACKBONE_MULTI_GRID: True
+    PSPNET:
+        DEPTH_MULTIPLIER: 1
+        AuxHead: True
+TRAIN:
+    MODEL_SAVE_DIR: "snapshots/pspnet_res101_cityscapes/"
+    PRETRAINED_MODEL_DIR: "./pretrained_model/resnet101_v2/"
+    SNAPSHOT_EPOCH: 1
+TEST:
+    TEST_MODEL: "snapshots/pspnet_res101_cityscapes"
+    BASE_SIZE: 2048
+    CROP_SIZE: 769
+    SLIDE_WINDOW: True
+SOLVER:
+    LR: 0.01
+    LR_POLICY: "poly"
+    OPTIMIZER: "sgd"
+    NUM_EPOCHS: 80
+    LOSS: "['softmax_loss']"
--- a/PaddleCV/Research/SemSegPaddle/configs/pspnet_res101_pascalcontext.yaml
+++ b/PaddleCV/Research/SemSegPaddle/configs/pspnet_res101_pascalcontext.yaml
+DATAAUG:
+    RAND_SCALE_MIN: 0.5
+    RAND_SCALE_MAX: 2.0
+    BASE_SIZE: 520
+    CROP_SIZE: 520
+    EXTRA: True
+TRAIN_BATCH_SIZE_PER_GPU: 4
+NUM_TRAINERS: 4
+EVAL_BATCH_SIZE: 1
+DATASET:
+    DATASET_NAME: "pascalContext"
+    DATA_DIR: "./data/pascalContext/"
+    IMAGE_TYPE: "rgb"  # choice rgb or rgba
+    NUM_CLASSES: 59
+    TEST_FILE_LIST: "./data/pascalContext/pascal_context_val.txt"
+    TRAIN_FILE_LIST: "./data/pascalContext/pascal_context_train.txt"
+    VAL_FILE_LIST: "./data/pascalContext/pascal_context_val.txt"
+    IGNORE_INDEX: -1
+    DATA_DIM: 3
+    SEPARATOR: ' '
+MODEL:
+    MODEL_NAME: "pspnet"
+    DEFAULT_NORM_TYPE: "bn"
+    MULTI_LOSS_WEIGHT: [1.0,0.4]
+    BACKBONE: "resnet"
+    BACKBONE_LAYERS: 101
+    BACKBONE_MULTI_GRID: True
+    PSPNET:
+        DEPTH_MULTIPLIER: 1
+        AuxHead: True
+TEST:
+    TEST_MODEL: "snapshots/pspnet_res101_pascalContext"
+    BASE_SIZE: 520
+    CROP_SIZE: 520
+    SLIDE_WINDOW: True
+TRAIN:
+    MODEL_SAVE_DIR: "snapshots/pspnet_res101_pascalContext/"
+    PRETRAINED_MODEL_DIR: "./pretrained_model/resnet101_v2/"
+    SNAPSHOT_EPOCH: 1
+SOLVER:
+    LR: 0.005
+    LR_POLICY: "poly"
+    OPTIMIZER: "sgd"
+    NUM_EPOCHS: 80
+    LOSS: "['softmax_loss']"
--- a/PaddleCV/Research/SemSegPaddle/data/note.txt
+++ b/PaddleCV/Research/SemSegPaddle/data/note.txt
+please create symlinks for datasets
--- a/PaddleCV/Research/SemSegPaddle/eval.py
+++ b/PaddleCV/Research/SemSegPaddle/eval.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+# GPU memory garbage collection optimization flags
+os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
+import sys
+import time
+import argparse
+import functools
+import pprint
+import cv2
+import numpy as np
+import paddle
+import paddle.fluid as fluid
+import math
+from src.utils.config import cfg
+from src.utils.timer import Timer, calculate_eta
+from src.models.model_builder import build_model
+from src.models.model_builder import ModelPhase
+from src.datasets import build_dataset
+from src.utils.metrics import ConfusionMatrix
+def parse_args():
+    parser = argparse.ArgumentParser(description='SemsegPaddle')
+    parser.add_argument(
+        '--cfg',
+        dest='cfg_file',
+        help='Config file for training (and optionally testing)',
+        default=None,
+        type=str)
+    parser.add_argument(
+        '--use_gpu',
+        dest='use_gpu',
+        help='Use gpu or cpu',
+        action='store_true',
+        default=False)
+    parser.add_argument(
+        '--use_mpio',
+        dest='use_mpio',
+        help='Use multiprocess IO or not',
+        action='store_true',
+        default=False)
+    parser.add_argument(
+        'opts',
+        help='See utils/config.py for all options',
+        default=None,
+        nargs=argparse.REMAINDER)
+    parser.add_argument(
+            '--multi_scales',
+            dest='multi_scales',
+            help='Use multi_scales for eval',
+            action='store_true',
+            default=False)
+    parser.add_argument(
+            '--flip',
+            dest='flip',
+            help='flip the image or not',
+            action='store_true',
+            default=False)
+    if len(sys.argv) == 1:
+        parser.print_help()
+        sys.exit(1)
+    return parser.parse_args()
+def evaluate(cfg, ckpt_dir=None, use_gpu=False, use_mpio=False, multi_scales=False, flip=False,  **kwargs):
+    np.set_printoptions(precision=5, suppress=True)
+    num_classes = cfg.DATASET.NUM_CLASSES
+    base_size = cfg.TEST.BASE_SIZE
+    crop_size = cfg.TEST.CROP_SIZE
+    startup_prog = fluid.Program()
+    test_prog = fluid.Program()
+    dataset = build_dataset(cfg.DATASET.DATASET_NAME,
+        file_list=cfg.DATASET.VAL_FILE_LIST,
+        mode=ModelPhase.EVAL,
+        data_dir=cfg.DATASET.DATA_DIR)
+    def data_generator():
+        #TODO: check is batch reader compatitable with Windows
+        if use_mpio:
+            data_gen = dataset.multiprocess_generator(
+                num_processes=cfg.DATALOADER.NUM_WORKERS,
+                max_queue_size=cfg.DATALOADER.BUF_SIZE)
+        else:
+            data_gen = dataset.generator()
+        for b in data_gen:
+            yield b[0], b[1], b[2]
+    py_reader, avg_loss, out, grts, masks = build_model(
+        test_prog, startup_prog, phase=ModelPhase.EVAL)
+    py_reader.decorate_sample_generator(
+        data_generator, drop_last=False, batch_size=cfg.EVAL_BATCH_SIZE, places=fluid.cuda_places())
+    # Get device environment
+    places = fluid.cuda_places() if use_gpu else fluid.cpu_places()
+    place = places[0]
+    dev_count = len(places)
+    print("#Device count: {}".format(dev_count))
+    exe = fluid.Executor(place)
+    exe.run(startup_prog)
+    test_prog = test_prog.clone(for_test=True)
+    ckpt_dir = cfg.TEST.TEST_MODEL if not ckpt_dir else ckpt_dir
+    if ckpt_dir is not None:
+        filename= '{}_{}_{}_epoch_{}.pdparams'.format(str(cfg.MODEL.MODEL_NAME),
+                                                  str(cfg.MODEL.BACKBONE), str(cfg.DATASET.DATASET_NAME), cfg.SOLVER.NUM_EPOCHS)
+        print("loading testing model file: {}/{}".format(ckpt_dir, filename))
+        fluid.io.load_params(exe, ckpt_dir, main_program=test_prog, filename=filename)
+    # Use streaming confusion matrix to calculate mean_iou
+    np.set_printoptions(
+        precision=4, suppress=True, linewidth=160, floatmode="fixed")
+    conf_mat = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
+    #fetch_list: return of the model
+    fetch_list = [avg_loss.name, out.name]
+    num_images = 0
+    step = 0
+    all_step = cfg.DATASET.VAL_TOTAL_IMAGES // cfg.EVAL_BATCH_SIZE 
+    timer = Timer()
+    timer.start()
+    for data in py_reader():
+        mask = np.array(data[0]['mask'])
+        label = np.array(data[0]['label'])
+        image_org = np.array(data[0]['image'])
+        image = np.transpose(image_org, (0, 2, 3, 1)) # BCHW->BHWC
+        image = np.squeeze(image)
+        if cfg.TEST.SLIDE_WINDOW:
+            if not multi_scales:
+                scales = [1.0]
+            else:
+                scales = [0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0, 2.25] if cfg.DATASET.DATASET_NAME == 'cityscapes' else [0.5, 0.75, 1.0, 1.25, 1.5, 1.75]
+                #scales = [0.75, 1.0, 1.25] # fast multi-scale testing
+            #strides
+            stride = int(crop_size *1.0 / 3)  # 1/3 > 2/3 > 1/2 for input_size: 769 x 769
+            h, w = image.shape[0:2]
+            scores = np.zeros(shape=[num_classes, h, w], dtype='float32')
+            for scale in scales:
+                long_size = int(math.ceil(base_size * scale))
+                if h > w:
+                    height = long_size
+                    width = int(1.0 * w * long_size / h + 0.5)
+                    short_size = width
+                else:
+                    width = long_size
+                    height = int(1.0 * h * long_size / w + 0.5)
+                    short_size = height
+                # print('org_img_size: {}x{}, rescale_img_size: {}x{}'.format(h, w, height, width))
+                cur_img = image_resize(image, height, width)
+                # pading
+                if long_size <= crop_size:
+                    pad_img = pad_single_image(cur_img, crop_size)
+                    label_feed, mask_feed = get_feed(pad_img)
+                    pad_img = mapper_image(pad_img)
+                    loss, pred1 = exe.run(
+                            test_prog, 
+                            feed={'image':pad_img, 'label':label_feed, 'mask':mask_feed}, 
+                            fetch_list = fetch_list,
+                            return_numpy=True)
+                    pred1 = np.array(pred1)
+                    outputs = pred1[:, :, :height, :width]
+                    if flip:
+                        pad_img_flip = flip_left_right_image(cur_img)
+                        pad_img_flip = pad_single_image(pad_img_flip, crop_size)
+                        label_feed, mask_feed = get_feed(pad_img_flip)
+                        pad_img_flip = mapper_image(pad_img_flip)
+                        loss, pred1 = exe.run(
+                                test_prog,
+                                feed={'image':pad_img_flip, 'label':label_feed, 'mask':mask_feed},
+                                fetch_list = fetch_list,
+                                return_numpy=True)
+                        pred1 = np.flip(pred1, 3)
+                        outputs += pred1[:, :, :height, :width]
+                else:
+                    if short_size < crop_size:
+                        pad_img = pad_single_image(cur_img, crop_size)
+                    else:
+                        pad_img = cur_img
+                    ph, pw = pad_img.shape[0:2]
+                    #slid window
+                    h_grids = int(math.ceil(1.0 * (ph - crop_size) / stride)) + 1
+                    w_grids = int(math.ceil(1.0 * (pw - crop_size) / stride)) + 1
+                    outputs = np.zeros(shape=[1, num_classes, ph, pw], dtype='float32')
+                    count_norm = np.zeros(shape=[1, 1, ph, pw], dtype='int32')
+                    for idh in range(h_grids):
+                        for idw in range(w_grids):
+                            h0 = idh * stride
+                            w0 = idw * stride
+                            h1 = min(h0 + crop_size, ph)
+                            w1 = min(w0 + crop_size, pw)
+                            #print('(h0,w0,h1,w1):({},{},{},{})'.format(h0, w0, h1, w1))
+                            crop_img = crop_image(pad_img, h0, w0, h1, w1)
+                            pad_crop_img = pad_single_image(crop_img, crop_size)
+                            label_feed, mask_feed = get_feed(pad_crop_img)
+                            pad_crop_img = mapper_image(pad_crop_img)
+                            loss, pred1 = exe.run(
+                                    test_prog, 
+                                    feed={'image':pad_crop_img, 'label':label_feed, 'mask':mask_feed},
+                                    fetch_list = fetch_list,
+                                    return_numpy=True)
+                            pred1 = np.array(pred1)
+                            outputs[:, :, h0:h1, w0:w1] += pred1[:, :, 0:h1-h0, 0:w1-w0]
+                            count_norm[:, :, h0:h1, w0:w1] += 1
+                            if flip:
+                                pad_img_flip = flip_left_right_image(crop_img)
+                                pad_img_flip = pad_single_image(pad_img_flip, crop_size)
+                                label_feed, mask_feed = get_feed(pad_img_flip)
+                                pad_img_flip = mapper_image(pad_img_flip)
+                                loss, pred1 = exe.run(
+                                        test_prog,
+                                        feed={'image':pad_img_flip, 'label':label_feed, 'mask':mask_feed},
+                                        fetch_list = fetch_list,
+                                        return_numpy = True)
+                                pred1 = np.flip(pred1, 3)
+                                outputs[:, :, h0:h1, w0:w1] += pred1[:, :, 0:h1-h0, 0:w1-w0]
+                                count_norm[:, :, h0:h1, w0:w1] += 1
+                    outputs = 1.0 * outputs / count_norm
+                    outputs = outputs[:, :, :height, :width]
+                with fluid.dygraph.guard():
+                    outputs = fluid.dygraph.to_variable(outputs)
+                    outputs = fluid.layers.resize_bilinear(outputs, out_shape=[h, w])
+                    score = outputs.numpy()[0]
+                    scores += score
+        else: 
+            # taking the original image as the model input     
+            loss, pred = exe.run(
+                    test_prog,
+                    feed={'image':image_org, 'label':label, 'mask':mask},
+                    fetch_list = fetch_list,
+                    return_numpy = True)
+            scores = pred[0]
+        # computing IoU with all scale result
+        pred = np.argmax(scores, axis=0).astype('int64')
+        pred = pred[np.newaxis, :, :, np.newaxis]
+        step += 1
+        num_images += pred.shape[0]
+        conf_mat.calculate(pred, label, mask)
+        _, iou = conf_mat.mean_iou()
+        _, acc = conf_mat.accuracy()
+        print("[EVAL] step={}/{} acc={:.4f} IoU={:.4f}".format(step, all_step, acc, iou))
+    category_iou, avg_iou = conf_mat.mean_iou()
+    category_acc, avg_acc = conf_mat.accuracy()
+    print("[EVAL] #image={} acc={:.4f} IoU={:.4f}".format(num_images, avg_acc, avg_iou))
+    print("[EVAL] Category IoU:", category_iou)
+    print("[EVAL] Category Acc:", category_acc)
+    print("[EVAL] Kappa:{:.4f}".format(conf_mat.kappa()))
+    print("flip = ", flip)
+    print("scales = ", scales)
+    return category_iou, avg_iou, category_acc, avg_acc
+def image_resize(image, height, width):
+    if image.shape[0] == 3:
+        image = np.transpose(image, (1, 2, 0))
+    image = cv2.resize(image, (width, height), interpolation=cv2.INTER_LINEAR)
+    return image
+def pad_single_image(image, crop_size):
+    h, w  = image.shape[0:2]
+    pad_h = crop_size - h if h < crop_size else 0
+    pad_w = crop_size - w if w < crop_size else 0
+    image = cv2.copyMakeBorder(image, 0, pad_h, 0, pad_w, cv2.BORDER_CONSTANT,value=0)
+    return image
+def mapper_image(image):
+    # HxWx3 -> 3xHxW -> 1x3xHxW
+    image_array = np.transpose(image, (2, 0, 1))
+    image_array = image_array.astype('float32')
+    image_array = image_array[np.newaxis, :]
+    return image_array
+def flip_left_right_image(image):
+    return cv2.flip(image, 1)
+def get_feed(image):
+    h, w = image.shape[0:2]
+    return np.zeros([1, 1, h, w], dtype='int32'), np.zeros([1, 1, h, w], dtype='int32')
+def crop_image(image, h0, w0, h1, w1):
+    return image[h0:h1, w0:w1, :]
+def main():
+    args = parse_args()
+    if args.cfg_file is not None:
+        cfg.update_from_file(args.cfg_file)
+    if args.opts:
+        cfg.update_from_list(args.opts)
+    cfg.check_and_infer()
+    print(pprint.pformat(cfg))
+    evaluate(cfg, **args.__dict__)
+if __name__ == '__main__':
+    main()
--- a/PaddleCV/Research/SemSegPaddle/expes/deeplabv3_res101_cityscapes.sh
+++ b/PaddleCV/Research/SemSegPaddle/expes/deeplabv3_res101_cityscapes.sh
+#!/bin/bash
+# Deeplabv3_Res101_Cityscapes
+# 1.1 Training
+CUDA_VISIBLE_DEVICES=0,1,2,3  python train.py  --use_gpu \
+                                              --use_mpio \
+                                              --cfg ./configs/deeplabv3_res101_cityscapes.yaml | tee -a train.log 2>&1
+# 1.2 single-scale testing
+CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
+                                      --cfg ./configs/deeplabv3_res101_cityscapes.yaml
+# 1.3 multi-scale testing
+CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
+                                      --multi_scales \
+                                      --cfg ./configs/deeplabv3_res101_cityscapes.yaml
--- a/PaddleCV/Research/SemSegPaddle/expes/deeplabv3_res101_pascalcontext.sh
+++ b/PaddleCV/Research/SemSegPaddle/expes/deeplabv3_res101_pascalcontext.sh
+#!/bin/bash
+# Deeplabv3_Res101_PascalContext
+# 1.1 Training
+CUDA_VISIBLE_DEVICES=0,1,2,3  python train.py  --use_gpu \
+                                              --use_mpio \
+                                              --cfg ./configs/deeplabv3_res101_pascalcontext.yaml | tee -a train.log 2>&1
+# 1.2 single-scale testing
+CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
+                                      --cfg ./configs/deeplabv3_res101_pascalcontext.yaml
+# 1.3 multi-scale testing
+CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
+                                      --multi_scales \
+                                      --cfg ./configs/deeplabv3_res101_pascalcontext.yaml
--- a/PaddleCV/Research/SemSegPaddle/expes/glore_res101_cityscapes.sh
+++ b/PaddleCV/Research/SemSegPaddle/expes/glore_res101_cityscapes.sh
+#!/bin/bash
+# GloRe_Res101_Cityscapes
+# 1.1 Training
+CUDA_VISIBLE_DEVICES=0,1,2,3  python train.py  --use_gpu \
+                                              --use_mpio \
+                                              --cfg ./configs/glore_res101_cityscapes.yaml | tee -a train.log 2>&1
+# 1.2 single-scale testing
+CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
+                                      --cfg ./configs/glore_res101_cityscapes.yaml
+# 1.3 multi-scale testing
+CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
+                                      --multi_scales \
+                                      --cfg ./configs/glore_res101_cityscapes.yaml
--- a/PaddleCV/Research/SemSegPaddle/expes/glore_res101_pascalcontext.sh
+++ b/PaddleCV/Research/SemSegPaddle/expes/glore_res101_pascalcontext.sh
+#!/bin/bash
+# GloRe_Res101_PascalContext
+:<<!
+# 1.1 Training
+CUDA_VISIBLE_DEVICES=0,1,2,3  python train.py  --use_gpu \
+                                              --use_mpio \
+                                              --cfg ./configs/glore_res101_pascalcontext.yaml | tee -a train.log 2>&1
+!
+# 1.2 single-scale testing
+CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
+                                      --cfg ./configs/glore_res101_pascalcontext.yaml
+:<<!
+# 1.3 multi-scale testing
+CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
+                                      --multi_scales \
+                                      --cfg ./configs/glore_res101_pascalcontext.yaml
+!
--- a/PaddleCV/Research/SemSegPaddle/expes/pspnet_res101_cityscapes.sh
+++ b/PaddleCV/Research/SemSegPaddle/expes/pspnet_res101_cityscapes.sh
+#!/bin/bash
+#PSPNet_Res101_Cityscapes
+# 1.1 training 
+CUDA_VISIBLE_DEVICES=0,1,2,3  python train.py  --use_gpu \
+                                               --cfg ./configs/pspnet_res101_cityscapes.yaml | tee -a train.log 2>&1
+# 1.2 single-scale testing
+CUDA_VISIBLE_DEVICES=0 python  eval.py --use_gpu \
+                                       --cfg ./configs/pspnet_res101_cityscapes.yaml
+# 1.3 multi-scale testing
+CUDA_VISIBLE_DEVICES=0 python  eval.py --use_gpu \
+                                       --multi_scales \
+                                       --cfg ./configs/pspnet_res101_cityscapes.yaml
--- a/PaddleCV/Research/SemSegPaddle/expes/pspnet_res101_pascalcontext.sh
+++ b/PaddleCV/Research/SemSegPaddle/expes/pspnet_res101_pascalcontext.sh
+#!/bin/bash
+#PSPNet_Res101_PascalContext
+# 1.1 training 
+CUDA_VISIBLE_DEVICES=0,1,2,3  python train.py  --use_gpu \
+                                               --cfg ./configs/pspnet_res101_pascalcontext.yaml | tee -a train.log 2>&1
+# 1.2 single-scale testing
+CUDA_VISIBLE_DEVICES=0 python  eval.py --use_gpu \
+                                       --cfg ./configs/pspnet_res101_pascalcontext.yaml
+# 1.3 multi-scale testing
+CUDA_VISIBLE_DEVICES=0 python  eval.py --use_gpu \
+                                       --multi_scales \
+                                       --cfg ./configs/pspnet_res101_pascalcontext.yaml
--- a/PaddleCV/Research/SemSegPaddle/pretrained_model/note.txt
+++ b/PaddleCV/Research/SemSegPaddle/pretrained_model/note.txt
+please put the pretrained weights of backbone here
--- a/PaddleCV/Research/SemSegPaddle/snapshots/note.txt
+++ b/PaddleCV/Research/SemSegPaddle/snapshots/note.txt
+please put the trained model here
--- a/PaddleCV/Research/SemSegPaddle/src/__init__.py
+++ b/PaddleCV/Research/SemSegPaddle/src/__init__.py
+from . import datasets, models, utils
--- a/PaddleCV/Research/SemSegPaddle/src/datasets/__init__.py
+++ b/PaddleCV/Research/SemSegPaddle/src/datasets/__init__.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from .cityscapes import CityscapesSeg
+from .pascal_context import PascalContextSeg
+from .ade import AdeSeg
+datasets ={
+    'cityscapes':    CityscapesSeg,
+    'pascalcontext': PascalContextSeg,
+    'adechallengedata2016': AdeSeg,
+}
+def build_dataset(name, **kwargs):
+    return datasets[name.lower()](**kwargs)
--- a/PaddleCV/Research/SemSegPaddle/src/datasets/ade.py
+++ b/PaddleCV/Research/SemSegPaddle/src/datasets/ade.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import print_function
+import sys
+import os
+import math
+import random
+import functools
+import io
+import time
+import codecs
+import numpy as np
+import paddle
+import paddle.fluid as fluid
+import cv2
+from PIL import Image
+import copy
+from src.utils.config import cfg
+from src.models.model_builder import ModelPhase
+from .baseseg import BaseSeg
+class AdeSeg(BaseSeg):
+    def __init__(self,
+                 file_list,
+                 data_dir,
+                 shuffle=False,
+                 mode=ModelPhase.TRAIN, base_size=520, crop_size=520, rand_scale=True):
+        super(AdeSeg, self).__init__(file_list, data_dir, shuffle, mode, base_size, crop_size, rand_scale)
+    def _mask_transform(self, mask):
+        target = np.array(mask).astype('int32') - 1
+        return target
+    def load_image(self, line, src_dir, mode=ModelPhase.TRAIN):
+        # original image cv2.imread flag setting
+        cv2_imread_flag = cv2.IMREAD_COLOR
+        if cfg.DATASET.IMAGE_TYPE == "rgba":
+            # If use RBGA 4 channel ImageType, use IMREAD_UNCHANGED flags to
+            # reserver alpha channel
+            cv2_imread_flag = cv2.IMREAD_UNCHANGED
+        #print("line: ", line)
+        parts = line.strip().split(cfg.DATASET.SEPARATOR)
+        if len(parts) != 2:
+            if mode == ModelPhase.TRAIN or mode == ModelPhase.EVAL:
+                raise Exception("File list format incorrect! It should be"
+                                " image_name{}label_name\\n".format(
+                                    cfg.DATASET.SEPARATOR))
+            img_name, grt_name = parts[0], None
+        else:
+            img_name, grt_name = parts[0], parts[1]
+        img_path = os.path.join(src_dir, img_name)
+        img = self.cv2_imread(img_path, cv2_imread_flag)
+        if grt_name is not None:
+            grt_path = os.path.join(src_dir, grt_name)
+            grt = self.pil_imread(grt_path)
+        else:
+            grt = None
+        if img is None:
+            raise Exception(
+                "Empty image, src_dir: {}, img: {} & lab: {}".format(
+                    src_dir, img_path, grt_path))
+        img_height = img.shape[0]
+        img_width = img.shape[1]
+        #print('img.shape',img.shape)
+        if grt is not None:
+            grt_height = grt.shape[0]
+            grt_width = grt.shape[1]
+            if img_height != grt_height or img_width != grt_width:
+                raise Exception(
+                    "source img and label img must has the same size")
+        else:
+            if mode == ModelPhase.TRAIN or mode == ModelPhase.EVAL:
+                raise Exception(
+                    "Empty image, src_dir: {}, img: {} & lab: {}".format(
+                        src_dir, img_path, grt_path))
+        if len(img.shape) < 3:
+            img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
+        grt = self._mask_transform(grt)
+        return img, grt, img_name, grt_name
--- a/PaddleCV/Research/SemSegPaddle/src/datasets/baseseg.py
+++ b/PaddleCV/Research/SemSegPaddle/src/datasets/baseseg.py
+from __future__ import print_function
+import sys
+import os
+import math
+import random
+import functools
+import io
+import time
+import codecs
+import numpy as np
+import paddle
+import paddle.fluid as fluid
+import cv2
+import copy
+from PIL import Image, ImageOps, ImageFilter, ImageEnhance
+from src.models.model_builder import ModelPhase
+from src.utils.config import cfg
+from .data_utils import GeneratorEnqueuer
+class BaseSeg(object):
+    def __init__(self, file_list, data_dir, shuffle=False, mode=ModelPhase.TRAIN, base_size=1024, crop_size=769, rand_scale=True):
+        self.mode = mode
+        self.shuffle = shuffle
+        self.data_dir = data_dir
+        self.shuffle_seed = 0
+        self.crop_size = crop_size  
+        self.base_size = base_size  # short edge when training
+        self.rand_scale = rand_scale
+        # NOTE: Please ensure file list was save in UTF-8 coding format
+        with codecs.open(file_list, 'r', 'utf-8') as flist:
+            self.lines = [line.strip() for line in flist]
+            self.all_lines = copy.deepcopy(self.lines)
+            if shuffle and cfg.NUM_TRAINERS > 1:
+                np.random.RandomState(self.shuffle_seed).shuffle(self.all_lines)
+            elif shuffle:
+                np.random.shuffle(self.lines)
+        self.num_trainers= cfg.NUM_TRAINERS
+        self.trainer_id=cfg.TRAINER_ID
+    def generator(self):
+        if self.shuffle and cfg.NUM_TRAINERS > 1:
+            np.random.RandomState(self.shuffle_seed).shuffle(self.all_lines)
+            num_lines = len(self.all_lines) // cfg.NUM_TRAINERS
+            self.lines = self.all_lines[num_lines * cfg.TRAINER_ID: num_lines * (cfg.TRAINER_ID + 1)]
+            self.shuffle_seed += 1
+        elif self.shuffle:
+            np.random.shuffle(self.lines)
+        for line in self.lines:
+            yield self.process_image(line, self.data_dir, self.mode)
+    def sharding_generator(self, pid=0, num_processes=1):
+        """
+        Use line id as shard key for multiprocess io
+        It's a normal generator if pid=0, num_processes=1
+        """
+        for index, line in enumerate(self.lines):
+            # Use index and pid to shard file list
+                if index % num_processes == pid:
+                    yield self.process_image(line, self.data_dir, self.mode)
+    def batch_reader(self, batch_size):
+        br = self.batch(self.reader, batch_size)
+        for batch in br:
+            yield batch[0], batch[1], batch[2]
+    def multiprocess_generator(self, max_queue_size=32, num_processes=8):
+        # Re-shuffle file list
+        if self.shuffle and cfg.NUM_TRAINERS > 1:
+            np.random.RandomState(self.shuffle_seed).shuffle(self.all_lines)
+            num_lines = len(self.all_lines) // self.num_trainers
+            self.lines = self.all_lines[num_lines * self.trainer_id: num_lines * (self.trainer_id + 1)]
+            self.shuffle_seed += 1
+        elif self.shuffle:
+            np.random.shuffle(self.lines)
+        # Create multiple sharding generators according to num_processes for multiple processes
+        generators = []
+        for pid in range(num_processes):
+            generators.append(self.sharding_generator(pid, num_processes))
+        try:
+            enqueuer = GeneratorEnqueuer(generators)
+            enqueuer.start(max_queue_size=max_queue_size, workers=num_processes)
+            while True:
+                generator_out = None
+                while enqueuer.is_running():
+                    if not enqueuer.queue.empty():
+                        generator_out = enqueuer.queue.get(timeout=5)
+                        break
+                    else:
+                        time.sleep(0.01)
+                if generator_out is None:
+                    break
+                yield generator_out
+        finally:
+            if enqueuer is not None:
+                enqueuer.stop()
+    def batch(self, reader, batch_size, is_test=False, drop_last=False):
+        def batch_reader(is_test=False, drop_last=drop_last):
+            if is_test:
+                imgs, grts, img_names, valid_shapes, org_shapes = [], [], [], [], []
+                for img, grt, img_name, valid_shape, org_shape in reader():
+                    imgs.append(img)
+                    grts.append(grt)
+                    img_names.append(img_name)
+                    valid_shapes.append(valid_shape)
+                    org_shapes.append(org_shape)
+                    if len(imgs) == batch_size:
+                        yield np.array(imgs), np.array(
+                            grts), img_names, np.array(valid_shapes), np.array(
+                                org_shapes)
+                        imgs, grts, img_names, valid_shapes, org_shapes = [], [], [], [], []
+                if not drop_last and len(imgs) > 0:
+                    yield np.array(imgs), np.array(grts), img_names, np.array(
+                        valid_shapes), np.array(org_shapes)
+            else:
+                imgs, labs, ignore = [], [], []
+                bs = 0
+                for img, lab, ig in reader():
+                    imgs.append(img)
+                    labs.append(lab)
+                    ignore.append(ig)
+                    bs += 1
+                    if bs == batch_size:
+                        yield np.array(imgs), np.array(labs), np.array(ignore)
+                        bs = 0
+                        imgs, labs, ignore = [], [], []
+                if not drop_last and bs > 0:
+                    yield np.array(imgs), np.array(labs), np.array(ignore)
+        return batch_reader(is_test, drop_last)
+    def load_image(self, line, src_dir, mode=ModelPhase.TRAIN):
+        raise NotImplemented
+    def pil_imread(self, file_path):
+        """read pseudo-color label"""
+        im = Image.open(file_path)
+        return np.asarray(im)
+    def cv2_imread(self, file_path, flag=cv2.IMREAD_COLOR):
+        # resolve cv2.imread open Chinese file path issues on Windows Platform.
+        return cv2.imdecode(np.fromfile(file_path, dtype=np.uint8), flag)
+    def normalize_image(self, img):
+        img = img.transpose((2, 0, 1)).astype('float32') / 255.0
+        img_mean = np.array(cfg.MEAN).reshape((len(cfg.MEAN), 1, 1))
+        img_std = np.array(cfg.STD).reshape((len(cfg.STD), 1, 1))
+        img -= img_mean
+        img /= img_std
+        return img
+    def process_image(self, line, data_dir, mode):
+        """ process_image """
+        img, grt, img_name, grt_name = self.load_image( line, data_dir, mode=mode)  # img.type: numpy.array, grt.type: numpy.array
+        if mode == ModelPhase.TRAIN:
+            # numpy.array convert to  PIL.Image 
+            img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
+            grt = Image.fromarray(grt.astype('uint8')).convert('L')
+            crop_size = self.crop_size
+            # random scale 
+            if self.rand_scale:
+                short_size = random.randint(int(self.base_size * cfg.DATAAUG.RAND_SCALE_MIN), int(self.base_size * cfg.DATAAUG.RAND_SCALE_MAX))
+            else:
+                short_size = self.base_size
+            w, h = img.size
+            if h > w:
+                out_w = short_size
+                out_h = int(1.0 * h / w * out_w)
+            else:
+                out_h = short_size
+                out_w = int(1.0 * w / h * out_h)
+            img = img.resize((out_w, out_h), Image.BILINEAR)
+            grt = grt.resize((out_w, out_h), Image.NEAREST)
+            # rand flip
+            if random.random() > 0.5:
+                img = img.transpose(Image.FLIP_LEFT_RIGHT)
+                grt = grt.transpose(Image.FLIP_LEFT_RIGHT)
+            # padding
+            if short_size < crop_size:
+                pad_h = crop_size - out_h if out_h < crop_size else 0
+                pad_w = crop_size - out_w if out_w < crop_size else 0
+                img = ImageOps.expand(img, border=(pad_w // 2, pad_h // 2, pad_w - pad_w // 2, pad_h - pad_h // 2), fill=0)
+                grt = ImageOps.expand(grt, border=(pad_w // 2, pad_h // 2, pad_w - pad_w // 2, pad_h - pad_h // 2), fill=cfg.DATASET.IGNORE_INDEX)
+            # random crop
+            w, h = img.size
+            x = random.randint(0, w - crop_size)
+            y = random.randint(0, h - crop_size)
+            img = img.crop((x, y, x + crop_size, y + crop_size))
+            grt = grt.crop((x, y, x + crop_size, y + crop_size))
+            # gaussian blur
+            if cfg.DATAAUG_EXTRA:
+                if random.random() > 0.7:
+                    img = img.filter(ImageFilter.GaussianBlur(radius=random.random()))
+            # PIL.Image -> cv2
+            img = cv2.cvtColor(np.asarray(img),cv2.COLOR_RGB2BGR)
+            grt = np.array(grt)
+        elif ModelPhase.is_eval(mode):
+            org_shape = [img.shape[0], img.shape[1]]  # 1024 x 2048 for cityscapes
+        elif ModelPhase.is_visual(mode):
+            org_shape = [img.shape[0], img.shape[1]]
+            #img, grt = resize(img, grt, mode=mode)
+            valid_shape = [img.shape[0], img.shape[1]]
+            #img, grt = rand_crop(img, grt, mode=mode)
+        else:
+            raise ValueError("Dataset mode={} Error!".format(mode))
+        # Normalize image
+        img = self.normalize_image(img)
+        if ModelPhase.is_train(mode) or ModelPhase.is_eval(mode):
+            grt = np.expand_dims(np.array(grt).astype('int32'), axis=0)
+            ignore = (grt != cfg.DATASET.IGNORE_INDEX).astype('int32')
+        if ModelPhase.is_train(mode):
+            return (img, grt, ignore)
+        elif ModelPhase.is_eval(mode):
+            return (img, grt, ignore)
+        elif ModelPhase.is_visual(mode):
+            return (img, grt, img_name, valid_shape, org_shape)
--- a/PaddleCV/Research/SemSegPaddle/src/datasets/cityscapes.py
+++ b/PaddleCV/Research/SemSegPaddle/src/datasets/cityscapes.py
+from __future__ import print_function
+import sys
+import os
+import math
+import random
+import functools
+import io
+import time
+import codecs
+import numpy as np
+import paddle
+import paddle.fluid as fluid
+import cv2
+from PIL import Image
+import copy
+from src.utils.config import cfg
+from src.models.model_builder import ModelPhase
+from .baseseg import BaseSeg
+class CityscapesSeg(BaseSeg):
+    def __init__(self, file_list, data_dir, shuffle=False, mode=ModelPhase.TRAIN, base_size=1024, crop_size=769, rand_scale=True):
+        super(CityscapesSeg, self).__init__(file_list, data_dir, shuffle, mode, base_size, crop_size, rand_scale)
+    def load_image(self, line, src_dir, mode=ModelPhase.TRAIN):
+        # original image cv2.imread flag setting
+        cv2_imread_flag = cv2.IMREAD_COLOR
+        if cfg.DATASET.IMAGE_TYPE == "rgba":
+            # If use RBGA 4 channel ImageType, use IMREAD_UNCHANGED flags to
+            # reserver alpha channel
+            cv2_imread_flag = cv2.IMREAD_UNCHANGED
+        parts = line.strip().split(cfg.DATASET.SEPARATOR)
+        if len(parts) != 2:
+            if mode == ModelPhase.TRAIN or mode == ModelPhase.EVAL:
+                raise Exception("File list format incorrect! It should be image_name {} label_name\\n".format(cfg.DATASET.SEPARATOR))
+            img_name, grt_name = parts[0], None
+        else:
+            img_name, grt_name = parts[0], parts[1]
+        img_path = os.path.join(src_dir, img_name)
+        img = self.cv2_imread(img_path, cv2_imread_flag)
+        if grt_name is not None:
+            grt_path = os.path.join(src_dir, grt_name)
+            grt = self.pil_imread(grt_path)
+        else:
+            grt = None
+        img_height = img.shape[0]
+        img_width = img.shape[1]
+        if grt is not None:
+            grt_height = grt.shape[0]
+            grt_width = grt.shape[1]
+            id_to_trainid = [255, 255, 255, 255, 255,
+                             255, 255, 255, 0, 1,
+                             255, 255, 2, 3, 4,
+                             255, 255, 255, 5, 255,
+                             6, 7, 8, 9, 10,
+                             11, 12, 13, 14, 15,
+                             255, 255, 16, 17, 18]
+            grt_ = np.zeros([grt_height, grt_width])
+            for h in range(grt_height):
+                for w in range(grt_width):
+                    grt_[h][w] = id_to_trainid[int(grt[h][w])+1]
+            if img_height != grt_height or img_width != grt_width:
+                raise Exception("source img and label img must has the same size")
+        else:
+            if mode == ModelPhase.TRAIN or mode == ModelPhase.EVAL:
+                raise Exception("Empty image, src_dir: {}, img: {} & lab: {}".format(src_dir, img_path, grt_path))
+        if len(img.shape) < 3:
+            img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
+        return img, grt_, img_name, grt_name
--- a/PaddleCV/Research/SemSegPaddle/src/datasets/data_utils.py
+++ b/PaddleCV/Research/SemSegPaddle/src/datasets/data_utils.py
+"""
+This code is based on https://github.com/fchollet/keras/blob/master/keras/utils/data_utils.py
+"""
+import time
+import numpy as np
+import threading
+import multiprocessing
+try:
+    import queue
+except ImportError:
+    import Queue as queue
+class GeneratorEnqueuer(object):
+    """
+    Multiple generators 
+    Args:
+        generators: 
+        wait_time (float): time to sleep in-between calls to `put()`.
+    """
+    def __init__(self, generators, wait_time=0.05):
+        self.wait_time = wait_time
+        self._generators = generators
+        self._threads = []
+        self._stop_events = []
+        self.queue = None
+        self._manager = None
+        self.workers = 1
+    def start(self, workers=1, max_queue_size=16):
+        """
+        Start worker threads which add data from the generator into the queue.
+        Args:
+            workers (int): number of worker threads
+            max_queue_size (int): queue size
+                (when full, threads could block on `put()`)
+        """
+        self.workers = workers
+        def data_generator_task(pid):
+            """
+            Data generator task.
+            """
+            def task(pid):
+                if (self.queue is not None
+                        and self.queue.qsize() < max_queue_size):
+                    generator_output = next(self._generators[pid])
+                    self.queue.put((generator_output))
+                else:
+                    time.sleep(self.wait_time)
+            while not self._stop_events[pid].is_set():
+                try:
+                    task(pid)
+                except Exception:
+                    self._stop_events[pid].set()
+                    break
+        try:
+            self._manager = multiprocessing.Manager()
+            self.queue = self._manager.Queue(maxsize=max_queue_size)
+            for pid in range(self.workers):
+                self._stop_events.append(multiprocessing.Event())
+                thread = multiprocessing.Process(
+                    target=data_generator_task, args=(pid, ))
+                thread.daemon = True
+                self._threads.append(thread)
+                thread.start()
+        except:
+            self.stop()
+            raise
+    def is_running(self):
+        """
+        Returns:
+            bool: Whether the worker theads are running.
+        """
+        # If queue is not empty then still in runing state wait for consumer
+        if not self.queue.empty():
+            return True
+        for pid in range(self.workers):
+            if not self._stop_events[pid].is_set():
+                return True
+        return False
+    def stop(self, timeout=None):
+        """
+        Stops running threads and wait for them to exit, if necessary.
+        Should be called by the same thread which called `start()`.
+        Args:
+            timeout(int|None): maximum time to wait on `thread.join()`.
+        """
+        if self.is_running():
+            for pid in range(self.workers):
+                self._stop_events[pid].set()
+        for thread in self._threads:
+            if thread.is_alive():
+                thread.join(timeout)
+        if self._manager:
+            self._manager.shutdown()
+        self._threads = []
+        self._stop_events = []
+        self.queue = None
--- a/PaddleCV/Research/SemSegPaddle/src/datasets/pascal_context.py
+++ b/PaddleCV/Research/SemSegPaddle/src/datasets/pascal_context.py
+from __future__ import print_function
+import sys
+import os
+import math
+import random
+import functools
+import io
+import time
+import codecs
+import numpy as np
+import paddle
+import paddle.fluid as fluid
+import cv2
+from PIL import Image
+import copy
+from src.utils.config import cfg
+from src.models.model_builder import ModelPhase
+from .baseseg import BaseSeg
+class PascalContextSeg(BaseSeg):
+    def __init__(self,
+                 file_list,
+                 data_dir,
+                 shuffle=False,
+                 mode=ModelPhase.TRAIN, base_size=520, crop_size=520, rand_scale=True):
+        super(PascalContextSeg, self).__init__(file_list, data_dir, shuffle, mode, base_size, crop_size, rand_scale)
+    def _mask_transform(self, mask):
+        target = np.array(mask).astype('int32') - 1
+        return target
+    def load_image(self, line, src_dir, mode=ModelPhase.TRAIN):
+        # original image cv2.imread flag setting
+        cv2_imread_flag = cv2.IMREAD_COLOR
+        if cfg.DATASET.IMAGE_TYPE == "rgba":
+            # If use RBGA 4 channel ImageType, use IMREAD_UNCHANGED flags to
+            # reserver alpha channel
+            cv2_imread_flag = cv2.IMREAD_UNCHANGED
+        parts = line.strip().split(cfg.DATASET.SEPARATOR)
+        if len(parts) != 2:
+            if mode == ModelPhase.TRAIN or mode == ModelPhase.EVAL:
+                raise Exception("File list format incorrect! It should be"
+                                " image_name{}label_name\\n".format(
+                                    cfg.DATASET.SEPARATOR))
+            img_name, grt_name = parts[0], None
+        else:
+            img_name, grt_name = parts[0], parts[1]
+        img_path = os.path.join(src_dir, img_name)
+        img = self.cv2_imread(img_path, cv2_imread_flag)
+        if grt_name is not None:
+            grt_path = os.path.join(src_dir, grt_name)
+            grt = self.pil_imread(grt_path)
+        else:
+            grt = None
+        if len(img.shape) < 3:
+            img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
+        grt = self._mask_transform(grt)
+        return img, grt, img_name, grt_name
--- a/PaddleCV/Research/SemSegPaddle/src/models/__init__.py
+++ b/PaddleCV/Research/SemSegPaddle/src/models/__init__.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#import models.modeling
+#import models.libs
+#import models.backbone
+from . import modeling, libs, backbone
--- a/PaddleCV/Research/SemSegPaddle/src/models/backbone/__init__.py
+++ b/PaddleCV/Research/SemSegPaddle/src/models/backbone/__init__.py
--- a/PaddleCV/Research/SemSegPaddle/src/models/backbone/hrnet.py
+++ b/PaddleCV/Research/SemSegPaddle/src/models/backbone/hrnet.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import paddle
+import paddle.fluid as fluid
+from paddle.fluid.initializer import MSRA
+from paddle.fluid.param_attr import ParamAttr
+from src.utils.config import cfg
+class HRNet():
+    """
+    Reference: 
+        Sun, Ke, et al. "Deep High-Resolution Representation Learning for Human Pose Estimation.", In CVPR 2019
+    """
+    def __init__(self, stride=4, seg_flag=False):
+        self.stride= stride
+        self.seg_flag=seg_flag
+    def conv_bn_layer(self, input, filter_size, num_filters, stride=1, padding=1, num_groups=1, if_act=True, name=None):
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=stride,
+            padding=(filter_size - 1) // 2,
+            groups=num_groups,
+            act=None,
+            param_attr=ParamAttr(initializer=MSRA(), name=name + '_weights'),
+            bias_attr=False)
+        bn_name = name + '_bn'
+        bn = fluid.layers.batch_norm(input=conv,
+                                     param_attr=ParamAttr(name=bn_name + "_scale",
+                                                          initializer=fluid.initializer.Constant(1.0)),
+                                     bias_attr=ParamAttr(name=bn_name + "_offset",
+                                                         initializer=fluid.initializer.Constant(0.0)),
+                                     moving_mean_name=bn_name + '_mean',
+                                     moving_variance_name=bn_name + '_variance')
+        if if_act:
+            bn = fluid.layers.relu(bn)
+        return bn
+    def basic_block(self, input, num_filters, stride=1, downsample=False, name=None):
+        residual = input
+        conv = self.conv_bn_layer(input=input, filter_size=3, num_filters=num_filters, stride=stride, name=name + '_conv1')
+        conv = self.conv_bn_layer(input=conv, filter_size=3, num_filters=num_filters, if_act=False, name=name + '_conv2')
+        if downsample:
+            residual = self.conv_bn_layer(input=input, filter_size=1, num_filters=num_filters, if_act=False,
+                                      name=name + '_downsample')
+        return fluid.layers.elementwise_add(x=residual, y=conv, act='relu')
+    def bottleneck_block(self, input, num_filters, stride=1, downsample=False, name=None):
+        residual = input
+        conv = self.conv_bn_layer(input=input, filter_size=1, num_filters=num_filters, name=name + '_conv1')
+        conv = self.conv_bn_layer(input=conv, filter_size=3, num_filters=num_filters, stride=stride, name=name + '_conv2')
+        conv = self.conv_bn_layer(input=conv, filter_size=1, num_filters=num_filters * 4, if_act=False,
+                              name=name + '_conv3')
+        if downsample:
+            residual = self.conv_bn_layer(input=input, filter_size=1, num_filters=num_filters * 4, if_act=False,
+                                      name=name + '_downsample')
+        return fluid.layers.elementwise_add(x=residual, y=conv, act='relu')
+    def fuse_layers(self, x, channels, multi_scale_output=True, name=None):
+        out = []
+        for i in range(len(channels) if multi_scale_output else 1):
+            residual = x[i]
+            shape = residual.shape
+            width = shape[-1]
+            height = shape[-2]
+            for j in range(len(channels)):
+                if j > i:
+                    y = self.conv_bn_layer(x[j], filter_size=1, num_filters=channels[i], if_act=False,
+                                           name=name + '_layer_' + str(i + 1) + '_' + str(j + 1))
+                    y = fluid.layers.resize_bilinear(input=y, out_shape=[height, width])
+                    residual = fluid.layers.elementwise_add(x=residual, y=y, act=None)
+                elif j < i:
+                    y = x[j]
+                    for k in range(i - j):
+                        if k == i - j - 1:
+                            y = self.conv_bn_layer(y, filter_size=3, num_filters=channels[i], stride=2, if_act=False,
+                                               name=name + '_layer_' + str(i + 1) + '_' + str(j + 1) + '_' + str(k + 1))
+                        else:
+                            y = self.conv_bn_layer(y, filter_size=3, num_filters=channels[j], stride=2,
+                                               name=name + '_layer_' + str(i + 1) + '_' + str(j + 1) + '_' + str(k + 1))
+                    residual = fluid.layers.elementwise_add(x=residual, y=y, act=None)
+            residual = fluid.layers.relu(residual)
+            out.append(residual)
+        return out
+    def branches(self, x, block_num, channels, name=None):
+        out = []
+        for i in range(len(channels)):
+            residual = x[i]
+            for j in range(block_num):
+                residual = self.basic_block(residual, channels[i],
+                                        name=name + '_branch_layer_' + str(i + 1) + '_' + str(j + 1))
+            out.append(residual)
+        return out
+    def high_resolution_module(self, x, channels, multi_scale_output=True, name=None):
+        residual = self.branches(x, 4, channels, name=name)
+        out = self.fuse_layers(residual, channels, multi_scale_output=multi_scale_output, name=name)
+        return out
+    def transition_layer(self, x, in_channels, out_channels, name=None):
+        num_in = len(in_channels)
+        num_out = len(out_channels)
+        out = []
+        for i in range(num_out):
+            if i < num_in:
+                if in_channels[i] != out_channels[i]:
+                    residual = self.conv_bn_layer(x[i], filter_size=3, num_filters=out_channels[i],
+                                              name=name + '_layer_' + str(i + 1))
+                    out.append(residual)
+                else:
+                    out.append(x[i])
+            else:
+                residual = self.conv_bn_layer(x[-1], filter_size=3, num_filters=out_channels[i], stride=2,
+                                          name=name + '_layer_' + str(i + 1))
+                out.append(residual)
+        return out
+    def stage(self, x, num_modules, channels, multi_scale_output=True, name=None):
+        out = x
+        for i in range(num_modules):
+            if i == num_modules - 1 and multi_scale_output == False:
+                out = self.high_resolution_module(out, channels, multi_scale_output=False, name=name + '_' + str(i + 1))
+            else:
+                out = self.high_resolution_module(out, channels, name=name + '_' + str(i + 1))
+        return out
+    def layer1(self, input, name=None):
+        conv = input
+        for i in range(4):
+            conv = self.bottleneck_block(conv, num_filters=64, downsample=True if i == 0 else False,
+                                     name=name + '_' + str(i + 1))
+        return conv
+    #def highResolutionNet(input, num_classes):
+    def net(self, input, num_classes=1000):
+        channels_2 = cfg.MODEL.HRNET.STAGE2.NUM_CHANNELS
+        channels_3 = cfg.MODEL.HRNET.STAGE3.NUM_CHANNELS
+        channels_4 = cfg.MODEL.HRNET.STAGE4.NUM_CHANNELS
+        num_modules_2 = cfg.MODEL.HRNET.STAGE2.NUM_MODULES
+        num_modules_3 = cfg.MODEL.HRNET.STAGE3.NUM_MODULES
+        num_modules_4 = cfg.MODEL.HRNET.STAGE4.NUM_MODULES
+        x = self.conv_bn_layer(input=input, filter_size=3, num_filters=64, stride=2, if_act=True, name='layer1_1')
+        x = self.conv_bn_layer(input=x, filter_size=3, num_filters=64, stride=2, if_act=True, name='layer1_2')
+        la1 = self.layer1(x, name='layer2')
+        tr1 = self.transition_layer([la1], [256], channels_2, name='tr1')
+        st2 = self.stage(tr1, num_modules_2, channels_2, name='st2')
+        tr2 = self.transition_layer(st2, channels_2, channels_3, name='tr2')
+        st3 = self.stage(tr2, num_modules_3, channels_3, name='st3')
+        tr3 = self.transition_layer(st3, channels_3, channels_4, name='tr3')
+        st4 = self.stage(tr3, num_modules_4, channels_4, name='st4')
+        # upsample
+        shape = st4[0].shape
+        height, width = shape[-2], shape[-1]
+        st4[1] = fluid.layers.resize_bilinear(st4[1], out_shape=[height, width])
+        st4[2] = fluid.layers.resize_bilinear(st4[2], out_shape=[height, width])
+        st4[3] = fluid.layers.resize_bilinear(st4[3], out_shape=[height, width])
+        out = fluid.layers.concat(st4, axis=1)
+        if self.seg_flag and self.stride==4:
+           return out
+        last_channels = sum(channels_4)
+        out = conv_bn_layer(input=out, filter_size=1, num_filters=last_channels, stride=1, if_act=True, name='conv-2')
+        out= fluid.layers.conv2d(
+            input=out,
+            num_filters=num_classes,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            act=None,
+            param_attr=ParamAttr(initializer=MSRA(), name='conv-1_weights'),
+            bias_attr=False)
+        out = fluid.layers.resize_bilinear(out, input.shape[2:])
+        return out
+def hrnet():
+    model = HRNet(stride=4, seg_flag=True)
+    return model
+if __name__ == '__main__':
+    image_shape = [3, 769, 769]
+    image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
+    logit = hrnet(image, 4)
+    print("logit:", logit.shape)
--- a/PaddleCV/Research/SemSegPaddle/src/models/backbone/mobilenet_v2.py
+++ b/PaddleCV/Research/SemSegPaddle/src/models/backbone/mobilenet_v2.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import paddle.fluid as fluid
+from paddle.fluid.initializer import MSRA
+from paddle.fluid.param_attr import ParamAttr
+__all__ = [
+    'MobileNetV2', 'MobileNetV2_x0_25', 'MobileNetV2_x0_5', 'MobileNetV2_x1_0',
+    'MobileNetV2_x1_5', 'MobileNetV2_x2_0', 'MobileNetV2_scale'
+]
+class MobileNetV2():
+    def __init__(self, scale=1.0, change_depth=False, output_stride=None):
+        self.scale = scale
+        self.change_depth = change_depth
+        self.bottleneck_params_list = [
+            (1, 16, 1, 1),
+            (6, 24, 2, 2),
+            (6, 32, 3, 2),
+            (6, 64, 4, 2),
+            (6, 96, 3, 1),
+            (6, 160, 3, 2),
+            (6, 320, 1, 1),
+        ] if change_depth == False else [
+            (1, 16, 1, 1),
+            (6, 24, 2, 2),
+            (6, 32, 5, 2),
+            (6, 64, 7, 2),
+            (6, 96, 5, 1),
+            (6, 160, 3, 2),
+            (6, 320, 1, 1),
+        ]
+        self.modify_bottle_params(output_stride)
+    def modify_bottle_params(self, output_stride=None):
+        if output_stride is not None and output_stride % 2 != 0:
+            raise Exception("output stride must to be even number")
+        if output_stride is None:
+            return
+        else:
+            stride = 2
+            for i, layer_setting in enumerate(self.bottleneck_params_list):
+                t, c, n, s = layer_setting
+                stride = stride * s
+                if stride > output_stride:
+                    s = 1
+                self.bottleneck_params_list[i] = (t, c, n, s)
+    def net(self, input, class_dim=1000, end_points=None, decode_points=None):
+        scale = self.scale
+        change_depth = self.change_depth
+        #if change_depth is True, the new depth is 1.4 times as deep as before.
+        bottleneck_params_list = self.bottleneck_params_list
+        decode_ends = dict()
+        def check_points(count, points):
+            if points is None:
+                return False
+            else:
+                if isinstance(points, list):
+                    return (True if count in points else False)
+                else:
+                    return (True if count == points else False)
+        #conv1
+        input = self.conv_bn_layer(
+            input,
+            num_filters=int(32 * scale),
+            filter_size=3,
+            stride=2,
+            padding=1,
+            if_act=True,
+            name='conv1_1')
+        layer_count = 1
+        #print("node test:", layer_count, input.shape)
+        if check_points(layer_count, decode_points):
+            decode_ends[layer_count] = input
+        if check_points(layer_count, end_points):
+            return input, decode_ends
+        # bottleneck sequences
+        i = 1
+        in_c = int(32 * scale)
+        for layer_setting in bottleneck_params_list:
+            t, c, n, s = layer_setting
+            i += 1
+            input, depthwise_output = self.invresi_blocks(
+                input=input,
+                in_c=in_c,
+                t=t,
+                c=int(c * scale),
+                n=n,
+                s=s,
+                name='conv' + str(i))
+            in_c = int(c * scale)
+            layer_count += n
+            #print("node test:", layer_count, input.shape)
+            if check_points(layer_count, decode_points):
+                decode_ends[layer_count] = depthwise_output
+            if check_points(layer_count, end_points):
+                return input, decode_ends
+        #last_conv
+        input = self.conv_bn_layer(
+            input=input,
+            num_filters=int(1280 * scale) if scale > 1.0 else 1280,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            if_act=True,
+            name='conv9')
+        input = fluid.layers.pool2d(
+            input=input,
+            pool_size=7,
+            pool_stride=1,
+            pool_type='avg',
+            global_pooling=True)
+        output = fluid.layers.fc(
+            input=input,
+            size=class_dim,
+            param_attr=ParamAttr(name='fc10_weights'),
+            bias_attr=ParamAttr(name='fc10_offset'))
+        return output
+    def conv_bn_layer(self,
+                      input,
+                      filter_size,
+                      num_filters,
+                      stride,
+                      padding,
+                      channels=None,
+                      num_groups=1,
+                      if_act=True,
+                      name=None,
+                      use_cudnn=True):
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=stride,
+            padding=padding,
+            groups=num_groups,
+            act=None,
+            use_cudnn=use_cudnn,
+            param_attr=ParamAttr(name=name + '_weights'),
+            bias_attr=False)
+        bn_name = name + '_bn'
+        bn = fluid.layers.batch_norm(
+            input=conv,
+            param_attr=ParamAttr(name=bn_name + "_scale"),
+            bias_attr=ParamAttr(name=bn_name + "_offset"),
+            moving_mean_name=bn_name + '_mean',
+            moving_variance_name=bn_name + '_variance')
+        if if_act:
+            return fluid.layers.relu6(bn)
+        else:
+            return bn
+    def shortcut(self, input, data_residual):
+        return fluid.layers.elementwise_add(input, data_residual)
+    def inverted_residual_unit(self,
+                               input,
+                               num_in_filter,
+                               num_filters,
+                               ifshortcut,
+                               stride,
+                               filter_size,
+                               padding,
+                               expansion_factor,
+                               name=None):
+        num_expfilter = int(round(num_in_filter * expansion_factor))
+        channel_expand = self.conv_bn_layer(
+            input=input,
+            num_filters=num_expfilter,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            num_groups=1,
+            if_act=True,
+            name=name + '_expand')
+        bottleneck_conv = self.conv_bn_layer(
+            input=channel_expand,
+            num_filters=num_expfilter,
+            filter_size=filter_size,
+            stride=stride,
+            padding=padding,
+            num_groups=num_expfilter,
+            if_act=True,
+            name=name + '_dwise',
+            use_cudnn=False)
+        depthwise_output = bottleneck_conv
+        linear_out = self.conv_bn_layer(
+            input=bottleneck_conv,
+            num_filters=num_filters,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            num_groups=1,
+            if_act=False,
+            name=name + '_linear')
+        if ifshortcut:
+            out = self.shortcut(input=input, data_residual=linear_out)
+            return out, depthwise_output
+        else:
+            return linear_out, depthwise_output
+    def invresi_blocks(self, input, in_c, t, c, n, s, name=None):
+        first_block, depthwise_output = self.inverted_residual_unit(
+            input=input,
+            num_in_filter=in_c,
+            num_filters=c,
+            ifshortcut=False,
+            stride=s,
+            filter_size=3,
+            padding=1,
+            expansion_factor=t,
+            name=name + '_1')
+        last_residual_block = first_block
+        last_c = c
+        for i in range(1, n):
+            last_residual_block, depthwise_output = self.inverted_residual_unit(
+                input=last_residual_block,
+                num_in_filter=last_c,
+                num_filters=c,
+                ifshortcut=True,
+                stride=1,
+                filter_size=3,
+                padding=1,
+                expansion_factor=t,
+                name=name + '_' + str(i + 1))
+        return last_residual_block, depthwise_output
+def MobileNetV2_x0_25():
+    model = MobileNetV2(scale=0.25)
+    return model
+def MobileNetV2_x0_5():
+    model = MobileNetV2(scale=0.5)
+    return model
+def MobileNetV2_x1_0():
+    model = MobileNetV2(scale=1.0)
+    return model
+def MobileNetV2_x1_5():
+    model = MobileNetV2(scale=1.5)
+    return model
+def MobileNetV2_x2_0():
+    model = MobileNetV2(scale=2.0)
+    return model
+def MobileNetV2_scale():
+    model = MobileNetV2(scale=1.2, change_depth=True)
+    return model
+if __name__ == '__main__':
+    image_shape = [3, 224, 224]
+    image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
+    model = MobileNetV2_x1_0()
+    logit, decode_ends = model.net(image)
+    #print("logit:", logit.shape)
--- a/PaddleCV/Research/SemSegPaddle/src/models/backbone/resnet.py
+++ b/PaddleCV/Research/SemSegPaddle/src/models/backbone/resnet.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import math
+import numpy as np
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+from src.utils.config import cfg
+__all__ = [
+    "ResNet", "ResNet18", "ResNet34", "ResNet50", "ResNet101", "ResNet152"
+]
+class ResNet():
+    def __init__(self, layers=50, scale=1.0):
+        self.layers = layers
+        self.scale = scale
+    def net(self,
+            input,
+            class_dim=1000,
+            end_points=None,
+            decode_points=None,
+            resize_points=None,
+            dilation_dict=None):
+        layers = self.layers
+        supported_layers = [18, 34, 50, 101, 152]
+        assert layers in supported_layers, \
+            "supported layers are {} but input layer is {}".format(supported_layers, layers)
+        decode_ends = dict()
+        def check_points(count, points):
+            if points is None:
+                return False
+            else:
+                if isinstance(points, list):
+                    return (True if count in points else False)
+                else:
+                    return (True if count == points else False)
+        def get_dilated_rate(dilation_dict, idx):
+            if dilation_dict is None or idx not in dilation_dict:
+                return 1
+            else:
+                return dilation_dict[idx]
+        if layers == 18:
+            depth = [2, 2, 2, 2]
+        elif layers == 34 or layers == 50:
+            depth = [3, 4, 6, 3]
+        elif layers == 101:
+            depth = [3, 4, 23, 3]
+        elif layers == 152:
+            depth = [3, 8, 36, 3]
+        num_filters = [64, 128, 256, 512]
+        # stage_1: 3 3x3_Conv
+        conv = self.conv_bn_layer(
+                input=input,
+                num_filters=int(64 * self.scale),
+                filter_size=3,
+                stride=2,
+                act='relu',
+                name="conv1_1")
+        conv = self.conv_bn_layer(
+                input=conv,
+                num_filters=int(64 * self.scale),
+                filter_size=3,
+                stride=1,
+                act='relu',
+                name="conv1_2")
+        conv = self.conv_bn_layer(
+                input=conv,
+                num_filters=int(128 * self.scale),
+                filter_size=3,
+                stride=1,
+                act='relu',
+                name="conv1_3")
+        conv = fluid.layers.pool2d(
+            input=conv,
+            pool_size=3,
+            pool_stride=2,
+            pool_padding=1,
+            pool_type='max')
+        layer_count = 1
+        if check_points(layer_count, decode_points):
+            decode_ends[layer_count] = conv
+        if check_points(layer_count, end_points):
+            return conv, decode_ends
+        if layers >= 50:
+            for block in range(len(depth)):
+                for i in range(depth[block]):  #depth = [3, 4, 23, 3]
+                    if layers in [101, 152] and block == 2:
+                        if i == 0:
+                            conv_name = "res" + str(block + 2) + "a"
+                        else:
+                            conv_name = "res" + str(block + 2) + "b" + str(i)
+                    else:
+                        conv_name = "res" + str(block + 2) + chr(97 + i)
+                    dilation_rate = get_dilated_rate(dilation_dict, block)
+                    # added by Rosun, employ multi-grid
+                    if cfg.MODEL.BACKBONE_MULTI_GRID== True and block==3:
+                        if i==0:
+                            dilation_rate = dilation_rate*(i+1)
+                        else: 
+                            dilation_rate = dilation_rate*(2*i)  # 2, 4
+                        print("employ multi-grid for resnet backbone network: dilation_rate={}\n".format(dilation_rate))
+                    conv = self.bottleneck_block(
+                        input=conv,
+                        num_filters=int(num_filters[block] * self.scale),
+                        stride=2
+                        if i == 0 and block != 0 and dilation_rate == 1 else 1,
+                        name=conv_name,
+                        dilation=dilation_rate)
+                    layer_count += 3
+                    if check_points(layer_count, decode_points):
+                        decode_ends[layer_count] = conv
+                    if check_points(layer_count, end_points):
+                        return conv, decode_ends
+                    if check_points(layer_count, resize_points):
+                        conv = self.interp(
+                            conv,
+                            np.ceil(
+                                np.array(conv.shape[2:]).astype('int32') / 2))
+            pool = fluid.layers.pool2d(input=conv, pool_size=7, pool_type='avg', global_pooling=True)
+            stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
+            out = fluid.layers.fc(input=pool,
+                                  size=class_dim,
+                                  param_attr=fluid.param_attr.ParamAttr(initializer=fluid.initializer.Uniform(-stdv, stdv)))
+        else:
+            for block in range(len(depth)):
+                for i in range(depth[block]):
+                    conv_name = "res" + str(block + 2) + chr(97 + i)
+                    conv = self.basic_block(
+                        input=conv,
+                        num_filters=num_filters[block],
+                        stride=2 if i == 0 and block != 0 else 1,
+                        is_first=block == i == 0,
+                        name=conv_name)
+                    layer_count += 2
+                    if check_points(layer_count, decode_points):
+                        decode_ends[layer_count] = conv
+                    if check_points(layer_count, end_points):
+                        return conv, decode_ends
+            pool = fluid.layers.pool2d(
+                input=conv, pool_size=7, pool_type='avg', global_pooling=True)
+            stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
+            out = fluid.layers.fc(
+                input=pool,
+                size=class_dim,
+                param_attr=fluid.param_attr.ParamAttr(
+                    initializer=fluid.initializer.Uniform(-stdv, stdv)))
+        return out
+    def zero_padding(self, input, padding):
+        return fluid.layers.pad(
+            input, [0, 0, 0, 0, padding, padding, padding, padding])
+    def interp(self, input, out_shape):
+        out_shape = list(out_shape.astype("int32"))
+        return fluid.layers.resize_bilinear(input, out_shape=out_shape)
+    def conv_bn_layer(self,
+                      input,
+                      num_filters,
+                      filter_size,
+                      stride=1,
+                      dilation=1,
+                      groups=1,
+                      act=None,
+                      name=None):
+        bias_attr=False
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=stride,
+            padding=(filter_size - 1) // 2 if dilation == 1 else 0,
+            dilation=dilation,
+            groups=groups,
+            act=None,
+            param_attr=ParamAttr(name=name + "_weights"),
+            bias_attr=bias_attr,
+            name=name + '.conv2d.output.1')
+        if name == "conv1":
+            bn_name = "bn_" + name
+        else:
+            bn_name = "bn" + name[3:]
+        return fluid.layers.batch_norm(input=conv,
+                                       act=act,
+                                       name=bn_name + '.output.1',
+                                       param_attr=ParamAttr(name=bn_name + '_scale'),
+                                       bias_attr=ParamAttr(bn_name + '_offset'),
+                                       moving_mean_name=bn_name + '_mean',
+                                       moving_variance_name=bn_name + '_variance', )
+    def shortcut(self, input, ch_out, stride, is_first, name):
+        ch_in = input.shape[1]
+        if ch_in != ch_out or stride != 1 or is_first == True:
+            return self.conv_bn_layer(input, ch_out, 1, stride, name=name)
+        else:
+            return input
+    def bottleneck_block(self, input, num_filters, stride, name, dilation=1):
+        if self.layers == 101:
+            strides = [1, stride]
+        else:
+            strides = [stride, 1]
+        conv0 = self.conv_bn_layer(
+            input=input,
+            num_filters=num_filters,
+            filter_size=1,
+            dilation=1,
+            stride=strides[0],
+            act='relu',
+            name=name + "_branch2a")
+        if dilation > 1:
+            conv0 = self.zero_padding(conv0, dilation)
+        conv1 = self.conv_bn_layer(
+            input=conv0,
+            num_filters=num_filters,
+            filter_size=3,
+            dilation=dilation,
+            stride=strides[1],
+            act='relu',
+            name=name + "_branch2b")
+        conv2 = self.conv_bn_layer(
+            input=conv1,
+            num_filters=num_filters * 4,
+            dilation=1,
+            filter_size=1,
+            act=None,
+            name=name + "_branch2c")
+        short = self.shortcut(
+            input,
+            num_filters * 4,
+            stride,
+            is_first=False,
+            name=name + "_branch1")
+        return fluid.layers.elementwise_add(
+            x=short, y=conv2, act='relu', name=name + ".add.output.5")
+    def basic_block(self, input, num_filters, stride, is_first, name):
+        conv0 = self.conv_bn_layer(
+            input=input,
+            num_filters=num_filters,
+            filter_size=3,
+            act='relu',
+            stride=stride,
+            name=name + "_branch2a")
+        conv1 = self.conv_bn_layer(
+            input=conv0,
+            num_filters=num_filters,
+            filter_size=3,
+            act=None,
+            name=name + "_branch2b")
+        short = self.shortcut(
+            input, num_filters, stride, is_first, name=name + "_branch1")
+        return fluid.layers.elementwise_add(x=short, y=conv1, act='relu')
+def ResNet18():
+    model = ResNet(layers=18)
+    return model
+def ResNet34():
+    model = ResNet(layers=34)
+    return model
+def ResNet50():
+    model = ResNet(layers=50)
+    return model
+def ResNet101():
+    model = ResNet(layers=101)
+    return model
+def ResNet152():
+    model = ResNet(layers=152)
+    return model
--- a/PaddleCV/Research/SemSegPaddle/src/models/backbone/xception.py
+++ b/PaddleCV/Research/SemSegPaddle/src/models/backbone/xception.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import contextlib
+import paddle
+import math
+import paddle.fluid as fluid
+from src.models.libs.model_libs import scope, name_scope
+from src.models.libs.model_libs import bn, bn_relu, relu
+from src.models.libs.model_libs import conv
+from src.models.libs.model_libs import separate_conv
+__all__ = ['xception_65', 'xception_41', 'xception_71']
+def check_data(data, number):
+    if type(data) == int:
+        return [data] * number
+    assert len(data) == number
+    return data
+def check_stride(s, os):
+    if s <= os:
+        return True
+    else:
+        return False
+def check_points(count, points):
+    if points is None:
+        return False
+    else:
+        if isinstance(points, list):
+            return (True if count in points else False)
+        else:
+            return (True if count == points else False)
+class Xception():
+    def __init__(self, backbone="xception_65"):
+        self.bottleneck_params = self.gen_bottleneck_params(backbone)
+        self.backbone = backbone
+    def gen_bottleneck_params(self, backbone='xception_65'):
+        if backbone == 'xception_65':
+            bottleneck_params = {
+                "entry_flow": (3, [2, 2, 2], [128, 256, 728]),
+                "middle_flow": (16, 1, 728),
+                "exit_flow": (2, [2, 1], [[728, 1024, 1024], [1536, 1536,
+                                                              2048]])
+            }
+        elif backbone == 'xception_41':
+            bottleneck_params = {
+                "entry_flow": (3, [2, 2, 2], [128, 256, 728]),
+                "middle_flow": (8, 1, 728),
+                "exit_flow": (2, [2, 1], [[728, 1024, 1024], [1536, 1536,
+                                                              2048]])
+            }
+        elif backbone == 'xception_71':
+            bottleneck_params = {
+                "entry_flow": (5, [2, 1, 2, 1, 2], [128, 256, 256, 728, 728]),
+                "middle_flow": (16, 1, 728),
+                "exit_flow": (2, [2, 1], [[728, 1024, 1024], [1536, 1536,
+                                                              2048]])
+            }
+        else:
+            raise Exception(
+                "xception backbont only support xception_41/xception_65/xception_71"
+            )
+        return bottleneck_params
+    def net(self,
+            input,
+            output_stride=32,
+            num_classes=1000,
+            end_points=None,
+            decode_points=None):
+        self.stride = 2
+        self.block_point = 0
+        self.output_stride = output_stride
+        self.decode_points = decode_points
+        self.short_cuts = dict()
+        with scope(self.backbone):
+            # Entry flow
+            data = self.entry_flow(input)
+            if check_points(self.block_point, end_points):
+                return data, self.short_cuts
+            # Middle flow
+            data = self.middle_flow(data)
+            if check_points(self.block_point, end_points):
+                return data, self.short_cuts
+            # Exit flow
+            data = self.exit_flow(data)
+            if check_points(self.block_point, end_points):
+                return data, self.short_cuts
+            data = fluid.layers.reduce_mean(data, [2, 3], keep_dim=True)
+            data = fluid.layers.dropout(data, 0.5)
+            stdv = 1.0 / math.sqrt(data.shape[1] * 1.0)
+            with scope("logit"):
+                out = fluid.layers.fc(
+                    input=data,
+                    size=num_classes,
+                    act='softmax',
+                    param_attr=fluid.param_attr.ParamAttr(
+                        name='weights',
+                        initializer=fluid.initializer.Uniform(-stdv, stdv)),
+                    bias_attr=fluid.param_attr.ParamAttr(name='bias'))
+            return out
+    def entry_flow(self, data):
+        param_attr = fluid.ParamAttr(
+            name=name_scope + 'weights',
+            regularizer=None,
+            initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.09))
+        with scope("entry_flow"):
+            with scope("conv1"):
+                data = bn_relu(
+                    conv(
+                        data, 32, 3, stride=2, padding=1,
+                        param_attr=param_attr))
+            with scope("conv2"):
+                data = bn_relu(
+                    conv(
+                        data, 64, 3, stride=1, padding=1,
+                        param_attr=param_attr))
+        # get entry flow params
+        block_num = self.bottleneck_params["entry_flow"][0]
+        strides = self.bottleneck_params["entry_flow"][1]
+        chns = self.bottleneck_params["entry_flow"][2]
+        strides = check_data(strides, block_num)
+        chns = check_data(chns, block_num)
+        # params to control your flow
+        s = self.stride
+        block_point = self.block_point
+        output_stride = self.output_stride
+        with scope("entry_flow"):
+            for i in range(block_num):
+                block_point = block_point + 1
+                with scope("block" + str(i + 1)):
+                    stride = strides[i] if check_stride(s * strides[i],
+                                                        output_stride) else 1
+                    data, short_cuts = self.xception_block(
+                        data, chns[i], [1, 1, stride])
+                    s = s * stride
+                    if check_points(block_point, self.decode_points):
+                        self.short_cuts[block_point] = short_cuts[1]
+        self.stride = s
+        self.block_point = block_point
+        return data
+    def middle_flow(self, data):
+        block_num = self.bottleneck_params["middle_flow"][0]
+        strides = self.bottleneck_params["middle_flow"][1]
+        chns = self.bottleneck_params["middle_flow"][2]
+        strides = check_data(strides, block_num)
+        chns = check_data(chns, block_num)
+        # params to control your flow
+        s = self.stride
+        block_point = self.block_point
+        output_stride = self.output_stride
+        with scope("middle_flow"):
+            for i in range(block_num):
+                block_point = block_point + 1
+                with scope("block" + str(i + 1)):
+                    stride = strides[i] if check_stride(s * strides[i],
+                                                        output_stride) else 1
+                    data, short_cuts = self.xception_block(
+                        data, chns[i], [1, 1, strides[i]], skip_conv=False)
+                    s = s * stride
+                    if check_points(block_point, self.decode_points):
+                        self.short_cuts[block_point] = short_cuts[1]
+        self.stride = s
+        self.block_point = block_point
+        return data
+    def exit_flow(self, data):
+        block_num = self.bottleneck_params["exit_flow"][0]
+        strides = self.bottleneck_params["exit_flow"][1]
+        chns = self.bottleneck_params["exit_flow"][2]
+        strides = check_data(strides, block_num)
+        chns = check_data(chns, block_num)
+        assert (block_num == 2)
+        # params to control your flow
+        s = self.stride
+        block_point = self.block_point
+        output_stride = self.output_stride
+        with scope("exit_flow"):
+            with scope('block1'):
+                block_point += 1
+                stride = strides[0] if check_stride(s * strides[0],
+                                                    output_stride) else 1
+                data, short_cuts = self.xception_block(data, chns[0],
+                                                       [1, 1, stride])
+                s = s * stride
+                if check_points(block_point, self.decode_points):
+                    self.short_cuts[block_point] = short_cuts[1]
+            with scope('block2'):
+                block_point += 1
+                stride = strides[1] if check_stride(s * strides[1],
+                                                    output_stride) else 1
+                data, short_cuts = self.xception_block(
+                    data,
+                    chns[1], [1, 1, stride],
+                    dilation=2,
+                    has_skip=False,
+                    activation_fn_in_separable_conv=True)
+                s = s * stride
+                if check_points(block_point, self.decode_points):
+                    self.short_cuts[block_point] = short_cuts[1]
+        self.stride = s
+        self.block_point = block_point
+        return data
+    def xception_block(self,
+                       input,
+                       channels,
+                       strides=1,
+                       filters=3,
+                       dilation=1,
+                       skip_conv=True,
+                       has_skip=True,
+                       activation_fn_in_separable_conv=False):
+        repeat_number = 3
+        channels = check_data(channels, repeat_number)
+        filters = check_data(filters, repeat_number)
+        strides = check_data(strides, repeat_number)
+        data = input
+        results = []
+        for i in range(repeat_number):
+            with scope('separable_conv' + str(i + 1)):
+                if not activation_fn_in_separable_conv:
+                    data = relu(data)
+                    data = separate_conv(
+                        data,
+                        channels[i],
+                        strides[i],
+                        filters[i],
+                        dilation=dilation)
+                else:
+                    data = separate_conv(
+                        data,
+                        channels[i],
+                        strides[i],
+                        filters[i],
+                        dilation=dilation,
+                        act=relu)
+                results.append(data)
+        if not has_skip:
+            return data, results
+        if skip_conv:
+            param_attr = fluid.ParamAttr(
+                name=name_scope + 'weights',
+                regularizer=None,
+                initializer=fluid.initializer.TruncatedNormal(
+                    loc=0.0, scale=0.09))
+            with scope('shortcut'):
+                skip = bn(
+                    conv(
+                        input,
+                        channels[-1],
+                        1,
+                        strides[-1],
+                        groups=1,
+                        padding=0,
+                        param_attr=param_attr))
+        else:
+            skip = input
+        return data + skip, results
+def xception_65():
+    model = Xception("xception_65")
+    return model
+def xception_41():
+    model = Xception("xception_41")
+    return model
+def xception_71():
+    model = Xception("xception_71")
+    return model
+if __name__ == '__main__':
+    image_shape = [3, 224, 224]
+    image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
+    model = xception_65()
+    logit = model.net(image)
--- a/PaddleCV/Research/SemSegPaddle/src/models/libs/__init__.py
+++ b/PaddleCV/Research/SemSegPaddle/src/models/libs/__init__.py
--- a/PaddleCV/Research/SemSegPaddle/src/models/libs/model_libs.py
+++ b/PaddleCV/Research/SemSegPaddle/src/models/libs/model_libs.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import sys
+import paddle
+import paddle.fluid as fluid
+from src.utils.config import cfg
+import contextlib
+bn_regularizer = fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.0)
+name_scope = ""
+@contextlib.contextmanager
+def scope(name):
+    global name_scope
+    bk = name_scope
+    name_scope = name_scope + name + '/'
+    yield
+    name_scope = bk
+def max_pool(input, kernel, stride, padding):
+    data = fluid.layers.pool2d(
+        input,
+        pool_size=kernel,
+        pool_type='max',
+        pool_stride=stride,
+        pool_padding=padding)
+    return data
+def avg_pool(input, kernel, stride, padding=0):
+    data = fluid.layers.pool2d(
+        input,
+        pool_size=kernel,
+        pool_type='avg',
+        pool_stride=stride,
+        pool_padding=padding)
+    return data
+def group_norm(input, G, eps=1e-5, param_attr=None, bias_attr=None):
+    N, C, H, W = input.shape
+    if C % G != 0:
+        # print "group can not divide channle:", C, G
+        for d in range(10):
+            for t in [d, -d]:
+                if G + t <= 0: continue
+                if C % (G + t) == 0:
+                    G = G + t
+                    break
+            if C % G == 0:
+                # print "use group size:", G
+                break
+    assert C % G == 0
+    x = fluid.layers.group_norm(
+        input,
+        groups=G,
+        param_attr=param_attr,
+        bias_attr=bias_attr,
+        name=name_scope + 'group_norm')
+    return x
+def bn(*args, **kargs):
+    if cfg.MODEL.DEFAULT_NORM_TYPE == 'bn':
+        with scope('BatchNorm'):
+            return fluid.layers.batch_norm(
+                *args,
+                epsilon=cfg.MODEL.DEFAULT_EPSILON,
+                momentum=cfg.MODEL.BN_MOMENTUM,
+                param_attr=fluid.ParamAttr(
+                    name=name_scope + 'gamma', regularizer=bn_regularizer),
+                bias_attr=fluid.ParamAttr(
+                    name=name_scope + 'beta', regularizer=bn_regularizer),
+                moving_mean_name=name_scope + 'moving_mean',
+                moving_variance_name=name_scope + 'moving_variance',
+                **kargs)
+    elif cfg.MODEL.DEFAULT_NORM_TYPE == 'gn':
+        with scope('GroupNorm'):
+            return group_norm(
+                args[0],
+                cfg.MODEL.DEFAULT_GROUP_NUMBER,
+                eps=cfg.MODEL.DEFAULT_EPSILON,
+                param_attr=fluid.ParamAttr(
+                    name=name_scope + 'gamma', regularizer=bn_regularizer),
+                bias_attr=fluid.ParamAttr(
+                    name=name_scope + 'beta', regularizer=bn_regularizer))
+    else:
+        raise Exception("Unsupport norm type:" + cfg.MODEL.DEFAULT_NORM_TYPE)
+def bn_zero(*args, **kargs):
+    if cfg.MODEL.DEFAULT_NORM_TYPE == 'bn':
+        with scope('BatchNormZeroInit'):
+            return fluid.layers.batch_norm(
+                *args,
+                epsilon=cfg.MODEL.DEFAULT_EPSILON,
+                momentum=cfg.MODEL.BN_MOMENTUM,
+                param_attr=fluid.ParamAttr(
+                    name=name_scope + 'gamma', regularizer=bn_regularizer,
+                    initializer=fluid.initializer.ConstantInitializer(value=0.0)),
+                bias_attr=fluid.ParamAttr(
+                    name=name_scope + 'beta', regularizer=bn_regularizer,
+                    initializer=fluid.initializer.ConstantInitializer(value=0.0)),
+                moving_mean_name=name_scope + 'moving_mean',
+                moving_variance_name=name_scope + 'moving_variance',
+                **kargs)
+def bn_relu(data):
+    return fluid.layers.relu(bn(data))
+def relu(data):
+    return fluid.layers.relu(data)
+def conv(*args, **kargs):
+    kargs['param_attr'] = name_scope + 'weights'
+    if 'bias_attr' in kargs and kargs['bias_attr']:
+        kargs['bias_attr'] = fluid.ParamAttr(
+            name=name_scope + 'biases',
+            regularizer=None,
+            initializer=fluid.initializer.ConstantInitializer(value=0.0))
+    else:
+        kargs['bias_attr'] = False
+    return fluid.layers.conv2d(*args, **kargs)
+def deconv(*args, **kargs):
+    kargs['param_attr'] = name_scope + 'weights'
+    if 'bias_attr' in kargs and kargs['bias_attr']:
+        kargs['bias_attr'] = name_scope + 'biases'
+    else:
+        kargs['bias_attr'] = False
+    return fluid.layers.conv2d_transpose(*args, **kargs)
+def separate_conv(input, channel, stride, filter, dilation=1, act=None):
+    param_attr = fluid.ParamAttr(
+        name=name_scope + 'weights',
+        regularizer=fluid.regularizer.L2DecayRegularizer(
+            regularization_coeff=0.0),
+        initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.33))
+    with scope('depthwise'):
+        input = conv(
+            input,
+            input.shape[1],
+            filter,
+            stride,
+            groups=input.shape[1],
+            padding=(filter // 2) * dilation,
+            dilation=dilation,
+            use_cudnn=False,
+            param_attr=param_attr)
+        input = bn(input)
+        if act: input = act(input)
+    param_attr = fluid.ParamAttr(
+        name=name_scope + 'weights',
+        regularizer=None,
+        initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
+    with scope('pointwise'):
+        input = conv(
+            input, channel, 1, 1, groups=1, padding=0, param_attr=param_attr)
+        input = bn(input)
+        if act: input = act(input)
+    return input
+def FCNHead(input, mid_feat_channel, num_classes, output_shape):
+    # Arch: Conv_3x3 + BN + ReLU + Dropout + Conv_1x1
+    # Conv_3x3 + BN + ReLU
+    aux_seg_name= "Aux_layer1"
+    with scope(aux_seg_name):
+        conv_feat= conv(input, mid_feat_channel, filter_size=3, padding=1, bias_attr=False, name=aux_seg_name + '_conv')
+        bn_feat = bn(conv_feat, act='relu')
+    # Dropout
+    dropout_out = fluid.layers.dropout(bn_feat, dropout_prob=0.1, name="Aux_dropout")
+    # Conv_1x1 + bilinear_upsample
+    aux_seg_name= "Aux_layer2"
+    with scope(aux_seg_name):
+        aux_logit = conv(dropout_out, num_classes, filter_size=1, bias_attr=True, name= aux_seg_name + '_logit_conv')
+        aux_logit_interp = fluid.layers.resize_bilinear(aux_logit, out_shape=output_shape, name= aux_seg_name + '_logit_interp')
+    return aux_logit_interp
+def conv1d(x, output_channels, name_scope, bias_attr=False):
+    '''
+    x:B, C, N
+    reshape to 4D --> conv2d --> reshape to 3D
+    '''
+    B, C, N = x.shape
+    with scope(name_scope):
+        x = fluid.layers.reshape(x, shape=[B, C, N, 1])
+        if bias_attr:
+            x = conv(x, output_channels, filter_size=1, name=name_scope, bias_attr=bias_attr)
+        else: 
+            x = conv(x, output_channels, filter_size=1, name=name_scope)
+        x = fluid.layers.reshape(x, shape=[B, C, N])
+    return x
--- a/PaddleCV/Research/SemSegPaddle/src/models/model_builder.py
+++ b/PaddleCV/Research/SemSegPaddle/src/models/model_builder.py
+import sys
+import struct
+import importlib
+import paddle.fluid as fluid
+import numpy as np
+from paddle.fluid.proto.framework_pb2 import VarType
+from src.utils import solver
+from src.utils.config import cfg
+from src.utils.loss import multi_softmax_with_loss, multi_dice_loss, multi_bce_loss
+class ModelPhase(object):
+    """
+    Standard name for model phase in PaddleSeg
+    The following standard keys are defined:
+    * `TRAIN`: training mode.
+    * `EVAL`: testing/evaluation mode.
+    * `PREDICT`: prediction/inference mode.
+    * `VISUAL` : visualization mode
+    """
+    TRAIN = 'train'
+    EVAL = 'eval'
+    PREDICT = 'predict'
+    VISUAL = 'visual'
+    @staticmethod
+    def is_train(phase):
+        return phase == ModelPhase.TRAIN
+    @staticmethod
+    def is_predict(phase):
+        return phase == ModelPhase.PREDICT
+    @staticmethod
+    def is_eval(phase):
+        return phase == ModelPhase.EVAL
+    @staticmethod
+    def is_visual(phase):
+        return phase == ModelPhase.VISUAL
+    @staticmethod
+    def is_valid_phase(phase):
+        """ Check valid phase """
+        if ModelPhase.is_train(phase) or ModelPhase.is_predict(phase) \
+                or ModelPhase.is_eval(phase) or ModelPhase.is_visual(phase):
+            return True
+        return False
+def map_model_name(model_name):
+    name_dict = {
+        "deeplabv3": "deeplabv3.deeplabv3",
+        "pspnet": "pspnet.pspnet",
+        "glore": "glore.glore",
+    }
+    if model_name in name_dict.keys():
+        return name_dict[model_name]
+    else:
+        raise Exception(
+            "unknow model name, only support unet, deeplabv3p, icnet")
+def get_func(func_name):
+    """Helper to return a function object by name. func_name must identify a
+    function in this module or the path to a function relative to the base
+    'modeling' module.
+    """
+    print("func_name: ", func_name)
+    if func_name == '':
+        return None
+    try:
+        parts = func_name.split('.')
+        # Refers to a function in this module
+        if len(parts) == 1:
+            return globals()[parts[0]]
+        # Otherwise, assume we're referencing a module under modeling
+        module_name = 'src.models.' + '.'.join(parts[:-1])
+        print("module_name: ", module_name)
+        # method 1
+        #from src.models.modeling import pspnet
+        # method 2
+        module = importlib.import_module(module_name)
+        return getattr(module, parts[-1])
+    except Exception:
+        print('Failed to find function: {}'.format(func_name))
+    return module
+def softmax(logit):
+    logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+    logit = fluid.layers.softmax(logit)
+    logit = fluid.layers.transpose(logit, [0, 3, 1, 2])
+    return logit
+def sigmoid_to_softmax(logit):
+    """
+    one channel to two channel
+    """
+    logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+    logit = fluid.layers.sigmoid(logit)
+    logit_back = 1 - logit
+    logit = fluid.layers.concat([logit_back, logit], axis=-1)
+    logit = fluid.layers.transpose(logit, [0, 3, 1, 2])
+    return logit
+def build_model(main_prog, start_prog, phase=ModelPhase.TRAIN):
+    if not ModelPhase.is_valid_phase(phase):
+        raise ValueError("ModelPhase {} is not valid!".format(phase))
+    if ModelPhase.is_train(phase):
+        width = cfg.DATAAUG.CROP_SIZE
+        height = cfg.DATAAUG.CROP_SIZE
+    else:
+        width = cfg.TEST.CROP_SIZE
+        height = cfg.TEST.CROP_SIZE
+    image_shape = [cfg.DATASET.DATA_DIM, height, width]
+    grt_shape = [1, height, width]
+    class_num = cfg.DATASET.NUM_CLASSES
+    with fluid.program_guard(main_prog, start_prog):
+        with fluid.unique_name.guard():
+            # 在导出模型的时候，增加图像标准化预处理,减小预测部署时图像的处理流程
+            # 预测部署时只须对输入图像增加batch_size维度即可
+            if ModelPhase.is_predict(phase):
+                origin_image = fluid.layers.data(name='image', 
+                        shape=[ -1, 1, 1, cfg.DATASET.DATA_DIM], 
+                        dtype='float32', 
+                        append_batch_size=False)
+                image = fluid.layers.transpose(origin_image, [0, 3, 1, 2])
+                origin_shape = fluid.layers.shape(image)[-2:]
+                mean = np.array(cfg.MEAN).reshape(1, len(cfg.MEAN), 1, 1)
+                mean = fluid.layers.assign(mean.astype('float32'))
+                std = np.array(cfg.STD).reshape(1, len(cfg.STD), 1, 1)
+                std = fluid.layers.assign(std.astype('float32'))
+                image = (image/255 - mean)/std
+                image = fluid.layers.resize_bilinear(image, 
+                        out_shape=[height, width], align_corners=False, align_mode=0)
+            else:
+                image = fluid.layers.data( name='image', shape=image_shape, dtype='float32')
+            label = fluid.layers.data( name='label', shape=grt_shape, dtype='int32')
+            mask = fluid.layers.data( name='mask', shape=grt_shape, dtype='int32')
+            # use PyReader when doing traning and evaluation
+            if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
+                iterable = True if ModelPhase.is_eval(phase) else False
+                print("iterable: ", iterable)
+                py_reader = fluid.io.PyReader(
+                    feed_list=[image, label, mask],
+                    capacity=cfg.DATALOADER.BUF_SIZE,
+                    iterable=iterable,
+                    use_double_buffer=True,
+                    return_list=False)
+            model_name = map_model_name(cfg.MODEL.MODEL_NAME)
+            model_func = get_func("modeling." + model_name)
+            loss_type = cfg.SOLVER.LOSS
+            if not isinstance(loss_type, list):
+                loss_type = list(loss_type)
+            # dice_loss或bce_loss只适用两类分割中
+            if class_num > 2 and (("dice_loss" in loss_type) or ("bce_loss" in loss_type)):
+                raise Exception("dice loss and bce loss is only applicable to binary classfication")
+            # 在两类分割情况下，当loss函数选择dice_loss或bce_loss的时候，最后logit输出通道数设置为1
+            if ("dice_loss" in loss_type) or ("bce_loss" in loss_type):
+                class_num = 1
+                if "softmax_loss" in loss_type:
+                    raise Exception("softmax loss can not combine with dice loss or bce loss")
+            logits = model_func(image, class_num)
+            # 根据选择的loss函数计算相应的损失函数
+            if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
+                loss_valid = False
+                avg_loss_list = []
+                valid_loss = []
+                if "softmax_loss" in loss_type: 
+                    avg_loss_list.append(multi_softmax_with_loss(logits,
+                        label, mask,class_num))
+                    loss_valid = True
+                    valid_loss.append("softmax_loss")
+                if "dice_loss" in loss_type:
+                    avg_loss_list.append(multi_dice_loss(logits, label, mask))
+                    loss_valid = True
+                    valid_loss.append("dice_loss")
+                if "bce_loss" in loss_type:
+                    avg_loss_list.append(multi_bce_loss(logits, label, mask))
+                    loss_valid = True
+                    valid_loss.append("bce_loss")
+                if not loss_valid:
+                    raise Exception("SOLVER.LOSS: {} is set wrong. it should "
+                            "include one of (softmax_loss, bce_loss, dice_loss) at least"
+                            " example: ['softmax_loss'], ['dice_loss'], ['bce_loss', 'dice_loss']".format(cfg.SOLVER.LOSS))
+                invalid_loss = [x for x in loss_type if x not in valid_loss]
+                if len(invalid_loss) > 0:
+                    print("Warning: the loss {} you set is invalid. it will not be included in loss computed.".format(invalid_loss))
+                avg_loss = 0
+                for i in range(0, len(avg_loss_list)):
+                    avg_loss += avg_loss_list[i]
+            #get pred result in original size
+            if isinstance(logits, tuple):
+                logit = logits[0]
+            else:
+                logit = logits
+            if logit.shape[2:] != label.shape[2:]:
+                logit = fluid.layers.resize_bilinear(logit, label.shape[2:])
+            # return image input and logit output for inference graph prune
+            if ModelPhase.is_predict(phase):
+                # 两类分割中，使用dice_loss或bce_loss返回的logit为单通道，进行到两通道的变换
+                if class_num == 1:
+                    logit = sigmoid_to_softmax(logit)
+                else:
+                    logit = softmax(logit)
+                logit = fluid.layers.resize_bilinear(logit, out_shape=origin_shape, align_corners=False, align_mode=0)
+                logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+                logit = fluid.layers.argmax(logit, axis=3)
+                return origin_image, logit
+            if class_num == 1:
+                out = sigmoid_to_softmax(logit)
+                out = fluid.layers.transpose(out, [0, 2, 3, 1])
+            else:
+                out = fluid.layers.transpose(logit, [0, 2, 3, 1])
+            pred = fluid.layers.argmax(out, axis=3)
+            pred = fluid.layers.unsqueeze(pred, axes=[3])
+            if ModelPhase.is_visual(phase):
+                if class_num == 1:
+                    logit = sigmoid_to_softmax(logit)
+                else:
+                    logit = softmax(logit)
+                return pred, logit
+            if ModelPhase.is_eval(phase):
+                out = fluid.layers.transpose(out, [0, 3, 1, 2]) #unnormalized probability
+                #return py_reader, avg_loss, pred, label, mask
+                return py_reader, avg_loss, out, label, mask
+            if ModelPhase.is_train(phase):
+                optimizer = solver.Solver(main_prog, start_prog)
+                decayed_lr = optimizer.optimise(avg_loss)
+                return py_reader, avg_loss, decayed_lr, pred, label, mask
+def to_int(string, dest="I"):
+    return struct.unpack(dest, string)[0]
+def parse_shape_from_file(filename):
+    with open(filename, "rb") as file:
+        version = file.read(4)
+        lod_level = to_int(file.read(8), dest="Q")
+        for i in range(lod_level):
+            _size = to_int(file.read(8), dest="Q")
+            _ = file.read(_size)
+        version = file.read(4)
+        tensor_desc_size = to_int(file.read(4))
+        tensor_desc = VarType.TensorDesc()
+        tensor_desc.ParseFromString(file.read(tensor_desc_size))
+    return tuple(tensor_desc.dims)
--- a/PaddleCV/Research/SemSegPaddle/src/models/modeling/__init__.py
+++ b/PaddleCV/Research/SemSegPaddle/src/models/modeling/__init__.py
--- a/PaddleCV/Research/SemSegPaddle/src/models/modeling/deeplabv3.py
+++ b/PaddleCV/Research/SemSegPaddle/src/models/modeling/deeplabv3.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import contextlib
+import paddle
+import paddle.fluid as fluid
+from src.utils.config import cfg
+from src.models.libs.model_libs import scope, name_scope
+from src.models.libs.model_libs import bn, bn_relu, relu, FCNHead
+from src.models.libs.model_libs import conv
+from src.models.libs.model_libs import separate_conv
+from src.models.backbone.mobilenet_v2 import MobileNetV2 as mobilenet_backbone
+from src.models.backbone.xception import Xception as xception_backbone
+from src.models.backbone.resnet import ResNet as resnet_backbone
+from src.models.backbone.hrnet import HRNet as hrnet_backbone
+def ASPPHead(input, mid_channel, num_classes, output_shape):
+    # Arch of Atorus Spatial Pyramid Pooling Module:                                                 
+    #
+    #          |----> ImagePool + Conv_1x1 + BN + ReLU + bilinear_interp-------->|————————|
+    #          |                                                                 |        |
+    #          |---->           Conv_1x1 + BN + ReLU                    -------->|        | 
+    #          |                                                                 |        |
+    #   x----->|---->        AtrousConv_3x3 + BN + ReLU                 -------->| concat |----> Conv_1x1 + BN + ReLU -->Dropout --> Conv_1x1 
+    #          |                                                                 |        |
+    #          |---->        AtrousConv_3x3 + BN + ReLU                 -------->|        |
+    #          |                                                                 |        |
+    #          |---->        AtorusConv_3x3 + BN + ReLU                 -------->|________|
+    #                                                                                    
+    #
+    if cfg.MODEL.BACKBONE_OUTPUT_STRIDE == 16:
+        aspp_ratios = [6, 12, 18]
+    elif cfg.MODEL.BACKBONE_OUTPUT_STRIDE == 8:
+        aspp_ratios = [12, 24, 36]
+    else:
+        raise Exception("deeplab only support stride 8 or 16")
+    param_attr = fluid.ParamAttr(name=name_scope + 'weights', regularizer=None,
+                                 initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
+    with scope('ASPPHead'):
+        with scope("image_pool"):
+            image_avg = fluid.layers.reduce_mean( input, [2, 3], keep_dim=True)
+            image_avg = bn_relu( conv( image_avg, mid_channel, 1, 1, groups=1, padding=0, param_attr=param_attr))
+            image_avg = fluid.layers.resize_bilinear(image_avg, input.shape[2:])
+        with scope("aspp0"):
+            aspp0 = bn_relu( conv( input, mid_channel, 1, 1, groups=1, padding=0, param_attr=param_attr))
+        with scope("aspp1"):
+            if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
+                aspp1 = separate_conv( input, mid_channel, 1, 3, dilation=aspp_ratios[0], act=relu)
+            else:
+                aspp1 = bn_relu( conv( input, mid_channel, stride=1, filter_size=3, dilation=aspp_ratios[0], 
+                                       padding=aspp_ratios[0], param_attr=param_attr))
+        with scope("aspp2"):
+            if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
+                aspp2 = separate_conv( input, mid_channel, 1, 3, dilation=aspp_ratios[1], act=relu)
+            else:
+                aspp2 = bn_relu( conv( input, mid_channel, stride=1, filter_size=3, dilation=aspp_ratios[1], 
+                                       padding=aspp_ratios[1], param_attr=param_attr))
+        with scope("aspp3"):
+            if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
+                aspp3 = separate_conv( input, mid_channel, 1, 3, dilation=aspp_ratios[2], act=relu)
+            else:
+                aspp3 = bn_relu( conv( input, mid_channel, stride=1, filter_size=3, dilation=aspp_ratios[2],
+                                       padding=aspp_ratios[2], param_attr=param_attr))
+        with scope("concat"):
+            feat = fluid.layers.concat([image_avg, aspp0, aspp1, aspp2, aspp3], axis=1)
+            feat = bn_relu( conv( feat, 2*mid_channel, 1, 1, groups=1, padding=0, param_attr=param_attr))
+            feat = fluid.layers.dropout(feat, 0.1)
+    # Conv_1x1 + bilinear_upsample
+    seg_name = "logit"
+    with scope(seg_name):
+        param_attr = fluid.ParamAttr( name= seg_name+'_weights',
+                                      regularizer=fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.0),
+                                      initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
+        logit = conv(feat, num_classes, filter_size=1, param_attr=param_attr, bias_attr=True, name=seg_name+'_conv')
+        logit_interp = fluid.layers.resize_bilinear(logit, out_shape=output_shape, name=seg_name+'_interp')
+    return logit_interp
+def mobilenetv2(input):
+    # Backbone: mobilenetv2结构配置
+    # DEPTH_MULTIPLIER: mobilenetv2的scale设置，默认1.0
+    # OUTPUT_STRIDE：下采样倍数
+    # end_points: mobilenetv2的block数
+    # decode_point: 从mobilenetv2中引出分支所在block数, 作为decoder输入
+    scale = cfg.MODEL.DEEPLABv3.DEPTH_MULTIPLIER
+    output_stride = cfg.MODEL.DEEPLABv3.OUTPUT_STRIDE
+    model = mobilenet_backbone(scale=scale, output_stride=output_stride)
+    end_points = 18
+    decode_point = 4
+    data, decode_shortcuts = model.net(
+        input, end_points=end_points, decode_points=decode_point)
+    decode_shortcut = decode_shortcuts[decode_point]
+    return data, decode_shortcut
+def xception(input):
+    # Backbone: Xception结构配置, xception_65, xception_41, xception_71三种可选
+    # decode_point: 从Xception中引出分支所在block数，作为decoder输入
+    # end_point：Xception的block数
+    cfg.MODEL.DEFAULT_EPSILON = 1e-3
+    model = xception_backbone(cfg.MODEL.BACKBONE)
+    backbone = cfg.MODEL.BACKBONE
+    output_stride = cfg.MODEL.DEEPLABv3.OUTPUT_STRIDE
+    if '65' in backbone:
+        decode_point = 2
+        end_points = 21
+    if '41' in backbone:
+        decode_point = 2
+        end_points = 13
+    if '71' in backbone:
+        decode_point = 3
+        end_points = 23
+    data, decode_shortcuts = model.net(
+        input,
+        output_stride=output_stride,
+        end_points=end_points,
+        decode_points=decode_point)
+    decode_shortcut = decode_shortcuts[decode_point]
+    return data, decode_shortcut
+def resnet(input):
+    # dilation_dict: 
+    #     key: stage num
+    #     value: dilation factor
+    scale = cfg.MODEL.DEEPLABv3.DEPTH_MULTIPLIER
+    layers = cfg.MODEL.BACKBONE_LAYERS
+    end_points = layers - 1
+    decode_points = [91,100 ]  # [10, 22, 91, 100], for obtaining feature maps of res2,res3, res4, and res5
+    dilation_dict = {2:2, 3:4}
+    model = resnet_backbone(layers, scale)
+    res5, feat_dict = model.net(input,
+                                end_points=end_points,
+                                dilation_dict=dilation_dict,
+                                decode_points=decode_points)
+    return res5, feat_dict
+def hrnet(input):
+    model = hrnet_backbone(stride=4, seg_flag=True)
+    feats = model.net(input)
+    return feats
+def deeplabv3(input, num_classes):
+    """
+       Chen, Liang-Chieh, et al. "Rethinking atrous convolution for semantic image segmentation", in arXiv:1706:05587
+    """
+    if 'xception' in cfg.MODEL.BACKBONE:
+        data, decode_shortcut = xception(input)
+    elif 'mobilenet' in cfg.MODEL.BACKBONE:
+        data, decode_shortcut = mobilenetv2(input)
+    elif 'resnet' in cfg.MODEL.BACKBONE:
+        res5, feat_dict = resnet(input)
+        res4 = feat_dict[91]
+    elif 'hrnet' in cfg.MODEL.BACKBONE:
+        res5 = hrnet(input)
+    else:
+        raise Exception("deeplabv3 only support xception, mobilenet, resnet, and hrnet backbone")
+    logit = ASPPHead(res5, mid_channel= 256, num_classes= num_classes, output_shape= input.shape[2:])
+    if cfg.MODEL.DEEPLABv3.AuxHead:
+        aux_logit = FCNHead(res4, 256, num_classes, input.shape[2:])
+        return logit, aux_logit
+    return logit
--- a/PaddleCV/Research/SemSegPaddle/src/models/modeling/glore.py
+++ b/PaddleCV/Research/SemSegPaddle/src/models/modeling/glore.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import sys
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+from src.models.libs.model_libs import scope, name_scope
+from src.models.libs.model_libs import avg_pool, conv, bn, bn_zero, conv1d, FCNHead
+from src.models.backbone.resnet import ResNet as resnet_backbone
+from src.utils.config import cfg
+def get_logit_interp(input, num_classes, out_shape, name="logit"):
+    # 1x1_Conv
+    param_attr = fluid.ParamAttr(
+        name= name + 'weights',
+        regularizer= fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.0),
+        initializer= fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
+    with scope(name):
+        logit = conv(input, num_classes, filter_size=1, param_attr=param_attr, bias_attr=True, name=name+'_conv')
+        logit_interp = fluid.layers.resize_bilinear( logit, out_shape=out_shape, name=name+'_interp')
+    return logit_interp
+def gcn_module(name_scope, x, num_node, num_state):
+    '''
+    input: any tensor of 3D, B,C,N
+    '''
+    print(x.shape)
+    h = fluid.layers.transpose(x, perm=[0, 2, 1]) #B,C,N-->B,N,C
+    h = conv1d(h, num_node, name_scope+'_conv1d1', bias_attr=True)
+    h = fluid.layers.transpose(h, perm=[0, 2, 1]) #B,C,N
+    h = fluid.layers.elementwise_add(h, x, act='relu')
+    h = conv1d(h, num_state, name_scope+'_conv1d2', bias_attr= False)
+    return h
+def gru_module(x, num_state, num_node):
+    '''
+    Global Reasoning Unit: projection --> graph reasoning --> reverse projection
+    params:
+         x:  B x C x H x W
+         num_state: the dimension of each vertex feature
+         num_node: the number of vertet
+    output: B x C x H x W
+    feature trans:
+            B, C, H, W --> B, N, H, W -->             B, N, H*W -->B, N, C1 -->B, C1, N-->B, C1, N-->B, C1, H*W-->B, C, H, W
+                       --> B, C1,H, W -->B, C1,H*W -->B, H*W, C1
+    '''
+    # generate B
+    num_batch, C, H, W = x.shape
+    with scope('projection'):
+        B = conv(x, num_node,
+                filter_size=1,
+                bias_attr=True,
+                name='projection'+'_conv') #num_batch, node, H, W
+        B = fluid.layers.reshape(B, shape=[num_batch, num_node, H*W]) # Projection Matrix: num_batch, node, L=H*W
+    # reduce dimension
+    with scope('reduce_channel'):
+        x_reduce = conv(x, num_state,
+                filter_size=1,
+                bias_attr=True,
+                name='reduce_channel'+'_conv') #num_batch, num_state, H, W
+        x_reduce = fluid.layers.reshape(x_reduce, shape=[num_batch, num_state, H*W]) #num_batch, num_state, L
+        x_reduce = fluid.layers.transpose(x_reduce, perm=[0, 2, 1]) #num_batch, L, num_state
+    V = fluid.layers.transpose(fluid.layers.matmul(B, x_reduce), perm=[0,2,1]) #num_batch, num_state, num_node
+    #L = fluid.layers.fill_constant(shape=[1], value=H*W, dtype='float32')
+    #V = fluid.layers.elementwise_div(V, L)
+    new_V = gcn_module('gru'+'_gcn', V, num_node, num_state)
+    B = fluid.layers.reshape(B, shape= [num_batch, num_node, H*W])
+    D = fluid.layers.transpose(B, perm=[0, 2, 1])
+    Y = fluid.layers.matmul(D, fluid.layers.transpose(new_V, perm=[0, 2, 1]))
+    Y = fluid.layers.transpose(Y, perm=[0, 2, 1])
+    Y = fluid.layers.reshape(Y, shape=[num_batch, num_state, H, W])
+    with scope('extend_dim'):
+        Y = conv(Y, C, filter_size=1, bias_attr=False, name='extend_dim'+'_conv')
+        #Y = bn_zero(Y)
+        Y = bn(Y)
+    out = fluid.layers.elementwise_add(Y, x)
+    return out
+def resnet(input):
+    # end_points: end_layer of resnet backbone 
+    # dilation_dict: dilation factor for stages_key
+    scale = cfg.MODEL.GLORE.DEPTH_MULTIPLIER
+    layers = cfg.MODEL.BACKBONE_LAYERS
+    end_points = layers - 1
+    dilation_dict = {2:2, 3:4}
+    decode_points= [91, 100]
+    model = resnet_backbone(layers, scale)
+    res5, feat_dict = model.net(input,
+                                end_points=end_points,
+                                dilation_dict=dilation_dict,
+                                decode_points= decode_points)
+    return res5, feat_dict
+def glore(input, num_classes):
+    """
+    Reference:
+       Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks", In CVPR 2019
+    """
+    # Backbone: ResNet
+    res5, feat_dict = resnet(input)
+    res4= feat_dict[91]
+    # 3x3 Conv. 2048 -> 512
+    reduce_kernel=3
+    if cfg.DATASET.DATASET_NAME=='cityscapes':
+        reduce_kernel=1
+    with scope('feature'):
+        feature = conv(res5, 512, filter_size=reduce_kernel, bias_attr=False, name='feature_conv')
+        feature = bn(feature, act='relu')
+    # GRU Module
+    gru_output = gru_module(feature,  num_state= 128,  num_node = 64)
+    dropout = fluid.layers.dropout(gru_output, dropout_prob=0.1, name="dropout")
+    logit = get_logit_interp(dropout, num_classes, input.shape[2:])
+    if cfg.MODEL.GLORE.AuxHead:
+        aux_logit = FCNHead(res4, 256, num_classes, input.shape[2:])
+        return logit, aux_logit
+    return logit
--- a/PaddleCV/Research/SemSegPaddle/src/models/modeling/glore.py.bak
+++ b/PaddleCV/Research/SemSegPaddle/src/models/modeling/glore.py.bak
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import sys
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+from src.models.libs.model_libs import scope, name_scope
+from src.models.libs.model_libs import avg_pool, conv, bn, conv1d, FCNHead
+from src.models.backbone.resnet import ResNet as resnet_backbone
+from src.utils.config import cfg
+def get_logit_interp(input, num_classes, out_shape, name="logit"):
+    # 根据类别数决定最后一层卷积输出, 并插值回原始尺寸
+    param_attr = fluid.ParamAttr(
+        name= name + 'weights',
+        regularizer= fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.0),
+        initializer= fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
+    with scope(name):
+        logit = conv(input, num_classes, filter_size=1, param_attr=param_attr, bias_attr=True, name=name+'_conv')
+        logit_interp = fluid.layers.resize_bilinear( logit, out_shape=out_shape, name=name+'_interp')
+    return logit_interp
+def gcn_module(name_scope, x, num_node, num_state):
+    '''
+    input: any tensor of 3D, B,C,N
+    '''
+    h = fluid.layers.transpose(x, perm=[0, 2, 1]) #B,C,N-->B,N,C
+    h = conv1d(h, num_node, name_scope+'_conv1d1')
+    h = fluid.layers.transpose(h, perm=[0, 2, 1]) #B,C,N
+    h = fluid.layers.elementwise_add(h, x, act='relu')
+    h = conv1d(h, num_state, name_scope+'_conv1d2')
+    return h
+def gru_module(x, num_state, num_node):
+    '''
+    Global Reasoning Unit: projection --> graph reasoning --> reverse projection
+    params:
+         x:  B x C x H x W
+         num_state: the dimension of each vertex feature
+         num_node: the number of vertet
+    output: B x C x H x W
+    feature trans:
+            B, C, H, W --> B, N, H, W -->             B, N, H*W -->B, N, C1 -->B, C1, N-->B, C1, N-->B, C1, H*W-->B, C, H, W
+                       --> B, C1,H, W -->B, C1,H*W -->B, H*W, C1
+    '''
+    num_batch, C, H, W = x.shape
+    with scope('projection'):
+        B = conv(x, num_node,
+                filter_size=1,
+                bias_attr=True,
+                name='projection'+'_conv') #num_batch, node, H, W
+        B = fluid.layers.reshape(B, shape=[num_batch, num_node, H*W]) #num_batch, node, L=H*W
+    with scope('reduce_channel'):
+        x_reduce = conv(x, num_state,
+                filter_size=1,
+                bias_attr=True,
+                name='reduce_channel'+'_conv') #num_batch, num_state, H, W
+        x_reduce = fluid.layers.reshape(x_reduce, shape=[num_batch, num_state, H*W]) #num_batch, num_state, L
+        x_reduce = fluid.layers.transpose(x_reduce, perm=[0, 2, 1]) #num_batch, L, num_state
+    V = fluid.layers.transpose(fluid.layers.matmul(B, x_reduce), perm=[0,2,1]) #num_batch, num_state, num_node
+    new_V = gcn_module('gru'+'_gcn', V, num_node, num_state)
+    D = fluid.layers.transpose(B, perm=[0, 2, 1])
+    Y = fluid.layers.matmul(D, fluid.layers.transpose(new_V, perm=[0, 2, 1]))
+    Y = fluid.layers.transpose(Y, perm=[0, 2, 1])
+    Y = fluid.layers.reshape(Y, shape=[num_batch, num_state, H, W])
+    with scope('extand_dim'):
+        Y = conv(Y, C, filter_size=1, bias_attr=True, name='extend_dim'+'_conv')
+        Y = bn(Y)
+    out = fluid.layers.elementwise_add(Y, x)
+    return out
+def resnet(input):
+    # PSPNET backbone: resnet, 默认resnet50
+    # end_points: resnet终止层数
+    # dilation_dict: resnet block数及对应的膨胀卷积尺度
+    scale = cfg.MODEL.GLORE.DEPTH_MULTIPLIER
+    layers = cfg.MODEL.BACKBONE_LAYERS
+    end_points = layers - 1
+    dilation_dict = {2:2, 3:4}
+    decode_points= [91, 100]
+    model = resnet_backbone(layers, scale)
+    res5, feat_dict = model.net(input,
+                                end_points=end_points,
+                                dilation_dict=dilation_dict,
+                                decode_points= decode_points)
+    return res5, feat_dict
+def glore(input, num_classes):
+    """
+    Reference:
+       Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks", In CVPR 2019
+    """
+    # Backbone: ResNet
+    res5, feat_dict = resnet(input)
+    res4= feat_dict[91]
+    # Conv_1x1 for reduce dimension
+    with scope('feature'):
+        feature = conv(res5, 512, filter_size=1, bias_attr=True, name='feature_conv')
+        feature = bn(feature, act='relu')
+    # GRU Module
+    gru_output = gru_module(feature, 128, 64)
+    dropout = fluid.layers.dropout(gru_output, dropout_prob=0.1, name="dropout")
+    # 根据类别数决定最后一层卷积输出, 并插值回原始尺寸
+    logit = get_logit_interp(dropout, num_classes, input.shape[2:])
+    if cfg.MODEL.GLORE.AuxHead:
+        aux_logit = FCNHead(res4, 256, num_classes, input.shape[2:])
+        return logit, aux_logit
+    return logit
--- a/PaddleCV/Research/SemSegPaddle/src/models/modeling/glore.py.bak1
+++ b/PaddleCV/Research/SemSegPaddle/src/models/modeling/glore.py.bak1
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import sys
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+from src.models.libs.model_libs import scope, name_scope
+from src.models.libs.model_libs import avg_pool, conv, bn, bn_zero, conv1d, FCNHead
+from src.models.backbone.resnet import ResNet as resnet_backbone
+from src.utils.config import cfg
+def get_logit_interp(input, num_classes, out_shape, name="logit"):
+    # 1x1_Conv
+    param_attr = fluid.ParamAttr(
+        name= name + 'weights',
+        regularizer= fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.0),
+        initializer= fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
+    with scope(name):
+        logit = conv(input, num_classes, filter_size=1, param_attr=param_attr, bias_attr=True, name=name+'_conv')
+        logit_interp = fluid.layers.resize_bilinear( logit, out_shape=out_shape, name=name+'_interp')
+    return logit_interp
+def gcn_module(name_scope, x, num_node, num_state):
+    '''
+    input: any tensor of 3D, B,C,N
+    '''
+    print(x.shape)
+    h = fluid.layers.transpose(x, perm=[0, 2, 1]) #B,C,N-->B,N,C
+    h = conv1d(h, num_node, name_scope+'_conv1d1')
+    h = fluid.layers.transpose(h, perm=[0, 2, 1]) #B,C,N
+    h = fluid.layers.elementwise_add(h, x, act='relu')
+    h = conv1d(h, num_state, name_scope+'_conv1d2')
+    return h
+def gru_module(x, num_state, num_node):
+    '''
+    Global Reasoning Unit: projection --> graph reasoning --> reverse projection
+    params:
+         x:  B x C x H x W
+         num_state: the dimension of each vertex feature
+         num_node: the number of vertet
+    output: B x C x H x W
+    feature trans:
+            B, C, H, W --> B, N, H, W -->             B, N, H*W -->B, N, C1 -->B, C1, N-->B, C1, N-->B, C1, H*W-->B, C, H, W
+                       --> B, C1,H, W -->B, C1,H*W -->B, H*W, C1
+    '''
+    num_batch, C, H, W = x.shape
+    with scope('projection'):
+        B = conv(x, num_node,
+                filter_size=1,
+                bias_attr=True,
+                name='projection'+'_conv') #num_batch, node, H, W
+        B = fluid.layers.reshape(B, shape=[num_batch, num_node, H*W]) #num_batch, node, L=H*W
+    with scope('reduce_channel'):
+        x_reduce = conv(x, num_state,
+                filter_size=1,
+                bias_attr=True,
+                name='reduce_channel'+'_conv') #num_batch, num_state, H, W
+        x_reduce = fluid.layers.reshape(x_reduce, shape=[num_batch, num_state, H*W]) #num_batch, num_state, L
+        x_reduce = fluid.layers.transpose(x_reduce, perm=[0, 2, 1]) #num_batch, L, num_state
+    V = fluid.layers.transpose(fluid.layers.matmul(B, x_reduce), perm=[0,2,1]) #num_batch, num_state, num_node
+    L = fluid.layers.fill_constant(shape=[1], value=H*W, dtype='float32')
+    V = fluid.layers.elementwise_div(V, L)
+    new_V = gcn_module('gru'+'_gcn', V, num_node, num_state)
+    D = fluid.layers.transpose(B, perm=[0, 2, 1])
+    Y = fluid.layers.matmul(D, fluid.layers.transpose(new_V, perm=[0, 2, 1]))
+    Y = fluid.layers.transpose(Y, perm=[0, 2, 1])
+    Y = fluid.layers.reshape(Y, shape=[num_batch, num_state, H, W])
+    with scope('extend_dim'):
+        Y = conv(Y, C, filter_size=1, bias_attr=True, name='extend_dim'+'_conv')
+        #Y = bn_zero(Y)
+        Y = bn(Y)
+    out = fluid.layers.elementwise_add(Y, x)
+    return out
+def resnet(input):
+    # end_points: end_layer of resnet backbone 
+    # dilation_dict: dilation factor for stages_key
+    scale = cfg.MODEL.GLORE.DEPTH_MULTIPLIER
+    layers = cfg.MODEL.BACKBONE_LAYERS
+    end_points = layers - 1
+    dilation_dict = {2:2, 3:4}
+    decode_points= [91, 100]
+    model = resnet_backbone(layers, scale)
+    res5, feat_dict = model.net(input,
+                                end_points=end_points,
+                                dilation_dict=dilation_dict,
+                                decode_points= decode_points)
+    return res5, feat_dict
+def glore(input, num_classes):
+    """
+    Reference:
+       Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks", In CVPR 2019
+    """
+    # Backbone: ResNet
+    res5, feat_dict = resnet(input)
+    res4= feat_dict[91]
+    # Conv_1x1 for reduce dimension
+    with scope('feature'):
+        feature = conv(res5, 512, filter_size=3, bias_attr=True, name='feature_conv')
+        feature = bn(feature, act='relu')
+    # GRU Module
+    gru_output = gru_module(feature, 128, 64)
+    dropout = fluid.layers.dropout(gru_output, dropout_prob=0.1, name="dropout")
+    logit = get_logit_interp(dropout, num_classes, input.shape[2:])
+    if cfg.MODEL.GLORE.AuxHead:
+        aux_logit = FCNHead(res4, 256, num_classes, input.shape[2:])
+        return logit, aux_logit
+    return logit
--- a/PaddleCV/Research/SemSegPaddle/src/models/modeling/pspnet.py
+++ b/PaddleCV/Research/SemSegPaddle/src/models/modeling/pspnet.py
+from __future__ import division
+from __future__ import print_function
+import sys
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+from src.models.libs.model_libs import scope, name_scope
+from src.models.libs.model_libs import avg_pool, conv, bn, FCNHead
+from src.models.backbone.resnet import ResNet as resnet_backbone
+from src.models.backbone.hrnet import HRNet as hrnet_backbone
+from src.utils.config import cfg
+def PSPHead(input, out_features, num_classes, output_shape):
+    # Arch of Pyramid Scene Parsing Module:                                                 
+    #
+    #          |----> Pool_1x1 + Conv_1x1 + BN + ReLU + bilinear_interp-------->|————————|
+    #          |                                                                |        |
+    #          |----> Pool_2x2 + Conv_1x1 + BN + ReLU + bilinear_interp-------->|        | 
+    # x ------>|                                                                | concat |----> Conv_3x3 + BN + ReLU -->Dropout --> Conv_1x1
+    #     |    |----> Pool_3x3 + Conv_1x1 + BN + ReLU + bilinear_interp-------->|        | 
+    #     |    |                                                                |        |
+    #     |    |----> Pool_6x6 + Conv_1x1 + BN + ReLU + bilinear_interp-------->|________|
+    #     |                                                                              ^
+    #     |——————————————————————————————————————————————————————————————————————————————|
+    #
+    cat_layers = []
+    sizes = (1,2,3,6)
+    # 4 parallel pooling branches
+    for size in sizes:
+        psp_name = "psp" + str(size)
+        with scope(psp_name):
+            pool_feat = fluid.layers.adaptive_pool2d(input, pool_size=[size, size], pool_type='avg', 
+                                                name=psp_name+'_adapool')
+            conv_feat = conv(pool_feat, out_features, filter_size=1, bias_attr=True, 
+                        name= psp_name + '_conv')
+            bn_feat = bn(conv_feat, act='relu')
+            interp = fluid.layers.resize_bilinear(bn_feat, out_shape=input.shape[2:], name=psp_name+'_interp') 
+        cat_layers.append(interp)
+    cat_layers = [input] + cat_layers[::-1]
+    cat = fluid.layers.concat(cat_layers, axis=1, name='psp_cat')
+    # Conv_3x3 + BN + ReLU
+    psp_end_name = "psp_end"
+    with scope(psp_end_name):
+        data = conv(cat, out_features, filter_size=3, padding=1, bias_attr=True, name=psp_end_name)
+        out = bn(data, act='relu')
+    # Dropout
+    dropout_out = fluid.layers.dropout(out, dropout_prob=0.1, name="dropout")
+    # Conv_1x1 + bilinear_upsample
+    seg_name = "logit"
+    with scope(seg_name):
+        param_attr = fluid.ParamAttr( name= seg_name+'_weights',
+                                      regularizer=fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.0),
+                                      initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
+        logit = conv(dropout_out, num_classes, filter_size=1, param_attr=param_attr, bias_attr=True, name=seg_name+'_conv')
+        logit_interp = fluid.layers.resize_bilinear(logit, out_shape=output_shape, name=seg_name+'_interp') 
+    return logit_interp
+def resnet(input):
+    # dilation_dict: 
+    #     key: stage num
+    #     value: dilation factor
+    scale = cfg.MODEL.PSPNET.DEPTH_MULTIPLIER
+    layers = cfg.MODEL.BACKBONE_LAYERS
+    end_points = layers - 1
+    decode_points = [91, 100]  # [10, 22, 91, 100], for obtaining feature maps of res2,res3, res4, and res5
+    dilation_dict = {2:2, 3:4}
+    model = resnet_backbone(layers, scale)
+    res5, feat_dict = model.net(input, 
+                                end_points=end_points, 
+                                dilation_dict=dilation_dict,
+                                decode_points=decode_points)
+    return res5, feat_dict
+def hrnet(input):
+    model = hrnet_backbone(stride=4, seg_flag=True)
+    feats = model.net(input)
+    return feats
+def pspnet(input, num_classes):
+    """
+    Reference: 
+        Zhao, Hengshuang, et al. "Pyramid scene parsing network.", In CVPR 2017
+    """
+    if 'resnet' in cfg.MODEL.BACKBONE:
+        res5, feat_dict = resnet(input)
+        res4 = feat_dict[91]
+    elif 'hrnet' in cfg.MODEL.BACKBONE:
+        res5 = hrnet(input)
+    else:
+        raise Exception("pspnet only support resnet and hrnet backbone")
+    logit = PSPHead(res5, 512, num_classes, input.shape[2:])
+    if cfg.MODEL.PSPNET.AuxHead:
+        aux_logit = FCNHead(res4, 256, num_classes, input.shape[2:])
+        return logit, aux_logit
+    return logit
--- a/PaddleCV/Research/SemSegPaddle/src/utils/__init__.py
+++ b/PaddleCV/Research/SemSegPaddle/src/utils/__init__.py
--- a/PaddleCV/Research/SemSegPaddle/src/utils/collect.py
+++ b/PaddleCV/Research/SemSegPaddle/src/utils/collect.py
+#   Copyright (c) 2019  PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""A simple attribute dictionary used for representing configuration options."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+import copy
+import codecs
+from ast import literal_eval
+import yaml
+import six
+class SegConfig(dict):
+    def __init__(self, *args, **kwargs):
+        super(SegConfig, self).__init__(*args, **kwargs)
+        self.immutable = False
+    def __setattr__(self, key, value, create_if_not_exist=True):
+        if key in ["immutable"]:
+            self.__dict__[key] = value
+            return
+        t = self
+        keylist = key.split(".")
+        for k in keylist[:-1]:
+            t = t.__getattr__(k, create_if_not_exist)
+        t.__getattr__(keylist[-1], create_if_not_exist)
+        t[keylist[-1]] = value
+    def __getattr__(self, key, create_if_not_exist=True):
+        if key in ["immutable"]:
+            return self.__dict__[key]
+        if not key in self:
+            if not create_if_not_exist:
+                raise KeyError
+            self[key] = SegConfig()
+        return self[key]
+    def __setitem__(self, key, value):
+        #
+        if self.immutable:
+            raise AttributeError(
+                'Attempted to set "{}" to "{}", but SegConfig is immutable'.
+                format(key, value))
+        #
+        if isinstance(value, six.string_types):
+            try:
+                value = literal_eval(value)
+            except ValueError:
+                pass
+            except SyntaxError:
+                pass
+        super(SegConfig, self).__setitem__(key, value)
+    def update_from_segconfig(self, other):
+        if isinstance(other, dict):
+            other = SegConfig(other)
+        assert isinstance(other, SegConfig)
+        diclist = [("", other)]
+        while len(diclist):
+            prefix, tdic = diclist[0]
+            diclist = diclist[1:]
+            for key, value in tdic.items():
+                key = "{}.{}".format(prefix, key) if prefix else key
+                if isinstance(value, dict):
+                    diclist.append((key, value))
+                    continue
+                try:
+                    self.__setattr__(key, value, create_if_not_exist=False)
+                except KeyError:
+                    raise KeyError('Non-existent config key: {}'.format(key))
+    def check_and_infer(self):
+        if self.DATASET.IMAGE_TYPE in ['rgb', 'gray']:
+            self.DATASET.DATA_DIM = 3
+        elif self.DATASET.IMAGE_TYPE in ['rgba']:
+            self.DATASET.DATA_DIM = 4
+        else:
+            raise KeyError(
+                'DATASET.IMAGE_TYPE config error, only support `rgb`, `gray` and `rgba`'
+            )
+        if self.MEAN is not None:
+            self.DATASET.PADDING_VALUE = [x*255.0 for x in self.MEAN]
+        """
+        if not self.TRAIN_CROP_SIZE:
+            raise ValueError(
+                'TRAIN_CROP_SIZE is empty! Please set a pair of values in format (width, height)'
+            )
+        if not self.EVAL_CROP_SIZE:
+            raise ValueError(
+                'EVAL_CROP_SIZE is empty! Please set a pair of values in format (width, height)'
+            )
+        """
+        # Ensure file list is use UTF-8 encoding
+        train_sets = codecs.open(self.DATASET.TRAIN_FILE_LIST, 'r', 'utf-8').readlines()
+        val_sets = codecs.open(self.DATASET.VAL_FILE_LIST, 'r', 'utf-8').readlines()
+        test_sets = codecs.open(self.DATASET.TEST_FILE_LIST, 'r', 'utf-8').readlines()
+        self.DATASET.TRAIN_TOTAL_IMAGES = len(train_sets)
+        self.DATASET.VAL_TOTAL_IMAGES = len(val_sets)
+        self.DATASET.TEST_TOTAL_IMAGES = len(test_sets)
+        if self.MODEL.MODEL_NAME == 'icnet' and \
+                len(self.MODEL.MULTI_LOSS_WEIGHT) != 3:
+            self.MODEL.MULTI_LOSS_WEIGHT = [1.0, 0.4, 0.16]
+    def update_from_list(self, config_list):
+        if len(config_list) % 2 != 0:
+            raise ValueError(
+                "Command line options config format error! Please check it: {}".
+                format(config_list))
+        for key, value in zip(config_list[0::2], config_list[1::2]):
+            try:
+                self.__setattr__(key, value, create_if_not_exist=False)
+            except KeyError:
+                raise KeyError('Non-existent config key: {}'.format(key))
+    def update_from_file(self, config_file):
+        with codecs.open(config_file, 'r', 'utf-8') as file:
+            dic = yaml.load(file, Loader=yaml.FullLoader)
+        self.update_from_segconfig(dic)
+    def set_immutable(self, immutable):
+        self.immutable = immutable
+        for value in self.values():
+            if isinstance(value, SegConfig):
+                value.set_immutable(immutable)
+    def is_immutable(self):
+        return self.immutable
--- a/PaddleCV/Research/SemSegPaddle/src/utils/config.py
+++ b/PaddleCV/Research/SemSegPaddle/src/utils/config.py
+# -*- coding: utf-8 -*-
+#   Copyright (c) 2019  PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import print_function
+from __future__ import unicode_literals
+from .collect import SegConfig
+import numpy as np
+cfg = SegConfig()
+########################## 基本配置 ###########################################
+# 均值，图像预处理减去的均值
+#cfg.MEAN = [0.5, 0.5, 0.5]
+cfg.MEAN = [0.485, 0.456, 0.406]
+# 标准差，图像预处理除以标准差·
+cfg.STD = [0.229, 0.224, 0.225]
+# 批处理大小
+cfg.TRAIN_BATCH_SIZE_PER_GPU = 2
+cfg.TRAIN_BATCH_SIZE= 8
+cfg.EVAL_BATCH_SIZE= 8
+# 多进程训练总进程数
+cfg.NUM_TRAINERS = 1
+# 多进程训练进程ID
+cfg.TRAINER_ID = 0
+########################## 数据载入配置 #######################################
+# 数据载入时的并发数, 建议值8
+cfg.DATALOADER.NUM_WORKERS = 8
+# 数据载入时缓存队列大小, 建议值256
+cfg.DATALOADER.BUF_SIZE = 256
+########################## 数据集配置 #########################################
+cfg.DATASET.DATASET_NAME = 'cityscapes'
+# 数据主目录目录
+cfg.DATASET.DATA_DIR = './data_local/cityscapes/'
+# 训练集列表
+cfg.DATASET.TRAIN_FILE_LIST = './data_local/cityscapes/train.list'
+# 训练集数量
+cfg.DATASET.TRAIN_TOTAL_IMAGES = 5
+# 验证集列表
+cfg.DATASET.VAL_FILE_LIST = './data_local/cityscapes/val.list'
+# 验证数据数量
+cfg.DATASET.VAL_TOTAL_IMAGES = 50
+# 测试数据列表
+cfg.DATASET.TEST_FILE_LIST = './data_local/cityscapes/test.list'
+# 测试数据数量
+cfg.DATASET.TEST_TOTAL_IMAGES = 1525
+# Tensorboard 可视化的数据集
+cfg.DATASET.VIS_FILE_LIST = None
+# 类别数(需包括背景类)
+cfg.DATASET.NUM_CLASSES = 19
+# 输入图像类型, 支持三通道'rgb',四通道'rgba',单通道灰度图'gray'
+cfg.DATASET.IMAGE_TYPE = 'rgb'
+# 输入图片的通道数
+cfg.DATASET.DATA_DIM = 3
+# 数据列表分割符, 默认为空格
+cfg.DATASET.SEPARATOR = '\t'
+# 忽略的像素标签值, 默认为255，一般无需改动
+cfg.DATASET.IGNORE_INDEX = 255
+# 数据增强是图像的padding值
+cfg.DATASET.PADDING_VALUE = [127.5, 127.5, 127.5]
+########################### 数据增强配置 ######################################
+cfg.DATAAUG.EXTRA = True
+cfg.DATAAUG.BASE_SIZE = 1024
+cfg.DATAAUG.CROP_SIZE = 769
+cfg.DATAAUG.RAND_SCALE_MIN = 0.75
+cfg.DATAAUG.RAND_SCALE_MAX = 2.0
+########################### 训练配置 ##########################################
+# 模型保存路径
+cfg.TRAIN.MODEL_SAVE_DIR = ''
+# 预训练模型路径
+cfg.TRAIN.PRETRAINED_MODEL_DIR = ''
+# 是否resume，继续训练
+cfg.TRAIN.RESUME_MODEL_DIR = ''
+# 是否使用多卡间同步BatchNorm均值和方差
+cfg.TRAIN.SYNC_BATCH_NORM = True
+# 模型参数保存的epoch间隔数，可用来继续训练中断的模型
+cfg.TRAIN.SNAPSHOT_EPOCH = 10
+########################### 模型优化相关配置 ##################################
+# 初始学习率
+cfg.SOLVER.LR = 0.001
+# 学习率下降方法, 支持poly piecewise cosine 三种
+cfg.SOLVER.LR_POLICY = "poly"
+# 优化算法, 支持SGD和Adam两种算法
+cfg.SOLVER.OPTIMIZER = "sgd"
+# 动量参数
+cfg.SOLVER.MOMENTUM = 0.9
+# 二阶矩估计的指数衰减率
+cfg.SOLVER.MOMENTUM2 = 0.999
+# 学习率Poly下降指数
+cfg.SOLVER.POWER = 0.9
+# step下降指数
+cfg.SOLVER.GAMMA = 0.1
+# step下降间隔
+cfg.SOLVER.DECAY_EPOCH = [10, 20]
+# 学习率权重衰减，0-1
+#cfg.SOLVER.WEIGHT_DECAY = 0.0001
+cfg.SOLVER.WEIGHT_DECAY = 0.00004
+# 训练开始epoch数，默认为1
+cfg.SOLVER.BEGIN_EPOCH = 1
+# 训练epoch数，正整数
+cfg.SOLVER.NUM_EPOCHS = 30
+# loss的选择，支持softmax_loss, bce_loss, dice_loss
+cfg.SOLVER.LOSS = ["softmax_loss"]
+# 是否开启warmup学习策略 
+cfg.SOLVER.LR_WARMUP = False 
+# warmup的迭代次数
+cfg.SOLVER.LR_WARMUP_STEPS = 2000 
+########################## 测试配置 ###########################################
+# 测试模型路径
+cfg.TEST.TEST_MODEL = ''
+cfg.TEST.BASE_SIZE = 2048
+cfg.TEST.CROP_SIZE = 769
+cfg.TEST.SLIDE_WINDOW = True
+########################## 模型通用配置 #######################################
+# 模型名称, 支持pspnet, deeplabv3, glore, ginet 
+cfg.MODEL.MODEL_NAME = ''
+# BatchNorm类型: bn、gn(group_norm)
+cfg.MODEL.DEFAULT_NORM_TYPE = 'bn'
+# 多路损失加权值
+cfg.MODEL.MULTI_LOSS_WEIGHT = [1.0, 0.4]
+# DEFAULT_NORM_TYPE为gn时group数
+cfg.MODEL.DEFAULT_GROUP_NUMBER = 32
+# 极小值, 防止分母除0溢出，一般无需改动
+cfg.MODEL.DEFAULT_EPSILON = 1e-5
+# BatchNorm动量, 一般无需改动
+cfg.MODEL.BN_MOMENTUM = 0.99
+# 是否使用FP16训练
+cfg.MODEL.FP16 = False
+# 混合精度训练需对LOSS进行scale, 默认为动态scale，静态scale可以设置为512.0
+cfg.MODEL.SCALE_LOSS = "DYNAMIC"
+# backbone network, (resnet, hrnet, xception_65, mobilenetv2)
+cfg.MODEL.BACKBONE= "resnet"
+#  backbone_layer: 101 and 50 for resnet
+cfg.MODEL.BACKBONE_LAYERS=101
+# strides= input.size / feature_maps.size
+cfg.MODEL.BACKBONE_OUTPUT_STRIDE=8
+cfg.MODEL.BACKBONE_MULTI_GRID = False
+########################## PSPNET模型配置 ######################################
+# RESNET backbone scale 设置
+cfg.MODEL.PSPNET.DEPTH_MULTIPLIER = 1
+# Aux loss 
+cfg.MODEL.PSPNET.AuxHead= True
+########################## GloRe模型配置 ######################################
+# RESNET backbone scale 设置
+cfg.MODEL.GLORE.DEPTH_MULTIPLIER = 1
+# Aux loss 
+cfg.MODEL.GLORE.AuxHead= True
+########################## DeepLabv3模型配置 ####################################
+# MobileNet v2 backbone scale 设置
+cfg.MODEL.DEEPLABv3.DEPTH_MULTIPLIER = 1.0
+# ASPP是否使用可分离卷积
+cfg.MODEL.DEEPLABv3.ASPP_WITH_SEP_CONV = True
+cfg.MODEL.DEEPLABv3.AuxHead= True
+########################## HRNET模型配置 ######################################
+# HRNET STAGE2 设置
+cfg.MODEL.HRNET.STAGE2.NUM_MODULES = 1
+cfg.MODEL.HRNET.STAGE2.NUM_CHANNELS = [40, 80]
+# HRNET STAGE3 设置
+cfg.MODEL.HRNET.STAGE3.NUM_MODULES = 4
+cfg.MODEL.HRNET.STAGE3.NUM_CHANNELS = [40, 80, 160]
+# HRNET STAGE4 设置
+cfg.MODEL.HRNET.STAGE4.NUM_MODULES = 3
+cfg.MODEL.HRNET.STAGE4.NUM_CHANNELS = [40, 80, 160, 320]
--- a/PaddleCV/Research/SemSegPaddle/src/utils/dist_utils.py
+++ b/PaddleCV/Research/SemSegPaddle/src/utils/dist_utils.py
+#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import paddle.fluid as fluid
+def nccl2_prepare(args, startup_prog, main_prog):
+    config = fluid.DistributeTranspilerConfig()
+    config.mode = "nccl2"
+    t = fluid.DistributeTranspiler(config=config)
+    envs = args.dist_env
+    t.transpile(
+        envs["trainer_id"],
+        trainers=','.join(envs["trainer_endpoints"]),
+        current_endpoint=envs["current_endpoint"],
+        startup_program=startup_prog,
+        program=main_prog)
+def pserver_prepare(args, train_prog, startup_prog):
+    config = fluid.DistributeTranspilerConfig()
+    config.slice_var_up = args.split_var
+    t = fluid.DistributeTranspiler(config=config)
+    envs = args.dist_env
+    training_role = envs["training_role"]
+    t.transpile(
+        envs["trainer_id"],
+        program=train_prog,
+        pservers=envs["pserver_endpoints"],
+        trainers=envs["num_trainers"],
+        sync_mode=not args.async_mode,
+        startup_program=startup_prog)
+    if training_role == "PSERVER":
+        pserver_program = t.get_pserver_program(envs["current_endpoint"])
+        pserver_startup_program = t.get_startup_program(
+            envs["current_endpoint"],
+            pserver_program,
+            startup_program=startup_prog)
+        return pserver_program, pserver_startup_program
+    elif training_role == "TRAINER":
+        train_program = t.get_trainer_program()
+        return train_program, startup_prog
+    else:
+        raise ValueError(
+            'PADDLE_TRAINING_ROLE environment variable must be either TRAINER or PSERVER'
+        )
+def nccl2_prepare_paddle(trainer_id, startup_prog, main_prog):
+    config = fluid.DistributeTranspilerConfig()
+    config.mode = "nccl2"
+    t = fluid.DistributeTranspiler(config=config)
+    t.transpile(
+        trainer_id,
+        trainers=os.environ.get('PADDLE_TRAINER_ENDPOINTS'),
+        current_endpoint=os.environ.get('PADDLE_CURRENT_ENDPOINT'),
+        startup_program=startup_prog,
+        program=main_prog)
+def prepare_for_multi_process(exe, build_strategy, train_prog):
+    # prepare for multi-process
+    trainer_id = int(os.environ.get('PADDLE_TRAINER_ID', 0))
+    num_trainers = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
+    if num_trainers < 2: return
+    build_strategy.num_trainers = num_trainers
+    build_strategy.trainer_id = trainer_id
+    # NOTE(zcd): use multi processes to train the model,
+    # and each process use one GPU card.
+    startup_prog = fluid.Program()
+    nccl2_prepare_paddle(trainer_id, startup_prog, train_prog)
+    # the startup_prog are run two times, but it doesn't matter.
+    exe.run(startup_prog)
--- a/PaddleCV/Research/SemSegPaddle/src/utils/fp16_utils.py
+++ b/PaddleCV/Research/SemSegPaddle/src/utils/fp16_utils.py
+import os
+from paddle import fluid
+def load_fp16_vars(executor, dirname, program):
+    load_dirname = os.path.normpath(dirname)
+    def _if_exist(var):
+        name = var.name[:-7] if var.name.endswith('.master') else var.name
+        b = os.path.exists(os.path.join(load_dirname, name))
+        if not b and isinstance(var, fluid.framework.Parameter):
+            print("===== {} not found ====".format(var.name))
+        return b
+    load_prog = fluid.Program()
+    load_block = load_prog.global_block()
+    vars = list(filter(_if_exist, program.list_vars()))
+    for var in vars:
+        new_var = fluid.io._clone_var_in_block_(load_block, var)
+        name = var.name[:-7] if var.name.endswith('.master') else var.name
+        file_path = os.path.join(load_dirname, name)
+        load_block.append_op(
+            type='load',
+            inputs={},
+            outputs={'Out': [new_var]},
+            attrs={
+                'file_path': file_path,
+                'load_as_fp16': var.dtype == fluid.core.VarDesc.VarType.FP16
+            })
+    executor.run(load_prog)
\ No newline at end of file
--- a/PaddleCV/Research/SemSegPaddle/src/utils/loss.py
+++ b/PaddleCV/Research/SemSegPaddle/src/utils/loss.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import sys
+import paddle.fluid as fluid
+import numpy as np
+import importlib
+from src.utils.config import cfg
+def softmax_with_loss(logit, label, ignore_mask=None, num_classes=2):
+    ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
+    label = fluid.layers.elementwise_min( label, fluid.layers.assign(np.array([num_classes - 1], dtype=np.int32)))
+    logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+    logit = fluid.layers.reshape(logit, [-1, num_classes])
+    label = fluid.layers.reshape(label, [-1, 1])
+    label = fluid.layers.cast(label, 'int64')
+    ignore_mask = fluid.layers.reshape(ignore_mask, [-1, 1])
+    loss, probs = fluid.layers.softmax_with_cross_entropy(
+        logit,
+        label,
+        ignore_index=cfg.DATASET.IGNORE_INDEX,
+        return_softmax=True)
+    loss = loss * ignore_mask
+    avg_loss = fluid.layers.mean(loss) / fluid.layers.mean(ignore_mask)
+    label.stop_gradient = True
+    ignore_mask.stop_gradient = True
+    return avg_loss
+# to change, how to appicate ignore index and ignore mask
+def dice_loss(logit, label, ignore_mask=None, epsilon=0.00001):
+    if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1:
+        raise Exception("dice loss is only applicable to one channel classfication")
+    ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
+    logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+    label  = fluid.layers.transpose(label, [0, 2, 3, 1])
+    label = fluid.layers.cast(label, 'int64')
+    ignore_mask = fluid.layers.transpose(ignore_mask, [0, 2, 3, 1])
+    logit = fluid.layers.sigmoid(logit)
+    logit = logit * ignore_mask
+    label = label * ignore_mask
+    reduce_dim = list(range(1, len(logit.shape)))
+    inse = fluid.layers.reduce_sum(logit * label, dim=reduce_dim)
+    dice_denominator = fluid.layers.reduce_sum(
+        logit, dim=reduce_dim) + fluid.layers.reduce_sum(
+        label, dim=reduce_dim)
+    dice_score = 1 - inse * 2 / (dice_denominator + epsilon)
+    label.stop_gradient = True
+    ignore_mask.stop_gradient = True
+    return fluid.layers.reduce_mean(dice_score)
+def bce_loss(logit, label, ignore_mask=None):
+    if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1:
+        raise Exception("bce loss is only applicable to binary classfication")
+    label = fluid.layers.cast(label, 'float32')
+    loss = fluid.layers.sigmoid_cross_entropy_with_logits(
+        x=logit,
+        label=label,
+        ignore_index=cfg.DATASET.IGNORE_INDEX,
+        normalize=True) # or False
+    loss = fluid.layers.reduce_sum(loss)
+    label.stop_gradient = True
+    ignore_mask.stop_gradient = True
+    return loss
+def multi_softmax_with_loss(logits, label, ignore_mask=None, num_classes=2):
+    if isinstance(logits, tuple):
+        print("logits.type: ",type(logits))
+        avg_loss = 0
+        for i, logit in enumerate(logits):
+            logit_label = fluid.layers.resize_nearest(label, logit.shape[2:])
+            logit_mask = (logit_label.astype('int32') !=
+                          cfg.DATASET.IGNORE_INDEX).astype('int32')
+            loss = softmax_with_loss(logit, logit_label, logit_mask,
+                                     num_classes)
+            avg_loss += cfg.MODEL.MULTI_LOSS_WEIGHT[i] * loss
+    else:
+        avg_loss = softmax_with_loss(logits, label, ignore_mask, num_classes)
+    return avg_loss
+def multi_dice_loss(logits, label, ignore_mask=None):
+    if isinstance(logits, tuple):
+        avg_loss = 0
+        for i, logit in enumerate(logits):
+            logit_label = fluid.layers.resize_nearest(label, logit.shape[2:])
+            logit_mask = (logit_label.astype('int32') !=
+                          cfg.DATASET.IGNORE_INDEX).astype('int32')
+            loss = dice_loss(logit, logit_label, logit_mask)
+            avg_loss += cfg.MODEL.MULTI_LOSS_WEIGHT[i] * loss
+    else:
+        avg_loss = dice_loss(logits, label, ignore_mask)
+    return avg_loss
+def multi_bce_loss(logits, label, ignore_mask=None):
+    if isinstance(logits, tuple):
+        avg_loss = 0
+        for i, logit in enumerate(logits):
+            logit_label = fluid.layers.resize_nearest(label, logit.shape[2:])
+            logit_mask = (logit_label.astype('int32') !=
+                          cfg.DATASET.IGNORE_INDEX).astype('int32')
+            loss = bce_loss(logit, logit_label, logit_mask)
+            avg_loss += cfg.MODEL.MULTI_LOSS_WEIGHT[i] * loss
+    else:
+        avg_loss = bce_loss(logits, label, ignore_mask)
+    return avg_loss
--- a/PaddleCV/Research/SemSegPaddle/src/utils/metrics.py
+++ b/PaddleCV/Research/SemSegPaddle/src/utils/metrics.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import sys
+import numpy as np
+from scipy.sparse import csr_matrix
+class ConfusionMatrix(object):
+    """
+        Confusion Matrix for segmentation evaluation
+    """
+    def __init__(self, num_classes=2, streaming=False):
+        self.confusion_matrix = np.zeros([num_classes, num_classes],
+                                         dtype='int64')
+        self.num_classes = num_classes
+        self.streaming = streaming
+    def calculate(self, pred, label, ignore=None):
+        # If not in streaming mode, clear matrix everytime when call `calculate`
+        if not self.streaming:
+            self.zero_matrix()
+        label = np.transpose(label, (0, 2, 3, 1))
+        ignore = np.transpose(ignore, (0, 2, 3, 1))
+        mask = np.array(ignore) == 1
+        label = np.asarray(label)[mask]
+        pred = np.asarray(pred)[mask]
+        one = np.ones_like(pred)
+        # Accumuate ([row=label, col=pred], 1) into sparse matrix
+        spm = csr_matrix((one, (label, pred)),
+                         shape=(self.num_classes, self.num_classes))
+        spm = spm.todense()
+        self.confusion_matrix += spm
+    def zero_matrix(self):
+        """ Clear confusion matrix """
+        self.confusion_matrix = np.zeros([self.num_classes, self.num_classes],
+                                         dtype='int64')
+    def mean_iou(self):
+        iou_list = []
+        avg_iou = 0
+        # TODO: use numpy sum axis api to simpliy
+        vji = np.zeros(self.num_classes, dtype=int)
+        vij = np.zeros(self.num_classes, dtype=int)
+        for j in range(self.num_classes):
+            v_j = 0
+            for i in range(self.num_classes):
+                v_j += self.confusion_matrix[j][i]
+            vji[j] = v_j
+        for i in range(self.num_classes):
+            v_i = 0
+            for j in range(self.num_classes):
+                v_i += self.confusion_matrix[j][i]
+            vij[i] = v_i
+        for c in range(self.num_classes):
+            total = vji[c] + vij[c] - self.confusion_matrix[c][c]
+            if total == 0:
+                iou = 0
+            else:
+                iou = float(self.confusion_matrix[c][c]) / total
+            avg_iou += iou
+            iou_list.append(iou)
+        avg_iou = float(avg_iou) / float(self.num_classes)
+        return np.array(iou_list), avg_iou
+    def accuracy(self):
+        total = self.confusion_matrix.sum()
+        total_right = 0
+        for c in range(self.num_classes):
+            total_right += self.confusion_matrix[c][c]
+        if total == 0:
+            avg_acc = 0
+        else:
+            avg_acc = float(total_right) / total
+        vij = np.zeros(self.num_classes, dtype=int)
+        for i in range(self.num_classes):
+            v_i = 0
+            for j in range(self.num_classes):
+                v_i += self.confusion_matrix[j][i]
+            vij[i] = v_i
+        acc_list = []
+        for c in range(self.num_classes):
+            if vij[c] == 0:
+                acc = 0
+            else:
+                acc = self.confusion_matrix[c][c] / float(vij[c])
+            acc_list.append(acc)
+        return np.array(acc_list), avg_acc
+    def kappa(self):
+        vji = np.zeros(self.num_classes)
+        vij = np.zeros(self.num_classes)
+        for j in range(self.num_classes):
+            v_j = 0
+            for i in range(self.num_classes):
+                v_j += self.confusion_matrix[j][i]
+            vji[j] = v_j
+        for i in range(self.num_classes):
+            v_i = 0
+            for j in range(self.num_classes):
+                v_i += self.confusion_matrix[j][i]
+            vij[i] = v_i
+        total = self.confusion_matrix.sum()
+        # avoid spillovers
+        # TODO: is it reasonable to hard code 10000.0?
+        total = float(total) / 10000.0
+        vji = vji / 10000.0
+        vij = vij / 10000.0
+        tp = 0
+        tc = 0
+        for c in range(self.num_classes):
+            tp += vji[c] * vij[c]
+            tc += self.confusion_matrix[c][c]
+        tc = tc / 10000.0
+        pe = tp / (total * total)
+        po = tc / total
+        kappa = (po - pe) / (1 - pe)
+        return kappa
--- a/PaddleCV/Research/SemSegPaddle/src/utils/palette.py
+++ b/PaddleCV/Research/SemSegPaddle/src/utils/palette.py
+def get_cityscapes_palette(num_cls=19):
+    """ Returns the color map for visualizing the segmentation mask.
+    Args:
+        num_cls: Number of classes
+    Returns:
+        The color map
+    """
+    palette = [0] * (num_cls * 3)
+    palette[0:3] = (128, 64, 128)       # 0: 'road' 
+    palette[3:6] = (244, 35,232)        # 1 'sidewalk'
+    palette[6:9] = (70, 70, 70)         # 2''building'
+    palette[9:12] = (102,102,156)       # 3 wall
+    palette[12:15] =  (190,153,153)     # 4 fence
+    palette[15:18] = (153,153,153)      # 5 pole
+    palette[18:21] = (250,170, 30)      # 6 'traffic light'
+    palette[21:24] = (220,220, 0)       # 7 'traffic sign'
+    palette[24:27] = (107,142, 35)      # 8 'vegetation'
+    palette[27:30] = (152,251,152)      # 9 'terrain'
+    palette[30:33] = ( 70,130,180)      # 10 sky
+    palette[33:36] = (220, 20, 60)      # 11 person
+    palette[36:39] = (255, 0, 0)        # 12 rider
+    palette[39:42] = (0, 0, 142)        # 13 car
+    palette[42:45] = (0, 0, 70)         # 14 truck
+    palette[45:48] = (0, 60,100)        # 15 bus
+    palette[48:51] = (0, 80,100)        # 16 train
+    palette[51:54] = (0, 0,230)         # 17 'motorcycle'
+    palette[54:57] = (119, 11, 32)      # 18 'bicycle'
+    palette[57:60] = (105, 105, 105)
+    return palette
+def get_gene_palette(num_cls=182):  #Ref: CCNet
+    """ Returns the color map for visualizing the segmentation mask.
+    Args:
+        num_cls: Number of classes
+    Returns:
+        The color map
+    """
+    n = num_cls
+    palette = [0] * (n * 3)
+    for j in range(0, n):
+        lab = j
+        palette[j * 3 + 0] = 0
+        palette[j * 3 + 1] = 0
+        palette[j * 3 + 2] = 0
+        i = 0
+        while lab:
+            palette[j * 3 + 0] |= (((lab >> 0) & 1) << (7 - i))
+            palette[j * 3 + 1] |= (((lab >> 1) & 1) << (7 - i))
+            palette[j * 3 + 2] |= (((lab >> 2) & 1) << (7 - i))
+            i += 1
+            lab >>= 3
+    return palette
+def get_palette(dataset):
+    if dataset == 'cityscapes':
+        palette = get_cityscapes_palette(19)
+    elif dataset == 'pascalContext':
+        palette = get_gene_palette(num_cls=59)
+    else:
+        raise RuntimeError("unkonw dataset :{}".format(dataset))
+    return palette
--- a/PaddleCV/Research/SemSegPaddle/src/utils/solver.py
+++ b/PaddleCV/Research/SemSegPaddle/src/utils/solver.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import sys
+import paddle.fluid as fluid
+import numpy as np
+import importlib
+from src.utils.config import cfg
+from paddle.fluid.contrib.mixed_precision.decorator import OptimizerWithMixedPrecison, decorate, AutoMixedPrecisionLists
+class Solver(object):
+    def __init__(self, main_prog, start_prog):
+        total_images = cfg.DATASET.TRAIN_TOTAL_IMAGES
+        self.weight_decay = cfg.SOLVER.WEIGHT_DECAY
+        self.momentum = cfg.SOLVER.MOMENTUM
+        self.momentum2 = cfg.SOLVER.MOMENTUM2
+        self.step_per_epoch = total_images // cfg.TRAIN_BATCH_SIZE
+        if total_images % cfg.TRAIN_BATCH_SIZE != 0:
+            self.step_per_epoch += 1
+        self.total_step = cfg.SOLVER.NUM_EPOCHS * self.step_per_epoch
+        self.main_prog = main_prog
+        self.start_prog = start_prog
+        self.warmup_step = cfg.SOLVER.LR_WARMUP_STEPS if cfg.SOLVER.LR_WARMUP else -1
+        self.decay_step = self.total_step - self.warmup_step
+        self.decay_epochs = cfg.SOLVER.NUM_EPOCHS - self.warmup_step / self.step_per_epoch
+    def lr_warmup(self, learning_rate, start_lr, end_lr):
+        linear_step = end_lr - start_lr
+        lr = fluid.layers.tensor.create_global_var(
+            shape=[1],
+            value=0.0,
+            dtype='float32',
+            persistable=True,
+            name="learning_rate_warmup")
+        global_step = fluid.layers.learning_rate_scheduler._decay_step_counter()
+        warmup_counter = fluid.layers.autoincreased_step_counter(
+            counter_name='@LR_DECAY_COUNTER_WARMUP_IN_SEG@', begin=1, step=1)
+        global_counter = fluid.default_main_program().global_block(
+        ).vars['@LR_DECAY_COUNTER@']
+        warmup_counter = fluid.layers.cast(warmup_counter, 'float32')
+        with fluid.layers.control_flow.Switch() as switch:
+            with switch.case(warmup_counter <= self.warmup_step):
+                decayed_lr = start_lr + linear_step * (
+                    warmup_counter / self.warmup_step)
+                fluid.layers.tensor.assign(decayed_lr, lr)
+                # hold the global_step to 0 during the warm-up phase
+                fluid.layers.increment(global_counter, value=-1)
+            with switch.default():
+                fluid.layers.tensor.assign(learning_rate, lr)
+        return lr
+    def piecewise_decay(self):
+        gamma = cfg.SOLVER.GAMMA
+        bd = [self.step_per_epoch * e for e in cfg.SOLVER.DECAY_EPOCH]
+        lr = [cfg.SOLVER.LR * (gamma**i) for i in range(len(bd) + 1)]
+        decayed_lr = fluid.layers.piecewise_decay(boundaries=bd, values=lr)
+        return decayed_lr
+    def poly_decay(self):
+        power = cfg.SOLVER.POWER
+        decayed_lr = fluid.layers.polynomial_decay(
+            cfg.SOLVER.LR, self.decay_step, end_learning_rate=0, power=power)
+        return decayed_lr
+    def cosine_decay(self):
+        decayed_lr = fluid.layers.cosine_decay(
+            cfg.SOLVER.LR, self.step_per_epoch, self.decay_epochs)
+        return decayed_lr
+    def get_lr(self, lr_policy):
+        if lr_policy.lower() == 'poly':
+            decayed_lr = self.poly_decay()
+        elif lr_policy.lower() == 'piecewise':
+            decayed_lr = self.piecewise_decay()
+        elif lr_policy.lower() == 'cosine':
+            decayed_lr = self.cosine_decay()
+        else:
+            raise Exception(
+                "unsupport learning decay policy! only support poly,piecewise,cosine"
+            )
+        decayed_lr = self.lr_warmup(decayed_lr, 0, cfg.SOLVER.LR)
+        return decayed_lr
+    def sgd_optimizer(self, lr_policy, loss):
+        decayed_lr = self.get_lr(lr_policy)
+        optimizer = fluid.optimizer.Momentum(
+            learning_rate=decayed_lr,
+            momentum=self.momentum,
+            regularization=fluid.regularizer.L2Decay(
+                regularization_coeff=self.weight_decay),
+        )
+        if cfg.MODEL.FP16:
+            if cfg.MODEL.MODEL_NAME in ["pspnet"]:
+                custom_black_list = {"pool2d"}
+            else:
+                custom_black_list = {}
+            amp_lists = AutoMixedPrecisionLists(
+                custom_black_list=custom_black_list)
+            assert isinstance(cfg.MODEL.SCALE_LOSS, float) or isinstance(cfg.MODEL.SCALE_LOSS, str), \
+                "data type of MODEL.SCALE_LOSS must be float or str"
+            if isinstance(cfg.MODEL.SCALE_LOSS, float):
+                optimizer = decorate(
+                    optimizer,
+                    amp_lists=amp_lists,
+                    init_loss_scaling=cfg.MODEL.SCALE_LOSS,
+                    use_dynamic_loss_scaling=False)
+            else:
+                assert cfg.MODEL.SCALE_LOSS.lower() in [
+                    'dynamic'
+                ], "if MODEL.SCALE_LOSS is a string,\
+                 must be set as 'DYNAMIC'!"
+                optimizer = decorate(
+                    optimizer,
+                    amp_lists=amp_lists,
+                    use_dynamic_loss_scaling=True)
+        optimizer.minimize(loss)
+        return decayed_lr
+    def adam_optimizer(self, lr_policy, loss):
+        decayed_lr = self.get_lr(lr_policy)
+        optimizer = fluid.optimizer.Adam(
+            learning_rate=decayed_lr,
+            beta1=self.momentum,
+            beta2=self.momentum2,
+            regularization=fluid.regularizer.L2Decay(
+                regularization_coeff=self.weight_decay),
+        )
+        optimizer.minimize(loss)
+        return decayed_lr
+    def optimise(self, loss):
+        lr_policy = cfg.SOLVER.LR_POLICY
+        opt = cfg.SOLVER.OPTIMIZER
+        if opt.lower() == 'adam':
+            return self.adam_optimizer(lr_policy, loss)
+        elif opt.lower() == 'sgd':
+            return self.sgd_optimizer(lr_policy, loss)
+        else:
+            raise Exception(
+                "unsupport optimizer solver, only support adam and sgd")
--- a/PaddleCV/Research/SemSegPaddle/src/utils/timer.py
+++ b/PaddleCV/Research/SemSegPaddle/src/utils/timer.py
+#   Copyright (c) 2019  PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import time
+def calculate_eta(remaining_step, speed):
+    if remaining_step < 0:
+        remaining_step = 0
+    remaining_time = int(remaining_step / speed)
+    result = "{:0>2}:{:0>2}:{:0>2}"
+    arr = []
+    for i in range(2, -1, -1):
+        arr.append(int(remaining_time / 60**i))
+        remaining_time %= 60**i
+    return result.format(*arr)
+class Timer(object):
+    """ Simple timer class for measuring time consuming """
+    def __init__(self):
+        self._start_time = 0.0
+        self._end_time = 0.0
+        self._elapsed_time = 0.0
+        self._is_running = False
+    def start(self):
+        self._is_running = True
+        self._start_time = time.time()
+    def restart(self):
+        self.start()
+    def stop(self):
+        self._is_running = False
+        self._end_time = time.time()
+    def elapsed_time(self):
+        self._end_time = time.time()
+        self._elapsed_time = self._end_time - self._start_time
+        if not self.is_running:
+            return 0.0
+        return self._elapsed_time
+    @property
+    def is_running(self):
+        return self._is_running
--- a/PaddleCV/Research/SemSegPaddle/train.log
+++ b/PaddleCV/Research/SemSegPaddle/train.log
--- a/PaddleCV/Research/SemSegPaddle/train.py
+++ b/PaddleCV/Research/SemSegPaddle/train.py
--- a/PaddleCV/Research/SemSegPaddle/vis.py
+++ b/PaddleCV/Research/SemSegPaddle/vis.py