Merge branch 'develop' of https://github.com/wuyefeilin/PaddleSeg into develop

fdfa761a · chenguowei01 · cef62b71 · 11737433 · fdfa761a · fdfa761a
132 changed file
--- a/.travis.yml
+++ b/.travis.yml
 language: python
 python:
-  - '2.7'
  - '3.5'
  - '3.6'

--- a/README.md
+++ b/README.md
-# PaddleSeg 图像分割库
+# PaddleSeg
 [![Build Status](https://travis-ci.org/PaddlePaddle/PaddleSeg.svg?branch=master)](https://travis-ci.org/PaddlePaddle/PaddleSeg)
 [![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE)
 [![Version](https://img.shields.io/github/release/PaddlePaddle/PaddleSeg.svg)](https://github.com/PaddlePaddle/PaddleSeg/releases)
+![python version](https://img.shields.io/badge/python-3.6+-orange.svg)
+![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg)
 ## 简介
-PaddleSeg是基于[PaddlePaddle](https://www.paddlepaddle.org.cn)开发的语义分割库，覆盖了DeepLabv3+, U-Net, ICNet, PSPNet, HRNet, Fast-SCNN等主流分割模型。通过统一的配置，帮助用户更便捷地完成从训练到部署的全流程图像分割应用。
+PaddleSeg是基于[PaddlePaddle](https://www.paddlepaddle.org.cn)开发的端到端图像分割开发套件，覆盖了DeepLabv3+, U-Net, ICNet, PSPNet, HRNet, Fast-SCNN等主流分割网络。通过模块化的设计，以配置化方式驱动模型组合，帮助开发者更便捷地完成从训练到部署的全流程图像分割应用。
-</br>
 - [特点](#特点) 
 - [安装](#安装)
@@ -23,8 +23,6 @@ PaddleSeg是基于[PaddlePaddle](https://www.paddlepaddle.org.cn)开发的语义
 - [更新日志](#更新日志)
 - [贡献代码](#贡献代码)
-</br>
 ## 特点
 - **丰富的数据增强**
@@ -43,13 +41,17 @@ PaddleSeg支持多进程I/O、多卡并行、跨卡Batch Norm同步等训练加
 全面提供**服务端**和**移动端**的工业级部署能力，依托飞桨高性能推理引擎和高性能图像处理实现，开发者可以轻松完成高性能的分割模型部署和集成。通过[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)，可以在移动设备或者嵌入式设备上完成轻量级、高性能的人像分割模型部署。
+- **产业实践案例**
+PaddleSeg提供丰富地产业实践案例，如[人像分割](./contrib/HumanSeg)、[工业表计检测](https://github.com/PaddlePaddle/PaddleSeg/tree/develop/contrib#%E5%B7%A5%E4%B8%9A%E8%A1%A8%E7%9B%98%E5%88%86%E5%89%B2)、[遥感分割](./contrib/RemoteSensing)、[人体解析](contrib/ACE2P)，[工业质检](https://aistudio.baidu.com/aistudio/projectdetail/184392)等产业实践案例，助力开发者更便捷地落地图像分割技术。
 ## 安装
 ### 1. 安装PaddlePaddle
 版本要求
-* PaddlePaddle >= 1.6.1
+* PaddlePaddle >= 1.7.0
-* Python 2.7 or 3.5+
+* Python >= 3.5+
 由于图像分割模型计算开销大，推荐在GPU版本的PaddlePaddle下使用PaddleSeg.
 ```
@@ -70,8 +72,6 @@ cd PaddleSeg
 pip install -r requirements.txt
 ```
-</br>
 ## 使用教程
 我们提供了一系列的使用教程，来说明如何使用PaddleSeg完成语义分割模型的训练、评估、部署。
@@ -124,8 +124,6 @@ pip install -r requirements.txt
 |人像分割|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/188833)|
 |PaddleSeg特色垂类模型|[点击体验](https://aistudio.baidu.com/aistudio/projectdetail/226710)|
-</br>
 ## FAQ
 #### Q: 安装requirements.txt指定的依赖包时，部分包提示找不到？
@@ -148,26 +146,28 @@ python pdseg/train.py --cfg xxx.yaml TRAIN.RESUME_MODEL_DIR /PATH/TO/MODEL_CKPT/
 A: 降低Batch size，使用Group Norm策略；请注意训练过程中当`DEFAULT_NORM_TYPE`选择`bn`时，为了Batch Norm计算稳定性，batch size需要满足>=2
-#### Q: 出现错误 ModuleNotFoundError: No module named 'paddle.fluid.contrib.mixed_precision'
-A: 请将PaddlePaddle升级至1.5.2版本或以上。
-</br>
 ## 交流与反馈
 * 欢迎您通过[Github Issues](https://github.com/PaddlePaddle/PaddleSeg/issues)来提交问题、报告与建议
 * 微信公众号：飞桨PaddlePaddle
-* QQ群: 796771754
+* QQ群: 703252161
 <p align="center"><img width="200" height="200"  src="https://user-images.githubusercontent.com/45189361/64117959-1969de80-cdc9-11e9-84f7-e1c2849a004c.jpeg"/>&#8194;&#8194;&#8194;&#8194;&#8194;<img width="200" height="200" margin="500" src="./docs/imgs/qq_group2.png"/></p>
 <p align="center">  &#8194;&#8194;&#8194;微信公众号&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;官方技术交流QQ群</p>
 ## 更新日志
+* 2020.05.12
+  **`v0.5.0`**
+  * 全面升级[HumanSeg人像分割模型](./contrib/HumanSeg)，新增超轻量级人像分割模型HumanSeg-lite支持移动端实时人像分割处理，并提供基于光流的视频分割后处理提升分割流畅性。
+  * 新增[气象遥感分割方案](./contrib/RemoteSensing)，支持积雪识别、云检测等气象遥感场景。
+  * 新增[Lovasz Loss](docs/lovasz_loss.md)，解决数据类别不均衡问题。
+  * 使用VisualDL 2.0作为训练可视化工具
 * 2020.02.25
  **`v0.4.0`**
-  * 新增适用于实时场景且不需要预训练模型的分割网络Fast-SCNN，提供基于Cityscapes的[预训练模型](./docs/model_zoo.md)1个。
+  * 新增适用于实时场景且不需要预训练模型的分割网络Fast-SCNN，提供基于Cityscapes的[预训练模型](./docs/model_zoo.md)1个
-  * 新增LaneNet车道线检测网络，提供[预训练模型](https://github.com/PaddlePaddle/PaddleSeg/tree/release/v0.4.0/contrib/LaneNet#%E4%B8%83-%E5%8F%AF%E8%A7%86%E5%8C%96)一个。
+  * 新增LaneNet车道线检测网络，提供[预训练模型](https://github.com/PaddlePaddle/PaddleSeg/tree/release/v0.4.0/contrib/LaneNet#%E4%B8%83-%E5%8F%AF%E8%A7%86%E5%8C%96)一个
  * 新增基于PaddleSlim的分割库压缩策略([量化](./slim/quantization/README.md), [蒸馏](./slim/distillation/README.md), [剪枝](./slim/prune/README.md), [搜索](./slim/nas/README.md))
@@ -203,4 +203,4 @@ A: 请将PaddlePaddle升级至1.5.2版本或以上。
 ## 贡献代码
-我们非常欢迎您为PaddleSeg贡献代码或者提供使用建议。如果您可以修复某个issue或者增加一个新功能，欢迎给我们提交pull requests.
+我们非常欢迎您为PaddleSeg贡献代码或者提供使用建议。如果您可以修复某个issue或者增加一个新功能，欢迎给我们提交Pull Requests.
--- a/contrib/ACE2P/README.md
+++ b/contrib/ACE2P/README.md
 # Augmented Context Embedding with Edge Perceiving(ACE2P)
 ## 模型概述
-人体解析(Human Parsing)是细粒度的语义分割任务，旨在识别像素级别的人类图像的组成部分（例如，身体部位和服装）。ACE2P通过融合底层特征、全局上下文信息和边缘细节，
+人体解析(Human Parsing)是细粒度的语义分割任务，旨在识别像素级别的人类图像的组成部分（例如，身体部位和服装）。Augmented Context Embedding with Edge Perceiving (ACE2P)通过融合底层特征、全局上下文信息和边缘细节，端到端训练学习人体解析任务。以ACE2P单人人体解析网络为基础的解决方案在CVPR2019第三届Look into Person (LIP)挑战赛中赢得了全部三个人体解析任务的第一名。
-端到端训练学习人体解析任务。以ACE2P单人人体解析网络为基础的解决方案在CVPR2019第三届LIP挑战赛中赢得了全部三个人体解析任务的第一名
 ## 模型框架图
 ![](imgs/net.jpg)
@@ -38,6 +37,59 @@ ACE2P模型包含三个分支:
 ![](imgs/result.jpg)
+![](ACE2P/imgs/result.jpg)
+人体解析(Human Parsing)是细粒度的语义分割任务，旨在识别像素级别的人类图像的组成部分（例如，身体部位和服装）。本章节使用冠军模型Augmented Context Embedding with Edge Perceiving (ACE2P)进行预测分割。
+## 代码使用说明
+### 1. 模型下载
+执行以下命令下载并解压ACE2P预测模型：
+```
+python download_ACE2P.py
+```
+或点击[链接](https://paddleseg.bj.bcebos.com/models/ACE2P.tgz)进行手动下载, 并在contrib/ACE2P下解压。
+### 2. 数据下载
+测试图片共10000张，
+点击 [Baidu_Drive](https://pan.baidu.com/s/1nvqmZBN#list/path=%2Fsharelink2787269280-523292635003760%2FLIP%2FLIP&parentPath=%2Fsharelink2787269280-523292635003760)
+下载Testing_images.zip，或前往LIP数据集官网进行下载。
+下载后解压到./data文件夹下
+### 3. 快速预测
+使用GPU预测
+```
+python -u infer.py --example ACE2P --use_gpu
+```
+使用CPU预测：
+```
+python -u infer.py --example ACE2P
+```
+**NOTE:** 运行该模型需要2G左右显存。由于数据图片较多，预测过程将比较耗时。
+#### 4. 预测结果示例：
+  原图：
+  ![](ACE2P/imgs/117676_2149260.jpg)
+  预测结果：
+  ![](ACE2P/imgs/117676_2149260.png)
+### 备注
+1. 数据及模型路径等详细配置见ACE2P/HumanSeg/RoadLine下的config.py文件
+2. ACE2P模型需预留2G显存，若显存超可调小FLAGS_fraction_of_gpu_memory_to_use
 ## 引用
 **论文** 

--- a/contrib/HumanSeg/README.md
+++ b/contrib/HumanSeg/README.md
+# HumanSeg人像分割模型
+本教程基于PaddleSeg核心分割网络，提供针对人像分割场景从预训练模型、Fine-tune、视频分割预测部署的全流程应用指南。最新发布HumanSeg-lite模型超轻量级人像分割模型，支持移动端场景的实时分割。
+## 环境依赖
+* Python == 3.5/3.6/3.7
+* PaddlePaddle >= 1.7.2
+PaddlePaddle的安装可参考[飞桨快速安装](https://www.paddlepaddle.org.cn/install/quick)
+通过以下命令安装python包依赖，请确保在该分支上至少执行过一次以下命令
+```shell
+$ pip install -r requirements.txt
+```
+## 预训练模型
+HumanSeg开放了在大规模人像数据上训练的三个预训练模型，满足多种使用场景的需求
+| 模型类型 | Checkpoint | Inference Model | Quant Inference Model | 备注 |
+| --- | --- | --- | ---| --- |
+| HumanSeg-server  | [humanseg_server_ckpt](https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_server_ckpt.zip) | [humanseg_server_inference](https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_server_inference.zip) | -- | 高精度模型，适用于服务端GPU且背景复杂的人像场景， 模型结构为Deeplabv3+/Xcetion65, 输入大小（512， 512） |
+| HumanSeg-mobile | [humanseg_mobile_ckpt](https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_mobile_ckpt.zip) | [humanseg_mobile_inference](https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_mobile_inference.zip) | [humanseg_mobile_quant](https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_mobile_quant.zip) | 轻量级模型, 适用于移动端或服务端CPU的前置摄像头场景，模型结构为HRNet_w18_samll_v1，输入大小（192， 192）  |
+| HumanSeg-lite | [humanseg_lite_ckpt](https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_lite_ckpt.zip) | [humanseg_lite_inference](https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_lite_inference.zip) |  [humanseg_lite_quant](https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_lite_quant.zip) | 超轻量级模型, 适用于手机自拍人像，且有移动端实时分割场景， 模型结构为优化的ShuffleNetV2，输入大小（192， 192） |
+模型性能
+| 模型 | 模型大小 | 计算耗时 |
+| --- | --- | --- |
+|humanseg_server_inference| 158M | - |
+|humanseg_mobile_inference | 5.8 M | 42.35ms |
+|humanseg_mobile_quant | 1.6M | 24.93ms |
+|humanseg_lite_inference | 541K | 17.26ms |
+|humanseg_lite_quant | 187k | 11.89ms |
+计算耗时运行环境： 小米，cpu：骁龙855， 内存：6GB， 图片大小：192*192)
+**NOTE:**
+其中Checkpoint为模型权重，用于Fine-tuning场景。
+* Inference Model和Quant Inference Model为预测部署模型，包含`__model__`计算图结构、`__params__`模型参数和`model.yaml`基础的模型配置信息。
+* 其中Inference Model适用于服务端的CPU和GPU预测部署，Qunat Inference Model为量化版本，适用于通过Paddle Lite进行移动端等端侧设备部署。更多Paddle Lite部署说明查看[Paddle Lite文档](https://paddle-lite.readthedocs.io/zh/latest/)
+执行以下脚本进行HumanSeg预训练模型的下载
+```bash
+python pretrained_weights/download_pretrained_weights.py
+```
+## 下载测试数据
+我们提供了[supervise.ly](https://supervise.ly/)发布人像分割数据集**Supervisely Persons**, 从中随机抽取一小部分并转化成PaddleSeg可直接加载数据格式。通过运行以下代码进行快速下载，其中包含手机前置摄像头的人像测试视频`video_test.mp4`.
+```bash
+python data/download_data.py
+```
+## 快速体验视频流人像分割
+结合DIS（Dense Inverse Search-basedmethod）光流算法预测结果与分割结果，改善视频流人像分割
+```bash
+# 通过电脑摄像头进行实时分割处理
+python video_infer.py --model_dir pretrained_weights/humanseg_lite_inference
+# 对人像视频进行分割处理
+python video_infer.py --model_dir pretrained_weights/humanseg_lite_inference --video_path data/video_test.mp4
+```
+视频分割结果如下：
+<img src="https://paddleseg.bj.bcebos.com/humanseg/data/video_test.gif" width="20%" height="20%"><img src="https://paddleseg.bj.bcebos.com/humanseg/data/result.gif" width="20%" height="20%">
+**NOTE**:
+视频分割处理时间需要几分钟，请耐心等待。
+## 训练
+使用下述命令基于与训练模型进行Fine-tuning，请确保选用的模型结构`model_type`与模型参数`pretrained_weights`匹配。
+```bash
+python train.py --model_type HumanSegMobile \
+--save_dir output/ \
+--data_dir data/mini_supervisely \
+--train_list data/mini_supervisely/train.txt \
+--val_list data/mini_supervisely/val.txt \
+--pretrained_weights pretrained_weights/humanseg_mobile_ckpt \
+--batch_size 8 \
+--learning_rate 0.001 \
+--num_epochs 10 \
+--image_shape 192 192
+```
+其中参数含义如下：
+* `--model_type`: 模型类型，可选项为：HumanSegServer、HumanSegMobile和HumanSegLite
+* `--save_dir`: 模型保存路径
+* `--data_dir`: 数据集路径
+* `--train_list`: 训练集列表路径
+* `--val_list`: 验证集列表路径
+* `--pretrained_weights`: 预训练模型路径
+* `--batch_size`: 批大小
+* `--learning_rate`: 初始学习率
+* `--num_epochs`: 训练轮数
+* `--image_shape`: 网络输入图像大小（w, h）
+更多命令行帮助可运行下述命令进行查看：
+```bash
+python train.py --help
+```
+**NOTE**
+可通过更换`--model_type`变量与对应的`--pretrained_weights`使用不同的模型快速尝试。
+## 评估
+使用下述命令进行评估
+```bash
+python val.py --model_dir output/best_model \
+--data_dir data/mini_supervisely \
+--val_list data/mini_supervisely/val.txt \
+--image_shape 192 192
+```
+其中参数含义如下：
+* `--model_dir`: 模型路径
+* `--data_dir`: 数据集路径
+* `--val_list`: 验证集列表路径
+* `--image_shape`: 网络输入图像大小（w, h）
+## 预测
+使用下述命令进行预测
+```bash
+python infer.py --model_dir output/best_model \
+--data_dir data/mini_supervisely \
+--test_list data/mini_supervisely/test.txt \
+--image_shape 192 192
+```
+其中参数含义如下：
+* `--model_dir`: 模型路径
+* `--data_dir`: 数据集路径
+* `--test_list`: 测试集列表路径
+* `--image_shape`: 网络输入图像大小（w, h）
+## 模型导出
+```bash
+python export.py --model_dir output/best_model \
+--save_dir output/export
+```
+其中参数含义如下：
+* `--model_dir`: 模型路径
+* `--save_dir`: 导出模型保存路径
+## 离线量化
+```bash
+python quant_offline.py --model_dir output/best_model \
+--data_dir data/mini_supervisely \
+--quant_list data/mini_supervisely/val.txt \
+--save_dir output/quant_offline \
+--image_shape 192 192
+```
+其中参数含义如下：
+* `--model_dir`: 待量化模型路径
+* `--data_dir`: 数据集路径
+* `--quant_list`: 量化数据集列表路径，一般直接选择训练集或验证集
+* `--save_dir`: 量化模型保存路径
+* `--image_shape`: 网络输入图像大小（w, h）
+## 在线量化
+利用float训练模型进行在线量化。
+```bash
+python quant_online.py --model_type HumanSegMobile \
+--save_dir output/quant_online \
+--data_dir data/mini_supervisely \
+--train_list data/mini_supervisely/train.txt \
+--val_list data/mini_supervisely/val.txt \
+--pretrained_weights output/best_model \
+--batch_size 2 \
+--learning_rate 0.001 \
+--num_epochs 2 \
+--image_shape 192 192
+```
+其中参数含义如下：
+* `--model_type`: 模型类型，可选项为：HumanSegServer、HumanSegMobile和HumanSegLite
+* `--save_dir`: 模型保存路径
+* `--data_dir`: 数据集路径
+* `--train_list`: 训练集列表路径
+* `--val_list`: 验证集列表路径
+* `--pretrained_weights`: 预训练模型路径,
+* `--batch_size`: 批大小
+* `--learning_rate`: 初始学习率
+* `--num_epochs`: 训练轮数
+* `--image_shape`: 网络输入图像大小（w, h）
--- a/contrib/HumanSeg/__init__.py
+++ b/contrib/HumanSeg/__init__.py
--- a/contrib/HumanSeg/config.py
+++ b/contrib/HumanSeg/config.py
-# -*- coding: utf-8 -*-
-from utils.util import AttrDict, get_arguments, merge_cfg_from_args
-import os
-args = get_arguments()
-cfg = AttrDict()
-# 待预测图像所在路径
-cfg.data_dir = os.path.join(args.example , "data", "test_images")
-# 待预测图像名称列表
-cfg.data_list_file = os.path.join(args.example , "data", "test.txt")
-# 模型加载路径
-cfg.model_path = os.path.join(args.example , "model")
-# 预测结果保存路径
-cfg.vis_dir = os.path.join(args.example , "result")
-# 预测类别数
-cfg.class_num = 2
-# 均值, 图像预处理减去的均值
-cfg.MEAN = 104.008, 116.669, 122.675
-# 标准差，图像预处理除以标准差
-cfg.STD =  1.0, 1.0, 1.0
-# 待预测图像输入尺寸
-cfg.input_size = 513, 513
-merge_cfg_from_args(args, cfg)
--- a/contrib/HumanSeg/data/download_data.py
+++ b/contrib/HumanSeg/data/download_data.py
+# Copyright (c) 2019  PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import sys
+import os
+LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
+TEST_PATH = os.path.join(LOCAL_PATH, "../../../", "test")
+sys.path.append(TEST_PATH)
+from test_utils import download_file_and_uncompress
+def download_data(savepath, extrapath):
+    url = "https://paddleseg.bj.bcebos.com/humanseg/data/mini_supervisely.zip"
+    download_file_and_uncompress(
+        url=url, savepath=savepath, extrapath=extrapath)
+    url = "https://paddleseg.bj.bcebos.com/humanseg/data/video_test.zip"
+    download_file_and_uncompress(
+        url=url,
+        savepath=savepath,
+        extrapath=extrapath,
+        extraname='video_test.mp4')
+if __name__ == "__main__":
+    download_data(LOCAL_PATH, LOCAL_PATH)
+    print("Data download finish!")
--- a/contrib/HumanSeg/datasets/__init__.py
+++ b/contrib/HumanSeg/datasets/__init__.py
+#   Copyright (c) 2020  PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from .dataset import Dataset
--- a/contrib/HumanSeg/datasets/dataset.py
+++ b/contrib/HumanSeg/datasets/dataset.py
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os.path as osp
+from threading import Thread
+import multiprocessing
+import collections
+import numpy as np
+import six
+import sys
+import copy
+import random
+import platform
+import chardet
+import utils.logging as logging
+class EndSignal():
+    pass
+def is_pic(img_name):
+    valid_suffix = ['JPEG', 'jpeg', 'JPG', 'jpg', 'BMP', 'bmp', 'PNG', 'png']
+    suffix = img_name.split('.')[-1]
+    if suffix not in valid_suffix:
+        return False
+    return True
+def is_valid(sample):
+    if sample is None:
+        return False
+    if isinstance(sample, tuple):
+        for s in sample:
+            if s is None:
+                return False
+            elif isinstance(s, np.ndarray) and s.size == 0:
+                return False
+            elif isinstance(s, collections.Sequence) and len(s) == 0:
+                return False
+    return True
+def get_encoding(path):
+    f = open(path, 'rb')
+    data = f.read()
+    file_encoding = chardet.detect(data).get('encoding')
+    return file_encoding
+def multithread_reader(mapper,
+                       reader,
+                       num_workers=4,
+                       buffer_size=1024,
+                       batch_size=8,
+                       drop_last=True):
+    from queue import Queue
+    end = EndSignal()
+    # define a worker to read samples from reader to in_queue
+    def read_worker(reader, in_queue):
+        for i in reader():
+            in_queue.put(i)
+        in_queue.put(end)
+    # define a worker to handle samples from in_queue by mapper
+    # and put mapped samples into out_queue
+    def handle_worker(in_queue, out_queue, mapper):
+        sample = in_queue.get()
+        while not isinstance(sample, EndSignal):
+            if len(sample) == 2:
+                r = mapper(sample[0], sample[1])
+            elif len(sample) == 3:
+                r = mapper(sample[0], sample[1], sample[2])
+            else:
+                raise Exception('The sample\'s length must be 2 or 3.')
+            if is_valid(r):
+                out_queue.put(r)
+            sample = in_queue.get()
+        in_queue.put(end)
+        out_queue.put(end)
+    def xreader():
+        in_queue = Queue(buffer_size)
+        out_queue = Queue(buffer_size)
+        # start a read worker in a thread
+        target = read_worker
+        t = Thread(target=target, args=(reader, in_queue))
+        t.daemon = True
+        t.start()
+        # start several handle_workers
+        target = handle_worker
+        args = (in_queue, out_queue, mapper)
+        workers = []
+        for i in range(num_workers):
+            worker = Thread(target=target, args=args)
+            worker.daemon = True
+            workers.append(worker)
+        for w in workers:
+            w.start()
+        batch_data = []
+        sample = out_queue.get()
+        while not isinstance(sample, EndSignal):
+            batch_data.append(sample)
+            if len(batch_data) == batch_size:
+                yield batch_data
+                batch_data = []
+            sample = out_queue.get()
+        finish = 1
+        while finish < num_workers:
+            sample = out_queue.get()
+            if isinstance(sample, EndSignal):
+                finish += 1
+            else:
+                batch_data.append(sample)
+                if len(batch_data) == batch_size:
+                    yield batch_data
+                    batch_data = []
+        if not drop_last and len(batch_data) != 0:
+            yield batch_data
+            batch_data = []
+    return xreader
+def multiprocess_reader(mapper,
+                        reader,
+                        num_workers=4,
+                        buffer_size=1024,
+                        batch_size=8,
+                        drop_last=True):
+    from .shared_queue import SharedQueue as Queue
+    def _read_into_queue(samples, mapper, queue):
+        end = EndSignal()
+        try:
+            for sample in samples:
+                if sample is None:
+                    raise ValueError("sample has None")
+                if len(sample) == 2:
+                    result = mapper(sample[0], sample[1])
+                elif len(sample) == 3:
+                    result = mapper(sample[0], sample[1], sample[2])
+                else:
+                    raise Exception('The sample\'s length must be 2 or 3.')
+                if is_valid(result):
+                    queue.put(result)
+            queue.put(end)
+        except:
+            queue.put("")
+            six.reraise(*sys.exc_info())
+    def queue_reader():
+        queue = Queue(buffer_size, memsize=3 * 1024**3)
+        total_samples = [[] for i in range(num_workers)]
+        for i, sample in enumerate(reader()):
+            index = i % num_workers
+            total_samples[index].append(sample)
+        for i in range(num_workers):
+            p = multiprocessing.Process(
+                target=_read_into_queue, args=(total_samples[i], mapper, queue))
+            p.start()
+        finish_num = 0
+        batch_data = list()
+        while finish_num < num_workers:
+            sample = queue.get()
+            if isinstance(sample, EndSignal):
+                finish_num += 1
+            elif sample == "":
+                raise ValueError("multiprocess reader raises an exception")
+            else:
+                batch_data.append(sample)
+                if len(batch_data) == batch_size:
+                    yield batch_data
+                    batch_data = []
+        if len(batch_data) != 0 and not drop_last:
+            yield batch_data
+            batch_data = []
+    return queue_reader
+class Dataset:
+    def __init__(self,
+                 data_dir,
+                 file_list,
+                 label_list=None,
+                 transforms=None,
+                 num_workers='auto',
+                 buffer_size=100,
+                 parallel_method='thread',
+                 shuffle=False):
+        if num_workers == 'auto':
+            import multiprocessing as mp
+            num_workers = mp.cpu_count() // 2 if mp.cpu_count() // 2 < 8 else 8
+        if transforms is None:
+            raise Exception("transform should be defined.")
+        self.transforms = transforms
+        self.num_workers = num_workers
+        self.buffer_size = buffer_size
+        self.parallel_method = parallel_method
+        self.shuffle = shuffle
+        self.file_list = list()
+        self.labels = list()
+        self._epoch = 0
+        if label_list is not None:
+            with open(label_list, encoding=get_encoding(label_list)) as f:
+                for line in f:
+                    item = line.strip()
+                    self.labels.append(item)
+        with open(file_list, encoding=get_encoding(file_list)) as f:
+            for line in f:
+                items = line.strip().split()
+                if not is_pic(items[0]):
+                    continue
+                full_path_im = osp.join(data_dir, items[0])
+                full_path_label = osp.join(data_dir, items[1])
+                if not osp.exists(full_path_im):
+                    raise IOError(
+                        'The image file {} is not exist!'.format(full_path_im))
+                if not osp.exists(full_path_label):
+                    raise IOError('The image file {} is not exist!'.format(
+                        full_path_label))
+                self.file_list.append([full_path_im, full_path_label])
+        self.num_samples = len(self.file_list)
+        logging.info("{} samples in file {}".format(
+            len(self.file_list), file_list))
+    def iterator(self):
+        self._epoch += 1
+        self._pos = 0
+        files = copy.deepcopy(self.file_list)
+        if self.shuffle:
+            random.shuffle(files)
+        files = files[:self.num_samples]
+        self.num_samples = len(files)
+        for f in files:
+            label_path = f[1]
+            sample = [f[0], None, label_path]
+            yield sample
+    def generator(self, batch_size=1, drop_last=True):
+        self.batch_size = batch_size
+        parallel_reader = multithread_reader
+        if self.parallel_method == "process":
+            if platform.platform().startswith("Windows"):
+                logging.debug(
+                    "multiprocess_reader is not supported in Windows platform, force to use multithread_reader."
+                )
+            else:
+                parallel_reader = multiprocess_reader
+        return parallel_reader(
+            self.transforms,
+            self.iterator,
+            num_workers=self.num_workers,
+            buffer_size=self.buffer_size,
+            batch_size=batch_size,
+            drop_last=drop_last)
--- a/contrib/HumanSeg/datasets/shared_queue/__init__.py
+++ b/contrib/HumanSeg/datasets/shared_queue/__init__.py
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+__all__ = ['SharedBuffer', 'SharedMemoryMgr', 'SharedQueue']
+from .sharedmemory import SharedBuffer
+from .sharedmemory import SharedMemoryMgr
+from .sharedmemory import SharedMemoryError
+from .queue import SharedQueue
--- a/contrib/HumanSeg/datasets/shared_queue/queue.py
+++ b/contrib/HumanSeg/datasets/shared_queue/queue.py
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+import sys
+import six
+if six.PY3:
+    import pickle
+    from io import BytesIO as StringIO
+else:
+    import cPickle as pickle
+    from cStringIO import StringIO
+import logging
+import traceback
+import multiprocessing as mp
+from multiprocessing.queues import Queue
+from .sharedmemory import SharedMemoryMgr
+logger = logging.getLogger(__name__)
+class SharedQueueError(ValueError):
+    """ SharedQueueError
+    """
+    pass
+class SharedQueue(Queue):
+    """ a Queue based on shared memory to communicate data between Process,
+        and it's interface is compatible with 'multiprocessing.queues.Queue'
+    """
+    def __init__(self, maxsize=0, mem_mgr=None, memsize=None, pagesize=None):
+        """ init
+        """
+        if six.PY3:
+            super(SharedQueue, self).__init__(maxsize, ctx=mp.get_context())
+        else:
+            super(SharedQueue, self).__init__(maxsize)
+        if mem_mgr is not None:
+            self._shared_mem = mem_mgr
+        else:
+            self._shared_mem = SharedMemoryMgr(
+                capacity=memsize, pagesize=pagesize)
+    def put(self, obj, **kwargs):
+        """ put an object to this queue
+        """
+        obj = pickle.dumps(obj, -1)
+        buff = None
+        try:
+            buff = self._shared_mem.malloc(len(obj))
+            buff.put(obj)
+            super(SharedQueue, self).put(buff, **kwargs)
+        except Exception as e:
+            stack_info = traceback.format_exc()
+            err_msg = 'failed to put a element to SharedQueue '\
+                'with stack info[%s]' % (stack_info)
+            logger.warn(err_msg)
+            if buff is not None:
+                buff.free()
+            raise e
+    def get(self, **kwargs):
+        """ get an object from this queue
+        """
+        buff = None
+        try:
+            buff = super(SharedQueue, self).get(**kwargs)
+            data = buff.get()
+            return pickle.load(StringIO(data))
+        except Exception as e:
+            stack_info = traceback.format_exc()
+            err_msg = 'failed to get element from SharedQueue '\
+                        'with stack info[%s]' % (stack_info)
+            logger.warn(err_msg)
+            raise e
+        finally:
+            if buff is not None:
+                buff.free()
+    def release(self):
+        self._shared_mem.release()
+        self._shared_mem = None
--- a/contrib/HumanSeg/datasets/shared_queue/sharedmemory.py
+++ b/contrib/HumanSeg/datasets/shared_queue/sharedmemory.py
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# utils for memory management which is allocated on sharedmemory,
+#    note that these structures may not be thread-safe
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+import os
+import time
+import math
+import struct
+import sys
+import six
+if six.PY3:
+    import pickle
+else:
+    import cPickle as pickle
+import json
+import uuid
+import random
+import numpy as np
+import weakref
+import logging
+from multiprocessing import Lock
+from multiprocessing import RawArray
+logger = logging.getLogger(__name__)
+class SharedMemoryError(ValueError):
+    """ SharedMemoryError
+    """
+    pass
+class SharedBufferError(SharedMemoryError):
+    """ SharedBufferError
+    """
+    pass
+class MemoryFullError(SharedMemoryError):
+    """ MemoryFullError
+    """
+    def __init__(self, errmsg=''):
+        super(MemoryFullError, self).__init__()
+        self.errmsg = errmsg
+def memcopy(dst, src, offset=0, length=None):
+    """ copy data from 'src' to 'dst' in bytes
+    """
+    length = length if length is not None else len(src)
+    assert type(dst) == np.ndarray, 'invalid type for "dst" in memcopy'
+    if type(src) is not np.ndarray:
+        if type(src) is str and six.PY3:
+            src = src.encode()
+        src = np.frombuffer(src, dtype='uint8', count=len(src))
+    dst[:] = src[offset:offset + length]
+class SharedBuffer(object):
+    """ Buffer allocated from SharedMemoryMgr, and it stores data on shared memory
+        note that:
+            every instance of this should be freed explicitely by calling 'self.free'
+    """
+    def __init__(self, owner, capacity, pos, size=0, alloc_status=''):
+        """ Init
+            Args:
+                owner (str): manager to own this buffer
+                capacity (int): capacity in bytes for this buffer
+                pos (int): page position in shared memory
+                size (int): bytes already used
+                alloc_status (str): debug info about allocator when allocate this
+        """
+        self._owner = owner
+        self._cap = capacity
+        self._pos = pos
+        self._size = size
+        self._alloc_status = alloc_status
+        assert self._pos >= 0 and self._cap > 0, \
+            "invalid params[%d:%d] to construct SharedBuffer" \
+            % (self._pos, self._cap)
+    def owner(self):
+        """ get owner
+        """
+        return SharedMemoryMgr.get_mgr(self._owner)
+    def put(self, data, override=False):
+        """ put data to this buffer
+        Args:
+            data (str): data to be stored in this buffer
+        Returns:
+            None
+        Raises:
+            SharedMemoryError when not enough space in this buffer
+        """
+        assert type(data) in [str, bytes], \
+            'invalid type[%s] for SharedBuffer::put' % (str(type(data)))
+        if self._size > 0 and not override:
+            raise SharedBufferError('already has already been setted before')
+        if self.capacity() < len(data):
+            raise SharedBufferError('data[%d] is larger than size of buffer[%s]'\
+                % (len(data), str(self)))
+        self.owner().put_data(self, data)
+        self._size = len(data)
+    def get(self, offset=0, size=None, no_copy=True):
+        """ get the data stored this buffer
+        Args:
+            offset (int): position for the start point to 'get'
+            size (int): size to get
+        Returns:
+            data (np.ndarray('uint8')): user's data in numpy
+                which is passed in by 'put'
+            None: if no data stored in
+        """
+        offset = offset if offset >= 0 else self._size + offset
+        if self._size <= 0:
+            return None
+        size = self._size if size is None else size
+        assert offset + size <= self._cap, 'invalid offset[%d] '\
+            'or size[%d] for capacity[%d]' % (offset, size, self._cap)
+        return self.owner().get_data(self, offset, size, no_copy=no_copy)
+    def size(self):
+        """ bytes of used memory
+        """
+        return self._size
+    def resize(self, size):
+        """ resize the used memory to 'size', should not be greater than capacity
+        """
+        assert size >= 0 and size <= self._cap, \
+            "invalid size[%d] for resize" % (size)
+        self._size = size
+    def capacity(self):
+        """ size of allocated memory
+        """
+        return self._cap
+    def __str__(self):
+        """ human readable format
+        """
+        return "SharedBuffer(owner:%s, pos:%d, size:%d, "\
+            "capacity:%d, alloc_status:[%s], pid:%d)" \
+            % (str(self._owner), self._pos, self._size, \
+            self._cap, self._alloc_status, os.getpid())
+    def free(self):
+        """ free this buffer to it's owner
+        """
+        if self._owner is not None:
+            self.owner().free(self)
+            self._owner = None
+            self._cap = 0
+            self._pos = -1
+            self._size = 0
+            return True
+        else:
+            return False
+class PageAllocator(object):
+    """ allocator used to malloc and free shared memory which
+        is split into pages
+    """
+    s_allocator_header = 12
+    def __init__(self, base, total_pages, page_size):
+        """ init
+        """
+        self._magic_num = 1234321000 + random.randint(100, 999)
+        self._base = base
+        self._total_pages = total_pages
+        self._page_size = page_size
+        header_pages = int(
+            math.ceil((total_pages + self.s_allocator_header) / page_size))
+        self._header_pages = header_pages
+        self._free_pages = total_pages - header_pages
+        self._header_size = self._header_pages * page_size
+        self._reset()
+    def _dump_alloc_info(self, fname):
+        hpages, tpages, pos, used = self.header()
+        start = self.s_allocator_header
+        end = start + self._page_size * hpages
+        alloc_flags = self._base[start:end].tostring()
+        info = {
+            'magic_num': self._magic_num,
+            'header_pages': hpages,
+            'total_pages': tpages,
+            'pos': pos,
+            'used': used
+        }
+        info['alloc_flags'] = alloc_flags
+        fname = fname + '.' + str(uuid.uuid4())[:6]
+        with open(fname, 'wb') as f:
+            f.write(pickle.dumps(info, -1))
+        logger.warn('dump alloc info to file[%s]' % (fname))
+    def _reset(self):
+        alloc_page_pos = self._header_pages
+        used_pages = self._header_pages
+        header_info = struct.pack(
+            str('III'), self._magic_num, alloc_page_pos, used_pages)
+        assert len(header_info) == self.s_allocator_header, \
+            'invalid size of header_info'
+        memcopy(self._base[0:self.s_allocator_header], header_info)
+        self.set_page_status(0, self._header_pages, '1')
+        self.set_page_status(self._header_pages, self._free_pages, '0')
+    def header(self):
+        """ get header info of this allocator
+        """
+        header_str = self._base[0:self.s_allocator_header].tostring()
+        magic, pos, used = struct.unpack(str('III'), header_str)
+        assert magic == self._magic_num, \
+            'invalid header magic[%d] in shared memory' % (magic)
+        return self._header_pages, self._total_pages, pos, used
+    def empty(self):
+        """ are all allocatable pages available
+        """
+        header_pages, pages, pos, used = self.header()
+        return header_pages == used
+    def full(self):
+        """ are all allocatable pages used
+        """
+        header_pages, pages, pos, used = self.header()
+        return header_pages + used == pages
+    def __str__(self):
+        header_pages, pages, pos, used = self.header()
+        desc = '{page_info[magic:%d,total:%d,used:%d,header:%d,alloc_pos:%d,pagesize:%d]}' \
+            % (self._magic_num, pages, used, header_pages, pos, self._page_size)
+        return 'PageAllocator:%s' % (desc)
+    def set_alloc_info(self, alloc_pos, used_pages):
+        """ set allocating position to new value
+        """
+        memcopy(self._base[4:12], struct.pack(str('II'), alloc_pos, used_pages))
+    def set_page_status(self, start, page_num, status):
+        """ set pages from 'start' to 'end' with new same status 'status'
+        """
+        assert status in ['0', '1'], 'invalid status[%s] for page status '\
+            'in allocator[%s]' % (status, str(self))
+        start += self.s_allocator_header
+        end = start + page_num
+        assert start >= 0 and end <= self._header_size, 'invalid end[%d] of pages '\
+            'in allocator[%s]' % (end, str(self))
+        memcopy(self._base[start:end], str(status * page_num))
+    def get_page_status(self, start, page_num, ret_flag=False):
+        start += self.s_allocator_header
+        end = start + page_num
+        assert start >= 0 and end <= self._header_size, 'invalid end[%d] of pages '\
+            'in allocator[%s]' % (end, str(self))
+        status = self._base[start:end].tostring().decode()
+        if ret_flag:
+            return status
+        zero_num = status.count('0')
+        if zero_num == 0:
+            return (page_num, 1)
+        else:
+            return (zero_num, 0)
+    def malloc_page(self, page_num):
+        header_pages, pages, pos, used = self.header()
+        end = pos + page_num
+        if end > pages:
+            pos = self._header_pages
+            end = pos + page_num
+        start_pos = pos
+        flags = ''
+        while True:
+            # maybe flags already has some '0' pages,
+            # so just check 'page_num - len(flags)' pages
+            flags = self.get_page_status(pos, page_num, ret_flag=True)
+            if flags.count('0') == page_num:
+                break
+            # not found enough pages, so shift to next few pages
+            free_pos = flags.rfind('1') + 1
+            pos += free_pos
+            end = pos + page_num
+            if end > pages:
+                pos = self._header_pages
+                end = pos + page_num
+                flags = ''
+            # not found available pages after scan all pages
+            if pos <= start_pos and end >= start_pos:
+                logger.debug('not found available pages after scan all pages')
+                break
+        page_status = (flags.count('0'), 0)
+        if page_status != (page_num, 0):
+            free_pages = self._total_pages - used
+            if free_pages == 0:
+                err_msg = 'all pages have been used:%s' % (str(self))
+            else:
+                err_msg = 'not found available pages with page_status[%s] '\
+                    'and %d free pages' % (str(page_status), free_pages)
+            err_msg = 'failed to malloc %d pages at pos[%d] for reason[%s] and allocator status[%s]' \
+                % (page_num, pos, err_msg, str(self))
+            raise MemoryFullError(err_msg)
+        self.set_page_status(pos, page_num, '1')
+        used += page_num
+        self.set_alloc_info(end, used)
+        return pos
+    def free_page(self, start, page_num):
+        """ free 'page_num' pages start from 'start'
+        """
+        page_status = self.get_page_status(start, page_num)
+        assert page_status == (page_num, 1), \
+            'invalid status[%s] when free [%d, %d]' \
+                % (str(page_status), start, page_num)
+        self.set_page_status(start, page_num, '0')
+        _, _, pos, used = self.header()
+        used -= page_num
+        self.set_alloc_info(pos, used)
+DEFAULT_SHARED_MEMORY_SIZE = 1024 * 1024 * 1024
+class SharedMemoryMgr(object):
+    """ manage a continouse block of memory, provide
+        'malloc' to allocate new buffer, and 'free' to free buffer
+    """
+    s_memory_mgrs = weakref.WeakValueDictionary()
+    s_mgr_num = 0
+    s_log_statis = False
+    @classmethod
+    def get_mgr(cls, id):
+        """ get a SharedMemoryMgr with size of 'capacity'
+        """
+        assert id in cls.s_memory_mgrs, 'invalid id[%s] for memory managers' % (
+            id)
+        return cls.s_memory_mgrs[id]
+    def __init__(self, capacity=None, pagesize=None):
+        """ init
+        """
+        logger.debug('create SharedMemoryMgr')
+        pagesize = 64 * 1024 if pagesize is None else pagesize
+        assert type(pagesize) is int, "invalid type of pagesize[%s]" \
+            % (str(pagesize))
+        capacity = DEFAULT_SHARED_MEMORY_SIZE if capacity is None else capacity
+        assert type(capacity) is int, "invalid type of capacity[%s]" \
+            % (str(capacity))
+        assert capacity > 0, '"size of shared memory should be greater than 0'
+        self._released = False
+        self._cap = capacity
+        self._page_size = pagesize
+        assert self._cap % self._page_size == 0, \
+            "capacity[%d] and pagesize[%d] are not consistent" \
+            % (self._cap, self._page_size)
+        self._total_pages = self._cap // self._page_size
+        self._pid = os.getpid()
+        SharedMemoryMgr.s_mgr_num += 1
+        self._id = self._pid * 100 + SharedMemoryMgr.s_mgr_num
+        SharedMemoryMgr.s_memory_mgrs[self._id] = self
+        self._locker = Lock()
+        self._setup()
+    def _setup(self):
+        self._shared_mem = RawArray('c', self._cap)
+        self._base = np.frombuffer(
+            self._shared_mem, dtype='uint8', count=self._cap)
+        self._locker.acquire()
+        try:
+            self._allocator = PageAllocator(self._base, self._total_pages,
+                                            self._page_size)
+        finally:
+            self._locker.release()
+    def malloc(self, size, wait=True):
+        """ malloc a new SharedBuffer
+        Args:
+            size (int): buffer size to be malloc
+            wait (bool): whether to wait when no enough memory
+        Returns:
+            SharedBuffer
+        Raises:
+            SharedMemoryError when not found available memory
+        """
+        page_num = int(math.ceil(size / self._page_size))
+        size = page_num * self._page_size
+        start = None
+        ct = 0
+        errmsg = ''
+        while True:
+            self._locker.acquire()
+            try:
+                start = self._allocator.malloc_page(page_num)
+                alloc_status = str(self._allocator)
+            except MemoryFullError as e:
+                start = None
+                errmsg = e.errmsg
+                if not wait:
+                    raise e
+            finally:
+                self._locker.release()
+            if start is None:
+                time.sleep(0.1)
+                if ct % 100 == 0:
+                    logger.warn('not enough space for reason[%s]' % (errmsg))
+                ct += 1
+            else:
+                break
+        return SharedBuffer(self._id, size, start, alloc_status=alloc_status)
+    def free(self, shared_buf):
+        """ free a SharedBuffer
+        Args:
+            shared_buf (SharedBuffer): buffer to be freed
+        Returns:
+            None
+        Raises:
+            SharedMemoryError when failed to release this buffer
+        """
+        assert shared_buf._owner == self._id, "invalid shared_buf[%s] "\
+            "for it's not allocated from me[%s]" % (str(shared_buf), str(self))
+        cap = shared_buf.capacity()
+        start_page = shared_buf._pos
+        page_num = cap // self._page_size
+        #maybe we don't need this lock here
+        self._locker.acquire()
+        try:
+            self._allocator.free_page(start_page, page_num)
+        finally:
+            self._locker.release()
+    def put_data(self, shared_buf, data):
+        """  fill 'data' into 'shared_buf'
+        """
+        assert len(data) <= shared_buf.capacity(), 'too large data[%d] '\
+            'for this buffer[%s]' % (len(data), str(shared_buf))
+        start = shared_buf._pos * self._page_size
+        end = start + len(data)
+        assert start >= 0 and end <= self._cap, "invalid start "\
+            "position[%d] when put data to buff:%s" % (start, str(shared_buf))
+        self._base[start:end] = np.frombuffer(data, 'uint8', len(data))
+    def get_data(self, shared_buf, offset, size, no_copy=True):
+        """ extract 'data' from 'shared_buf' in range [offset, offset + size)
+        """
+        start = shared_buf._pos * self._page_size
+        start += offset
+        if no_copy:
+            return self._base[start:start + size]
+        else:
+            return self._base[start:start + size].tostring()
+    def __str__(self):
+        return 'SharedMemoryMgr:{id:%d, %s}' % (self._id, str(self._allocator))
+    def __del__(self):
+        if SharedMemoryMgr.s_log_statis:
+            logger.info('destroy [%s]' % (self))
+        if not self._released and not self._allocator.empty():
+            logger.debug(
+                'not empty when delete this SharedMemoryMgr[%s]' % (self))
+        else:
+            self._released = True
+        if self._id in SharedMemoryMgr.s_memory_mgrs:
+            del SharedMemoryMgr.s_memory_mgrs[self._id]
+            SharedMemoryMgr.s_mgr_num -= 1
--- a/contrib/HumanSeg/export.py
+++ b/contrib/HumanSeg/export.py
+import models
+import argparse
+def parse_args():
+    parser = argparse.ArgumentParser(description='Export model')
+    parser.add_argument(
+        '--model_dir',
+        dest='model_dir',
+        help='Model path for exporting',
+        type=str)
+    parser.add_argument(
+        '--save_dir',
+        dest='save_dir',
+        help='The directory for saving the export model',
+        type=str,
+        default='./output/export')
+    return parser.parse_args()
+def export(args):
+    model = models.load_model(args.model_dir)
+    model.export_inference_model(args.save_dir)
+if __name__ == '__main__':
+    args = parse_args()
+    export(args)
--- a/contrib/HumanSeg/imgs/Human.jpg
+++ b/contrib/HumanSeg/imgs/Human.jpg
--- a/contrib/HumanSeg/imgs/HumanSeg.jpg
+++ b/contrib/HumanSeg/imgs/HumanSeg.jpg
--- a/contrib/HumanSeg/infer.py
+++ b/contrib/HumanSeg/infer.py
-# -*- coding: utf-8 -*-
+import argparse
 import os
+import os.path as osp
 import cv2
 import numpy as np
-from utils.util import get_arguments
+import tqdm
-from utils.palette import get_palette
-from PIL import Image as PILImage
+import utils
-import importlib
+import models
+import transforms
-args = get_arguments()
-config = importlib.import_module('config')
-cfg = getattr(config, 'cfg')
+def parse_args():
+    parser = argparse.ArgumentParser(
-# paddle垃圾回收策略FLAG，ACE2P模型较大，当显存不够时建议开启
+        description='HumanSeg inference and visualization')
-os.environ['FLAGS_eager_delete_tensor_gb']='0.0'
+    parser.add_argument(
+        '--model_dir',
-import paddle.fluid as fluid
+        dest='model_dir',
+        help='Model path for inference',
-# 预测数据集类
+        type=str)
-class TestDataSet():
+    parser.add_argument(
-    def __init__(self):
+        '--data_dir',
-        self.data_dir = cfg.data_dir 
+        dest='data_dir',
-        self.data_list_file = cfg.data_list_file
+        help='The root directory of dataset',
-        self.data_list = self.get_data_list()
+        type=str)
-        self.data_num = len(self.data_list)
+    parser.add_argument(
+        '--test_list',
-    def get_data_list(self):
+        dest='test_list',
-        # 获取预测图像路径列表
+        help='Test list file of dataset',
-        data_list = []
+        type=str)
-        data_file_handler = open(self.data_list_file, 'r')
+    parser.add_argument(
-        for line in data_file_handler:
+        '--save_dir',
-            img_name = line.strip()
+        dest='save_dir',
-            name_prefix = img_name.split('.')[0]
+        help='The directory for saving the inference results',
-            if len(img_name.split('.')) == 1:
+        type=str,
-                img_name = img_name + '.jpg'
+        default='./output/result')
-            img_path = os.path.join(self.data_dir, img_name)
+    parser.add_argument(
-            data_list.append(img_path)
+        "--image_shape",
-        return data_list
+        dest="image_shape",
+        help="The image shape for net inputs.",
-    def preprocess(self, img):
+        nargs=2,
-        # 图像预处理
+        default=[192, 192],
-        if cfg.example == 'ACE2P':
+        type=int)
-            reader = importlib.import_module(args.example+'.reader')
+    return parser.parse_args()
-            ACE2P_preprocess = getattr(reader, 'preprocess')
-            img = ACE2P_preprocess(img)
-        else:
+def mkdir(path):
-            img = cv2.resize(img, cfg.input_size).astype(np.float32)
+    sub_dir = osp.dirname(path)
-            img -= np.array(cfg.MEAN)
+    if not osp.exists(sub_dir):
-            img /= np.array(cfg.STD)
+        os.makedirs(sub_dir)
-            img = img.transpose((2, 0, 1))
-            img = np.expand_dims(img, axis=0)
-        return img
+def infer(args):
+    test_transforms = transforms.Compose(
-    def get_data(self, index):
+        [transforms.Resize(args.image_shape),
-        # 获取图像信息
+         transforms.Normalize()])
-        img_path = self.data_list[index]
+    model = models.load_model(args.model_dir)
-        img = cv2.imread(img_path, cv2.IMREAD_COLOR)
+    added_saveed_path = osp.join(args.save_dir, 'added')
-        if img is None:
+    mat_saved_path = osp.join(args.save_dir, 'mat')
-            return img, img,img_path, None
+    scoremap_saved_path = osp.join(args.save_dir, 'scoremap')
-        img_name = img_path.split(os.sep)[-1]
+    with open(args.test_list, 'r') as f:
-        name_prefix = img_name.replace('.'+img_name.split('.')[-1],'')
+        files = f.readlines()
-        img_shape = img.shape[:2]
-        img_process = self.preprocess(img)
+    for file in tqdm.tqdm(files):
+        file = file.strip()
-        return img, img_process, name_prefix, img_shape
+        im_file = osp.join(args.data_dir, file)
+        im = cv2.imread(im_file)
+        result = model.predict(im, transforms=test_transforms)
-def infer():
-    if not os.path.exists(cfg.vis_dir):
+        # save added image
-        os.makedirs(cfg.vis_dir)
+        added_image = utils.visualize(im_file, result, weight=0.6)
-    palette = get_palette(cfg.class_num)
+        added_image_file = osp.join(added_saveed_path, file)
-    # 人像分割结果显示阈值
+        mkdir(added_image_file)
-    thresh = 120
+        cv2.imwrite(added_image_file, added_image)
-    place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
+        # save score map
-    exe = fluid.Executor(place)
+        score_map = result['score_map'][:, :, 1]
+        score_map = (score_map * 255).astype(np.uint8)
-    # 加载预测模型
+        score_map_file = osp.join(scoremap_saved_path, file)
-    test_prog, feed_name, fetch_list = fluid.io.load_inference_model(
+        mkdir(score_map_file)
-        dirname=cfg.model_path, executor=exe, params_filename='__params__')
+        cv2.imwrite(score_map_file, score_map)
-    #加载预测数据集
+        # save mat image
-    test_dataset = TestDataSet()
+        score_map = np.expand_dims(score_map, axis=-1)
-    data_num = test_dataset.data_num
+        mat_image = np.concatenate([im, score_map], axis=2)
+        mat_file = osp.join(mat_saved_path, file)
-    for idx in range(data_num):
+        ext = osp.splitext(mat_file)[-1]
-        # 数据获取
+        mat_file = mat_file.replace(ext, '.png')
-        ori_img, image, im_name, im_shape = test_dataset.get_data(idx)
+        mkdir(mat_file)
-        if image is None:
+        cv2.imwrite(mat_file, mat_image)
-            print(im_name, 'is None')
-            continue
+if __name__ == '__main__':
-        # 预测
+    args = parse_args()
-        if cfg.example == 'ACE2P':
+    infer(args)
-            # ACE2P模型使用多尺度预测
-            reader = importlib.import_module(args.example+'.reader')
-            multi_scale_test = getattr(reader, 'multi_scale_test')
-            parsing, logits = multi_scale_test(exe, test_prog, feed_name, fetch_list, image, im_shape)
-        else:
-            # HumanSeg,RoadLine模型单尺度预测
-            result = exe.run(program=test_prog, feed={feed_name[0]: image}, fetch_list=fetch_list)
-            parsing = np.argmax(result[0][0], axis=0)
-            parsing = cv2.resize(parsing.astype(np.uint8), im_shape[::-1])
-        # 预测结果保存
-        result_path = os.path.join(cfg.vis_dir, im_name + '.png')
-        if cfg.example == 'HumanSeg':
-            logits = result[0][0][1]*255
-            logits = cv2.resize(logits, im_shape[::-1])
-            ret, logits = cv2.threshold(logits, thresh, 0, cv2.THRESH_TOZERO)
-            logits = 255 *(logits - thresh)/(255 - thresh)
-            # 将分割结果添加到alpha通道
-            rgba = np.concatenate((ori_img, np.expand_dims(logits, axis=2)), axis=2)
-            cv2.imwrite(result_path, rgba)
-        else: 
-            output_im = PILImage.fromarray(np.asarray(parsing, dtype=np.uint8))
-            output_im.putpalette(palette)
-            output_im.save(result_path)
-        if (idx + 1) % 100 == 0:
-            print('%d  processd' % (idx + 1))
-    print('%d  processd done' % (idx + 1))   
-    return 0
-if __name__ == "__main__":
-    infer()
--- a/contrib/HumanSeg/models/__init__.py
+++ b/contrib/HumanSeg/models/__init__.py
+from .humanseg import HumanSegMobile
+from .humanseg import HumanSegServer
+from .humanseg import HumanSegLite
+from .load_model import load_model
--- a/contrib/HumanSeg/models/humanseg.py
+++ b/contrib/HumanSeg/models/humanseg.py
--- a/contrib/HumanSeg/models/load_model.py
+++ b/contrib/HumanSeg/models/load_model.py
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import yaml
+import os.path as osp
+import six
+import copy
+from collections import OrderedDict
+import paddle.fluid as fluid
+import utils.logging as logging
+import models
+def load_model(model_dir):
+    if not osp.exists(osp.join(model_dir, "model.yml")):
+        raise Exception("There's not model.yml in {}".format(model_dir))
+    with open(osp.join(model_dir, "model.yml")) as f:
+        info = yaml.load(f.read(), Loader=yaml.Loader)
+    status = info['status']
+    if not hasattr(models, info['Model']):
+        raise Exception("There's no attribute {} in models".format(
+            info['Model']))
+    model = getattr(models, info['Model'])(**info['_init_params'])
+    if status == "Normal":
+        startup_prog = fluid.Program()
+        model.test_prog = fluid.Program()
+        with fluid.program_guard(model.test_prog, startup_prog):
+            with fluid.unique_name.guard():
+                model.test_inputs, model.test_outputs = model.build_net(
+                    mode='test')
+        model.test_prog = model.test_prog.clone(for_test=True)
+        model.exe.run(startup_prog)
+        import pickle
+        with open(osp.join(model_dir, 'model.pdparams'), 'rb') as f:
+            load_dict = pickle.load(f)
+        fluid.io.set_program_state(model.test_prog, load_dict)
+    elif status in ['Infer', 'Quant']:
+        [prog, input_names, outputs] = fluid.io.load_inference_model(
+            model_dir, model.exe, params_filename='__params__')
+        model.test_prog = prog
+        test_outputs_info = info['_ModelInputsOutputs']['test_outputs']
+        model.test_inputs = OrderedDict()
+        model.test_outputs = OrderedDict()
+        for name in input_names:
+            model.test_inputs[name] = model.test_prog.global_block().var(name)
+        for i, out in enumerate(outputs):
+            var_desc = test_outputs_info[i]
+            model.test_outputs[var_desc[0]] = out
+    if 'test_transforms' in info:
+        model.test_transforms = build_transforms(info['test_transforms'])
+        model.eval_transforms = copy.deepcopy(model.test_transforms)
+    if '_Attributes' in info:
+        for k, v in info['_Attributes'].items():
+            if k in model.__dict__:
+                model.__dict__[k] = v
+    logging.info("Model[{}] loaded.".format(info['Model']))
+    return model
+def build_transforms(transforms_info):
+    import transforms as T
+    transforms = list()
+    for op_info in transforms_info:
+        op_name = list(op_info.keys())[0]
+        op_attr = op_info[op_name]
+        if not hasattr(T, op_name):
+            raise Exception(
+                "There's no operator named '{}' in transforms".format(op_name))
+        transforms.append(getattr(T, op_name)(**op_attr))
+    eval_transforms = T.Compose(transforms)
+    return eval_transforms
--- a/contrib/HumanSeg/nets/__init__.py
+++ b/contrib/HumanSeg/nets/__init__.py
+from .backbone import mobilenet_v2
+from .backbone import xception
+from .deeplabv3p import DeepLabv3p
+from .shufflenet_slim import ShuffleSeg
+from .hrnet import HRNet
--- a/contrib/HumanSeg/nets/backbone/__init__.py
+++ b/contrib/HumanSeg/nets/backbone/__init__.py
+from .mobilenet_v2 import MobileNetV2
+from .xception import Xception
--- a/contrib/HumanSeg/nets/backbone/mobilenet_v2.py
+++ b/contrib/HumanSeg/nets/backbone/mobilenet_v2.py
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+class MobileNetV2:
+    def __init__(self,
+                 num_classes=None,
+                 scale=1.0,
+                 output_stride=None,
+                 end_points=None,
+                 decode_points=None):
+        self.scale = scale
+        self.num_classes = num_classes
+        self.output_stride = output_stride
+        self.end_points = end_points
+        self.decode_points = decode_points
+        self.bottleneck_params_list = [(1, 16, 1, 1), (6, 24, 2, 2),
+                                       (6, 32, 3, 2), (6, 64, 4, 2),
+                                       (6, 96, 3, 1), (6, 160, 3, 2),
+                                       (6, 320, 1, 1)]
+        self.modify_bottle_params(output_stride)
+    def __call__(self, input):
+        scale = self.scale
+        decode_ends = dict()
+        def check_points(count, points):
+            if points is None:
+                return False
+            else:
+                if isinstance(points, list):
+                    return (True if count in points else False)
+                else:
+                    return (True if count == points else False)
+        # conv1
+        input = self.conv_bn_layer(
+            input,
+            num_filters=int(32 * scale),
+            filter_size=3,
+            stride=2,
+            padding=1,
+            if_act=True,
+            name='conv1_1')
+        layer_count = 1
+        if check_points(layer_count, self.decode_points):
+            decode_ends[layer_count] = input
+        if check_points(layer_count, self.end_points):
+            return input, decode_ends
+        # bottleneck sequences
+        i = 1
+        in_c = int(32 * scale)
+        for layer_setting in self.bottleneck_params_list:
+            t, c, n, s = layer_setting
+            i += 1
+            input, depthwise_output = self.invresi_blocks(
+                input=input,
+                in_c=in_c,
+                t=t,
+                c=int(c * scale),
+                n=n,
+                s=s,
+                name='conv' + str(i))
+            in_c = int(c * scale)
+            layer_count += n
+            if check_points(layer_count, self.decode_points):
+                decode_ends[layer_count] = depthwise_output
+            if check_points(layer_count, self.end_points):
+                return input, decode_ends
+        # last_conv
+        output = self.conv_bn_layer(
+            input=input,
+            num_filters=int(1280 * scale) if scale > 1.0 else 1280,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            if_act=True,
+            name='conv9')
+        if self.num_classes is not None:
+            output = fluid.layers.pool2d(
+                input=output, pool_type='avg', global_pooling=True)
+            output = fluid.layers.fc(
+                input=output,
+                size=self.num_classes,
+                param_attr=ParamAttr(name='fc10_weights'),
+                bias_attr=ParamAttr(name='fc10_offset'))
+        return output
+    def modify_bottle_params(self, output_stride=None):
+        if output_stride is not None and output_stride % 2 != 0:
+            raise Exception("output stride must to be even number")
+        if output_stride is None:
+            return
+        else:
+            stride = 2
+            for i, layer_setting in enumerate(self.bottleneck_params_list):
+                t, c, n, s = layer_setting
+                stride = stride * s
+                if stride > output_stride:
+                    s = 1
+                self.bottleneck_params_list[i] = (t, c, n, s)
+    def conv_bn_layer(self,
+                      input,
+                      filter_size,
+                      num_filters,
+                      stride,
+                      padding,
+                      channels=None,
+                      num_groups=1,
+                      if_act=True,
+                      name=None,
+                      use_cudnn=True):
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=stride,
+            padding=padding,
+            groups=num_groups,
+            act=None,
+            use_cudnn=use_cudnn,
+            param_attr=ParamAttr(name=name + '_weights'),
+            bias_attr=False)
+        bn_name = name + '_bn'
+        bn = fluid.layers.batch_norm(
+            input=conv,
+            param_attr=ParamAttr(name=bn_name + "_scale"),
+            bias_attr=ParamAttr(name=bn_name + "_offset"),
+            moving_mean_name=bn_name + '_mean',
+            moving_variance_name=bn_name + '_variance')
+        if if_act:
+            return fluid.layers.relu6(bn)
+        else:
+            return bn
+    def shortcut(self, input, data_residual):
+        return fluid.layers.elementwise_add(input, data_residual)
+    def inverted_residual_unit(self,
+                               input,
+                               num_in_filter,
+                               num_filters,
+                               ifshortcut,
+                               stride,
+                               filter_size,
+                               padding,
+                               expansion_factor,
+                               name=None):
+        num_expfilter = int(round(num_in_filter * expansion_factor))
+        channel_expand = self.conv_bn_layer(
+            input=input,
+            num_filters=num_expfilter,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            num_groups=1,
+            if_act=True,
+            name=name + '_expand')
+        bottleneck_conv = self.conv_bn_layer(
+            input=channel_expand,
+            num_filters=num_expfilter,
+            filter_size=filter_size,
+            stride=stride,
+            padding=padding,
+            num_groups=num_expfilter,
+            if_act=True,
+            name=name + '_dwise',
+            use_cudnn=False)
+        depthwise_output = bottleneck_conv
+        linear_out = self.conv_bn_layer(
+            input=bottleneck_conv,
+            num_filters=num_filters,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            num_groups=1,
+            if_act=False,
+            name=name + '_linear')
+        if ifshortcut:
+            out = self.shortcut(input=input, data_residual=linear_out)
+            return out, depthwise_output
+        else:
+            return linear_out, depthwise_output
+    def invresi_blocks(self, input, in_c, t, c, n, s, name=None):
+        first_block, depthwise_output = self.inverted_residual_unit(
+            input=input,
+            num_in_filter=in_c,
+            num_filters=c,
+            ifshortcut=False,
+            stride=s,
+            filter_size=3,
+            padding=1,
+            expansion_factor=t,
+            name=name + '_1')
+        last_residual_block = first_block
+        last_c = c
+        for i in range(1, n):
+            last_residual_block, depthwise_output = self.inverted_residual_unit(
+                input=last_residual_block,
+                num_in_filter=last_c,
+                num_filters=c,
+                ifshortcut=True,
+                stride=1,
+                filter_size=3,
+                padding=1,
+                expansion_factor=t,
+                name=name + '_' + str(i + 1))
+        return last_residual_block, depthwise_output
--- a/contrib/HumanSeg/nets/backbone/xception.py
+++ b/contrib/HumanSeg/nets/backbone/xception.py
+# coding: utf8
+# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import math
+import paddle.fluid as fluid
+from nets.libs import scope, name_scope
+from nets.libs import bn, bn_relu, relu
+from nets.libs import conv
+from nets.libs import separate_conv
+__all__ = ['xception_65', 'xception_41', 'xception_71']
+def check_data(data, number):
+    if type(data) == int:
+        return [data] * number
+    assert len(data) == number
+    return data
+def check_stride(s, os):
+    if s <= os:
+        return True
+    else:
+        return False
+def check_points(count, points):
+    if points is None:
+        return False
+    else:
+        if isinstance(points, list):
+            return (True if count in points else False)
+        else:
+            return (True if count == points else False)
+class Xception():
+    def __init__(self,
+                 num_classes=None,
+                 layers=65,
+                 output_stride=32,
+                 end_points=None,
+                 decode_points=None):
+        self.backbone = 'xception_' + str(layers)
+        self.num_classes = num_classes
+        self.output_stride = output_stride
+        self.end_points = end_points
+        self.decode_points = decode_points
+        self.bottleneck_params = self.gen_bottleneck_params(self.backbone)
+    def __call__(
+            self,
+            input,
+    ):
+        self.stride = 2
+        self.block_point = 0
+        self.short_cuts = dict()
+        with scope(self.backbone):
+            # Entry flow
+            data = self.entry_flow(input)
+            if check_points(self.block_point, self.end_points):
+                return data, self.short_cuts
+            # Middle flow
+            data = self.middle_flow(data)
+            if check_points(self.block_point, self.end_points):
+                return data, self.short_cuts
+            # Exit flow
+            data = self.exit_flow(data)
+            if check_points(self.block_point, self.end_points):
+                return data, self.short_cuts
+        if self.num_classes is not None:
+            data = fluid.layers.reduce_mean(data, [2, 3], keep_dim=True)
+            data = fluid.layers.dropout(data, 0.5)
+            stdv = 1.0 / math.sqrt(data.shape[1] * 1.0)
+            with scope("logit"):
+                out = fluid.layers.fc(
+                    input=data,
+                    size=self.num_classes,
+                    act='softmax',
+                    param_attr=fluid.param_attr.ParamAttr(
+                        name='weights',
+                        initializer=fluid.initializer.Uniform(-stdv, stdv)),
+                    bias_attr=fluid.param_attr.ParamAttr(name='bias'))
+            return out
+        else:
+            return data
+    def gen_bottleneck_params(self, backbone='xception_65'):
+        if backbone == 'xception_65':
+            bottleneck_params = {
+                "entry_flow": (3, [2, 2, 2], [128, 256, 728]),
+                "middle_flow": (16, 1, 728),
+                "exit_flow": (2, [2, 1], [[728, 1024, 1024], [1536, 1536,
+                                                              2048]])
+            }
+        elif backbone == 'xception_41':
+            bottleneck_params = {
+                "entry_flow": (3, [2, 2, 2], [128, 256, 728]),
+                "middle_flow": (8, 1, 728),
+                "exit_flow": (2, [2, 1], [[728, 1024, 1024], [1536, 1536,
+                                                              2048]])
+            }
+        elif backbone == 'xception_71':
+            bottleneck_params = {
+                "entry_flow": (5, [2, 1, 2, 1, 2], [128, 256, 256, 728, 728]),
+                "middle_flow": (16, 1, 728),
+                "exit_flow": (2, [2, 1], [[728, 1024, 1024], [1536, 1536,
+                                                              2048]])
+            }
+        else:
+            raise Exception(
+                "xception backbont only support xception_41/xception_65/xception_71"
+            )
+        return bottleneck_params
+    def entry_flow(self, data):
+        param_attr = fluid.ParamAttr(
+            name=name_scope + 'weights',
+            regularizer=None,
+            initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.09))
+        with scope("entry_flow"):
+            with scope("conv1"):
+                data = bn_relu(
+                    conv(
+                        data, 32, 3, stride=2, padding=1,
+                        param_attr=param_attr),
+                    eps=1e-3)
+            with scope("conv2"):
+                data = bn_relu(
+                    conv(
+                        data, 64, 3, stride=1, padding=1,
+                        param_attr=param_attr),
+                    eps=1e-3)
+        # get entry flow params
+        block_num = self.bottleneck_params["entry_flow"][0]
+        strides = self.bottleneck_params["entry_flow"][1]
+        chns = self.bottleneck_params["entry_flow"][2]
+        strides = check_data(strides, block_num)
+        chns = check_data(chns, block_num)
+        # params to control your flow
+        s = self.stride
+        block_point = self.block_point
+        output_stride = self.output_stride
+        with scope("entry_flow"):
+            for i in range(block_num):
+                block_point = block_point + 1
+                with scope("block" + str(i + 1)):
+                    stride = strides[i] if check_stride(s * strides[i],
+                                                        output_stride) else 1
+                    data, short_cuts = self.xception_block(
+                        data, chns[i], [1, 1, stride])
+                    s = s * stride
+                    if check_points(block_point, self.decode_points):
+                        self.short_cuts[block_point] = short_cuts[1]
+        self.stride = s
+        self.block_point = block_point
+        return data
+    def middle_flow(self, data):
+        block_num = self.bottleneck_params["middle_flow"][0]
+        strides = self.bottleneck_params["middle_flow"][1]
+        chns = self.bottleneck_params["middle_flow"][2]
+        strides = check_data(strides, block_num)
+        chns = check_data(chns, block_num)
+        # params to control your flow
+        s = self.stride
+        block_point = self.block_point
+        output_stride = self.output_stride
+        with scope("middle_flow"):
+            for i in range(block_num):
+                block_point = block_point + 1
+                with scope("block" + str(i + 1)):
+                    stride = strides[i] if check_stride(s * strides[i],
+                                                        output_stride) else 1
+                    data, short_cuts = self.xception_block(
+                        data, chns[i], [1, 1, strides[i]], skip_conv=False)
+                    s = s * stride
+                    if check_points(block_point, self.decode_points):
+                        self.short_cuts[block_point] = short_cuts[1]
+        self.stride = s
+        self.block_point = block_point
+        return data
+    def exit_flow(self, data):
+        block_num = self.bottleneck_params["exit_flow"][0]
+        strides = self.bottleneck_params["exit_flow"][1]
+        chns = self.bottleneck_params["exit_flow"][2]
+        strides = check_data(strides, block_num)
+        chns = check_data(chns, block_num)
+        assert (block_num == 2)
+        # params to control your flow
+        s = self.stride
+        block_point = self.block_point
+        output_stride = self.output_stride
+        with scope("exit_flow"):
+            with scope('block1'):
+                block_point += 1
+                stride = strides[0] if check_stride(s * strides[0],
+                                                    output_stride) else 1
+                data, short_cuts = self.xception_block(data, chns[0],
+                                                       [1, 1, stride])
+                s = s * stride
+                if check_points(block_point, self.decode_points):
+                    self.short_cuts[block_point] = short_cuts[1]
+            with scope('block2'):
+                block_point += 1
+                stride = strides[1] if check_stride(s * strides[1],
+                                                    output_stride) else 1
+                data, short_cuts = self.xception_block(
+                    data,
+                    chns[1], [1, 1, stride],
+                    dilation=2,
+                    has_skip=False,
+                    activation_fn_in_separable_conv=True)
+                s = s * stride
+                if check_points(block_point, self.decode_points):
+                    self.short_cuts[block_point] = short_cuts[1]
+        self.stride = s
+        self.block_point = block_point
+        return data
+    def xception_block(self,
+                       input,
+                       channels,
+                       strides=1,
+                       filters=3,
+                       dilation=1,
+                       skip_conv=True,
+                       has_skip=True,
+                       activation_fn_in_separable_conv=False):
+        repeat_number = 3
+        channels = check_data(channels, repeat_number)
+        filters = check_data(filters, repeat_number)
+        strides = check_data(strides, repeat_number)
+        data = input
+        results = []
+        for i in range(repeat_number):
+            with scope('separable_conv' + str(i + 1)):
+                if not activation_fn_in_separable_conv:
+                    data = relu(data)
+                    data = separate_conv(
+                        data,
+                        channels[i],
+                        strides[i],
+                        filters[i],
+                        dilation=dilation,
+                        eps=1e-3)
+                else:
+                    data = separate_conv(
+                        data,
+                        channels[i],
+                        strides[i],
+                        filters[i],
+                        dilation=dilation,
+                        act=relu,
+                        eps=1e-3)
+                results.append(data)
+        if not has_skip:
+            return data, results
+        if skip_conv:
+            param_attr = fluid.ParamAttr(
+                name=name_scope + 'weights',
+                regularizer=None,
+                initializer=fluid.initializer.TruncatedNormal(
+                    loc=0.0, scale=0.09))
+            with scope('shortcut'):
+                skip = bn(
+                    conv(
+                        input,
+                        channels[-1],
+                        1,
+                        strides[-1],
+                        groups=1,
+                        padding=0,
+                        param_attr=param_attr),
+                    eps=1e-3)
+        else:
+            skip = input
+        return data + skip, results
+def xception_65(num_classes=None):
+    model = Xception(num_classes, 65)
+    return model
+def xception_41(num_classes=None):
+    model = Xception(num_classes, 41)
+    return model
+def xception_71(num_classes=None):
+    model = Xception(num_classes, 71)
+    return model
--- a/contrib/HumanSeg/nets/deeplabv3p.py
+++ b/contrib/HumanSeg/nets/deeplabv3p.py
+# coding: utf8
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from collections import OrderedDict
+import paddle.fluid as fluid
+from .libs import scope, name_scope
+from .libs import bn_relu, relu
+from .libs import conv
+from .libs import separate_conv
+from .libs import sigmoid_to_softmax
+from .seg_modules import softmax_with_loss
+from .seg_modules import dice_loss
+from .seg_modules import bce_loss
+from .backbone import MobileNetV2
+from .backbone import Xception
+class DeepLabv3p(object):
+    """实现DeepLabv3+模型
+    `"Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation"
+    <https://arxiv.org/abs/1802.02611>`
+    Args:
+        num_classes (int): 类别数。
+        backbone (str): DeepLabv3+的backbone网络，实现特征图的计算，取值范围为['Xception65', 'Xception41',
+            'MobileNetV2_x0.25', 'MobileNetV2_x0.5', 'MobileNetV2_x1.0', 'MobileNetV2_x1.5',
+            'MobileNetV2_x2.0']。默认'MobileNetV2_x1.0'。
+        mode (str): 网络运行模式，根据mode构建网络的输入和返回。
+            当mode为'train'时，输入为image(-1, 3, -1, -1)和label (-1, 1, -1, -1) 返回loss。
+            当mode为'train'时，输入为image (-1, 3, -1, -1)和label  (-1, 1, -1, -1)，返回loss，
+            pred (与网络输入label 相同大小的预测结果，值代表相应的类别），label，mask（非忽略值的mask，
+            与label相同大小，bool类型）。
+            当mode为'test'时，输入为image(-1, 3, -1, -1)返回pred (-1, 1, -1, -1)和
+            logit (-1, num_classes, -1, -1) 通道维上代表每一类的概率值。
+        output_stride (int): backbone 输出特征图相对于输入的下采样倍数，一般取值为8或16。
+        aspp_with_sep_conv (bool): 在asspp模块是否采用separable convolutions。
+        decoder_use_sep_conv (bool)： decoder模块是否采用separable convolutions。
+        encoder_with_aspp (bool): 是否在encoder阶段采用aspp模块。
+        enable_decoder (bool): 是否使用decoder模块。
+        use_bce_loss (bool): 是否使用bce loss作为网络的损失函数，只能用于两类分割。可与dice loss同时使用。
+        use_dice_loss (bool): 是否使用dice loss作为网络的损失函数，只能用于两类分割，可与bce loss同时使用。
+            当use_bce_loss和use_dice_loss都为False时，使用交叉熵损失函数。
+        class_weight (list/str): 交叉熵损失函数各类损失的权重。当class_weight为list的时候，长度应为
+            num_classes。当class_weight为str时， weight.lower()应为'dynamic'，这时会根据每一轮各类像素的比重
+            自行计算相应的权重，每一类的权重为：每类的比例 * num_classes。class_weight取默认值None是，各类的权重1，
+            即平时使用的交叉熵损失函数。
+        ignore_index (int): label上忽略的值，label为ignore_index的像素不参与损失函数的计算。
+    Raises:
+        ValueError: use_bce_loss或use_dice_loss为真且num_calsses > 2。
+        ValueError: class_weight为list, 但长度不等于num_class。
+            class_weight为str, 但class_weight.low()不等于dynamic。
+        TypeError: class_weight不为None时，其类型不是list或str。
+    """
+    def __init__(self,
+                 num_classes,
+                 backbone='MobileNetV2_x1.0',
+                 mode='train',
+                 output_stride=16,
+                 aspp_with_sep_conv=True,
+                 decoder_use_sep_conv=True,
+                 encoder_with_aspp=True,
+                 enable_decoder=True,
+                 use_bce_loss=False,
+                 use_dice_loss=False,
+                 class_weight=None,
+                 ignore_index=255):
+        # dice_loss或bce_loss只适用两类分割中
+        if num_classes > 2 and (use_bce_loss or use_dice_loss):
+            raise ValueError(
+                "dice loss and bce loss is only applicable to binary classfication"
+            )
+        if class_weight is not None:
+            if isinstance(class_weight, list):
+                if len(class_weight) != num_classes:
+                    raise ValueError(
+                        "Length of class_weight should be equal to number of classes"
+                    )
+            elif isinstance(class_weight, str):
+                if class_weight.lower() != 'dynamic':
+                    raise ValueError(
+                        "if class_weight is string, must be dynamic!")
+            else:
+                raise TypeError(
+                    'Expect class_weight is a list or string but receive {}'.
+                    format(type(class_weight)))
+        self.num_classes = num_classes
+        self.backbone = backbone
+        self.mode = mode
+        self.use_bce_loss = use_bce_loss
+        self.use_dice_loss = use_dice_loss
+        self.class_weight = class_weight
+        self.ignore_index = ignore_index
+        self.output_stride = output_stride
+        self.aspp_with_sep_conv = aspp_with_sep_conv
+        self.decoder_use_sep_conv = decoder_use_sep_conv
+        self.encoder_with_aspp = encoder_with_aspp
+        self.enable_decoder = enable_decoder
+    def _get_backbone(self, backbone):
+        def mobilenetv2(backbone):
+            # backbone: xception结构配置
+            # output_stride：下采样倍数
+            # end_points: mobilenetv2的block数
+            # decode_point: 从mobilenetv2中引出分支所在block数, 作为decoder输入
+            if '0.25' in backbone:
+                scale = 0.25
+            elif '0.5' in backbone:
+                scale = 0.5
+            elif '1.0' in backbone:
+                scale = 1.0
+            elif '1.5' in backbone:
+                scale = 1.5
+            elif '2.0' in backbone:
+                scale = 2.0
+            end_points = 18
+            decode_points = 4
+            return MobileNetV2(
+                scale=scale,
+                output_stride=self.output_stride,
+                end_points=end_points,
+                decode_points=decode_points)
+        def xception(backbone):
+            # decode_point: 从Xception中引出分支所在block数，作为decoder输入
+            # end_point：Xception的block数
+            if '65' in backbone:
+                decode_points = 2
+                end_points = 21
+                layers = 65
+            if '41' in backbone:
+                decode_points = 2
+                end_points = 13
+                layers = 41
+            if '71' in backbone:
+                decode_points = 3
+                end_points = 23
+                layers = 71
+            return Xception(
+                layers=layers,
+                output_stride=self.output_stride,
+                end_points=end_points,
+                decode_points=decode_points)
+        if 'Xception' in backbone:
+            return xception(backbone)
+        elif 'MobileNetV2' in backbone:
+            return mobilenetv2(backbone)
+    def _encoder(self, input):
+        # 编码器配置，采用ASPP架构，pooling + 1x1_conv + 三个不同尺度的空洞卷积并行, concat后1x1conv
+        # ASPP_WITH_SEP_CONV：默认为真，使用depthwise可分离卷积，否则使用普通卷积
+        # OUTPUT_STRIDE: 下采样倍数，8或16，决定aspp_ratios大小
+        # aspp_ratios：ASPP模块空洞卷积的采样率
+        if self.output_stride == 16:
+            aspp_ratios = [6, 12, 18]
+        elif self.output_stride == 8:
+            aspp_ratios = [12, 24, 36]
+        else:
+            raise Exception("DeepLabv3p only support stride 8 or 16")
+        param_attr = fluid.ParamAttr(
+            name=name_scope + 'weights',
+            regularizer=None,
+            initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
+        with scope('encoder'):
+            channel = 256
+            with scope("image_pool"):
+                image_avg = fluid.layers.reduce_mean(
+                    input, [2, 3], keep_dim=True)
+                image_avg = bn_relu(
+                    conv(
+                        image_avg,
+                        channel,
+                        1,
+                        1,
+                        groups=1,
+                        padding=0,
+                        param_attr=param_attr))
+                input_shape = fluid.layers.shape(input)
+                image_avg = fluid.layers.resize_bilinear(
+                    image_avg, input_shape[2:])
+            with scope("aspp0"):
+                aspp0 = bn_relu(
+                    conv(
+                        input,
+                        channel,
+                        1,
+                        1,
+                        groups=1,
+                        padding=0,
+                        param_attr=param_attr))
+            with scope("aspp1"):
+                if self.aspp_with_sep_conv:
+                    aspp1 = separate_conv(
+                        input, channel, 1, 3, dilation=aspp_ratios[0], act=relu)
+                else:
+                    aspp1 = bn_relu(
+                        conv(
+                            input,
+                            channel,
+                            stride=1,
+                            filter_size=3,
+                            dilation=aspp_ratios[0],
+                            padding=aspp_ratios[0],
+                            param_attr=param_attr))
+            with scope("aspp2"):
+                if self.aspp_with_sep_conv:
+                    aspp2 = separate_conv(
+                        input, channel, 1, 3, dilation=aspp_ratios[1], act=relu)
+                else:
+                    aspp2 = bn_relu(
+                        conv(
+                            input,
+                            channel,
+                            stride=1,
+                            filter_size=3,
+                            dilation=aspp_ratios[1],
+                            padding=aspp_ratios[1],
+                            param_attr=param_attr))
+            with scope("aspp3"):
+                if self.aspp_with_sep_conv:
+                    aspp3 = separate_conv(
+                        input, channel, 1, 3, dilation=aspp_ratios[2], act=relu)
+                else:
+                    aspp3 = bn_relu(
+                        conv(
+                            input,
+                            channel,
+                            stride=1,
+                            filter_size=3,
+                            dilation=aspp_ratios[2],
+                            padding=aspp_ratios[2],
+                            param_attr=param_attr))
+            with scope("concat"):
+                data = fluid.layers.concat(
+                    [image_avg, aspp0, aspp1, aspp2, aspp3], axis=1)
+                data = bn_relu(
+                    conv(
+                        data,
+                        channel,
+                        1,
+                        1,
+                        groups=1,
+                        padding=0,
+                        param_attr=param_attr))
+                data = fluid.layers.dropout(data, 0.9)
+            return data
+    def _decoder(self, encode_data, decode_shortcut):
+        # 解码器配置
+        # encode_data：编码器输出
+        # decode_shortcut: 从backbone引出的分支, resize后与encode_data concat
+        # decoder_use_sep_conv: 默认为真，则concat后连接两个可分离卷积，否则为普通卷积
+        param_attr = fluid.ParamAttr(
+            name=name_scope + 'weights',
+            regularizer=None,
+            initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
+        with scope('decoder'):
+            with scope('concat'):
+                decode_shortcut = bn_relu(
+                    conv(
+                        decode_shortcut,
+                        48,
+                        1,
+                        1,
+                        groups=1,
+                        padding=0,
+                        param_attr=param_attr))
+                decode_shortcut_shape = fluid.layers.shape(decode_shortcut)
+                encode_data = fluid.layers.resize_bilinear(
+                    encode_data, decode_shortcut_shape[2:])
+                encode_data = fluid.layers.concat(
+                    [encode_data, decode_shortcut], axis=1)
+            if self.decoder_use_sep_conv:
+                with scope("separable_conv1"):
+                    encode_data = separate_conv(
+                        encode_data, 256, 1, 3, dilation=1, act=relu)
+                with scope("separable_conv2"):
+                    encode_data = separate_conv(
+                        encode_data, 256, 1, 3, dilation=1, act=relu)
+            else:
+                with scope("decoder_conv1"):
+                    encode_data = bn_relu(
+                        conv(
+                            encode_data,
+                            256,
+                            stride=1,
+                            filter_size=3,
+                            dilation=1,
+                            padding=1,
+                            param_attr=param_attr))
+                with scope("decoder_conv2"):
+                    encode_data = bn_relu(
+                        conv(
+                            encode_data,
+                            256,
+                            stride=1,
+                            filter_size=3,
+                            dilation=1,
+                            padding=1,
+                            param_attr=param_attr))
+            return encode_data
+    def _get_loss(self, logit, label, mask):
+        avg_loss = 0
+        if not (self.use_dice_loss or self.use_bce_loss):
+            avg_loss += softmax_with_loss(
+                logit,
+                label,
+                mask,
+                num_classes=self.num_classes,
+                weight=self.class_weight,
+                ignore_index=self.ignore_index)
+        else:
+            if self.use_dice_loss:
+                avg_loss += dice_loss(logit, label, mask)
+            if self.use_bce_loss:
+                avg_loss += bce_loss(
+                    logit, label, mask, ignore_index=self.ignore_index)
+        return avg_loss
+    def generate_inputs(self):
+        inputs = OrderedDict()
+        inputs['image'] = fluid.data(
+            dtype='float32', shape=[None, 3, None, None], name='image')
+        if self.mode == 'train':
+            inputs['label'] = fluid.data(
+                dtype='int32', shape=[None, 1, None, None], name='label')
+        elif self.mode == 'eval':
+            inputs['label'] = fluid.data(
+                dtype='int32', shape=[None, 1, None, None], name='label')
+        return inputs
+    def build_net(self, inputs):
+        # 在两类分割情况下，当loss函数选择dice_loss或bce_loss的时候，最后logit输出通道数设置为1
+        if self.use_dice_loss or self.use_bce_loss:
+            self.num_classes = 1
+        image = inputs['image']
+        backbone_net = self._get_backbone(self.backbone)
+        data, decode_shortcuts = backbone_net(image)
+        decode_shortcut = decode_shortcuts[backbone_net.decode_points]
+        # 编码器解码器设置
+        if self.encoder_with_aspp:
+            data = self._encoder(data)
+        if self.enable_decoder:
+            data = self._decoder(data, decode_shortcut)
+        # 根据类别数设置最后一个卷积层输出，并resize到图片原始尺寸
+        param_attr = fluid.ParamAttr(
+            name=name_scope + 'weights',
+            regularizer=fluid.regularizer.L2DecayRegularizer(
+                regularization_coeff=0.0),
+            initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
+        with scope('logit'):
+            with fluid.name_scope('last_conv'):
+                logit = conv(
+                    data,
+                    self.num_classes,
+                    1,
+                    stride=1,
+                    padding=0,
+                    bias_attr=True,
+                    param_attr=param_attr)
+            image_shape = fluid.layers.shape(image)
+            logit = fluid.layers.resize_bilinear(logit, image_shape[2:])
+        if self.num_classes == 1:
+            out = sigmoid_to_softmax(logit)
+            out = fluid.layers.transpose(out, [0, 2, 3, 1])
+        else:
+            out = fluid.layers.transpose(logit, [0, 2, 3, 1])
+        pred = fluid.layers.argmax(out, axis=3)
+        pred = fluid.layers.unsqueeze(pred, axes=[3])
+        if self.mode == 'train':
+            label = inputs['label']
+            mask = label != self.ignore_index
+            return self._get_loss(logit, label, mask)
+        else:
+            if self.num_classes == 1:
+                logit = sigmoid_to_softmax(logit)
+            else:
+                logit = fluid.layers.softmax(logit, axis=1)
+            return pred, logit
+        return logit
--- a/contrib/HumanSeg/nets/hrnet.py
+++ b/contrib/HumanSeg/nets/hrnet.py
+# coding: utf8
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from collections import OrderedDict
+import paddle.fluid as fluid
+from paddle.fluid.initializer import MSRA
+from paddle.fluid.param_attr import ParamAttr
+from .seg_modules import softmax_with_loss
+from .seg_modules import dice_loss
+from .seg_modules import bce_loss
+from .libs import sigmoid_to_softmax
+class HRNet(object):
+    def __init__(self,
+                 num_classes,
+                 mode='train',
+                 stage1_num_modules=1,
+                 stage1_num_blocks=[4],
+                 stage1_num_channels=[64],
+                 stage2_num_modules=1,
+                 stage2_num_blocks=[4, 4],
+                 stage2_num_channels=[18, 36],
+                 stage3_num_modules=4,
+                 stage3_num_blocks=[4, 4, 4],
+                 stage3_num_channels=[18, 36, 72],
+                 stage4_num_modules=3,
+                 stage4_num_blocks=[4, 4, 4, 4],
+                 stage4_num_channels=[18, 36, 72, 144],
+                 use_bce_loss=False,
+                 use_dice_loss=False,
+                 class_weight=None,
+                 ignore_index=255):
+        # dice_loss或bce_loss只适用两类分割中
+        if num_classes > 2 and (use_bce_loss or use_dice_loss):
+            raise ValueError(
+                "dice loss and bce loss is only applicable to binary classfication"
+            )
+        if class_weight is not None:
+            if isinstance(class_weight, list):
+                if len(class_weight) != num_classes:
+                    raise ValueError(
+                        "Length of class_weight should be equal to number of classes"
+                    )
+            elif isinstance(class_weight, str):
+                if class_weight.lower() != 'dynamic':
+                    raise ValueError(
+                        "if class_weight is string, must be dynamic!")
+            else:
+                raise TypeError(
+                    'Expect class_weight is a list or string but receive {}'.
+                    format(type(class_weight)))
+        self.num_classes = num_classes
+        self.mode = mode
+        self.use_bce_loss = use_bce_loss
+        self.use_dice_loss = use_dice_loss
+        self.class_weight = class_weight
+        self.ignore_index = ignore_index
+        self.stage1_num_modules = stage1_num_modules
+        self.stage1_num_blocks = stage1_num_blocks
+        self.stage1_num_channels = stage1_num_channels
+        self.stage2_num_modules = stage2_num_modules
+        self.stage2_num_blocks = stage2_num_blocks
+        self.stage2_num_channels = stage2_num_channels
+        self.stage3_num_modules = stage3_num_modules
+        self.stage3_num_blocks = stage3_num_blocks
+        self.stage3_num_channels = stage3_num_channels
+        self.stage4_num_modules = stage4_num_modules
+        self.stage4_num_blocks = stage4_num_blocks
+        self.stage4_num_channels = stage4_num_channels
+    def build_net(self, inputs):
+        image = inputs['image']
+        logit = self._high_resolution_net(image, self.num_classes)
+        if self.num_classes == 1:
+            out = sigmoid_to_softmax(logit)
+            out = fluid.layers.transpose(out, [0, 2, 3, 1])
+        else:
+            out = fluid.layers.transpose(logit, [0, 2, 3, 1])
+        pred = fluid.layers.argmax(out, axis=3)
+        pred = fluid.layers.unsqueeze(pred, axes=[3])
+        if self.mode == 'train':
+            label = inputs['label']
+            mask = label != self.ignore_index
+            return self._get_loss(logit, label, mask)
+        else:
+            if self.num_classes == 1:
+                logit = sigmoid_to_softmax(logit)
+            else:
+                logit = fluid.layers.softmax(logit, axis=1)
+            return pred, logit
+        return logit
+    def generate_inputs(self):
+        inputs = OrderedDict()
+        inputs['image'] = fluid.data(
+            dtype='float32', shape=[None, 3, None, None], name='image')
+        if self.mode == 'train':
+            inputs['label'] = fluid.data(
+                dtype='int32', shape=[None, 1, None, None], name='label')
+        elif self.mode == 'eval':
+            inputs['label'] = fluid.data(
+                dtype='int32', shape=[None, 1, None, None], name='label')
+        return inputs
+    def _get_loss(self, logit, label, mask):
+        avg_loss = 0
+        if not (self.use_dice_loss or self.use_bce_loss):
+            avg_loss += softmax_with_loss(
+                logit,
+                label,
+                mask,
+                num_classes=self.num_classes,
+                weight=self.class_weight,
+                ignore_index=self.ignore_index)
+        else:
+            if self.use_dice_loss:
+                avg_loss += dice_loss(logit, label, mask)
+            if self.use_bce_loss:
+                avg_loss += bce_loss(
+                    logit, label, mask, ignore_index=self.ignore_index)
+        return avg_loss
+    def _conv_bn_layer(self,
+                       input,
+                       filter_size,
+                       num_filters,
+                       stride=1,
+                       padding=1,
+                       num_groups=1,
+                       if_act=True,
+                       name=None):
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=stride,
+            padding=(filter_size - 1) // 2,
+            groups=num_groups,
+            act=None,
+            param_attr=ParamAttr(initializer=MSRA(), name=name + '_weights'),
+            bias_attr=False)
+        bn_name = name + '_bn'
+        bn = fluid.layers.batch_norm(
+            input=conv,
+            param_attr=ParamAttr(
+                name=bn_name + "_scale",
+                initializer=fluid.initializer.Constant(1.0)),
+            bias_attr=ParamAttr(
+                name=bn_name + "_offset",
+                initializer=fluid.initializer.Constant(0.0)),
+            moving_mean_name=bn_name + '_mean',
+            moving_variance_name=bn_name + '_variance')
+        if if_act:
+            bn = fluid.layers.relu(bn)
+        return bn
+    def _basic_block(self,
+                     input,
+                     num_filters,
+                     stride=1,
+                     downsample=False,
+                     name=None):
+        residual = input
+        conv = self._conv_bn_layer(
+            input=input,
+            filter_size=3,
+            num_filters=num_filters,
+            stride=stride,
+            name=name + '_conv1')
+        conv = self._conv_bn_layer(
+            input=conv,
+            filter_size=3,
+            num_filters=num_filters,
+            if_act=False,
+            name=name + '_conv2')
+        if downsample:
+            residual = self._conv_bn_layer(
+                input=input,
+                filter_size=1,
+                num_filters=num_filters,
+                if_act=False,
+                name=name + '_downsample')
+        return fluid.layers.elementwise_add(x=residual, y=conv, act='relu')
+    def _bottleneck_block(self,
+                          input,
+                          num_filters,
+                          stride=1,
+                          downsample=False,
+                          name=None):
+        residual = input
+        conv = self._conv_bn_layer(
+            input=input,
+            filter_size=1,
+            num_filters=num_filters,
+            name=name + '_conv1')
+        conv = self._conv_bn_layer(
+            input=conv,
+            filter_size=3,
+            num_filters=num_filters,
+            stride=stride,
+            name=name + '_conv2')
+        conv = self._conv_bn_layer(
+            input=conv,
+            filter_size=1,
+            num_filters=num_filters * 4,
+            if_act=False,
+            name=name + '_conv3')
+        if downsample:
+            residual = self._conv_bn_layer(
+                input=input,
+                filter_size=1,
+                num_filters=num_filters * 4,
+                if_act=False,
+                name=name + '_downsample')
+        return fluid.layers.elementwise_add(x=residual, y=conv, act='relu')
+    def _fuse_layers(self, x, channels, multi_scale_output=True, name=None):
+        out = []
+        for i in range(len(channels) if multi_scale_output else 1):
+            residual = x[i]
+            shape = fluid.layers.shape(residual)[-2:]
+            for j in range(len(channels)):
+                if j > i:
+                    y = self._conv_bn_layer(
+                        x[j],
+                        filter_size=1,
+                        num_filters=channels[i],
+                        if_act=False,
+                        name=name + '_layer_' + str(i + 1) + '_' + str(j + 1))
+                    y = fluid.layers.resize_bilinear(input=y, out_shape=shape)
+                    residual = fluid.layers.elementwise_add(
+                        x=residual, y=y, act=None)
+                elif j < i:
+                    y = x[j]
+                    for k in range(i - j):
+                        if k == i - j - 1:
+                            y = self._conv_bn_layer(
+                                y,
+                                filter_size=3,
+                                num_filters=channels[i],
+                                stride=2,
+                                if_act=False,
+                                name=name + '_layer_' + str(i + 1) + '_' +
+                                str(j + 1) + '_' + str(k + 1))
+                        else:
+                            y = self._conv_bn_layer(
+                                y,
+                                filter_size=3,
+                                num_filters=channels[j],
+                                stride=2,
+                                name=name + '_layer_' + str(i + 1) + '_' +
+                                str(j + 1) + '_' + str(k + 1))
+                    residual = fluid.layers.elementwise_add(
+                        x=residual, y=y, act=None)
+            residual = fluid.layers.relu(residual)
+            out.append(residual)
+        return out
+    def _branches(self, x, block_num, channels, name=None):
+        out = []
+        for i in range(len(channels)):
+            residual = x[i]
+            for j in range(block_num[i]):
+                residual = self._basic_block(
+                    residual,
+                    channels[i],
+                    name=name + '_branch_layer_' + str(i + 1) + '_' +
+                    str(j + 1))
+            out.append(residual)
+        return out
+    def _high_resolution_module(self,
+                                x,
+                                blocks,
+                                channels,
+                                multi_scale_output=True,
+                                name=None):
+        residual = self._branches(x, blocks, channels, name=name)
+        out = self._fuse_layers(
+            residual,
+            channels,
+            multi_scale_output=multi_scale_output,
+            name=name)
+        return out
+    def _transition_layer(self, x, in_channels, out_channels, name=None):
+        num_in = len(in_channels)
+        num_out = len(out_channels)
+        out = []
+        for i in range(num_out):
+            if i < num_in:
+                if in_channels[i] != out_channels[i]:
+                    residual = self._conv_bn_layer(
+                        x[i],
+                        filter_size=3,
+                        num_filters=out_channels[i],
+                        name=name + '_layer_' + str(i + 1))
+                    out.append(residual)
+                else:
+                    out.append(x[i])
+            else:
+                residual = self._conv_bn_layer(
+                    x[-1],
+                    filter_size=3,
+                    num_filters=out_channels[i],
+                    stride=2,
+                    name=name + '_layer_' + str(i + 1))
+                out.append(residual)
+        return out
+    def _stage(self,
+               x,
+               num_modules,
+               num_blocks,
+               num_channels,
+               multi_scale_output=True,
+               name=None):
+        out = x
+        for i in range(num_modules):
+            if i == num_modules - 1 and multi_scale_output == False:
+                out = self._high_resolution_module(
+                    out,
+                    num_blocks,
+                    num_channels,
+                    multi_scale_output=False,
+                    name=name + '_' + str(i + 1))
+            else:
+                out = self._high_resolution_module(
+                    out, num_blocks, num_channels, name=name + '_' + str(i + 1))
+        return out
+    def _layer1(self, input, num_modules, num_blocks, num_channels, name=None):
+        # num_modules 默认为1,是否增加处理，官网实现为[1]，是否对齐。
+        conv = input
+        for i in range(num_blocks[0]):
+            conv = self._bottleneck_block(
+                conv,
+                num_filters=num_channels[0],
+                downsample=True if i == 0 else False,
+                name=name + '_' + str(i + 1))
+        return conv
+    def _high_resolution_net(self, input, num_classes):
+        x = self._conv_bn_layer(
+            input=input,
+            filter_size=3,
+            num_filters=self.stage1_num_channels[0],
+            stride=2,
+            if_act=True,
+            name='layer1_1')
+        x = self._conv_bn_layer(
+            input=x,
+            filter_size=3,
+            num_filters=self.stage1_num_channels[0],
+            stride=2,
+            if_act=True,
+            name='layer1_2')
+        la1 = self._layer1(
+            x,
+            self.stage1_num_modules,
+            self.stage1_num_blocks,
+            self.stage1_num_channels,
+            name='layer2')
+        tr1 = self._transition_layer([la1],
+                                     self.stage1_num_channels,
+                                     self.stage2_num_channels,
+                                     name='tr1')
+        st2 = self._stage(
+            tr1,
+            self.stage2_num_modules,
+            self.stage2_num_blocks,
+            self.stage2_num_channels,
+            name='st2')
+        tr2 = self._transition_layer(
+            st2, self.stage2_num_channels, self.stage3_num_channels, name='tr2')
+        st3 = self._stage(
+            tr2,
+            self.stage3_num_modules,
+            self.stage3_num_blocks,
+            self.stage3_num_channels,
+            name='st3')
+        tr3 = self._transition_layer(
+            st3, self.stage3_num_channels, self.stage4_num_channels, name='tr3')
+        st4 = self._stage(
+            tr3,
+            self.stage4_num_modules,
+            self.stage4_num_blocks,
+            self.stage4_num_channels,
+            name='st4')
+        # upsample
+        shape = fluid.layers.shape(st4[0])[-2:]
+        st4[1] = fluid.layers.resize_bilinear(st4[1], out_shape=shape)
+        st4[2] = fluid.layers.resize_bilinear(st4[2], out_shape=shape)
+        st4[3] = fluid.layers.resize_bilinear(st4[3], out_shape=shape)
+        out = fluid.layers.concat(st4, axis=1)
+        last_channels = sum(self.stage4_num_channels)
+        out = self._conv_bn_layer(
+            input=out,
+            filter_size=1,
+            num_filters=last_channels,
+            stride=1,
+            if_act=True,
+            name='conv-2')
+        out = fluid.layers.conv2d(
+            input=out,
+            num_filters=num_classes,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            act=None,
+            param_attr=ParamAttr(initializer=MSRA(), name='conv-1_weights'),
+            bias_attr=False)
+        input_shape = fluid.layers.shape(input)[-2:]
+        out = fluid.layers.resize_bilinear(out, input_shape)
+        return out
--- a/contrib/HumanSeg/nets/libs.py
+++ b/contrib/HumanSeg/nets/libs.py
+# coding: utf8
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import paddle
+import paddle.fluid as fluid
+import contextlib
+bn_regularizer = fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.0)
+name_scope = ""
+@contextlib.contextmanager
+def scope(name):
+    global name_scope
+    bk = name_scope
+    name_scope = name_scope + name + '/'
+    yield
+    name_scope = bk
+def max_pool(input, kernel, stride, padding):
+    data = fluid.layers.pool2d(
+        input,
+        pool_size=kernel,
+        pool_type='max',
+        pool_stride=stride,
+        pool_padding=padding)
+    return data
+def avg_pool(input, kernel, stride, padding=0):
+    data = fluid.layers.pool2d(
+        input,
+        pool_size=kernel,
+        pool_type='avg',
+        pool_stride=stride,
+        pool_padding=padding)
+    return data
+def group_norm(input, G, eps=1e-5, param_attr=None, bias_attr=None):
+    N, C, H, W = input.shape
+    if C % G != 0:
+        for d in range(10):
+            for t in [d, -d]:
+                if G + t <= 0: continue
+                if C % (G + t) == 0:
+                    G = G + t
+                    break
+            if C % G == 0:
+                break
+    assert C % G == 0, "group can not divide channle"
+    x = fluid.layers.group_norm(
+        input,
+        groups=G,
+        param_attr=param_attr,
+        bias_attr=bias_attr,
+        name=name_scope + 'group_norm')
+    return x
+def bn(*args,
+       norm_type='bn',
+       eps=1e-5,
+       bn_momentum=0.99,
+       group_norm=32,
+       **kargs):
+    if norm_type == 'bn':
+        with scope('BatchNorm'):
+            return fluid.layers.batch_norm(
+                *args,
+                epsilon=eps,
+                momentum=bn_momentum,
+                param_attr=fluid.ParamAttr(
+                    name=name_scope + 'gamma', regularizer=bn_regularizer),
+                bias_attr=fluid.ParamAttr(
+                    name=name_scope + 'beta', regularizer=bn_regularizer),
+                moving_mean_name=name_scope + 'moving_mean',
+                moving_variance_name=name_scope + 'moving_variance',
+                **kargs)
+    elif norm_type == 'gn':
+        with scope('GroupNorm'):
+            return group_norm(
+                args[0],
+                group_norm,
+                eps=eps,
+                param_attr=fluid.ParamAttr(
+                    name=name_scope + 'gamma', regularizer=bn_regularizer),
+                bias_attr=fluid.ParamAttr(
+                    name=name_scope + 'beta', regularizer=bn_regularizer))
+    else:
+        raise Exception("Unsupport norm type:" + norm_type)
+def bn_relu(data, norm_type='bn', eps=1e-5):
+    return fluid.layers.relu(bn(data, norm_type=norm_type, eps=eps))
+def relu(data):
+    return fluid.layers.relu(data)
+def conv(*args, **kargs):
+    kargs['param_attr'] = name_scope + 'weights'
+    if 'bias_attr' in kargs and kargs['bias_attr']:
+        kargs['bias_attr'] = fluid.ParamAttr(
+            name=name_scope + 'biases',
+            regularizer=None,
+            initializer=fluid.initializer.ConstantInitializer(value=0.0))
+    else:
+        kargs['bias_attr'] = False
+    return fluid.layers.conv2d(*args, **kargs)
+def deconv(*args, **kargs):
+    kargs['param_attr'] = name_scope + 'weights'
+    if 'bias_attr' in kargs and kargs['bias_attr']:
+        kargs['bias_attr'] = name_scope + 'biases'
+    else:
+        kargs['bias_attr'] = False
+    return fluid.layers.conv2d_transpose(*args, **kargs)
+def separate_conv(input,
+                  channel,
+                  stride,
+                  filter,
+                  dilation=1,
+                  act=None,
+                  eps=1e-5):
+    param_attr = fluid.ParamAttr(
+        name=name_scope + 'weights',
+        regularizer=fluid.regularizer.L2DecayRegularizer(
+            regularization_coeff=0.0),
+        initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.33))
+    with scope('depthwise'):
+        input = conv(
+            input,
+            input.shape[1],
+            filter,
+            stride,
+            groups=input.shape[1],
+            padding=(filter // 2) * dilation,
+            dilation=dilation,
+            use_cudnn=False,
+            param_attr=param_attr)
+        input = bn(input, eps=eps)
+        if act: input = act(input)
+    param_attr = fluid.ParamAttr(
+        name=name_scope + 'weights',
+        regularizer=None,
+        initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
+    with scope('pointwise'):
+        input = conv(
+            input, channel, 1, 1, groups=1, padding=0, param_attr=param_attr)
+        input = bn(input, eps=eps)
+        if act: input = act(input)
+    return input
+def conv_bn_layer(input,
+                  filter_size,
+                  num_filters,
+                  stride,
+                  padding,
+                  channels=None,
+                  num_groups=1,
+                  if_act=True,
+                  name=None,
+                  use_cudnn=True):
+    conv = fluid.layers.conv2d(
+        input=input,
+        num_filters=num_filters,
+        filter_size=filter_size,
+        stride=stride,
+        padding=padding,
+        groups=num_groups,
+        act=None,
+        use_cudnn=use_cudnn,
+        param_attr=fluid.ParamAttr(name=name + '_weights'),
+        bias_attr=False)
+    bn_name = name + '_bn'
+    bn = fluid.layers.batch_norm(
+        input=conv,
+        param_attr=fluid.ParamAttr(name=bn_name + "_scale"),
+        bias_attr=fluid.ParamAttr(name=bn_name + "_offset"),
+        moving_mean_name=bn_name + '_mean',
+        moving_variance_name=bn_name + '_variance')
+    if if_act:
+        return fluid.layers.relu6(bn)
+    else:
+        return bn
+def sigmoid_to_softmax(input):
+    """
+    one channel to two channel
+    """
+    logit = fluid.layers.sigmoid(input)
+    logit_back = 1 - logit
+    logit = fluid.layers.concat([logit_back, logit], axis=1)
+    return logit
--- a/contrib/HumanSeg/nets/seg_modules.py
+++ b/contrib/HumanSeg/nets/seg_modules.py
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle.fluid as fluid
+import numpy as np
+def softmax_with_loss(logit,
+                      label,
+                      ignore_mask=None,
+                      num_classes=2,
+                      weight=None,
+                      ignore_index=255):
+    ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
+    label = fluid.layers.elementwise_min(
+        label, fluid.layers.assign(np.array([num_classes - 1], dtype=np.int32)))
+    logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+    logit = fluid.layers.reshape(logit, [-1, num_classes])
+    label = fluid.layers.reshape(label, [-1, 1])
+    label = fluid.layers.cast(label, 'int64')
+    ignore_mask = fluid.layers.reshape(ignore_mask, [-1, 1])
+    if weight is None:
+        loss, probs = fluid.layers.softmax_with_cross_entropy(
+            logit, label, ignore_index=ignore_index, return_softmax=True)
+    else:
+        label_one_hot = fluid.one_hot(input=label, depth=num_classes)
+        if isinstance(weight, list):
+            assert len(
+                weight
+            ) == num_classes, "weight length must equal num of classes"
+            weight = fluid.layers.assign(np.array([weight], dtype='float32'))
+        elif isinstance(weight, str):
+            assert weight.lower(
+            ) == 'dynamic', 'if weight is string, must be dynamic!'
+            tmp = []
+            total_num = fluid.layers.cast(
+                fluid.layers.shape(label)[0], 'float32')
+            for i in range(num_classes):
+                cls_pixel_num = fluid.layers.reduce_sum(label_one_hot[:, i])
+                ratio = total_num / (cls_pixel_num + 1)
+                tmp.append(ratio)
+            weight = fluid.layers.concat(tmp)
+            weight = weight / fluid.layers.reduce_sum(weight) * num_classes
+        elif isinstance(weight, fluid.layers.Variable):
+            pass
+        else:
+            raise ValueError(
+                'Expect weight is a list, string or Variable, but receive {}'.
+                format(type(weight)))
+        weight = fluid.layers.reshape(weight, [1, num_classes])
+        weighted_label_one_hot = fluid.layers.elementwise_mul(
+            label_one_hot, weight)
+        probs = fluid.layers.softmax(logit)
+        loss = fluid.layers.cross_entropy(
+            probs,
+            weighted_label_one_hot,
+            soft_label=True,
+            ignore_index=ignore_index)
+        weighted_label_one_hot.stop_gradient = True
+    loss = loss * ignore_mask
+    avg_loss = fluid.layers.mean(loss) / (
+        fluid.layers.mean(ignore_mask) + 0.00001)
+    label.stop_gradient = True
+    ignore_mask.stop_gradient = True
+    return avg_loss
+# to change, how to appicate ignore index and ignore mask
+def dice_loss(logit, label, ignore_mask=None, epsilon=0.00001):
+    if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1:
+        raise Exception(
+            "dice loss is only applicable to one channel classfication")
+    ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
+    logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
+    label = fluid.layers.transpose(label, [0, 2, 3, 1])
+    label = fluid.layers.cast(label, 'int64')
+    ignore_mask = fluid.layers.transpose(ignore_mask, [0, 2, 3, 1])
+    logit = fluid.layers.sigmoid(logit)
+    logit = logit * ignore_mask
+    label = label * ignore_mask
+    reduce_dim = list(range(1, len(logit.shape)))
+    inse = fluid.layers.reduce_sum(logit * label, dim=reduce_dim)
+    dice_denominator = fluid.layers.reduce_sum(
+        logit, dim=reduce_dim) + fluid.layers.reduce_sum(
+            label, dim=reduce_dim)
+    dice_score = 1 - inse * 2 / (dice_denominator + epsilon)
+    label.stop_gradient = True
+    ignore_mask.stop_gradient = True
+    return fluid.layers.reduce_mean(dice_score)
+def bce_loss(logit, label, ignore_mask=None, ignore_index=255):
+    if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1:
+        raise Exception("bce loss is only applicable to binary classfication")
+    label = fluid.layers.cast(label, 'float32')
+    loss = fluid.layers.sigmoid_cross_entropy_with_logits(
+        x=logit, label=label, ignore_index=ignore_index,
+        normalize=True)  # or False
+    loss = fluid.layers.reduce_sum(loss)
+    label.stop_gradient = True
+    ignore_mask.stop_gradient = True
+    return loss
--- a/contrib/HumanSeg/nets/shufflenet_slim.py
+++ b/contrib/HumanSeg/nets/shufflenet_slim.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from collections import OrderedDict
+import paddle.fluid as fluid
+from paddle.fluid.initializer import MSRA
+from paddle.fluid.param_attr import ParamAttr
+from .libs import sigmoid_to_softmax
+from .seg_modules import softmax_with_loss
+from .seg_modules import dice_loss
+from .seg_modules import bce_loss
+class ShuffleSeg(object):
+    # def __init__(self):
+    # self.params = train_parameters
+    def __init__(self,
+                 num_classes,
+                 mode='train',
+                 use_bce_loss=False,
+                 use_dice_loss=False,
+                 class_weight=None,
+                 ignore_index=255):
+        # dice_loss或bce_loss只适用两类分割中
+        if num_classes > 2 and (use_bce_loss or use_dice_loss):
+            raise ValueError(
+                "dice loss and bce loss is only applicable to binary classfication"
+            )
+        if class_weight is not None:
+            if isinstance(class_weight, list):
+                if len(class_weight) != num_classes:
+                    raise ValueError(
+                        "Length of class_weight should be equal to number of classes"
+                    )
+            elif isinstance(class_weight, str):
+                if class_weight.lower() != 'dynamic':
+                    raise ValueError(
+                        "if class_weight is string, must be dynamic!")
+            else:
+                raise TypeError(
+                    'Expect class_weight is a list or string but receive {}'.
+                    format(type(class_weight)))
+        self.num_classes = num_classes
+        self.mode = mode
+        self.use_bce_loss = use_bce_loss
+        self.use_dice_loss = use_dice_loss
+        self.class_weight = class_weight
+        self.ignore_index = ignore_index
+    def _get_loss(self, logit, label, mask):
+        avg_loss = 0
+        if not (self.use_dice_loss or self.use_bce_loss):
+            avg_loss += softmax_with_loss(
+                logit,
+                label,
+                mask,
+                num_classes=self.num_classes,
+                weight=self.class_weight,
+                ignore_index=self.ignore_index)
+        else:
+            if self.use_dice_loss:
+                avg_loss += dice_loss(logit, label, mask)
+            if self.use_bce_loss:
+                avg_loss += bce_loss(
+                    logit, label, mask, ignore_index=self.ignore_index)
+        return avg_loss
+    def generate_inputs(self):
+        inputs = OrderedDict()
+        inputs['image'] = fluid.data(
+            dtype='float32', shape=[None, 3, None, None], name='image')
+        if self.mode == 'train':
+            inputs['label'] = fluid.data(
+                dtype='int32', shape=[None, 1, None, None], name='label')
+        elif self.mode == 'eval':
+            inputs['label'] = fluid.data(
+                dtype='int32', shape=[None, 1, None, None], name='label')
+        return inputs
+    def build_net(self, inputs, class_dim=2):
+        if self.use_dice_loss or self.use_bce_loss:
+            self.num_classes = 1
+        image = inputs['image']
+        ## Encoder
+        conv1 = self.conv_bn(image, 3, 36, 2, 1)
+        print('encoder 1', conv1.shape)
+        shortcut = self.conv_bn(
+            input=conv1, filter_size=1, num_filters=18, stride=1, padding=0)
+        print('shortcut 1', shortcut.shape)
+        pool = fluid.layers.pool2d(
+            input=conv1,
+            pool_size=3,
+            pool_type='max',
+            pool_stride=2,
+            pool_padding=1)
+        print('encoder 2', pool.shape)
+        # Block 1
+        conv = self.sfnetv2module(pool, stride=2, num_filters=72)
+        conv = self.sfnetv2module(conv, stride=1)
+        conv = self.sfnetv2module(conv, stride=1)
+        conv = self.sfnetv2module(conv, stride=1)
+        print('encoder 3', conv.shape)
+        # Block 2
+        conv = self.sfnetv2module(conv, stride=2)
+        conv = self.sfnetv2module(conv, stride=1)
+        conv = self.sfnetv2module(conv, stride=1)
+        conv = self.sfnetv2module(conv, stride=1)
+        conv = self.sfnetv2module(conv, stride=1)
+        conv = self.sfnetv2module(conv, stride=1)
+        conv = self.sfnetv2module(conv, stride=1)
+        conv = self.sfnetv2module(conv, stride=1)
+        print('encoder 4', conv.shape)
+        ### decoder
+        conv = self.depthwise_separable(conv, 3, 64, 1)
+        shortcut_shape = fluid.layers.shape(shortcut)[2:]
+        conv_b = fluid.layers.resize_bilinear(conv, shortcut_shape)
+        concat = fluid.layers.concat([shortcut, conv_b], axis=1)
+        decode_conv = self.depthwise_separable(concat, 3, 64, 1)
+        logit = self.output_layer(decode_conv, class_dim)
+        if self.num_classes == 1:
+            out = sigmoid_to_softmax(logit)
+            out = fluid.layers.transpose(out, [0, 2, 3, 1])
+        else:
+            out = fluid.layers.transpose(logit, [0, 2, 3, 1])
+        pred = fluid.layers.argmax(out, axis=3)
+        pred = fluid.layers.unsqueeze(pred, axes=[3])
+        if self.mode == 'train':
+            label = inputs['label']
+            mask = label != self.ignore_index
+            return self._get_loss(logit, label, mask)
+        else:
+            if self.num_classes == 1:
+                logit = sigmoid_to_softmax(logit)
+            else:
+                logit = fluid.layers.softmax(logit, axis=1)
+            return pred, logit
+        return logit
+    def conv_bn(self,
+                input,
+                filter_size,
+                num_filters,
+                stride,
+                padding,
+                channels=None,
+                num_groups=1,
+                act='relu',
+                use_cudnn=True):
+        parameter_attr = ParamAttr(learning_rate=1, initializer=MSRA())
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=stride,
+            padding=padding,
+            groups=num_groups,
+            act=None,
+            use_cudnn=use_cudnn,
+            param_attr=parameter_attr,
+            bias_attr=False)
+        return fluid.layers.batch_norm(input=conv, act=act)
+    def depthwise_separable(self, input, filter_size, num_filters, stride):
+        num_filters1 = int(input.shape[1])
+        num_groups = num_filters1
+        depthwise_conv = self.conv_bn(
+            input=input,
+            filter_size=filter_size,
+            num_filters=int(num_filters1),
+            stride=stride,
+            padding=int(filter_size / 2),
+            num_groups=num_groups,
+            use_cudnn=False,
+            act=None)
+        pointwise_conv = self.conv_bn(
+            input=depthwise_conv,
+            filter_size=1,
+            num_filters=num_filters,
+            stride=1,
+            padding=0)
+        return pointwise_conv
+    def sfnetv2module(self, input, stride, num_filters=None):
+        if stride == 1:
+            shortcut, branch = fluid.layers.split(
+                input, num_or_sections=2, dim=1)
+            if num_filters is None:
+                in_channels = int(branch.shape[1])
+            else:
+                in_channels = int(num_filters / 2)
+        else:
+            branch = input
+            if num_filters is None:
+                in_channels = int(branch.shape[1])
+            else:
+                in_channels = int(num_filters / 2)
+            shortcut = self.depthwise_separable(input, 3, in_channels, stride)
+        branch_1x1 = self.conv_bn(
+            input=branch,
+            filter_size=1,
+            num_filters=int(in_channels),
+            stride=1,
+            padding=0)
+        branch_dw1x1 = self.depthwise_separable(branch_1x1, 3, in_channels,
+                                                stride)
+        output = fluid.layers.concat(input=[shortcut, branch_dw1x1], axis=1)
+        # channel shuffle
+        # b, c, h, w = output.shape
+        shape = fluid.layers.shape(output)
+        c = output.shape[1]
+        b, h, w = shape[0], shape[2], shape[3]
+        output = fluid.layers.reshape(x=output, shape=[b, 2, in_channels, h, w])
+        output = fluid.layers.transpose(x=output, perm=[0, 2, 1, 3, 4])
+        output = fluid.layers.reshape(x=output, shape=[b, c, h, w])
+        return output
+    def output_layer(self, input, out_dim):
+        param_attr = fluid.param_attr.ParamAttr(
+            learning_rate=1.,
+            regularizer=fluid.regularizer.L2Decay(0.),
+            initializer=fluid.initializer.Xavier())
+        # deconv
+        output = fluid.layers.conv2d_transpose(
+            input=input,
+            num_filters=out_dim,
+            filter_size=2,
+            padding=0,
+            stride=2,
+            bias_attr=True,
+            param_attr=param_attr,
+            act=None)
+        return output
--- a/contrib/HumanSeg/pretrained_weights/download_pretrained_weights.py
+++ b/contrib/HumanSeg/pretrained_weights/download_pretrained_weights.py
+# Copyright (c) 2019  PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import sys
+import os
+LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
+TEST_PATH = os.path.join(LOCAL_PATH, "../../../", "test")
+sys.path.append(TEST_PATH)
+from test_utils import download_file_and_uncompress
+model_urls = {
+    "humanseg_server_ckpt":
+    "https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_server_ckpt.zip",
+    "humanseg_server_inference":
+    "https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_server_inference.zip",
+    "humanseg_mobile_ckpt":
+    "https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_mobile_ckpt.zip",
+    "humanseg_mobile_inference":
+    "https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_mobile_inference.zip",
+    "humanseg_mobile_quant":
+    "https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_mobile_quant.zip",
+    "humanseg_lite_ckpt":
+    "https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_lite_ckpt.zip",
+    "humanseg_lite_inference":
+    "https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_lite_inference.zip",
+    "humanseg_lite_quant":
+    "https://paddleseg.bj.bcebos.com/humanseg/models/humanseg_lite_quant.zip",
+}
+if __name__ == "__main__":
+    for model_name, url in model_urls.items():
+        download_file_and_uncompress(
+            url=url,
+            savepath=LOCAL_PATH,
+            extrapath=LOCAL_PATH,
+            extraname=model_name)
+    print("Pretrained Model download success!")
--- a/contrib/HumanSeg/quant_offline.py
+++ b/contrib/HumanSeg/quant_offline.py
+import argparse
+from datasets.dataset import Dataset
+import transforms
+import models
+def parse_args():
+    parser = argparse.ArgumentParser(description='HumanSeg training')
+    parser.add_argument(
+        '--model_dir',
+        dest='model_dir',
+        help='Model path for quant',
+        type=str,
+        default='output/best_model')
+    parser.add_argument(
+        '--batch_size',
+        dest='batch_size',
+        help='Mini batch size',
+        type=int,
+        default=1)
+    parser.add_argument(
+        '--batch_nums',
+        dest='batch_nums',
+        help='Batch number for quant',
+        type=int,
+        default=10)
+    parser.add_argument(
+        '--data_dir',
+        dest='data_dir',
+        help='the root directory of dataset',
+        type=str)
+    parser.add_argument(
+        '--quant_list',
+        dest='quant_list',
+        help=
+        'Image file list for model quantization, it can be vat.txt or train.txt',
+        type=str,
+        default=None)
+    parser.add_argument(
+        '--save_dir',
+        dest='save_dir',
+        help='The directory for saving the quant model',
+        type=str,
+        default='./output/quant_offline')
+    parser.add_argument(
+        "--image_shape",
+        dest="image_shape",
+        help="The image shape for net inputs.",
+        nargs=2,
+        default=[192, 192],
+        type=int)
+    return parser.parse_args()
+def evaluate(args):
+    eval_transforms = transforms.Compose(
+        [transforms.Resize(args.image_shape),
+         transforms.Normalize()])
+    eval_dataset = Dataset(
+        data_dir=args.data_dir,
+        file_list=args.quant_list,
+        transforms=eval_transforms,
+        num_workers='auto',
+        buffer_size=100,
+        parallel_method='thread',
+        shuffle=False)
+    model = models.load_model(args.model_dir)
+    model.export_quant_model(
+        dataset=eval_dataset,
+        save_dir=args.save_dir,
+        batch_size=args.batch_size,
+        batch_nums=args.batch_nums)
+if __name__ == '__main__':
+    args = parse_args()
+    evaluate(args)
--- a/contrib/HumanSeg/quant_online.py
+++ b/contrib/HumanSeg/quant_online.py
+import argparse
+from datasets.dataset import Dataset
+from models import HumanSegMobile, HumanSegLite, HumanSegServer
+import transforms
+MODEL_TYPE = ['HumanSegMobile', 'HumanSegLite', 'HumanSegServer']
+def parse_args():
+    parser = argparse.ArgumentParser(description='HumanSeg training')
+    parser.add_argument(
+        '--model_type',
+        dest='model_type',
+        help=
+        "Model type for traing, which is one of ('HumanSegMobile', 'HumanSegLite', 'HumanSegServer')",
+        type=str,
+        default='HumanSegMobile')
+    parser.add_argument(
+        '--data_dir',
+        dest='data_dir',
+        help='The root directory of dataset',
+        type=str)
+    parser.add_argument(
+        '--train_list',
+        dest='train_list',
+        help='Train list file of dataset',
+        type=str)
+    parser.add_argument(
+        '--val_list',
+        dest='val_list',
+        help='Val list file of dataset',
+        type=str,
+        default=None)
+    parser.add_argument(
+        '--save_dir',
+        dest='save_dir',
+        help='The directory for saving the model snapshot',
+        type=str,
+        default='./output/quant_train')
+    parser.add_argument(
+        '--num_classes',
+        dest='num_classes',
+        help='Number of classes',
+        type=int,
+        default=2)
+    parser.add_argument(
+        '--num_epochs',
+        dest='num_epochs',
+        help='Number epochs for training',
+        type=int,
+        default=2)
+    parser.add_argument(
+        '--batch_size',
+        dest='batch_size',
+        help='Mini batch size',
+        type=int,
+        default=128)
+    parser.add_argument(
+        '--learning_rate',
+        dest='learning_rate',
+        help='Learning rate',
+        type=float,
+        default=0.001)
+    parser.add_argument(
+        '--pretrained_weights',
+        dest='pretrained_weights',
+        help='The model path for quant',
+        type=str,
+        default=None)
+    parser.add_argument(
+        '--save_interval_epochs',
+        dest='save_interval_epochs',
+        help='The interval epochs for save a model snapshot',
+        type=int,
+        default=1)
+    parser.add_argument(
+        "--image_shape",
+        dest="image_shape",
+        help="The image shape for net inputs.",
+        nargs=2,
+        default=[192, 192],
+        type=int)
+    return parser.parse_args()
+def train(args):
+    train_transforms = transforms.Compose([
+        transforms.RandomHorizontalFlip(),
+        transforms.Resize(args.image_shape),
+        transforms.Normalize()
+    ])
+    eval_transforms = transforms.Compose(
+        [transforms.Resize(args.image_shape),
+         transforms.Normalize()])
+    train_dataset = Dataset(
+        data_dir=args.data_dir,
+        file_list=args.train_list,
+        transforms=train_transforms,
+        num_workers='auto',
+        buffer_size=100,
+        parallel_method='thread',
+        shuffle=True)
+    eval_dataset = None
+    if args.val_list is not None:
+        eval_dataset = Dataset(
+            data_dir=args.data_dir,
+            file_list=args.val_list,
+            transforms=eval_transforms,
+            num_workers='auto',
+            buffer_size=100,
+            parallel_method='thread',
+            shuffle=False)
+    if args.model_type == 'HumanSegMobile':
+        model = HumanSegMobile(num_classes=2)
+    elif args.model_type == 'HumanSegLite':
+        model = HumanSegLite(num_classes=2)
+    elif args.model_type == 'HumanSegServer':
+        model = HumanSegServer(num_classes=2)
+    else:
+        raise ValueError(
+            "--model_type: {} is set wrong, it shold be one of ('HumanSegMobile', "
+            "'HumanSegLite', 'HumanSegServer')".format(args.model_type))
+    model.train(
+        num_epochs=args.num_epochs,
+        train_dataset=train_dataset,
+        train_batch_size=args.batch_size,
+        eval_dataset=eval_dataset,
+        save_interval_epochs=args.save_interval_epochs,
+        save_dir=args.save_dir,
+        pretrained_weights=args.pretrained_weights,
+        learning_rate=args.learning_rate,
+        quant=True)
+if __name__ == '__main__':
+    args = parse_args()
+    train(args)
--- a/contrib/HumanSeg/requirements.txt
+++ b/contrib/HumanSeg/requirements.txt
+visualdl == 2.0.0-alpha.1
+paddleslim
--- a/contrib/HumanSeg/train.py
+++ b/contrib/HumanSeg/train.py
--- a/contrib/HumanSeg/transforms/__init__.py
+++ b/contrib/HumanSeg/transforms/__init__.py
+#   Copyright (c) 2020  PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from .transforms import *
+from . import functional
--- a/contrib/HumanSeg/transforms/functional.py
+++ b/contrib/HumanSeg/transforms/functional.py
--- a/contrib/HumanSeg/transforms/transforms.py
+++ b/contrib/HumanSeg/transforms/transforms.py
--- a/contrib/HumanSeg/utils/__init__.py
+++ b/contrib/HumanSeg/utils/__init__.py
+#   Copyright (c) 2020  PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from . import logging
+from . import humanseg_postprocess
+from .metrics import ConfusionMatrix
+from .utils import *
+from .post_quantization import HumanSegPostTrainingQuantization
--- a/contrib/HumanSeg/utils/humanseg_postprocess.py
+++ b/contrib/HumanSeg/utils/humanseg_postprocess.py
--- a/contrib/HumanSeg/utils/logging.py
+++ b/contrib/HumanSeg/utils/logging.py
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import time
+import os
+import sys
+levels = {0: 'ERROR', 1: 'WARNING', 2: 'INFO', 3: 'DEBUG'}
+log_level = 2
+def log(level=2, message=""):
+    current_time = time.time()
+    time_array = time.localtime(current_time)
+    current_time = time.strftime("%Y-%m-%d %H:%M:%S", time_array)
+    if log_level >= level:
+        print("{} [{}]\t{}".format(current_time, levels[level],
+                                   message).encode("utf-8").decode("latin1"))
+        sys.stdout.flush()
+def debug(message=""):
+    log(level=3, message=message)
+def info(message=""):
+    log(level=2, message=message)
+def warning(message=""):
+    log(level=1, message=message)
+def error(message=""):
+    log(level=0, message=message)
--- a/contrib/HumanSeg/utils/metrics.py
+++ b/contrib/HumanSeg/utils/metrics.py
--- a/contrib/HumanSeg/utils/palette.py
+++ b/contrib/HumanSeg/utils/palette.py
--- a/contrib/HumanSeg/utils/post_quantization.py
+++ b/contrib/HumanSeg/utils/post_quantization.py
--- a/contrib/HumanSeg/utils/util.py
+++ b/contrib/HumanSeg/utils/util.py
--- a/contrib/HumanSeg/utils/utils.py
+++ b/contrib/HumanSeg/utils/utils.py
--- a/contrib/HumanSeg/val.py
+++ b/contrib/HumanSeg/val.py
--- a/contrib/HumanSeg/video_infer.py
+++ b/contrib/HumanSeg/video_infer.py
--- a/contrib/LaneNet/requirements.txt
+++ b/contrib/LaneNet/requirements.txt
--- a/contrib/LaneNet/train.py
+++ b/contrib/LaneNet/train.py
--- a/contrib/LaneNet/utils/config.py
+++ b/contrib/LaneNet/utils/config.py
@@ -68,7 +68,7 @@ cfg.DATASET.VAL_TOTAL_IMAGES = 500
 cfg.DATASET.TEST_FILE_LIST = './dataset/cityscapes/test.list'
 # 测试数据数量
 cfg.DATASET.TEST_TOTAL_IMAGES = 500
-# Tensorboard 可视化的数据集
+# VisualDL 可视化的数据集
 cfg.DATASET.VIS_FILE_LIST = None
 # 类别数(需包括背景类)
 cfg.DATASET.NUM_CLASSES = 19

--- a/contrib/README.md
+++ b/contrib/README.md
--- a/contrib/RealTimeHumanSeg/README.md
+++ b/contrib/RealTimeHumanSeg/README.md
--- a/contrib/RealTimeHumanSeg/cpp/CMakeLists.txt
+++ b/contrib/RealTimeHumanSeg/cpp/CMakeLists.txt
--- a/contrib/RealTimeHumanSeg/cpp/CMakeSettings.json
+++ b/contrib/RealTimeHumanSeg/cpp/CMakeSettings.json
--- a/contrib/RealTimeHumanSeg/cpp/README.md
+++ b/contrib/RealTimeHumanSeg/cpp/README.md
--- a/contrib/RealTimeHumanSeg/cpp/docs/linux_build.md
+++ b/contrib/RealTimeHumanSeg/cpp/docs/linux_build.md
--- a/contrib/RealTimeHumanSeg/cpp/docs/windows_build.md
+++ b/contrib/RealTimeHumanSeg/cpp/docs/windows_build.md
--- a/contrib/RealTimeHumanSeg/cpp/humanseg.cc
+++ b/contrib/RealTimeHumanSeg/cpp/humanseg.cc
--- a/contrib/RealTimeHumanSeg/cpp/humanseg.h
+++ b/contrib/RealTimeHumanSeg/cpp/humanseg.h
--- a/contrib/RealTimeHumanSeg/cpp/humanseg_postprocess.cc
+++ b/contrib/RealTimeHumanSeg/cpp/humanseg_postprocess.cc
--- a/contrib/RealTimeHumanSeg/cpp/humanseg_postprocess.h
+++ b/contrib/RealTimeHumanSeg/cpp/humanseg_postprocess.h
--- a/contrib/RealTimeHumanSeg/cpp/linux_build.sh
+++ b/contrib/RealTimeHumanSeg/cpp/linux_build.sh
--- a/contrib/RealTimeHumanSeg/cpp/main.cc
+++ b/contrib/RealTimeHumanSeg/cpp/main.cc
--- a/contrib/RealTimeHumanSeg/python/README.md
+++ b/contrib/RealTimeHumanSeg/python/README.md
--- a/contrib/RealTimeHumanSeg/python/infer.py
+++ b/contrib/RealTimeHumanSeg/python/infer.py
--- a/contrib/RealTimeHumanSeg/python/requirements.txt
+++ b/contrib/RealTimeHumanSeg/python/requirements.txt
--- a/contrib/RemoteSensing/README.md
+++ b/contrib/RemoteSensing/README.md
--- a/contrib/RemoteSensing/__init__.py
+++ b/contrib/RemoteSensing/__init__.py
--- a/contrib/RemoteSensing/dataset/demo/annotations/0.png
+++ b/contrib/RemoteSensing/dataset/demo/annotations/0.png
--- a/contrib/RemoteSensing/dataset/demo/annotations/1.png
+++ b/contrib/RemoteSensing/dataset/demo/annotations/1.png
--- a/contrib/RemoteSensing/dataset/demo/annotations/10.png
+++ b/contrib/RemoteSensing/dataset/demo/annotations/10.png
--- a/contrib/RemoteSensing/dataset/demo/annotations/100.png
+++ b/contrib/RemoteSensing/dataset/demo/annotations/100.png
--- a/contrib/RemoteSensing/dataset/demo/annotations/1000.png
+++ b/contrib/RemoteSensing/dataset/demo/annotations/1000.png
--- a/contrib/RemoteSensing/dataset/demo/annotations/1001.png
+++ b/contrib/RemoteSensing/dataset/demo/annotations/1001.png
--- a/contrib/RemoteSensing/dataset/demo/annotations/1002.png
+++ b/contrib/RemoteSensing/dataset/demo/annotations/1002.png
--- a/contrib/RemoteSensing/dataset/demo/annotations/1003.png
+++ b/contrib/RemoteSensing/dataset/demo/annotations/1003.png
--- a/contrib/RemoteSensing/dataset/demo/annotations/1004.png
+++ b/contrib/RemoteSensing/dataset/demo/annotations/1004.png
--- a/contrib/RemoteSensing/dataset/demo/annotations/1005.png
+++ b/contrib/RemoteSensing/dataset/demo/annotations/1005.png
--- a/contrib/RemoteSensing/dataset/demo/images/0.npy
+++ b/contrib/RemoteSensing/dataset/demo/images/0.npy
--- a/contrib/RemoteSensing/dataset/demo/images/1.npy
+++ b/contrib/RemoteSensing/dataset/demo/images/1.npy
--- a/contrib/RemoteSensing/dataset/demo/images/10.npy
+++ b/contrib/RemoteSensing/dataset/demo/images/10.npy
--- a/contrib/RemoteSensing/dataset/demo/images/100.npy
+++ b/contrib/RemoteSensing/dataset/demo/images/100.npy
--- a/contrib/RemoteSensing/dataset/demo/images/1000.npy
+++ b/contrib/RemoteSensing/dataset/demo/images/1000.npy
--- a/contrib/RemoteSensing/dataset/demo/images/1001.npy
+++ b/contrib/RemoteSensing/dataset/demo/images/1001.npy
--- a/contrib/RemoteSensing/dataset/demo/images/1002.npy
+++ b/contrib/RemoteSensing/dataset/demo/images/1002.npy
--- a/contrib/RemoteSensing/dataset/demo/images/1003.npy
+++ b/contrib/RemoteSensing/dataset/demo/images/1003.npy
--- a/contrib/RemoteSensing/dataset/demo/images/1004.npy
+++ b/contrib/RemoteSensing/dataset/demo/images/1004.npy
--- a/contrib/RemoteSensing/dataset/demo/images/1005.npy
+++ b/contrib/RemoteSensing/dataset/demo/images/1005.npy
--- a/contrib/RemoteSensing/dataset/demo/labels.txt
+++ b/contrib/RemoteSensing/dataset/demo/labels.txt
--- a/contrib/RemoteSensing/dataset/demo/train.txt
+++ b/contrib/RemoteSensing/dataset/demo/train.txt
--- a/contrib/RemoteSensing/dataset/demo/val.txt
+++ b/contrib/RemoteSensing/dataset/demo/val.txt
--- a/contrib/RemoteSensing/docs/transforms.md
+++ b/contrib/RemoteSensing/docs/transforms.md
--- a/contrib/RemoteSensing/models/__init__.py
+++ b/contrib/RemoteSensing/models/__init__.py
--- a/contrib/RemoteSensing/models/base.py
+++ b/contrib/RemoteSensing/models/base.py
--- a/contrib/RemoteSensing/models/load_model.py
+++ b/contrib/RemoteSensing/models/load_model.py
--- a/contrib/RemoteSensing/models/unet.py
+++ b/contrib/RemoteSensing/models/unet.py
--- a/contrib/RemoteSensing/nets/__init__.py
+++ b/contrib/RemoteSensing/nets/__init__.py
--- a/contrib/RemoteSensing/nets/libs.py
+++ b/contrib/RemoteSensing/nets/libs.py
--- a/contrib/RemoteSensing/nets/loss.py
+++ b/contrib/RemoteSensing/nets/loss.py
--- a/contrib/RemoteSensing/nets/unet.py
+++ b/contrib/RemoteSensing/nets/unet.py
--- a/contrib/RemoteSensing/predict_demo.py
+++ b/contrib/RemoteSensing/predict_demo.py
--- a/contrib/RemoteSensing/readers/__init__.py
+++ b/contrib/RemoteSensing/readers/__init__.py
--- a/contrib/RemoteSensing/readers/base.py
+++ b/contrib/RemoteSensing/readers/base.py
--- a/contrib/RemoteSensing/readers/reader.py
+++ b/contrib/RemoteSensing/readers/reader.py
--- a/contrib/RemoteSensing/requirements.txt
+++ b/contrib/RemoteSensing/requirements.txt
--- a/contrib/RemoteSensing/tools/create_dataset_list.py
+++ b/contrib/RemoteSensing/tools/create_dataset_list.py
--- a/contrib/RemoteSensing/tools/split_dataset_list.py
+++ b/contrib/RemoteSensing/tools/split_dataset_list.py
--- a/contrib/RemoteSensing/train_demo.py
+++ b/contrib/RemoteSensing/train_demo.py
--- a/contrib/RemoteSensing/transforms/__init__.py
+++ b/contrib/RemoteSensing/transforms/__init__.py
--- a/contrib/RemoteSensing/transforms/ops.py
+++ b/contrib/RemoteSensing/transforms/ops.py
--- a/contrib/RemoteSensing/transforms/transforms.py
+++ b/contrib/RemoteSensing/transforms/transforms.py
--- a/contrib/RemoteSensing/utils/__init__.py
+++ b/contrib/RemoteSensing/utils/__init__.py
--- a/contrib/RemoteSensing/utils/logging.py
+++ b/contrib/RemoteSensing/utils/logging.py
--- a/contrib/RemoteSensing/utils/metrics.py
+++ b/contrib/RemoteSensing/utils/metrics.py
--- a/contrib/RemoteSensing/utils/pretrain_weights.py
+++ b/contrib/RemoteSensing/utils/pretrain_weights.py
--- a/contrib/RemoteSensing/utils/utils.py
+++ b/contrib/RemoteSensing/utils/utils.py
--- a/docs/config.md
+++ b/docs/config.md
--- a/docs/configs/dataset_group.md
+++ b/docs/configs/dataset_group.md
--- a/docs/imgs/tensorboard_image.JPG
+++ b/docs/imgs/tensorboard_image.JPG
--- a/docs/imgs/tensorboard_scalar.JPG
+++ b/docs/imgs/tensorboard_scalar.JPG
--- a/docs/imgs/visualdl_image.png
+++ b/docs/imgs/visualdl_image.png
--- a/docs/imgs/visualdl_scalar.png
+++ b/docs/imgs/visualdl_scalar.png
--- a/docs/usage.md
+++ b/docs/usage.md
--- a/pdseg/loss.py
+++ b/pdseg/loss.py
--- a/pdseg/tools/jingling2seg.py
+++ b/pdseg/tools/jingling2seg.py
--- a/pdseg/tools/labelme2seg.py
+++ b/pdseg/tools/labelme2seg.py
--- a/pdseg/train.py
+++ b/pdseg/train.py
--- a/pdseg/utils/config.py
+++ b/pdseg/utils/config.py
--- a/pdseg/vis.py
+++ b/pdseg/vis.py
--- a/requirements.txt
+++ b/requirements.txt
--- a/slim/distillation/train_distill.py
+++ b/slim/distillation/train_distill.py
--- a/slim/nas/train_nas.py
+++ b/slim/nas/train_nas.py
--- a/slim/prune/train_prune.py
+++ b/slim/prune/train_prune.py