diff --git a/modules/image/semantic_segmentation/bisenetv2_cityscapes/README.md b/modules/image/semantic_segmentation/bisenetv2_cityscapes/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..a6cf08ca95e2d0fa8786eb0b6bbb4dcf64920ffe
--- /dev/null
+++ b/modules/image/semantic_segmentation/bisenetv2_cityscapes/README.md
@@ -0,0 +1,176 @@
+# PaddleHub 图像分割
+
+## 模型预测
+
+若想使用我们提供的预训练模型进行预测，可使用如下脚本：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='bisenetv2_cityscapes')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+
+
+## 如何开始Fine-tune
+
+本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+
+在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用bisenetv2_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。
+
+## 代码步骤
+
+使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
+
+### Step1: 定义数据预处理方式
+```python
+from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+```
+
+`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+
+### Step2: 下载数据集并使用
+```python
+from paddlehub.datasets import OpticDiscSeg
+
+train_reader = OpticDiscSeg(transform， mode='train')
+
+```
+* `transform`: 数据预处理方式。
+* `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+
+数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+### Step3: 加载预训练模型
+
+```python
+model = hub.Module(name='bisenetv2_cityscapes', num_classes=2, pretrained=None)
+```
+* `name`: 选择预训练模型的名字。
+* `num_classes`: 分割模型的类别数目。
+* `pretrained`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+
+### Step4: 选择优化策略和运行配置
+
+```python
+scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True)
+```
+
+#### 优化策略
+
+Paddle2.0提供了多种优化器选择，如`SGD`, `Adam`, `Adamax`等，其中`Adam`:
+
+* `learning_rate`: 全局学习率。
+*  `parameters`: 待优化模型参数。
+
+#### 运行配置
+`Trainer` 主要控制Fine-tune的训练，包含以下可控制的参数:
+
+* `model`: 被优化模型；
+* `optimizer`: 优化器选择；
+* `use_gpu`: 是否使用gpu，默认为False;
+* `use_vdl`: 是否使用vdl可视化训练过程；
+* `checkpoint_dir`: 保存模型参数的地址；
+* `compare_metrics`: 保存最优模型的衡量指标；
+
+`trainer.train` 主要控制具体的训练过程，包含以下可控制的参数：
+
+* `train_dataset`: 训练时所用的数据集；
+* `epochs`: 训练轮数；
+* `batch_size`: 训练的批大小，如果使用GPU，请根据实际情况调整batch_size；
+* `num_workers`: works的数量，默认为0；
+* `eval_dataset`: 验证集；
+* `log_interval`: 打印日志的间隔， 单位为执行批训练的次数。
+* `save_interval`: 保存模型的间隔频次，单位为执行训练的轮数。
+
+## 模型预测
+
+当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
+
+我们使用该模型来进行预测。predict.py脚本如下：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='bisenetv2_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+参数配置正确后，请执行脚本`python predict.py`。
+**Args**
+* `images`:原始图像路径或BGR格式图片；
+* `visualization`: 是否可视化，默认为True；
+* `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+
+**NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+
+## 服务部署
+
+PaddleHub Serving可以部署一个在线图像分割服务。
+
+### Step1: 启动PaddleHub Serving
+
+运行启动命令：
+
+```shell
+$ hub serving start -m bisenetv2_cityscapes
+```
+
+这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+
+**NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+
+### Step2: 发送预测请求
+
+配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+
+```python
+import requests
+import json
+import cv2
+import base64
+
+import numpy as np
+
+
+def cv2_to_base64(image):
+    data = cv2.imencode('.jpg', image)[1]
+    return base64.b64encode(data.tostring()).decode('utf8')
+
+def base64_to_cv2(b64str):
+    data = base64.b64decode(b64str.encode('utf8'))
+    data = np.fromstring(data, np.uint8)
+    data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+    return data
+
+# 发送HTTP请求
+org_im = cv2.imread('/PATH/TO/IMAGE')
+data = {'images':[cv2_to_base64(org_im)]}
+headers = {"Content-type": "application/json"}
+url = "http://127.0.0.1:8866/predict/bisenetv2_cityscapes"
+r = requests.post(url=url, headers=headers, data=json.dumps(data))
+mask = base64_to_cv2(r.json()["results"][0])
+```
+
+### 查看代码
+
+https://github.com/PaddlePaddle/PaddleSeg
+
+### 依赖
+
+paddlepaddle >= 2.0.0
+
+paddlehub >= 2.0.0
diff --git a/modules/image/semantic_segmentation/bisenetv2_cityscapes/layers.py b/modules/image/semantic_segmentation/bisenetv2_cityscapes/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..dcaaded9f5453655c24bbb85e0115b8bb2fb7008
--- /dev/null
+++ b/modules/image/semantic_segmentation/bisenetv2_cityscapes/layers.py
@@ -0,0 +1,186 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu' or os.environ.get('PADDLESEG_EXPORT_STAGE'):
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
+        super().__init__()
+
+        self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+
+
+class ConvBN(nn.Layer):
+    """Basic conv bn layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
+        super().__init__()
+        self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+
+
+class ConvReLUPool(nn.Layer):
+    """Basic conv bn pool layer."""
+
+    def __init__(self, in_channels: int, out_channels: int):
+        super().__init__()
+        self.conv = nn.Conv2D(in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv(x)
+        x = F.relu(x)
+        x = F.pool2d(x, pool_size=2, pool_type="max", pool_stride=2)
+        return x
+
+
+class SeparableConvBNReLU(nn.Layer):
+    """Basic separable conv bn relu layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
+        super().__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+
+
+class DepthwiseConvBN(nn.Layer):
+    """Basic depthwise conv bn relu layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
+        super().__init__()
+
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        return x
+
+
+class AuxLayer(nn.Layer):
+    """
+    The auxiliary layer implementation for auxiliary loss.
+
+    Args:
+        in_channels (int): The number of input channels.
+        inter_channels (int): The intermediate channels.
+        out_channels (int): The number of output channels, and usually it is num_classes.
+        dropout_prob (float, optional): The drop rate. Default: 0.1.
+    """
+
+    def __init__(self, in_channels: int, inter_channels: int, out_channels: int, dropout_prob: float = 0.1):
+        super().__init__()
+
+        self.conv_bn_relu = ConvBNReLU(in_channels=in_channels, out_channels=inter_channels, kernel_size=3, padding=1)
+
+        self.dropout = nn.Dropout(p=dropout_prob)
+
+        self.conv = nn.Conv2D(in_channels=inter_channels, out_channels=out_channels, kernel_size=1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        x = self.conv(x)
+        return x
+
+
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+
+        self._act = act
+        upper_act_names = nn.layer.activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("nn.layer.activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys()))
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
diff --git a/modules/image/semantic_segmentation/bisenetv2_cityscapes/module.py b/modules/image/semantic_segmentation/bisenetv2_cityscapes/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..7745be3c6ec0ae3b598b6598503449c670a54a50
--- /dev/null
+++ b/modules/image/semantic_segmentation/bisenetv2_cityscapes/module.py
@@ -0,0 +1,288 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import numpy as np
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+import bisenet_cityscapes.layers as layers
+
+
+@moduleinfo(
+    name="bisenetv2_cityscapes",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="Bisenet is a segmentation model trained by Cityscapes.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class BiSeNetV2(nn.Layer):
+    """
+    The BiSeNet V2 implementation based on PaddlePaddle.
+
+    The original article refers to
+    Yu, Changqian, et al. "BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation"
+    (https://arxiv.org/abs/2004.02147)
+
+    Args:
+        num_classes (int): The unique number of target classes, default is 19.
+        lambd (float, optional): A factor for controlling the size of semantic branch channels. Default: 0.25.
+        align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
+            e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+        pretrained (str, optional): The path or url of pretrained model. Default: None.
+    """
+
+    def __init__(self, num_classes: int = 19, lambd: float = 0.25, align_corners: bool = False, pretrained: str = None):
+        super(BiSeNetV2, self).__init__()
+
+        C1, C2, C3 = 64, 64, 128
+        db_channels = (C1, C2, C3)
+        C1, C3, C4, C5 = int(C1 * lambd), int(C3 * lambd), 64, 128
+        sb_channels = (C1, C3, C4, C5)
+        mid_channels = 128
+
+        self.db = DetailBranch(db_channels)
+        self.sb = SemanticBranch(sb_channels)
+
+        self.bga = BGA(mid_channels, align_corners)
+        self.aux_head1 = SegHead(C1, C1, num_classes)
+        self.aux_head2 = SegHead(C3, C3, num_classes)
+        self.aux_head3 = SegHead(C4, C4, num_classes)
+        self.aux_head4 = SegHead(C5, C5, num_classes)
+        self.head = SegHead(mid_channels, mid_channels, num_classes)
+
+        self.align_corners = align_corners
+        self.transforms = T.Compose([T.Normalize()])
+
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+
+        else:
+            checkpoint = os.path.join(self.directory, 'bisenet_model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+
+    def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+        return self.transforms(img)
+
+    def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+        dfm = self.db(x)
+        feat1, feat2, feat3, feat4, sfm = self.sb(x)
+        logit = self.head(self.bga(dfm, sfm))
+
+        if not self.training:
+            logit_list = [logit]
+        else:
+            logit1 = self.aux_head1(feat1)
+            logit2 = self.aux_head2(feat2)
+            logit3 = self.aux_head3(feat3)
+            logit4 = self.aux_head4(feat4)
+            logit_list = [logit, logit1, logit2, logit3, logit4]
+
+        logit_list = [
+            F.interpolate(logit, paddle.shape(x)[2:], mode='bilinear', align_corners=self.align_corners)
+            for logit in logit_list
+        ]
+
+        return logit_list
+
+
+class StemBlock(nn.Layer):
+    def __init__(self, in_dim: int, out_dim: int):
+        super(StemBlock, self).__init__()
+
+        self.conv = layers.ConvBNReLU(in_dim, out_dim, 3, stride=2)
+
+        self.left = nn.Sequential(
+            layers.ConvBNReLU(out_dim, out_dim // 2, 1), layers.ConvBNReLU(out_dim // 2, out_dim, 3, stride=2))
+
+        self.right = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
+
+        self.fuse = layers.ConvBNReLU(out_dim * 2, out_dim, 3)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv(x)
+        left = self.left(x)
+        right = self.right(x)
+        concat = paddle.concat([left, right], axis=1)
+        return self.fuse(concat)
+
+
+class ContextEmbeddingBlock(nn.Layer):
+    def __init__(self, in_dim: int, out_dim: int):
+        super(ContextEmbeddingBlock, self).__init__()
+
+        self.gap = nn.AdaptiveAvgPool2D(1)
+        self.bn = layers.SyncBatchNorm(in_dim)
+
+        self.conv_1x1 = layers.ConvBNReLU(in_dim, out_dim, 1)
+        self.conv_3x3 = nn.Conv2D(out_dim, out_dim, 3, 1, 1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        gap = self.gap(x)
+        bn = self.bn(gap)
+        conv1 = self.conv_1x1(bn) + x
+        return self.conv_3x3(conv1)
+
+
+class GatherAndExpansionLayer1(nn.Layer):
+    """Gather And Expansion Layer with stride 1"""
+
+    def __init__(self, in_dim: int, out_dim: int, expand: int):
+        super().__init__()
+
+        expand_dim = expand * in_dim
+
+        self.conv = nn.Sequential(
+            layers.ConvBNReLU(in_dim, in_dim, 3), layers.DepthwiseConvBN(in_dim, expand_dim, 3),
+            layers.ConvBN(expand_dim, out_dim, 1))
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        return F.relu(self.conv(x) + x)
+
+
+class GatherAndExpansionLayer2(nn.Layer):
+    """Gather And Expansion Layer with stride 2"""
+
+    def __init__(self, in_dim: int, out_dim: int, expand: int):
+        super().__init__()
+
+        expand_dim = expand * in_dim
+
+        self.branch_1 = nn.Sequential(
+            layers.ConvBNReLU(in_dim, in_dim, 3), layers.DepthwiseConvBN(in_dim, expand_dim, 3, stride=2),
+            layers.DepthwiseConvBN(expand_dim, expand_dim, 3), layers.ConvBN(expand_dim, out_dim, 1))
+
+        self.branch_2 = nn.Sequential(
+            layers.DepthwiseConvBN(in_dim, in_dim, 3, stride=2), layers.ConvBN(in_dim, out_dim, 1))
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        return F.relu(self.branch_1(x) + self.branch_2(x))
+
+
+class DetailBranch(nn.Layer):
+    """The detail branch of BiSeNet, which has wide channels but shallow layers."""
+
+    def __init__(self, in_channels: int):
+        super().__init__()
+
+        C1, C2, C3 = in_channels
+
+        self.convs = nn.Sequential(
+            # stage 1
+            layers.ConvBNReLU(3, C1, 3, stride=2),
+            layers.ConvBNReLU(C1, C1, 3),
+            # stage 2
+            layers.ConvBNReLU(C1, C2, 3, stride=2),
+            layers.ConvBNReLU(C2, C2, 3),
+            layers.ConvBNReLU(C2, C2, 3),
+            # stage 3
+            layers.ConvBNReLU(C2, C3, 3, stride=2),
+            layers.ConvBNReLU(C3, C3, 3),
+            layers.ConvBNReLU(C3, C3, 3),
+        )
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        return self.convs(x)
+
+
+class SemanticBranch(nn.Layer):
+    """The semantic branch of BiSeNet, which has narrow channels but deep layers."""
+
+    def __init__(self, in_channels: int):
+        super().__init__()
+        C1, C3, C4, C5 = in_channels
+
+        self.stem = StemBlock(3, C1)
+
+        self.stage3 = nn.Sequential(GatherAndExpansionLayer2(C1, C3, 6), GatherAndExpansionLayer1(C3, C3, 6))
+
+        self.stage4 = nn.Sequential(GatherAndExpansionLayer2(C3, C4, 6), GatherAndExpansionLayer1(C4, C4, 6))
+
+        self.stage5_4 = nn.Sequential(
+            GatherAndExpansionLayer2(C4, C5, 6), GatherAndExpansionLayer1(C5, C5, 6), GatherAndExpansionLayer1(
+                C5, C5, 6), GatherAndExpansionLayer1(C5, C5, 6))
+
+        self.ce = ContextEmbeddingBlock(C5, C5)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        stage2 = self.stem(x)
+        stage3 = self.stage3(stage2)
+        stage4 = self.stage4(stage3)
+        stage5_4 = self.stage5_4(stage4)
+        fm = self.ce(stage5_4)
+        return stage2, stage3, stage4, stage5_4, fm
+
+
+class BGA(nn.Layer):
+    """The Bilateral Guided Aggregation Layer, used to fuse the semantic features and spatial features."""
+
+    def __init__(self, out_dim: int, align_corners: bool):
+        super().__init__()
+
+        self.align_corners = align_corners
+
+        self.db_branch_keep = nn.Sequential(layers.DepthwiseConvBN(out_dim, out_dim, 3), nn.Conv2D(out_dim, out_dim, 1))
+
+        self.db_branch_down = nn.Sequential(
+            layers.ConvBN(out_dim, out_dim, 3, stride=2), nn.AvgPool2D(kernel_size=3, stride=2, padding=1))
+
+        self.sb_branch_keep = nn.Sequential(
+            layers.DepthwiseConvBN(out_dim, out_dim, 3), nn.Conv2D(out_dim, out_dim, 1),
+            layers.Activation(act='sigmoid'))
+
+        self.sb_branch_up = layers.ConvBN(out_dim, out_dim, 3)
+
+        self.conv = layers.ConvBN(out_dim, out_dim, 3)
+
+    def forward(self, dfm: int, sfm: int) -> paddle.Tensor:
+        db_feat_keep = self.db_branch_keep(dfm)
+        db_feat_down = self.db_branch_down(dfm)
+        sb_feat_keep = self.sb_branch_keep(sfm)
+
+        sb_feat_up = self.sb_branch_up(sfm)
+        sb_feat_up = F.interpolate(
+            sb_feat_up, paddle.shape(db_feat_keep)[2:], mode='bilinear', align_corners=self.align_corners)
+
+        sb_feat_up = F.sigmoid(sb_feat_up)
+        db_feat = db_feat_keep * sb_feat_up
+
+        sb_feat = db_feat_down * sb_feat_keep
+        sb_feat = F.interpolate(sb_feat, paddle.shape(db_feat)[2:], mode='bilinear', align_corners=self.align_corners)
+
+        return self.conv(db_feat + sb_feat)
+
+
+class SegHead(nn.Layer):
+    def __init__(self, in_dim: int, mid_dim: int, num_classes: int):
+        super().__init__()
+
+        self.conv_3x3 = nn.Sequential(layers.ConvBNReLU(in_dim, mid_dim, 3), nn.Dropout(0.1))
+
+        self.conv_1x1 = nn.Conv2D(mid_dim, num_classes, 1, 1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        conv1 = self.conv_3x3(x)
+        conv2 = self.conv_1x1(conv1)
+        return conv2
diff --git a/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/README.md b/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..9629ccdf433eb1c1970571a01f2663a2af8457f6
--- /dev/null
+++ b/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/README.md
@@ -0,0 +1,173 @@
+# PaddleHub 图像分割
+
+## 模型预测
+
+若想使用我们提供的预训练模型进行预测，可使用如下脚本：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='deeplabv3p_resnet50_cityscapes')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+
+## 如何开始Fine-tune
+
+在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用deeplabv3p_resnet50_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。
+
+## 代码步骤
+
+使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
+
+### Step1: 定义数据预处理方式
+```python
+from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+```
+
+`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+
+### Step2: 下载数据集并使用
+```python
+from paddlehub.datasets import OpticDiscSeg
+
+train_reader = OpticDiscSeg(transform， mode='train')
+
+```
+* `transform`: 数据预处理方式。
+* `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+
+数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+### Step3: 加载预训练模型
+
+```python
+model = hub.Module(name='deeplabv3p_resnet50_cityscapes', num_classes=2, pretrained=None)
+```
+* `name`: 选择预训练模型的名字。
+* `num_classes`: 分割模型的类别数目。
+* `pretrained`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+
+### Step4: 选择优化策略和运行配置
+
+```python
+scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True)
+```
+
+#### 优化策略
+
+Paddle2.0rc提供了多种优化器选择，如`SGD`, `Adam`, `Adamax`等，其中`Adam`:
+
+* `learning_rate`: 全局学习率。
+*  `parameters`: 待优化模型参数。
+
+#### 运行配置
+`Trainer` 主要控制Fine-tune的训练，包含以下可控制的参数:
+
+* `model`: 被优化模型；
+* `optimizer`: 优化器选择；
+* `use_gpu`: 是否使用gpu，默认为False;
+* `use_vdl`: 是否使用vdl可视化训练过程；
+* `checkpoint_dir`: 保存模型参数的地址；
+* `compare_metrics`: 保存最优模型的衡量指标；
+
+`trainer.train` 主要控制具体的训练过程，包含以下可控制的参数：
+
+* `train_dataset`: 训练时所用的数据集；
+* `epochs`: 训练轮数；
+* `batch_size`: 训练的批大小，如果使用GPU，请根据实际情况调整batch_size；
+* `num_workers`: works的数量，默认为0；
+* `eval_dataset`: 验证集；
+* `log_interval`: 打印日志的间隔， 单位为执行批训练的次数。
+* `save_interval`: 保存模型的间隔频次，单位为执行训练的轮数。
+
+## 模型预测
+
+当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
+
+我们使用该模型来进行预测。predict.py脚本如下：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='deeplabv3p_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+参数配置正确后，请执行脚本`python predict.py`。
+**Args**
+* `images`:原始图像路径或BGR格式图片；
+* `visualization`: 是否可视化，默认为True；
+* `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+
+**NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+
+## 服务部署
+
+PaddleHub Serving可以部署一个在线图像分割服务。
+
+### Step1: 启动PaddleHub Serving
+
+运行启动命令：
+
+```shell
+$ hub serving start -m deeplabv3p_resnet50_cityscapes
+```
+
+这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+
+**NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+
+### Step2: 发送预测请求
+
+配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+
+```python
+import requests
+import json
+import cv2
+import base64
+
+import numpy as np
+
+
+def cv2_to_base64(image):
+    data = cv2.imencode('.jpg', image)[1]
+    return base64.b64encode(data.tostring()).decode('utf8')
+
+def base64_to_cv2(b64str):
+    data = base64.b64decode(b64str.encode('utf8'))
+    data = np.fromstring(data, np.uint8)
+    data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+    return data
+
+# 发送HTTP请求
+org_im = cv2.imread('/PATH/TO/IMAGE')
+data = {'images':[cv2_to_base64(org_im)]}
+headers = {"Content-type": "application/json"}
+url = "http://127.0.0.1:8866/predict/deeplabv3p_resnet50_cityscapes"
+r = requests.post(url=url, headers=headers, data=json.dumps(data))
+mask = base64_to_cv2(r.json()["results"][0])
+```
+
+### 查看代码
+
+https://github.com/PaddlePaddle/PaddleSeg
+
+### 依赖
+
+paddlepaddle >= 2.0.0
+
+paddlehub >= 2.0.0
diff --git a/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/layers.py b/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..ee62265b585c80189c32846c0037b2b002244d6d
--- /dev/null
+++ b/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/layers.py
@@ -0,0 +1,295 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu':
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+    """Basic conv bn relu layer."""
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 stride: int = 1,
+                 dilation: int = 1,
+                 groups: int = 1,
+                 is_vd_mode: bool = False,
+                 act: str = None,
+                 name: str = None):
+        super(ConvBNLayer, self).__init__()
+
+        self.is_vd_mode = is_vd_mode
+        self._pool2d_avg = AvgPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True)
+        self._conv = Conv2D(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+            dilation=dilation,
+            groups=groups,
+            bias_attr=False)
+
+        self._batch_norm = SyncBatchNorm(out_channels)
+        self._act_op = Activation(act=act)
+
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        if self.is_vd_mode:
+            inputs = self._pool2d_avg(inputs)
+        y = self._conv(inputs)
+        y = self._batch_norm(y)
+        y = self._act_op(y)
+
+        return y
+
+
+class BottleneckBlock(nn.Layer):
+    """Residual bottleneck block"""
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 shortcut: bool = True,
+                 if_first: bool = False,
+                 dilation: int = 1,
+                 name: str = None):
+        super(BottleneckBlock, self).__init__()
+
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels, out_channels=out_channels, kernel_size=1, act='relu', name=name + "_branch2a")
+
+        self.dilation = dilation
+
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            act='relu',
+            dilation=dilation,
+            name=name + "_branch2b")
+        self.conv2 = ConvBNLayer(
+            in_channels=out_channels, out_channels=out_channels * 4, kernel_size=1, act=None, name=name + "_branch2c")
+
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels * 4,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                name=name + "_branch1")
+
+        self.shortcut = shortcut
+
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        if self.dilation > 1:
+            padding = self.dilation
+            y = F.pad(y, [padding, padding, padding, padding])
+
+        conv1 = self.conv1(y)
+        conv2 = self.conv2(conv1)
+
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+
+        y = paddle.add(x=short, y=conv2)
+        y = F.relu(y)
+        return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+    """Depthwise Separable Convolution."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
+        super(SeparableConvBNReLU, self).__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+
+
+class ConvBN(nn.Layer):
+    """Basic conv bn layer"""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
+        super(ConvBN, self).__init__()
+        self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+
+
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
+        super(ConvBNReLU, self).__init__()
+
+        self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+
+
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+
+        self._act = act
+        upper_act_names = nn.layer.activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("nn.layer.activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys()))
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
+
+
+class ASPPModule(nn.Layer):
+    """
+    Atrous Spatial Pyramid Pooling.
+
+    Args:
+        aspp_ratios (tuple): The dilation rate using in ASSP module.
+        in_channels (int): The number of input channels.
+        out_channels (int): The number of output channels.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+        use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+        image_pooling (bool, optional): If augmented with image-level features. Default: False
+    """
+
+    def __init__(self,
+                 aspp_ratios: tuple,
+                 in_channels: int,
+                 out_channels: int,
+                 align_corners: bool,
+                 use_sep_conv: bool = False,
+                 image_pooling: bool = False):
+        super().__init__()
+
+        self.align_corners = align_corners
+        self.aspp_blocks = nn.LayerList()
+
+        for ratio in aspp_ratios:
+            if use_sep_conv and ratio > 1:
+                conv_func = SeparableConvBNReLU
+            else:
+                conv_func = ConvBNReLU
+
+            block = conv_func(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1 if ratio == 1 else 3,
+                dilation=ratio,
+                padding=0 if ratio == 1 else ratio)
+            self.aspp_blocks.append(block)
+
+        out_size = len(self.aspp_blocks)
+
+        if image_pooling:
+            self.global_avg_pool = nn.Sequential(
+                nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+                ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+            out_size += 1
+        self.image_pooling = image_pooling
+
+        self.conv_bn_relu = ConvBNReLU(in_channels=out_channels * out_size, out_channels=out_channels, kernel_size=1)
+
+        self.dropout = nn.Dropout(p=0.1)  # drop rate
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outputs = []
+        for block in self.aspp_blocks:
+            y = block(x)
+            y = F.interpolate(y, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
+            outputs.append(y)
+
+        if self.image_pooling:
+            img_avg = self.global_avg_pool(x)
+            img_avg = F.interpolate(img_avg, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
+            outputs.append(img_avg)
+
+        x = paddle.concat(outputs, axis=1)
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+
+        return x
diff --git a/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/module.py b/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..149bc4e04a7b52f8fce71bef4c3dbcdc8e4b74ec
--- /dev/null
+++ b/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/module.py
@@ -0,0 +1,169 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+from deeplabv3p_resnet50_cityscapes.resnet import ResNet50_vd
+import deeplabv3p_resnet50_cityscapes.layers as L
+
+
+@moduleinfo(
+    name="deeplabv3p_resnet50_cityscapes",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="DeepLabV3PResnet50 is a segmentation model.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class DeepLabV3PResnet50(nn.Layer):
+    """
+    The DeepLabV3PResnet50 implementation based on PaddlePaddle.
+
+    The original article refers to
+     Liang-Chieh Chen, et, al. "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation"
+     (https://arxiv.org/abs/1802.02611)
+
+    Args:
+        num_classes (int): the unique number of target classes.
+        backbone_indices (tuple): two values in the tuple indicate the indices of output of backbone.
+            the first index will be taken as a low-level feature in Decoder component;
+            the second one will be taken as input of ASPP component.
+            Usually backbone consists of four downsampling stage, and return an output of
+            each stage, so we set default (0, 3), which means taking feature map of the first
+            stage in backbone as low-level feature used in Decoder, and feature map of the fourth
+            stage as input of ASPP.
+        aspp_ratios (tuple): the dilation rate using in ASSP module.
+            if output_stride=16, aspp_ratios should be set as (1, 6, 12, 18).
+            if output_stride=8, aspp_ratios is (1, 12, 24, 36).
+        aspp_out_channels (int): the output channels of ASPP module.
+        align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
+            e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+        pretrained (str): the path of pretrained model. Default to None.
+    """
+
+    def __init__(self,
+                 num_classes: int = 19,
+                 backbone_indices: Tuple[int] = (0, 3),
+                 aspp_ratios: Tuple[int] = (1, 12, 24, 36),
+                 aspp_out_channels: int = 256,
+                 align_corners=False,
+                 pretrained: str = None):
+        super(DeepLabV3PResnet50, self).__init__()
+        self.backbone = ResNet50_vd()
+        backbone_channels = [self.backbone.feat_channels[i] for i in backbone_indices]
+        self.head = DeepLabV3PHead(num_classes, backbone_indices, backbone_channels, aspp_ratios, aspp_out_channels,
+                                   align_corners)
+        self.align_corners = align_corners
+        self.transforms = T.Compose([T.Normalize()])
+
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+
+        else:
+            checkpoint = os.path.join(self.directory, 'model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+
+    def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+        return self.transforms(img)
+
+    def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+        feat_list = self.backbone(x)
+        logit_list = self.head(feat_list)
+        return [
+            F.interpolate(logit, x.shape[2:], mode='bilinear', align_corners=self.align_corners) for logit in logit_list
+        ]
+
+
+class DeepLabV3PHead(nn.Layer):
+    """
+    The DeepLabV3PHead implementation based on PaddlePaddle.
+
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (tuple): Two values in the tuple indicate the indices of output of backbone.
+            the first index will be taken as a low-level feature in Decoder component;
+            the second one will be taken as input of ASPP component.
+            Usually backbone consists of four downsampling stage, and return an output of
+            each stage. If we set it as (0, 3), it means taking feature map of the first
+            stage in backbone as low-level feature used in Decoder, and feature map of the fourth
+            stage as input of ASPP.
+        backbone_channels (tuple): The same length with "backbone_indices". It indicates the channels of corresponding index.
+        aspp_ratios (tuple): The dilation rates using in ASSP module.
+        aspp_out_channels (int): The output channels of ASPP module.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+    """
+
+    def __init__(self, num_classes: int, backbone_indices: Tuple[paddle.Tensor],
+                 backbone_channels: Tuple[paddle.Tensor], aspp_ratios: Tuple[float], aspp_out_channels: int,
+                 align_corners: bool):
+        super().__init__()
+
+        self.aspp = L.ASPPModule(
+            aspp_ratios, backbone_channels[1], aspp_out_channels, align_corners, use_sep_conv=True, image_pooling=True)
+        self.decoder = Decoder(num_classes, backbone_channels[0], align_corners)
+        self.backbone_indices = backbone_indices
+
+    def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]:
+        logit_list = []
+        low_level_feat = feat_list[self.backbone_indices[0]]
+        x = feat_list[self.backbone_indices[1]]
+        x = self.aspp(x)
+        logit = self.decoder(x, low_level_feat)
+        logit_list.append(logit)
+        return logit_list
+
+
+class Decoder(nn.Layer):
+    """
+    Decoder module of DeepLabV3P model
+
+    Args:
+        num_classes (int): The number of classes.
+        in_channels (int): The number of input channels in decoder module.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+    """
+
+    def __init__(self, num_classes: int, in_channels: int, align_corners: bool):
+        super(Decoder, self).__init__()
+
+        self.conv_bn_relu1 = L.ConvBNReLU(in_channels=in_channels, out_channels=48, kernel_size=1)
+
+        self.conv_bn_relu2 = L.SeparableConvBNReLU(in_channels=304, out_channels=256, kernel_size=3, padding=1)
+        self.conv_bn_relu3 = L.SeparableConvBNReLU(in_channels=256, out_channels=256, kernel_size=3, padding=1)
+        self.conv = nn.Conv2D(in_channels=256, out_channels=num_classes, kernel_size=1)
+
+        self.align_corners = align_corners
+
+    def forward(self, x: paddle.Tensor, low_level_feat: paddle.Tensor) -> paddle.Tensor:
+        low_level_feat = self.conv_bn_relu1(low_level_feat)
+        x = F.interpolate(x, low_level_feat.shape[2:], mode='bilinear', align_corners=self.align_corners)
+        x = paddle.concat([x, low_level_feat], axis=1)
+        x = self.conv_bn_relu2(x)
+        x = self.conv_bn_relu3(x)
+        x = self.conv(x)
+        return x
diff --git a/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/resnet.py b/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..6c7fdfeb66c84d1595954bac4fcd65863649f7c8
--- /dev/null
+++ b/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/resnet.py
@@ -0,0 +1,115 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Union, List, Tuple
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import deeplabv3p_resnet50_cityscapes.layers as L
+
+
+class BasicBlock(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 shortcut: bool = True,
+                 if_first: bool = False,
+                 name: str = None):
+        super(BasicBlock, self).__init__()
+        self.stride = stride
+        self.conv0 = L.ConvBNLayer(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            act='relu',
+            name=name + "_branch2a")
+        self.conv1 = L.ConvBNLayer(
+            in_channels=out_channels, out_channels=out_channels, kernel_size=3, act=None, name=name + "_branch2b")
+
+        if not shortcut:
+            self.short = L.ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first else True,
+                name=name + "_branch1")
+
+        self.shortcut = shortcut
+
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        conv1 = self.conv1(y)
+
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+        y = paddle.elementwise_add(x=short, y=conv1, act='relu')
+
+        return y
+
+
+class ResNet50_vd(nn.Layer):
+    def __init__(self, multi_grid: Tuple[int] = (1, 2, 4)):
+        super(ResNet50_vd, self).__init__()
+        depth = [3, 4, 6, 3]
+        num_channels = [64, 256, 512, 1024]
+        num_filters = [64, 128, 256, 512]
+        self.feat_channels = [c * 4 for c in num_filters]
+        dilation_dict = {2: 2, 3: 4}
+        self.conv1_1 = L.ConvBNLayer(
+            in_channels=3, out_channels=32, kernel_size=3, stride=2, act='relu', name="conv1_1")
+        self.conv1_2 = L.ConvBNLayer(
+            in_channels=32, out_channels=32, kernel_size=3, stride=1, act='relu', name="conv1_2")
+        self.conv1_3 = L.ConvBNLayer(
+            in_channels=32, out_channels=64, kernel_size=3, stride=1, act='relu', name="conv1_3")
+        self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
+        self.stage_list = []
+
+        for block in range(len(depth)):
+            shortcut = False
+            block_list = []
+            for i in range(depth[block]):
+                conv_name = "res" + str(block + 2) + chr(97 + i)
+                dilation_rate = dilation_dict[block] if dilation_dict and block in dilation_dict else 1
+                if block == 3:
+                    dilation_rate = dilation_rate * multi_grid[i]
+                bottleneck_block = self.add_sublayer(
+                    'bb_%d_%d' % (block, i),
+                    L.BottleneckBlock(
+                        in_channels=num_channels[block] if i == 0 else num_filters[block] * 4,
+                        out_channels=num_filters[block],
+                        stride=2 if i == 0 and block != 0 and dilation_rate == 1 else 1,
+                        shortcut=shortcut,
+                        if_first=block == i == 0,
+                        name=conv_name,
+                        dilation=dilation_rate))
+                block_list.append(bottleneck_block)
+                shortcut = True
+            self.stage_list.append(block_list)
+
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv1_1(inputs)
+        y = self.conv1_2(y)
+        y = self.conv1_3(y)
+        y = self.pool2d_max(y)
+        feat_list = []
+        for stage in self.stage_list:
+            for block in stage:
+                y = block(y)
+            feat_list.append(y)
+        return feat_list
diff --git a/modules/image/semantic_segmentation/fastscnn_cityscapes/README.md b/modules/image/semantic_segmentation/fastscnn_cityscapes/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..1c5db026d2bb5450aca1f2e4106f0e3abf0f212c
--- /dev/null
+++ b/modules/image/semantic_segmentation/fastscnn_cityscapes/README.md
@@ -0,0 +1,173 @@
+# PaddleHub 图像分割
+
+## 模型预测
+
+若想使用我们提供的预训练模型进行预测，可使用如下脚本：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='fastscnn_cityscapes')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+
+## 如何开始Fine-tune
+
+在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用fastscnn_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。
+
+## 代码步骤
+
+使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
+
+### Step1: 定义数据预处理方式
+```python
+from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+```
+
+`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+
+### Step2: 下载数据集并使用
+```python
+from paddlehub.datasets import OpticDiscSeg
+
+train_reader = OpticDiscSeg(transform， mode='train')
+
+```
+* `transform`: 数据预处理方式。
+* `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+
+数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+### Step3: 加载预训练模型
+
+```python
+model = hub.Module(name='fastscnn_cityscapes', num_classes=2, pretrained=None)
+```
+* `name`: 选择预训练模型的名字。
+* `num_classes`: 分割模型的类别数目。
+* `pretrained`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+
+### Step4: 选择优化策略和运行配置
+
+```python
+scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True)
+```
+
+#### 优化策略
+
+Paddle2.0rc提供了多种优化器选择，如`SGD`, `Adam`, `Adamax`等，其中`Adam`:
+
+* `learning_rate`: 全局学习率。
+*  `parameters`: 待优化模型参数。
+
+#### 运行配置
+`Trainer` 主要控制Fine-tune的训练，包含以下可控制的参数:
+
+* `model`: 被优化模型；
+* `optimizer`: 优化器选择；
+* `use_gpu`: 是否使用gpu，默认为False;
+* `use_vdl`: 是否使用vdl可视化训练过程；
+* `checkpoint_dir`: 保存模型参数的地址；
+* `compare_metrics`: 保存最优模型的衡量指标；
+
+`trainer.train` 主要控制具体的训练过程，包含以下可控制的参数：
+
+* `train_dataset`: 训练时所用的数据集；
+* `epochs`: 训练轮数；
+* `batch_size`: 训练的批大小，如果使用GPU，请根据实际情况调整batch_size；
+* `num_workers`: works的数量，默认为0；
+* `eval_dataset`: 验证集；
+* `log_interval`: 打印日志的间隔， 单位为执行批训练的次数。
+* `save_interval`: 保存模型的间隔频次，单位为执行训练的轮数。
+
+## 模型预测
+
+当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
+
+我们使用该模型来进行预测。predict.py脚本如下：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='fastscnn_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+参数配置正确后，请执行脚本`python predict.py`。
+**Args**
+* `images`:原始图像路径或BGR格式图片；
+* `visualization`: 是否可视化，默认为True；
+* `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+
+**NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+
+## 服务部署
+
+PaddleHub Serving可以部署一个在线图像分割服务。
+
+### Step1: 启动PaddleHub Serving
+
+运行启动命令：
+
+```shell
+$ hub serving start -m fastscnn_cityscapes
+```
+
+这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+
+**NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+
+### Step2: 发送预测请求
+
+配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+
+```python
+import requests
+import json
+import cv2
+import base64
+
+import numpy as np
+
+
+def cv2_to_base64(image):
+    data = cv2.imencode('.jpg', image)[1]
+    return base64.b64encode(data.tostring()).decode('utf8')
+
+def base64_to_cv2(b64str):
+    data = base64.b64decode(b64str.encode('utf8'))
+    data = np.fromstring(data, np.uint8)
+    data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+    return data
+
+# 发送HTTP请求
+org_im = cv2.imread('/PATH/TO/IMAGE')
+data = {'images':[cv2_to_base64(org_im)]}
+headers = {"Content-type": "application/json"}
+url = "http://127.0.0.1:8866/predict/fastscnn_cityscapes"
+r = requests.post(url=url, headers=headers, data=json.dumps(data))
+mask = base64_to_cv2(r.json()["results"][0])
+```
+
+### 查看代码
+
+https://github.com/PaddlePaddle/PaddleSeg
+
+### 依赖
+
+paddlepaddle >= 2.0.0
+
+paddlehub >= 2.0.0
diff --git a/modules/image/semantic_segmentation/fastscnn_cityscapes/layers.py b/modules/image/semantic_segmentation/fastscnn_cityscapes/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..5e36a1501126097f5021c0b5e2e53cd98b67976a
--- /dev/null
+++ b/modules/image/semantic_segmentation/fastscnn_cityscapes/layers.py
@@ -0,0 +1,256 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Tuple
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu' or os.environ.get('PADDLESEG_EXPORT_STAGE'):
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
+        super().__init__()
+
+        self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+
+
+class ConvBN(nn.Layer):
+    """Basic conv bn layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
+        super().__init__()
+        self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+
+
+class ConvReLUPool(nn.Layer):
+    """Basic conv bn pool layer."""
+
+    def __init__(self, in_channels: int, out_channels: int):
+        super().__init__()
+        self.conv = nn.Conv2D(in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv(x)
+        x = F.relu(x)
+        x = F.pool2d(x, pool_size=2, pool_type="max", pool_stride=2)
+        return x
+
+
+class SeparableConvBNReLU(nn.Layer):
+    """Basic separable conv bn relu layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
+        super().__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+
+
+class DepthwiseConvBN(nn.Layer):
+    """Basic depthwise conv bn relu layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
+        super().__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        return x
+
+
+class AuxLayer(nn.Layer):
+    """
+    The auxiliary layer implementation for auxiliary loss.
+
+    Args:
+        in_channels (int): The number of input channels.
+        inter_channels (int): The intermediate channels.
+        out_channels (int): The number of output channels, and usually it is num_classes.
+        dropout_prob (float, optional): The drop rate. Default: 0.1.
+    """
+
+    def __init__(self, in_channels: int, inter_channels: int, out_channels: int, dropout_prob: float = 0.1):
+        super().__init__()
+
+        self.conv_bn_relu = ConvBNReLU(in_channels=in_channels, out_channels=inter_channels, kernel_size=3, padding=1)
+
+        self.dropout = nn.Dropout(p=dropout_prob)
+
+        self.conv = nn.Conv2D(in_channels=inter_channels, out_channels=out_channels, kernel_size=1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        x = self.conv(x)
+        return x
+
+
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+
+        self._act = act
+        upper_act_names = nn.layer.activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("nn.layer.activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys()))
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
+
+
+class PPModule(nn.Layer):
+    """
+    Pyramid pooling module originally in PSPNet.
+
+    Args:
+        in_channels (int): The number of intput channels to pyramid pooling module.
+        out_channels (int): The number of output channels after pyramid pooling module.
+        bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1, 2, 3, 6).
+        dim_reduction (bool, optional): A bool value represents if reducing dimension after pooling. Default: True.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+    """
+
+    def __init__(self, in_channels: int, out_channels: int, bin_sizes: Tuple, dim_reduction: bool, align_corners: bool):
+        super().__init__()
+
+        self.bin_sizes = bin_sizes
+
+        inter_channels = in_channels
+        if dim_reduction:
+            inter_channels = in_channels // len(bin_sizes)
+
+        # we use dimension reduction after pooling mentioned in original implementation.
+        self.stages = nn.LayerList([self._make_stage(in_channels, inter_channels, size) for size in bin_sizes])
+
+        self.conv_bn_relu2 = ConvBNReLU(
+            in_channels=in_channels + inter_channels * len(bin_sizes),
+            out_channels=out_channels,
+            kernel_size=3,
+            padding=1)
+
+        self.align_corners = align_corners
+
+    def _make_stage(self, in_channels: int, out_channels: int, size: int):
+        """
+        Create one pooling layer.
+
+        In our implementation, we adopt the same dimension reduction as the original paper that might be
+        slightly different with other implementations.
+
+        After pooling, the channels are reduced to 1/len(bin_sizes) immediately, while some other implementations
+        keep the channels to be same.
+
+        Args:
+            in_channels (int): The number of intput channels to pyramid pooling module.
+            out_channels (int): The number of output channels to pyramid pooling module.
+            size (int): The out size of the pooled layer.
+
+        Returns:
+            conv (Tensor): A tensor after Pyramid Pooling Module.
+        """
+
+        prior = nn.AdaptiveAvgPool2D(output_size=(size, size))
+        conv = ConvBNReLU(in_channels=in_channels, out_channels=out_channels, kernel_size=1)
+
+        return nn.Sequential(prior, conv)
+
+    def forward(self, input: paddle.Tensor) -> paddle.Tensor:
+        cat_layers = []
+        for stage in self.stages:
+            x = stage(input)
+            x = F.interpolate(x, paddle.shape(input)[2:], mode='bilinear', align_corners=self.align_corners)
+            cat_layers.append(x)
+        cat_layers = [input] + cat_layers[::-1]
+        cat = paddle.concat(cat_layers, axis=1)
+        out = self.conv_bn_relu2(cat)
+
+        return out
diff --git a/modules/image/semantic_segmentation/fastscnn_cityscapes/module.py b/modules/image/semantic_segmentation/fastscnn_cityscapes/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..88e805fdcf405ea080cc37ba01e456a8bcba2acd
--- /dev/null
+++ b/modules/image/semantic_segmentation/fastscnn_cityscapes/module.py
@@ -0,0 +1,275 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+from typing import Callable, Union, Tuple
+
+import paddle.nn as nn
+import paddle.nn.functional as F
+import paddle
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+import fastscnn_cityscapes.layers as layers
+
+
+@moduleinfo(
+    name="fastscnn_cityscapes",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="fastscnn_cityscapes is a segmentation model.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class FastSCNN(nn.Layer):
+    """
+    The FastSCNN implementation based on PaddlePaddle.
+    As mentioned in the original paper, FastSCNN is a real-time segmentation algorithm (123.5fps)
+    even for high resolution images (1024x2048).
+    The original article refers to
+    Poudel, Rudra PK, et al. "Fast-scnn: Fast semantic segmentation network"
+    (https://arxiv.org/pdf/1902.04502.pdf).
+    Args:
+        num_classes (int): The unique number of target classes, default is 19.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
+        pretrained (str, optional): The path or url of pretrained model. Default: None.
+    """
+
+    def __init__(self, num_classes: int = 19, align_corners: bool = False, pretrained: str = None):
+
+        super(FastSCNN, self).__init__()
+
+        self.learning_to_downsample = LearningToDownsample(32, 48, 64)
+        self.global_feature_extractor = GlobalFeatureExtractor(
+            in_channels=64,
+            block_channels=[64, 96, 128],
+            out_channels=128,
+            expansion=6,
+            num_blocks=[3, 3, 3],
+            align_corners=True)
+        self.feature_fusion = FeatureFusionModule(64, 128, 128, align_corners)
+        self.classifier = Classifier(128, num_classes)
+        self.align_corners = align_corners
+        self.transforms = T.Compose([T.Normalize()])
+
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+
+        else:
+            checkpoint = os.path.join(self.directory, 'fastscnn_model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+
+    def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+        return self.transforms(img)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        logit_list = []
+        input_size = paddle.shape(x)[2:]
+        higher_res_features = self.learning_to_downsample(x)
+        x = self.global_feature_extractor(higher_res_features)
+        x = self.feature_fusion(higher_res_features, x)
+        logit = self.classifier(x)
+        logit = F.interpolate(logit, input_size, mode='bilinear', align_corners=self.align_corners)
+        logit_list.append(logit)
+
+        return logit_list
+
+
+class LearningToDownsample(nn.Layer):
+    """
+    Learning to downsample module.
+    This module consists of three downsampling blocks (one conv and two separable conv)
+    Args:
+        dw_channels1 (int, optional): The input channels of the first sep conv. Default: 32.
+        dw_channels2 (int, optional): The input channels of the second sep conv. Default: 48.
+        out_channels (int, optional): The output channels of LearningToDownsample module. Default: 64.
+    """
+
+    def __init__(self, dw_channels1: int = 32, dw_channels2: int = 48, out_channels: int = 64):
+        super(LearningToDownsample, self).__init__()
+
+        self.conv_bn_relu = layers.ConvBNReLU(in_channels=3, out_channels=dw_channels1, kernel_size=3, stride=2)
+        self.dsconv_bn_relu1 = layers.SeparableConvBNReLU(
+            in_channels=dw_channels1, out_channels=dw_channels2, kernel_size=3, stride=2, padding=1)
+        self.dsconv_bn_relu2 = layers.SeparableConvBNReLU(
+            in_channels=dw_channels2, out_channels=out_channels, kernel_size=3, stride=2, padding=1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv_bn_relu(x)
+        x = self.dsconv_bn_relu1(x)
+        x = self.dsconv_bn_relu2(x)
+        return x
+
+
+class GlobalFeatureExtractor(nn.Layer):
+    """
+    Global feature extractor module.
+    This module consists of three InvertedBottleneck blocks (like inverted residual introduced by MobileNetV2) and
+    a PPModule (introduced by PSPNet).
+    Args:
+        in_channels (int): The number of input channels to the module.
+        block_channels (tuple): A tuple represents output channels of each bottleneck block.
+        out_channels (int): The number of output channels of the module. Default:
+        expansion (int): The expansion factor in bottleneck.
+        num_blocks (tuple): It indicates the repeat time of each bottleneck.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+    """
+
+    def __init__(self, in_channels: int, block_channels: int, out_channels: int, expansion: int, num_blocks: Tuple[int],
+                 align_corners: bool):
+        super(GlobalFeatureExtractor, self).__init__()
+
+        self.bottleneck1 = self._make_layer(InvertedBottleneck, in_channels, block_channels[0], num_blocks[0],
+                                            expansion, 2)
+        self.bottleneck2 = self._make_layer(InvertedBottleneck, block_channels[0], block_channels[1], num_blocks[1],
+                                            expansion, 2)
+        self.bottleneck3 = self._make_layer(InvertedBottleneck, block_channels[1], block_channels[2], num_blocks[2],
+                                            expansion, 1)
+
+        self.ppm = layers.PPModule(
+            block_channels[2], out_channels, bin_sizes=(1, 2, 3, 6), dim_reduction=True, align_corners=align_corners)
+
+    def _make_layer(self,
+                    block: Callable,
+                    in_channels: int,
+                    out_channels: int,
+                    blocks: int,
+                    expansion: int = 6,
+                    stride: int = 1):
+        layers = []
+        layers.append(block(in_channels, out_channels, expansion, stride))
+        for _ in range(1, blocks):
+            layers.append(block(out_channels, out_channels, expansion, 1))
+        return nn.Sequential(*layers)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.bottleneck1(x)
+        x = self.bottleneck2(x)
+        x = self.bottleneck3(x)
+        x = self.ppm(x)
+        return x
+
+
+class InvertedBottleneck(nn.Layer):
+    """
+    Single Inverted bottleneck implementation.
+    Args:
+        in_channels (int): The number of input channels to bottleneck block.
+        out_channels (int): The number of output channels of bottleneck block.
+        expansion (int, optional). The expansion factor in bottleneck. Default: 6.
+        stride (int, optional). The stride used in depth-wise conv. Defalt: 2.
+    """
+
+    def __init__(self, in_channels: int, out_channels: int, expansion: int = 6, stride: int = 2):
+        super().__init__()
+
+        self.use_shortcut = stride == 1 and in_channels == out_channels
+
+        expand_channels = in_channels * expansion
+        self.block = nn.Sequential(
+            # pw
+            layers.ConvBNReLU(in_channels=in_channels, out_channels=expand_channels, kernel_size=1, bias_attr=False),
+            # dw
+            layers.ConvBNReLU(
+                in_channels=expand_channels,
+                out_channels=expand_channels,
+                kernel_size=3,
+                stride=stride,
+                padding=1,
+                groups=expand_channels,
+                bias_attr=False),
+            # pw-linear
+            layers.ConvBN(in_channels=expand_channels, out_channels=out_channels, kernel_size=1, bias_attr=False))
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        out = self.block(x)
+        if self.use_shortcut:
+            out = x + out
+        return out
+
+
+class FeatureFusionModule(nn.Layer):
+    """
+    Feature Fusion Module Implementation.
+    This module fuses high-resolution feature and low-resolution feature.
+    Args:
+        high_in_channels (int): The channels of high-resolution feature (output of LearningToDownsample).
+        low_in_channels (int): The channels of low-resolution feature (output of GlobalFeatureExtractor).
+        out_channels (int): The output channels of this module.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+    """
+
+    def __init__(self, high_in_channels: int, low_in_channels: int, out_channels: int, align_corners: bool):
+        super().__init__()
+
+        # Only depth-wise conv
+        self.dwconv = layers.ConvBNReLU(
+            in_channels=low_in_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            padding=1,
+            groups=128,
+            bias_attr=False)
+
+        self.conv_low_res = layers.ConvBN(out_channels, out_channels, 1)
+        self.conv_high_res = layers.ConvBN(high_in_channels, out_channels, 1)
+        self.align_corners = align_corners
+
+    def forward(self, high_res_input: int, low_res_input: int) -> paddle.Tensor:
+        low_res_input = F.interpolate(
+            low_res_input, paddle.shape(high_res_input)[2:], mode='bilinear', align_corners=self.align_corners)
+        low_res_input = self.dwconv(low_res_input)
+        low_res_input = self.conv_low_res(low_res_input)
+        high_res_input = self.conv_high_res(high_res_input)
+        x = high_res_input + low_res_input
+
+        return F.relu(x)
+
+
+class Classifier(nn.Layer):
+    """
+    The Classifier module implementation.
+    This module consists of two depth-wise conv and one conv.
+    Args:
+        input_channels (int): The input channels to this module.
+        num_classes (int): The unique number of target classes.
+    """
+
+    def __init__(self, input_channels: int, num_classes: int):
+        super().__init__()
+
+        self.dsconv1 = layers.SeparableConvBNReLU(
+            in_channels=input_channels, out_channels=input_channels, kernel_size=3, padding=1)
+
+        self.dsconv2 = layers.SeparableConvBNReLU(
+            in_channels=input_channels, out_channels=input_channels, kernel_size=3, padding=1)
+
+        self.conv = nn.Conv2D(in_channels=input_channels, out_channels=num_classes, kernel_size=1)
+
+        self.dropout = nn.Dropout(p=0.1)  # dropout_prob
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.dsconv1(x)
+        x = self.dsconv2(x)
+        x = self.dropout(x)
+        x = self.conv(x)
+        return x
diff --git a/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/README.md b/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..7cd0b8cc83f8024bf90f01dcb5f46d893ca18298
--- /dev/null
+++ b/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/README.md
@@ -0,0 +1,174 @@
+# PaddleHub 图像分割
+
+## 模型预测
+
+
+若想使用我们提供的预训练模型进行预测，可使用如下脚本：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='fcn_hrnetw18_cityscapes')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+
+## 如何开始Fine-tune
+
+在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用fcn_hrnetw18_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。
+
+## 代码步骤
+
+使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
+
+### Step1: 定义数据预处理方式
+```python
+from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+```
+
+`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+
+### Step2: 下载数据集并使用
+```python
+from paddlehub.datasets import OpticDiscSeg
+
+train_reader = OpticDiscSeg(transform， mode='train')
+
+```
+* `transform`: 数据预处理方式。
+* `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+
+数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+### Step3: 加载预训练模型
+
+```python
+model = hub.Module(name='fcn_hrnetw18_cityscapes', num_classes=2, pretrained=None)
+```
+* `name`: 选择预训练模型的名字。
+* `num_classes`: 分割模型的类别数目。
+* `pretrained`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+
+### Step4: 选择优化策略和运行配置
+
+```python
+scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True)
+```
+
+#### 优化策略
+
+Paddle2.0rc提供了多种优化器选择，如`SGD`, `Adam`, `Adamax`等, 其中`Adam`:
+
+* `learning_rate`: 全局学习率。
+*  `parameters`: 待优化模型参数。
+
+#### 运行配置
+`Trainer` 主要控制Fine-tune的训练，包含以下可控制的参数:
+
+* `model`: 被优化模型；
+* `optimizer`: 优化器选择；
+* `use_gpu`: 是否使用gpu，默认为False;
+* `use_vdl`: 是否使用vdl可视化训练过程；
+* `checkpoint_dir`: 保存模型参数的地址；
+* `compare_metrics`: 保存最优模型的衡量指标；
+
+`trainer.train` 主要控制具体的训练过程，包含以下可控制的参数：
+
+* `train_dataset`: 训练时所用的数据集；
+* `epochs`: 训练轮数；
+* `batch_size`: 训练的批大小，如果使用GPU，请根据实际情况调整batch_size；
+* `num_workers`: works的数量，默认为0；
+* `eval_dataset`: 验证集；
+* `log_interval`: 打印日志的间隔， 单位为执行批训练的次数。
+* `save_interval`: 保存模型的间隔频次，单位为执行训练的轮数。
+
+## 模型预测
+
+当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
+
+我们使用该模型来进行预测。predict.py脚本如下：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='fcn_hrnetw18_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+参数配置正确后，请执行脚本`python predict.py`。
+**Args**
+* `images`:原始图像路径或BGR格式图片；
+* `visualization`: 是否可视化，默认为True；
+* `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+
+**NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+
+## 服务部署
+
+PaddleHub Serving可以部署一个在线图像分割服务。
+
+### Step1: 启动PaddleHub Serving
+
+运行启动命令：
+
+```shell
+$ hub serving start -m fcn_hrnetw18_cityscapes
+```
+
+这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+
+**NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+
+### Step2: 发送预测请求
+
+配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+
+```python
+import requests
+import json
+import cv2
+import base64
+
+import numpy as np
+
+
+def cv2_to_base64(image):
+    data = cv2.imencode('.jpg', image)[1]
+    return base64.b64encode(data.tostring()).decode('utf8')
+
+def base64_to_cv2(b64str):
+    data = base64.b64decode(b64str.encode('utf8'))
+    data = np.fromstring(data, np.uint8)
+    data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+    return data
+
+# 发送HTTP请求
+org_im = cv2.imread('/PATH/TO/IMAGE')
+data = {'images':[cv2_to_base64(org_im)]}
+headers = {"Content-type": "application/json"}
+url = "http://127.0.0.1:8866/predict/fcn_hrnetw18_cityscapes"
+r = requests.post(url=url, headers=headers, data=json.dumps(data))
+mask = base64_to_cv2(r.json()["results"][0])
+```
+
+### 查看代码
+
+https://github.com/PaddlePaddle/PaddleSeg
+
+### 依赖
+
+paddlepaddle >= 2.0.0
+
+paddlehub >= 2.0.0
diff --git a/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/hrnet.py b/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/hrnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..3e8422ad158de9b13d4eb4771f1a1736cc3b571e
--- /dev/null
+++ b/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/hrnet.py
@@ -0,0 +1,531 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+from typing import Tuple
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+import fcn_hrnetw18_cityscapes.layers as L
+
+
+class HRNet_W18(nn.Layer):
+    """
+    The HRNet implementation based on PaddlePaddle.
+
+    The original article refers to
+    Jingdong Wang, et, al. "HRNet：Deep High-Resolution Representation Learning for Visual Recognition"
+    (https://arxiv.org/pdf/1908.07919.pdf).
+
+    Args:
+        stage1_num_modules (int, optional): Number of modules for stage1. Default 1.
+        stage1_num_blocks (list, optional): Number of blocks per module for stage1. Default (4).
+        stage1_num_channels (list, optional): Number of channels per branch for stage1. Default (64).
+        stage2_num_modules (int, optional): Number of modules for stage2. Default 1.
+        stage2_num_blocks (list, optional): Number of blocks per module for stage2. Default (4, 4).
+        stage2_num_channels (list, optional): Number of channels per branch for stage2. Default (18, 36).
+        stage3_num_modules (int, optional): Number of modules for stage3. Default 4.
+        stage3_num_blocks (list, optional): Number of blocks per module for stage3. Default (4, 4, 4).
+        stage3_num_channels (list, optional): Number of channels per branch for stage3. Default [18, 36, 72).
+        stage4_num_modules (int, optional): Number of modules for stage4. Default 3.
+        stage4_num_blocks (list, optional): Number of blocks per module for stage4. Default (4, 4, 4, 4).
+        stage4_num_channels (list, optional): Number of channels per branch for stage4. Default (18, 36, 72. 144).
+        has_se (bool, optional): Whether to use Squeeze-and-Excitation module. Default False.
+        align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
+            e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+    """
+
+    def __init__(self,
+                 stage1_num_modules: int = 1,
+                 stage1_num_blocks: Tuple[int] = (4, ),
+                 stage1_num_channels: Tuple[int] = (64, ),
+                 stage2_num_modules: int = 1,
+                 stage2_num_blocks: Tuple[int] = (4, 4),
+                 stage2_num_channels: Tuple[int] = (18, 36),
+                 stage3_num_modules: int = 4,
+                 stage3_num_blocks: Tuple[int] = (4, 4, 4),
+                 stage3_num_channels: Tuple[int] = (18, 36, 72),
+                 stage4_num_modules: int = 3,
+                 stage4_num_blocks: Tuple[int] = (4, 4, 4, 4),
+                 stage4_num_channels: Tuple[int] = (18, 36, 72, 144),
+                 has_se: bool = False,
+                 align_corners: bool = False):
+        super(HRNet_W18, self).__init__()
+
+        self.stage1_num_modules = stage1_num_modules
+        self.stage1_num_blocks = stage1_num_blocks
+        self.stage1_num_channels = stage1_num_channels
+        self.stage2_num_modules = stage2_num_modules
+        self.stage2_num_blocks = stage2_num_blocks
+        self.stage2_num_channels = stage2_num_channels
+        self.stage3_num_modules = stage3_num_modules
+        self.stage3_num_blocks = stage3_num_blocks
+        self.stage3_num_channels = stage3_num_channels
+        self.stage4_num_modules = stage4_num_modules
+        self.stage4_num_blocks = stage4_num_blocks
+        self.stage4_num_channels = stage4_num_channels
+        self.has_se = has_se
+        self.align_corners = align_corners
+        self.feat_channels = [sum(stage4_num_channels)]
+
+        self.conv_layer1_1 = L.ConvBNReLU(
+            in_channels=3, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False)
+
+        self.conv_layer1_2 = L.ConvBNReLU(
+            in_channels=64, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False)
+
+        self.la1 = Layer1(
+            num_channels=64,
+            num_blocks=self.stage1_num_blocks[0],
+            num_filters=self.stage1_num_channels[0],
+            has_se=has_se,
+            name="layer2")
+
+        self.tr1 = TransitionLayer(
+            in_channels=[self.stage1_num_channels[0] * 4], out_channels=self.stage2_num_channels, name="tr1")
+
+        self.st2 = Stage(
+            num_channels=self.stage2_num_channels,
+            num_modules=self.stage2_num_modules,
+            num_blocks=self.stage2_num_blocks,
+            num_filters=self.stage2_num_channels,
+            has_se=self.has_se,
+            name="st2",
+            align_corners=align_corners)
+
+        self.tr2 = TransitionLayer(
+            in_channels=self.stage2_num_channels, out_channels=self.stage3_num_channels, name="tr2")
+        self.st3 = Stage(
+            num_channels=self.stage3_num_channels,
+            num_modules=self.stage3_num_modules,
+            num_blocks=self.stage3_num_blocks,
+            num_filters=self.stage3_num_channels,
+            has_se=self.has_se,
+            name="st3",
+            align_corners=align_corners)
+
+        self.tr3 = TransitionLayer(
+            in_channels=self.stage3_num_channels, out_channels=self.stage4_num_channels, name="tr3")
+        self.st4 = Stage(
+            num_channels=self.stage4_num_channels,
+            num_modules=self.stage4_num_modules,
+            num_blocks=self.stage4_num_blocks,
+            num_filters=self.stage4_num_channels,
+            has_se=self.has_se,
+            name="st4",
+            align_corners=align_corners)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        conv1 = self.conv_layer1_1(x)
+        conv2 = self.conv_layer1_2(conv1)
+
+        la1 = self.la1(conv2)
+
+        tr1 = self.tr1([la1])
+        st2 = self.st2(tr1)
+
+        tr2 = self.tr2(st2)
+        st3 = self.st3(tr2)
+
+        tr3 = self.tr3(st3)
+        st4 = self.st4(tr3)
+
+        x0_h, x0_w = st4[0].shape[2:]
+        x1 = F.interpolate(st4[1], (x0_h, x0_w), mode='bilinear', align_corners=self.align_corners)
+        x2 = F.interpolate(st4[2], (x0_h, x0_w), mode='bilinear', align_corners=self.align_corners)
+        x3 = F.interpolate(st4[3], (x0_h, x0_w), mode='bilinear', align_corners=self.align_corners)
+        x = paddle.concat([st4[0], x1, x2, x3], axis=1)
+
+        return [x]
+
+
+class Layer1(nn.Layer):
+    def __init__(self, num_channels: int, num_filters: int, num_blocks: int, has_se: bool = False, name: str = None):
+        super(Layer1, self).__init__()
+
+        self.bottleneck_block_list = []
+
+        for i in range(num_blocks):
+            bottleneck_block = self.add_sublayer(
+                "bb_{}_{}".format(name, i + 1),
+                BottleneckBlock(
+                    num_channels=num_channels if i == 0 else num_filters * 4,
+                    num_filters=num_filters,
+                    has_se=has_se,
+                    stride=1,
+                    downsample=True if i == 0 else False,
+                    name=name + '_' + str(i + 1)))
+            self.bottleneck_block_list.append(bottleneck_block)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        conv = x
+        for block_func in self.bottleneck_block_list:
+            conv = block_func(conv)
+        return conv
+
+
+class TransitionLayer(nn.Layer):
+    def __init__(self, in_channels: int, out_channels: int, name=None):
+        super(TransitionLayer, self).__init__()
+
+        num_in = len(in_channels)
+        num_out = len(out_channels)
+        self.conv_bn_func_list = []
+        for i in range(num_out):
+            residual = None
+            if i < num_in:
+                if in_channels[i] != out_channels[i]:
+                    residual = self.add_sublayer(
+                        "transition_{}_layer_{}".format(name, i + 1),
+                        L.ConvBNReLU(
+                            in_channels=in_channels[i],
+                            out_channels=out_channels[i],
+                            kernel_size=3,
+                            padding='same',
+                            bias_attr=False))
+            else:
+                residual = self.add_sublayer(
+                    "transition_{}_layer_{}".format(name, i + 1),
+                    L.ConvBNReLU(
+                        in_channels=in_channels[-1],
+                        out_channels=out_channels[i],
+                        kernel_size=3,
+                        stride=2,
+                        padding='same',
+                        bias_attr=False))
+            self.conv_bn_func_list.append(residual)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outs = []
+        for idx, conv_bn_func in enumerate(self.conv_bn_func_list):
+            if conv_bn_func is None:
+                outs.append(x[idx])
+            else:
+                if idx < len(x):
+                    outs.append(conv_bn_func(x[idx]))
+                else:
+                    outs.append(conv_bn_func(x[-1]))
+        return outs
+
+
+class Branches(nn.Layer):
+    def __init__(self, num_blocks: int, in_channels: int, out_channels: int, has_se: bool = False, name: str = None):
+        super(Branches, self).__init__()
+
+        self.basic_block_list = []
+
+        for i in range(len(out_channels)):
+            self.basic_block_list.append([])
+            for j in range(num_blocks[i]):
+                in_ch = in_channels[i] if j == 0 else out_channels[i]
+                basic_block_func = self.add_sublayer(
+                    "bb_{}_branch_layer_{}_{}".format(name, i + 1, j + 1),
+                    BasicBlock(
+                        num_channels=in_ch,
+                        num_filters=out_channels[i],
+                        has_se=has_se,
+                        name=name + '_branch_layer_' + str(i + 1) + '_' + str(j + 1)))
+                self.basic_block_list[i].append(basic_block_func)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outs = []
+        for idx, input in enumerate(x):
+            conv = input
+            for basic_block_func in self.basic_block_list[idx]:
+                conv = basic_block_func(conv)
+            outs.append(conv)
+        return outs
+
+
+class BottleneckBlock(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_filters: int,
+                 has_se: bool,
+                 stride: int = 1,
+                 downsample: bool = False,
+                 name: str = None):
+        super(BottleneckBlock, self).__init__()
+
+        self.has_se = has_se
+        self.downsample = downsample
+
+        self.conv1 = L.ConvBNReLU(
+            in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False)
+
+        self.conv2 = L.ConvBNReLU(
+            in_channels=num_filters,
+            out_channels=num_filters,
+            kernel_size=3,
+            stride=stride,
+            padding='same',
+            bias_attr=False)
+
+        self.conv3 = L.ConvBN(
+            in_channels=num_filters, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False)
+
+        if self.downsample:
+            self.conv_down = L.ConvBN(
+                in_channels=num_channels, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False)
+
+        if self.has_se:
+            self.se = SELayer(
+                num_channels=num_filters * 4, num_filters=num_filters * 4, reduction_ratio=16, name=name + '_fc')
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        residual = x
+        conv1 = self.conv1(x)
+        conv2 = self.conv2(conv1)
+        conv3 = self.conv3(conv2)
+
+        if self.downsample:
+            residual = self.conv_down(x)
+
+        if self.has_se:
+            conv3 = self.se(conv3)
+
+        y = conv3 + residual
+        y = F.relu(y)
+        return y
+
+
+class BasicBlock(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_filters: int,
+                 stride: int = 1,
+                 has_se: bool = False,
+                 downsample: bool = False,
+                 name: str = None):
+        super(BasicBlock, self).__init__()
+
+        self.has_se = has_se
+        self.downsample = downsample
+
+        self.conv1 = L.ConvBNReLU(
+            in_channels=num_channels,
+            out_channels=num_filters,
+            kernel_size=3,
+            stride=stride,
+            padding='same',
+            bias_attr=False)
+        self.conv2 = L.ConvBN(
+            in_channels=num_filters, out_channels=num_filters, kernel_size=3, padding='same', bias_attr=False)
+
+        if self.downsample:
+            self.conv_down = L.ConvBNReLU(
+                in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False)
+
+        if self.has_se:
+            self.se = SELayer(num_channels=num_filters, num_filters=num_filters, reduction_ratio=16, name=name + '_fc')
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        residual = x
+        conv1 = self.conv1(x)
+        conv2 = self.conv2(conv1)
+
+        if self.downsample:
+            residual = self.conv_down(x)
+
+        if self.has_se:
+            conv2 = self.se(conv2)
+
+        y = conv2 + residual
+        y = F.relu(y)
+        return y
+
+
+class SELayer(nn.Layer):
+    def __init__(self, num_channels: int, num_filters: int, reduction_ratio: int, name: str = None):
+        super(SELayer, self).__init__()
+
+        self.pool2d_gap = nn.AdaptiveAvgPool2D(1)
+
+        self._num_channels = num_channels
+
+        med_ch = int(num_channels / reduction_ratio)
+        stdv = 1.0 / math.sqrt(num_channels * 1.0)
+        self.squeeze = nn.Linear(
+            num_channels, med_ch, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv)))
+
+        stdv = 1.0 / math.sqrt(med_ch * 1.0)
+        self.excitation = nn.Linear(
+            med_ch, num_filters, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv)))
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        pool = self.pool2d_gap(x)
+        pool = paddle.reshape(pool, shape=[-1, self._num_channels])
+        squeeze = self.squeeze(pool)
+        squeeze = F.relu(squeeze)
+        excitation = self.excitation(squeeze)
+        excitation = F.sigmoid(excitation)
+        excitation = paddle.reshape(excitation, shape=[-1, self._num_channels, 1, 1])
+        out = x * excitation
+        return out
+
+
+class Stage(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_modules: int,
+                 num_blocks: int,
+                 num_filters: int,
+                 has_se: bool = False,
+                 multi_scale_output: bool = True,
+                 name: str = None,
+                 align_corners: bool = False):
+        super(Stage, self).__init__()
+
+        self._num_modules = num_modules
+
+        self.stage_func_list = []
+        for i in range(num_modules):
+            if i == num_modules - 1 and not multi_scale_output:
+                stage_func = self.add_sublayer(
+                    "stage_{}_{}".format(name, i + 1),
+                    HighResolutionModule(
+                        num_channels=num_channels,
+                        num_blocks=num_blocks,
+                        num_filters=num_filters,
+                        has_se=has_se,
+                        multi_scale_output=False,
+                        name=name + '_' + str(i + 1),
+                        align_corners=align_corners))
+            else:
+                stage_func = self.add_sublayer(
+                    "stage_{}_{}".format(name, i + 1),
+                    HighResolutionModule(
+                        num_channels=num_channels,
+                        num_blocks=num_blocks,
+                        num_filters=num_filters,
+                        has_se=has_se,
+                        name=name + '_' + str(i + 1),
+                        align_corners=align_corners))
+
+            self.stage_func_list.append(stage_func)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        out = x
+        for idx in range(self._num_modules):
+            out = self.stage_func_list[idx](out)
+        return out
+
+
+class HighResolutionModule(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_blocks: int,
+                 num_filters: int,
+                 has_se: bool = False,
+                 multi_scale_output: bool = True,
+                 name: str = None,
+                 align_corners: str = False):
+        super(HighResolutionModule, self).__init__()
+
+        self.branches_func = Branches(
+            num_blocks=num_blocks, in_channels=num_channels, out_channels=num_filters, has_se=has_se, name=name)
+
+        self.fuse_func = FuseLayers(
+            in_channels=num_filters,
+            out_channels=num_filters,
+            multi_scale_output=multi_scale_output,
+            name=name,
+            align_corners=align_corners)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        out = self.branches_func(x)
+        out = self.fuse_func(out)
+        return out
+
+
+class FuseLayers(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 multi_scale_output: bool = True,
+                 name: str = None,
+                 align_corners: bool = False):
+        super(FuseLayers, self).__init__()
+
+        self._actual_ch = len(in_channels) if multi_scale_output else 1
+        self._in_channels = in_channels
+        self.align_corners = align_corners
+
+        self.residual_func_list = []
+        for i in range(self._actual_ch):
+            for j in range(len(in_channels)):
+                if j > i:
+                    residual_func = self.add_sublayer(
+                        "residual_{}_layer_{}_{}".format(name, i + 1, j + 1),
+                        L.ConvBN(
+                            in_channels=in_channels[j],
+                            out_channels=out_channels[i],
+                            kernel_size=1,
+                            padding='same',
+                            bias_attr=False))
+                    self.residual_func_list.append(residual_func)
+                elif j < i:
+                    pre_num_filters = in_channels[j]
+                    for k in range(i - j):
+                        if k == i - j - 1:
+                            residual_func = self.add_sublayer(
+                                "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1),
+                                L.ConvBN(
+                                    in_channels=pre_num_filters,
+                                    out_channels=out_channels[i],
+                                    kernel_size=3,
+                                    stride=2,
+                                    padding='same',
+                                    bias_attr=False))
+                            pre_num_filters = out_channels[i]
+                        else:
+                            residual_func = self.add_sublayer(
+                                "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1),
+                                L.ConvBNReLU(
+                                    in_channels=pre_num_filters,
+                                    out_channels=out_channels[j],
+                                    kernel_size=3,
+                                    stride=2,
+                                    padding='same',
+                                    bias_attr=False))
+                            pre_num_filters = out_channels[j]
+                        self.residual_func_list.append(residual_func)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outs = []
+        residual_func_idx = 0
+        for i in range(self._actual_ch):
+            residual = x[i]
+            residual_shape = residual.shape[-2:]
+            for j in range(len(self._in_channels)):
+                if j > i:
+                    y = self.residual_func_list[residual_func_idx](x[j])
+                    residual_func_idx += 1
+
+                    y = F.interpolate(y, residual_shape, mode='bilinear', align_corners=self.align_corners)
+                    residual = residual + y
+                elif j < i:
+                    y = x[j]
+                    for k in range(i - j):
+                        y = self.residual_func_list[residual_func_idx](y)
+                        residual_func_idx += 1
+
+                    residual = residual + y
+
+            residual = F.relu(residual)
+            outs.append(residual)
+
+        return outs
diff --git a/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/layers.py b/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..8758f54f9a840ae49fd6e424b98bfe1dd61e13ec
--- /dev/null
+++ b/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/layers.py
@@ -0,0 +1,296 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Tuple
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu':
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+    """Basic conv bn relu layer."""
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 stride: int = 1,
+                 dilation: int = 1,
+                 groups: int = 1,
+                 is_vd_mode: bool = False,
+                 act: str = None,
+                 name: str = None):
+        super(ConvBNLayer, self).__init__()
+
+        self.is_vd_mode = is_vd_mode
+        self._pool2d_avg = AvgPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True)
+        self._conv = Conv2D(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+            dilation=dilation,
+            groups=groups,
+            bias_attr=False)
+
+        self._batch_norm = SyncBatchNorm(out_channels)
+        self._act_op = Activation(act=act)
+
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        if self.is_vd_mode:
+            inputs = self._pool2d_avg(inputs)
+        y = self._conv(inputs)
+        y = self._batch_norm(y)
+        y = self._act_op(y)
+
+        return y
+
+
+class BottleneckBlock(nn.Layer):
+    """Residual bottleneck block"""
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 shortcut: bool = True,
+                 if_first: bool = False,
+                 dilation: int = 1,
+                 name: str = None):
+        super(BottleneckBlock, self).__init__()
+
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels, out_channels=out_channels, kernel_size=1, act='relu', name=name + "_branch2a")
+
+        self.dilation = dilation
+
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            act='relu',
+            dilation=dilation,
+            name=name + "_branch2b")
+        self.conv2 = ConvBNLayer(
+            in_channels=out_channels, out_channels=out_channels * 4, kernel_size=1, act=None, name=name + "_branch2c")
+
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels * 4,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                name=name + "_branch1")
+
+        self.shortcut = shortcut
+
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        if self.dilation > 1:
+            padding = self.dilation
+            y = F.pad(y, [padding, padding, padding, padding])
+
+        conv1 = self.conv1(y)
+        conv2 = self.conv2(conv1)
+
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+
+        y = paddle.add(x=short, y=conv2)
+        y = F.relu(y)
+        return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+    """Depthwise Separable Convolution."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
+        super(SeparableConvBNReLU, self).__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+
+
+class ConvBN(nn.Layer):
+    """Basic conv bn layer"""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
+        super(ConvBN, self).__init__()
+        self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+
+
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
+        super(ConvBNReLU, self).__init__()
+
+        self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+
+
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+
+        self._act = act
+        upper_act_names = nn.layer.activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("nn.layer.activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys()))
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
+
+
+class ASPPModule(nn.Layer):
+    """
+    Atrous Spatial Pyramid Pooling.
+
+    Args:
+        aspp_ratios (tuple): The dilation rate using in ASSP module.
+        in_channels (int): The number of input channels.
+        out_channels (int): The number of output channels.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+        use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+        image_pooling (bool, optional): If augmented with image-level features. Default: False
+    """
+
+    def __init__(self,
+                 aspp_ratios: Tuple[int],
+                 in_channels: int,
+                 out_channels: int,
+                 align_corners: bool,
+                 use_sep_conv: bool = False,
+                 image_pooling: bool = False):
+        super().__init__()
+
+        self.align_corners = align_corners
+        self.aspp_blocks = nn.LayerList()
+
+        for ratio in aspp_ratios:
+            if use_sep_conv and ratio > 1:
+                conv_func = SeparableConvBNReLU
+            else:
+                conv_func = ConvBNReLU
+
+            block = conv_func(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1 if ratio == 1 else 3,
+                dilation=ratio,
+                padding=0 if ratio == 1 else ratio)
+            self.aspp_blocks.append(block)
+
+        out_size = len(self.aspp_blocks)
+
+        if image_pooling:
+            self.global_avg_pool = nn.Sequential(
+                nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+                ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+            out_size += 1
+        self.image_pooling = image_pooling
+
+        self.conv_bn_relu = ConvBNReLU(in_channels=out_channels * out_size, out_channels=out_channels, kernel_size=1)
+
+        self.dropout = nn.Dropout(p=0.1)  # drop rate
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outputs = []
+        for block in self.aspp_blocks:
+            y = block(x)
+            y = F.interpolate(y, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
+            outputs.append(y)
+
+        if self.image_pooling:
+            img_avg = self.global_avg_pool(x)
+            img_avg = F.interpolate(img_avg, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
+            outputs.append(img_avg)
+
+        x = paddle.concat(outputs, axis=1)
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+
+        return x
diff --git a/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/module.py b/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..436207fc12954e43bbccf9a626a6cf9783a88db0
--- /dev/null
+++ b/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/module.py
@@ -0,0 +1,133 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+from fcn_hrnetw18_cityscapes.hrnet import HRNet_W18
+import fcn_hrnetw18_cityscapes.layers as layers
+
+
+@moduleinfo(
+    name="fcn_hrnetw18_cityscapes",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="Fcn_hrnetw18 is a segmentation model.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class FCN(nn.Layer):
+    """
+    A simple implementation for FCN based on PaddlePaddle.
+
+    The original article refers to
+    Evan Shelhamer, et, al. "Fully Convolutional Networks for Semantic Segmentation"
+    (https://arxiv.org/abs/1411.4038).
+
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone.
+            Default: (-1, ).
+        channels (int, optional): The channels between conv layer and the last layer of FCNHead.
+            If None, it will be the number of channels of input features. Default: None.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.  Default: False.
+        pretrained (str, optional): The path or url of pretrained model. Default: None
+    """
+
+    def __init__(self,
+                 num_classes: int = 19,
+                 backbone_indices: Tuple[int] = (-1, ),
+                 channels: int = None,
+                 align_corners: bool = False,
+                 pretrained: str = None):
+        super(FCN, self).__init__()
+
+        self.backbone = HRNet_W18()
+        backbone_channels = [self.backbone.feat_channels[i] for i in backbone_indices]
+
+        self.head = FCNHead(num_classes, backbone_indices, backbone_channels, channels)
+
+        self.align_corners = align_corners
+        self.transforms = T.Compose([T.Normalize()])
+
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+
+        else:
+            checkpoint = os.path.join(self.directory, 'model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+
+    def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+        return self.transforms(img)
+
+    def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+        feat_list = self.backbone(x)
+        logit_list = self.head(feat_list)
+        return [
+            F.interpolate(logit, paddle.shape(x)[2:], mode='bilinear', align_corners=self.align_corners)
+            for logit in logit_list
+        ]
+
+
+class FCNHead(nn.Layer):
+    """
+    A simple implementation for FCNHead based on PaddlePaddle
+
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone.
+            Default: (-1, ).
+        backbone_channels (tuple): The values of backbone channels.
+            Default: (270, ).
+        channels (int, optional): The channels between conv layer and the last layer of FCNHead.
+            If None, it will be the number of channels of input features. Default: None.
+        pretrained (str, optional): The path of pretrained model. Default: None
+    """
+
+    def __init__(self,
+                 num_classes: int,
+                 backbone_indices: Tuple[int] = (-1, ),
+                 backbone_channels: Tuple[int] = (270, ),
+                 channels: int = None):
+        super(FCNHead, self).__init__()
+
+        self.num_classes = num_classes
+        self.backbone_indices = backbone_indices
+        if channels is None:
+            channels = backbone_channels[0]
+
+        self.conv_1 = layers.ConvBNReLU(
+            in_channels=backbone_channels[0], out_channels=channels, kernel_size=1, padding='same', stride=1)
+        self.cls = nn.Conv2D(in_channels=channels, out_channels=self.num_classes, kernel_size=1, stride=1, padding=0)
+
+    def forward(self, feat_list: nn.Layer) -> List[paddle.Tensor]:
+        logit_list = []
+        x = feat_list[self.backbone_indices[0]]
+        x = self.conv_1(x)
+        logit = self.cls(x)
+        logit_list.append(logit)
+        return logit_list
diff --git a/modules/image/semantic_segmentation/fcn_hrnetw18_voc/README.md b/modules/image/semantic_segmentation/fcn_hrnetw18_voc/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..251f9480dd49c5e6632e2b3814face5147c258bc
--- /dev/null
+++ b/modules/image/semantic_segmentation/fcn_hrnetw18_voc/README.md
@@ -0,0 +1,175 @@
+# PaddleHub 图像分割
+
+
+## 模型预测
+
+
+若想使用我们提供的预训练模型进行预测，可使用如下脚本：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='fcn_hrnetw18_voc')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+
+## 如何开始Fine-tune
+
+在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用fcn_hrnetw18_voc模型对OpticDiscSeg等数据集进行Fine-tune。
+
+## 代码步骤
+
+使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
+
+### Step1: 定义数据预处理方式
+```python
+from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+```
+
+`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+
+### Step2: 下载数据集并使用
+```python
+from paddlehub.datasets import OpticDiscSeg
+
+train_reader = OpticDiscSeg(transform， mode='train')
+
+```
+* `transform`: 数据预处理方式。
+* `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+
+数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+### Step3: 加载预训练模型
+
+```python
+model = hub.Module(name='fcn_hrnetw18_voc', num_classes=2, pretrained=None)
+```
+* `name`: 选择预训练模型的名字。
+* `num_classes`: 分割模型的类别数目。
+* `pretrained`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+
+### Step4: 选择优化策略和运行配置
+
+```python
+scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True)
+```
+
+#### 优化策略
+
+Paddle2.0rc提供了多种优化器选择，如`SGD`, `Adam`, `Adamax`等，其中`Adam`:
+
+* `learning_rate`: 全局学习率。
+*  `parameters`: 待优化模型参数。
+
+#### 运行配置
+`Trainer` 主要控制Fine-tune的训练，包含以下可控制的参数:
+
+* `model`: 被优化模型；
+* `optimizer`: 优化器选择；
+* `use_gpu`: 是否使用gpu，默认为False;
+* `use_vdl`: 是否使用vdl可视化训练过程；
+* `checkpoint_dir`: 保存模型参数的地址；
+* `compare_metrics`: 保存最优模型的衡量指标；
+
+`trainer.train` 主要控制具体的训练过程，包含以下可控制的参数：
+
+* `train_dataset`: 训练时所用的数据集；
+* `epochs`: 训练轮数；
+* `batch_size`: 训练的批大小，如果使用GPU，请根据实际情况调整batch_size；
+* `num_workers`: works的数量，默认为0；
+* `eval_dataset`: 验证集；
+* `log_interval`: 打印日志的间隔， 单位为执行批训练的次数。
+* `save_interval`: 保存模型的间隔频次，单位为执行训练的轮数。
+
+## 模型预测
+
+当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
+
+我们使用该模型来进行预测。predict.py脚本如下：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='fcn_hrnetw18_voc', pretrained='/PATH/TO/CHECKPOINT')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+参数配置正确后，请执行脚本`python predict.py`。
+**Args**
+* `images`:原始图像路径或BGR格式图片；
+* `visualization`: 是否可视化，默认为True；
+* `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+
+**NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+
+## 服务部署
+
+PaddleHub Serving可以部署一个在线图像分割服务。
+
+### Step1: 启动PaddleHub Serving
+
+运行启动命令：
+
+```shell
+$ hub serving start -m fcn_hrnetw18_voc
+```
+
+这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+
+**NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+
+### Step2: 发送预测请求
+
+配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+
+```python
+import requests
+import json
+import cv2
+import base64
+
+import numpy as np
+
+
+def cv2_to_base64(image):
+    data = cv2.imencode('.jpg', image)[1]
+    return base64.b64encode(data.tostring()).decode('utf8')
+
+def base64_to_cv2(b64str):
+    data = base64.b64decode(b64str.encode('utf8'))
+    data = np.fromstring(data, np.uint8)
+    data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+    return data
+
+# 发送HTTP请求
+org_im = cv2.imread('/PATH/TO/IMAGE')
+data = {'images':[cv2_to_base64(org_im)]}
+headers = {"Content-type": "application/json"}
+url = "http://127.0.0.1:8866/predict/fcn_hrnetw18_voc"
+r = requests.post(url=url, headers=headers, data=json.dumps(data))
+mask = base64_to_cv2(r.json()["results"][0])
+```
+
+### 查看代码
+
+https://github.com/PaddlePaddle/PaddleSeg
+
+### 依赖
+
+paddlepaddle >= 2.0.0
+
+paddlehub >= 2.0.0
diff --git a/modules/image/semantic_segmentation/fcn_hrnetw18_voc/hrnet.py b/modules/image/semantic_segmentation/fcn_hrnetw18_voc/hrnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..0766871d0f6dd82cc29aae13b7e01d2e377124a9
--- /dev/null
+++ b/modules/image/semantic_segmentation/fcn_hrnetw18_voc/hrnet.py
@@ -0,0 +1,531 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+from typing import Tuple
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+import fcn_hrnetw18_voc.layers as L
+
+
+class HRNet_W18(nn.Layer):
+    """
+    The HRNet implementation based on PaddlePaddle.
+
+    The original article refers to
+    Jingdong Wang, et, al. "HRNet：Deep High-Resolution Representation Learning for Visual Recognition"
+    (https://arxiv.org/pdf/1908.07919.pdf).
+
+    Args:
+        stage1_num_modules (int, optional): Number of modules for stage1. Default 1.
+        stage1_num_blocks (list, optional): Number of blocks per module for stage1. Default (4).
+        stage1_num_channels (list, optional): Number of channels per branch for stage1. Default (64).
+        stage2_num_modules (int, optional): Number of modules for stage2. Default 1.
+        stage2_num_blocks (list, optional): Number of blocks per module for stage2. Default (4, 4).
+        stage2_num_channels (list, optional): Number of channels per branch for stage2. Default (18, 36).
+        stage3_num_modules (int, optional): Number of modules for stage3. Default 4.
+        stage3_num_blocks (list, optional): Number of blocks per module for stage3. Default (4, 4, 4).
+        stage3_num_channels (list, optional): Number of channels per branch for stage3. Default [18, 36, 72).
+        stage4_num_modules (int, optional): Number of modules for stage4. Default 3.
+        stage4_num_blocks (list, optional): Number of blocks per module for stage4. Default (4, 4, 4, 4).
+        stage4_num_channels (list, optional): Number of channels per branch for stage4. Default (18, 36, 72. 144).
+        has_se (bool, optional): Whether to use Squeeze-and-Excitation module. Default False.
+        align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
+            e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+    """
+
+    def __init__(self,
+                 stage1_num_modules: int = 1,
+                 stage1_num_blocks: Tuple[int] = (4, ),
+                 stage1_num_channels: Tuple[int] = (64, ),
+                 stage2_num_modules: int = 1,
+                 stage2_num_blocks: Tuple[int] = (4, 4),
+                 stage2_num_channels: Tuple[int] = (18, 36),
+                 stage3_num_modules: int = 4,
+                 stage3_num_blocks: Tuple[int] = (4, 4, 4),
+                 stage3_num_channels: Tuple[int] = (18, 36, 72),
+                 stage4_num_modules: int = 3,
+                 stage4_num_blocks: Tuple[int] = (4, 4, 4, 4),
+                 stage4_num_channels: Tuple[int] = (18, 36, 72, 144),
+                 has_se: bool = False,
+                 align_corners: bool = False):
+        super(HRNet_W18, self).__init__()
+
+        self.stage1_num_modules = stage1_num_modules
+        self.stage1_num_blocks = stage1_num_blocks
+        self.stage1_num_channels = stage1_num_channels
+        self.stage2_num_modules = stage2_num_modules
+        self.stage2_num_blocks = stage2_num_blocks
+        self.stage2_num_channels = stage2_num_channels
+        self.stage3_num_modules = stage3_num_modules
+        self.stage3_num_blocks = stage3_num_blocks
+        self.stage3_num_channels = stage3_num_channels
+        self.stage4_num_modules = stage4_num_modules
+        self.stage4_num_blocks = stage4_num_blocks
+        self.stage4_num_channels = stage4_num_channels
+        self.has_se = has_se
+        self.align_corners = align_corners
+        self.feat_channels = [sum(stage4_num_channels)]
+
+        self.conv_layer1_1 = L.ConvBNReLU(
+            in_channels=3, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False)
+
+        self.conv_layer1_2 = L.ConvBNReLU(
+            in_channels=64, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False)
+
+        self.la1 = Layer1(
+            num_channels=64,
+            num_blocks=self.stage1_num_blocks[0],
+            num_filters=self.stage1_num_channels[0],
+            has_se=has_se,
+            name="layer2")
+
+        self.tr1 = TransitionLayer(
+            in_channels=[self.stage1_num_channels[0] * 4], out_channels=self.stage2_num_channels, name="tr1")
+
+        self.st2 = Stage(
+            num_channels=self.stage2_num_channels,
+            num_modules=self.stage2_num_modules,
+            num_blocks=self.stage2_num_blocks,
+            num_filters=self.stage2_num_channels,
+            has_se=self.has_se,
+            name="st2",
+            align_corners=align_corners)
+
+        self.tr2 = TransitionLayer(
+            in_channels=self.stage2_num_channels, out_channels=self.stage3_num_channels, name="tr2")
+        self.st3 = Stage(
+            num_channels=self.stage3_num_channels,
+            num_modules=self.stage3_num_modules,
+            num_blocks=self.stage3_num_blocks,
+            num_filters=self.stage3_num_channels,
+            has_se=self.has_se,
+            name="st3",
+            align_corners=align_corners)
+
+        self.tr3 = TransitionLayer(
+            in_channels=self.stage3_num_channels, out_channels=self.stage4_num_channels, name="tr3")
+        self.st4 = Stage(
+            num_channels=self.stage4_num_channels,
+            num_modules=self.stage4_num_modules,
+            num_blocks=self.stage4_num_blocks,
+            num_filters=self.stage4_num_channels,
+            has_se=self.has_se,
+            name="st4",
+            align_corners=align_corners)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        conv1 = self.conv_layer1_1(x)
+        conv2 = self.conv_layer1_2(conv1)
+
+        la1 = self.la1(conv2)
+
+        tr1 = self.tr1([la1])
+        st2 = self.st2(tr1)
+
+        tr2 = self.tr2(st2)
+        st3 = self.st3(tr2)
+
+        tr3 = self.tr3(st3)
+        st4 = self.st4(tr3)
+
+        x0_h, x0_w = st4[0].shape[2:]
+        x1 = F.interpolate(st4[1], (x0_h, x0_w), mode='bilinear', align_corners=self.align_corners)
+        x2 = F.interpolate(st4[2], (x0_h, x0_w), mode='bilinear', align_corners=self.align_corners)
+        x3 = F.interpolate(st4[3], (x0_h, x0_w), mode='bilinear', align_corners=self.align_corners)
+        x = paddle.concat([st4[0], x1, x2, x3], axis=1)
+
+        return [x]
+
+
+class Layer1(nn.Layer):
+    def __init__(self, num_channels: int, num_filters: int, num_blocks: int, has_se: bool = False, name: str = None):
+        super(Layer1, self).__init__()
+
+        self.bottleneck_block_list = []
+
+        for i in range(num_blocks):
+            bottleneck_block = self.add_sublayer(
+                "bb_{}_{}".format(name, i + 1),
+                BottleneckBlock(
+                    num_channels=num_channels if i == 0 else num_filters * 4,
+                    num_filters=num_filters,
+                    has_se=has_se,
+                    stride=1,
+                    downsample=True if i == 0 else False,
+                    name=name + '_' + str(i + 1)))
+            self.bottleneck_block_list.append(bottleneck_block)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        conv = x
+        for block_func in self.bottleneck_block_list:
+            conv = block_func(conv)
+        return conv
+
+
+class TransitionLayer(nn.Layer):
+    def __init__(self, in_channels: int, out_channels: int, name=None):
+        super(TransitionLayer, self).__init__()
+
+        num_in = len(in_channels)
+        num_out = len(out_channels)
+        self.conv_bn_func_list = []
+        for i in range(num_out):
+            residual = None
+            if i < num_in:
+                if in_channels[i] != out_channels[i]:
+                    residual = self.add_sublayer(
+                        "transition_{}_layer_{}".format(name, i + 1),
+                        L.ConvBNReLU(
+                            in_channels=in_channels[i],
+                            out_channels=out_channels[i],
+                            kernel_size=3,
+                            padding='same',
+                            bias_attr=False))
+            else:
+                residual = self.add_sublayer(
+                    "transition_{}_layer_{}".format(name, i + 1),
+                    L.ConvBNReLU(
+                        in_channels=in_channels[-1],
+                        out_channels=out_channels[i],
+                        kernel_size=3,
+                        stride=2,
+                        padding='same',
+                        bias_attr=False))
+            self.conv_bn_func_list.append(residual)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outs = []
+        for idx, conv_bn_func in enumerate(self.conv_bn_func_list):
+            if conv_bn_func is None:
+                outs.append(x[idx])
+            else:
+                if idx < len(x):
+                    outs.append(conv_bn_func(x[idx]))
+                else:
+                    outs.append(conv_bn_func(x[-1]))
+        return outs
+
+
+class Branches(nn.Layer):
+    def __init__(self, num_blocks: int, in_channels: int, out_channels: int, has_se: bool = False, name: str = None):
+        super(Branches, self).__init__()
+
+        self.basic_block_list = []
+
+        for i in range(len(out_channels)):
+            self.basic_block_list.append([])
+            for j in range(num_blocks[i]):
+                in_ch = in_channels[i] if j == 0 else out_channels[i]
+                basic_block_func = self.add_sublayer(
+                    "bb_{}_branch_layer_{}_{}".format(name, i + 1, j + 1),
+                    BasicBlock(
+                        num_channels=in_ch,
+                        num_filters=out_channels[i],
+                        has_se=has_se,
+                        name=name + '_branch_layer_' + str(i + 1) + '_' + str(j + 1)))
+                self.basic_block_list[i].append(basic_block_func)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outs = []
+        for idx, input in enumerate(x):
+            conv = input
+            for basic_block_func in self.basic_block_list[idx]:
+                conv = basic_block_func(conv)
+            outs.append(conv)
+        return outs
+
+
+class BottleneckBlock(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_filters: int,
+                 has_se: bool,
+                 stride: int = 1,
+                 downsample: bool = False,
+                 name: str = None):
+        super(BottleneckBlock, self).__init__()
+
+        self.has_se = has_se
+        self.downsample = downsample
+
+        self.conv1 = L.ConvBNReLU(
+            in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False)
+
+        self.conv2 = L.ConvBNReLU(
+            in_channels=num_filters,
+            out_channels=num_filters,
+            kernel_size=3,
+            stride=stride,
+            padding='same',
+            bias_attr=False)
+
+        self.conv3 = L.ConvBN(
+            in_channels=num_filters, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False)
+
+        if self.downsample:
+            self.conv_down = L.ConvBN(
+                in_channels=num_channels, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False)
+
+        if self.has_se:
+            self.se = SELayer(
+                num_channels=num_filters * 4, num_filters=num_filters * 4, reduction_ratio=16, name=name + '_fc')
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        residual = x
+        conv1 = self.conv1(x)
+        conv2 = self.conv2(conv1)
+        conv3 = self.conv3(conv2)
+
+        if self.downsample:
+            residual = self.conv_down(x)
+
+        if self.has_se:
+            conv3 = self.se(conv3)
+
+        y = conv3 + residual
+        y = F.relu(y)
+        return y
+
+
+class BasicBlock(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_filters: int,
+                 stride: int = 1,
+                 has_se: bool = False,
+                 downsample: bool = False,
+                 name: str = None):
+        super(BasicBlock, self).__init__()
+
+        self.has_se = has_se
+        self.downsample = downsample
+
+        self.conv1 = L.ConvBNReLU(
+            in_channels=num_channels,
+            out_channels=num_filters,
+            kernel_size=3,
+            stride=stride,
+            padding='same',
+            bias_attr=False)
+        self.conv2 = L.ConvBN(
+            in_channels=num_filters, out_channels=num_filters, kernel_size=3, padding='same', bias_attr=False)
+
+        if self.downsample:
+            self.conv_down = L.ConvBNReLU(
+                in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False)
+
+        if self.has_se:
+            self.se = SELayer(num_channels=num_filters, num_filters=num_filters, reduction_ratio=16, name=name + '_fc')
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        residual = x
+        conv1 = self.conv1(x)
+        conv2 = self.conv2(conv1)
+
+        if self.downsample:
+            residual = self.conv_down(x)
+
+        if self.has_se:
+            conv2 = self.se(conv2)
+
+        y = conv2 + residual
+        y = F.relu(y)
+        return y
+
+
+class SELayer(nn.Layer):
+    def __init__(self, num_channels: int, num_filters: int, reduction_ratio: int, name: str = None):
+        super(SELayer, self).__init__()
+
+        self.pool2d_gap = nn.AdaptiveAvgPool2D(1)
+
+        self._num_channels = num_channels
+
+        med_ch = int(num_channels / reduction_ratio)
+        stdv = 1.0 / math.sqrt(num_channels * 1.0)
+        self.squeeze = nn.Linear(
+            num_channels, med_ch, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv)))
+
+        stdv = 1.0 / math.sqrt(med_ch * 1.0)
+        self.excitation = nn.Linear(
+            med_ch, num_filters, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv)))
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        pool = self.pool2d_gap(x)
+        pool = paddle.reshape(pool, shape=[-1, self._num_channels])
+        squeeze = self.squeeze(pool)
+        squeeze = F.relu(squeeze)
+        excitation = self.excitation(squeeze)
+        excitation = F.sigmoid(excitation)
+        excitation = paddle.reshape(excitation, shape=[-1, self._num_channels, 1, 1])
+        out = x * excitation
+        return out
+
+
+class Stage(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_modules: int,
+                 num_blocks: int,
+                 num_filters: int,
+                 has_se: bool = False,
+                 multi_scale_output: bool = True,
+                 name: str = None,
+                 align_corners: bool = False):
+        super(Stage, self).__init__()
+
+        self._num_modules = num_modules
+
+        self.stage_func_list = []
+        for i in range(num_modules):
+            if i == num_modules - 1 and not multi_scale_output:
+                stage_func = self.add_sublayer(
+                    "stage_{}_{}".format(name, i + 1),
+                    HighResolutionModule(
+                        num_channels=num_channels,
+                        num_blocks=num_blocks,
+                        num_filters=num_filters,
+                        has_se=has_se,
+                        multi_scale_output=False,
+                        name=name + '_' + str(i + 1),
+                        align_corners=align_corners))
+            else:
+                stage_func = self.add_sublayer(
+                    "stage_{}_{}".format(name, i + 1),
+                    HighResolutionModule(
+                        num_channels=num_channels,
+                        num_blocks=num_blocks,
+                        num_filters=num_filters,
+                        has_se=has_se,
+                        name=name + '_' + str(i + 1),
+                        align_corners=align_corners))
+
+            self.stage_func_list.append(stage_func)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        out = x
+        for idx in range(self._num_modules):
+            out = self.stage_func_list[idx](out)
+        return out
+
+
+class HighResolutionModule(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_blocks: int,
+                 num_filters: int,
+                 has_se: bool = False,
+                 multi_scale_output: bool = True,
+                 name: str = None,
+                 align_corners: str = False):
+        super(HighResolutionModule, self).__init__()
+
+        self.branches_func = Branches(
+            num_blocks=num_blocks, in_channels=num_channels, out_channels=num_filters, has_se=has_se, name=name)
+
+        self.fuse_func = FuseLayers(
+            in_channels=num_filters,
+            out_channels=num_filters,
+            multi_scale_output=multi_scale_output,
+            name=name,
+            align_corners=align_corners)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        out = self.branches_func(x)
+        out = self.fuse_func(out)
+        return out
+
+
+class FuseLayers(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 multi_scale_output: bool = True,
+                 name: str = None,
+                 align_corners: bool = False):
+        super(FuseLayers, self).__init__()
+
+        self._actual_ch = len(in_channels) if multi_scale_output else 1
+        self._in_channels = in_channels
+        self.align_corners = align_corners
+
+        self.residual_func_list = []
+        for i in range(self._actual_ch):
+            for j in range(len(in_channels)):
+                if j > i:
+                    residual_func = self.add_sublayer(
+                        "residual_{}_layer_{}_{}".format(name, i + 1, j + 1),
+                        L.ConvBN(
+                            in_channels=in_channels[j],
+                            out_channels=out_channels[i],
+                            kernel_size=1,
+                            padding='same',
+                            bias_attr=False))
+                    self.residual_func_list.append(residual_func)
+                elif j < i:
+                    pre_num_filters = in_channels[j]
+                    for k in range(i - j):
+                        if k == i - j - 1:
+                            residual_func = self.add_sublayer(
+                                "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1),
+                                L.ConvBN(
+                                    in_channels=pre_num_filters,
+                                    out_channels=out_channels[i],
+                                    kernel_size=3,
+                                    stride=2,
+                                    padding='same',
+                                    bias_attr=False))
+                            pre_num_filters = out_channels[i]
+                        else:
+                            residual_func = self.add_sublayer(
+                                "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1),
+                                L.ConvBNReLU(
+                                    in_channels=pre_num_filters,
+                                    out_channels=out_channels[j],
+                                    kernel_size=3,
+                                    stride=2,
+                                    padding='same',
+                                    bias_attr=False))
+                            pre_num_filters = out_channels[j]
+                        self.residual_func_list.append(residual_func)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outs = []
+        residual_func_idx = 0
+        for i in range(self._actual_ch):
+            residual = x[i]
+            residual_shape = residual.shape[-2:]
+            for j in range(len(self._in_channels)):
+                if j > i:
+                    y = self.residual_func_list[residual_func_idx](x[j])
+                    residual_func_idx += 1
+
+                    y = F.interpolate(y, residual_shape, mode='bilinear', align_corners=self.align_corners)
+                    residual = residual + y
+                elif j < i:
+                    y = x[j]
+                    for k in range(i - j):
+                        y = self.residual_func_list[residual_func_idx](y)
+                        residual_func_idx += 1
+
+                    residual = residual + y
+
+            residual = F.relu(residual)
+            outs.append(residual)
+
+        return outs
diff --git a/modules/image/semantic_segmentation/fcn_hrnetw18_voc/layers.py b/modules/image/semantic_segmentation/fcn_hrnetw18_voc/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..8758f54f9a840ae49fd6e424b98bfe1dd61e13ec
--- /dev/null
+++ b/modules/image/semantic_segmentation/fcn_hrnetw18_voc/layers.py
@@ -0,0 +1,296 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Tuple
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu':
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+    """Basic conv bn relu layer."""
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 stride: int = 1,
+                 dilation: int = 1,
+                 groups: int = 1,
+                 is_vd_mode: bool = False,
+                 act: str = None,
+                 name: str = None):
+        super(ConvBNLayer, self).__init__()
+
+        self.is_vd_mode = is_vd_mode
+        self._pool2d_avg = AvgPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True)
+        self._conv = Conv2D(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+            dilation=dilation,
+            groups=groups,
+            bias_attr=False)
+
+        self._batch_norm = SyncBatchNorm(out_channels)
+        self._act_op = Activation(act=act)
+
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        if self.is_vd_mode:
+            inputs = self._pool2d_avg(inputs)
+        y = self._conv(inputs)
+        y = self._batch_norm(y)
+        y = self._act_op(y)
+
+        return y
+
+
+class BottleneckBlock(nn.Layer):
+    """Residual bottleneck block"""
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 shortcut: bool = True,
+                 if_first: bool = False,
+                 dilation: int = 1,
+                 name: str = None):
+        super(BottleneckBlock, self).__init__()
+
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels, out_channels=out_channels, kernel_size=1, act='relu', name=name + "_branch2a")
+
+        self.dilation = dilation
+
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            act='relu',
+            dilation=dilation,
+            name=name + "_branch2b")
+        self.conv2 = ConvBNLayer(
+            in_channels=out_channels, out_channels=out_channels * 4, kernel_size=1, act=None, name=name + "_branch2c")
+
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels * 4,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                name=name + "_branch1")
+
+        self.shortcut = shortcut
+
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        if self.dilation > 1:
+            padding = self.dilation
+            y = F.pad(y, [padding, padding, padding, padding])
+
+        conv1 = self.conv1(y)
+        conv2 = self.conv2(conv1)
+
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+
+        y = paddle.add(x=short, y=conv2)
+        y = F.relu(y)
+        return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+    """Depthwise Separable Convolution."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
+        super(SeparableConvBNReLU, self).__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+
+
+class ConvBN(nn.Layer):
+    """Basic conv bn layer"""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
+        super(ConvBN, self).__init__()
+        self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+
+
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
+        super(ConvBNReLU, self).__init__()
+
+        self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+
+
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+
+        self._act = act
+        upper_act_names = nn.layer.activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("nn.layer.activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys()))
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
+
+
+class ASPPModule(nn.Layer):
+    """
+    Atrous Spatial Pyramid Pooling.
+
+    Args:
+        aspp_ratios (tuple): The dilation rate using in ASSP module.
+        in_channels (int): The number of input channels.
+        out_channels (int): The number of output channels.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+        use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+        image_pooling (bool, optional): If augmented with image-level features. Default: False
+    """
+
+    def __init__(self,
+                 aspp_ratios: Tuple[int],
+                 in_channels: int,
+                 out_channels: int,
+                 align_corners: bool,
+                 use_sep_conv: bool = False,
+                 image_pooling: bool = False):
+        super().__init__()
+
+        self.align_corners = align_corners
+        self.aspp_blocks = nn.LayerList()
+
+        for ratio in aspp_ratios:
+            if use_sep_conv and ratio > 1:
+                conv_func = SeparableConvBNReLU
+            else:
+                conv_func = ConvBNReLU
+
+            block = conv_func(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1 if ratio == 1 else 3,
+                dilation=ratio,
+                padding=0 if ratio == 1 else ratio)
+            self.aspp_blocks.append(block)
+
+        out_size = len(self.aspp_blocks)
+
+        if image_pooling:
+            self.global_avg_pool = nn.Sequential(
+                nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+                ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+            out_size += 1
+        self.image_pooling = image_pooling
+
+        self.conv_bn_relu = ConvBNReLU(in_channels=out_channels * out_size, out_channels=out_channels, kernel_size=1)
+
+        self.dropout = nn.Dropout(p=0.1)  # drop rate
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outputs = []
+        for block in self.aspp_blocks:
+            y = block(x)
+            y = F.interpolate(y, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
+            outputs.append(y)
+
+        if self.image_pooling:
+            img_avg = self.global_avg_pool(x)
+            img_avg = F.interpolate(img_avg, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
+            outputs.append(img_avg)
+
+        x = paddle.concat(outputs, axis=1)
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+
+        return x
diff --git a/modules/image/semantic_segmentation/fcn_hrnetw18_voc/module.py b/modules/image/semantic_segmentation/fcn_hrnetw18_voc/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..39e04c6325abd83404c1c81faea59652a4b3f6d1
--- /dev/null
+++ b/modules/image/semantic_segmentation/fcn_hrnetw18_voc/module.py
@@ -0,0 +1,133 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+from fcn_hrnetw18_voc.hrnet import HRNet_W18
+import fcn_hrnetw18_voc.layers as layers
+
+
+@moduleinfo(
+    name="fcn_hrnetw18_voc",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="Fcn_hrnetw18 is a segmentation model.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class FCN(nn.Layer):
+    """
+    A simple implementation for FCN based on PaddlePaddle.
+
+    The original article refers to
+    Evan Shelhamer, et, al. "Fully Convolutional Networks for Semantic Segmentation"
+    (https://arxiv.org/abs/1411.4038).
+
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone.
+            Default: (-1, ).
+        channels (int, optional): The channels between conv layer and the last layer of FCNHead.
+            If None, it will be the number of channels of input features. Default: None.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.  Default: False.
+        pretrained (str, optional): The path or url of pretrained model. Default: None
+    """
+
+    def __init__(self,
+                 num_classes: int = 21,
+                 backbone_indices: Tuple[int] = (-1, ),
+                 channels: int = None,
+                 align_corners: bool = False,
+                 pretrained: str = None):
+        super(FCN, self).__init__()
+
+        self.backbone = HRNet_W18()
+        backbone_channels = [self.backbone.feat_channels[i] for i in backbone_indices]
+
+        self.head = FCNHead(num_classes, backbone_indices, backbone_channels, channels)
+
+        self.align_corners = align_corners
+        self.transforms = T.Compose([T.Normalize()])
+
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+
+        else:
+            checkpoint = os.path.join(self.directory, 'model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+
+    def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+        return self.transforms(img)
+
+    def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+        feat_list = self.backbone(x)
+        logit_list = self.head(feat_list)
+        return [
+            F.interpolate(logit, paddle.shape(x)[2:], mode='bilinear', align_corners=self.align_corners)
+            for logit in logit_list
+        ]
+
+
+class FCNHead(nn.Layer):
+    """
+    A simple implementation for FCNHead based on PaddlePaddle
+
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone.
+            Default: (-1, ).
+        backbone_channels (tuple): The values of backbone channels.
+            Default: (270, ).
+        channels (int, optional): The channels between conv layer and the last layer of FCNHead.
+            If None, it will be the number of channels of input features. Default: None.
+        pretrained (str, optional): The path of pretrained model. Default: None
+    """
+
+    def __init__(self,
+                 num_classes: int,
+                 backbone_indices: Tuple[int] = (-1, ),
+                 backbone_channels: Tuple[int] = (270, ),
+                 channels: int = None):
+        super(FCNHead, self).__init__()
+
+        self.num_classes = num_classes
+        self.backbone_indices = backbone_indices
+        if channels is None:
+            channels = backbone_channels[0]
+
+        self.conv_1 = layers.ConvBNReLU(
+            in_channels=backbone_channels[0], out_channels=channels, kernel_size=1, padding='same', stride=1)
+        self.cls = nn.Conv2D(in_channels=channels, out_channels=self.num_classes, kernel_size=1, stride=1, padding=0)
+
+    def forward(self, feat_list: nn.Layer) -> List[paddle.Tensor]:
+        logit_list = []
+        x = feat_list[self.backbone_indices[0]]
+        x = self.conv_1(x)
+        logit = self.cls(x)
+        logit_list.append(logit)
+        return logit_list
diff --git a/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/README.md b/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..eb7ab11f6d3ee959fcd44977e765511e7c8cbc30
--- /dev/null
+++ b/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/README.md
@@ -0,0 +1,174 @@
+# PaddleHub 图像分割
+
+## 模型预测
+
+
+若想使用我们提供的预训练模型进行预测，可使用如下脚本：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='fcn_hrnetw48_cityscapes')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+
+## 如何开始Fine-tune
+
+在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用fcn_hrnetw48_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。
+
+## 代码步骤
+
+使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
+
+### Step1: 定义数据预处理方式
+```python
+from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+```
+
+`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+
+### Step2: 下载数据集并使用
+```python
+from paddlehub.datasets import OpticDiscSeg
+
+train_reader = OpticDiscSeg(transform， mode='train')
+
+```
+* `transform`: 数据预处理方式。
+* `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+
+数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+### Step3: 加载预训练模型
+
+```python
+model = hub.Module(name='fcn_hrnetw48_cityscapes', num_classes=2, pretrained=None)
+```
+* `name`: 选择预训练模型的名字。
+* `num_classes`: 分割模型的类别数目。
+* `pretrained`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+
+### Step4: 选择优化策略和运行配置
+
+```python
+scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True)
+```
+
+#### 优化策略
+
+Paddle2.0提供了多种优化器选择，如`SGD`, `Adam`, `Adamax`等，其中`Adam`:
+
+* `learning_rate`: 全局学习率。
+*  `parameters`: 待优化模型参数。
+
+#### 运行配置
+`Trainer` 主要控制Fine-tune的训练，包含以下可控制的参数:
+
+* `model`: 被优化模型；
+* `optimizer`: 优化器选择；
+* `use_gpu`: 是否使用gpu，默认为False;
+* `use_vdl`: 是否使用vdl可视化训练过程；
+* `checkpoint_dir`: 保存模型参数的地址；
+* `compare_metrics`: 保存最优模型的衡量指标；
+
+`trainer.train` 主要控制具体的训练过程，包含以下可控制的参数：
+
+* `train_dataset`: 训练时所用的数据集；
+* `epochs`: 训练轮数；
+* `batch_size`: 训练的批大小，如果使用GPU，请根据实际情况调整batch_size；
+* `num_workers`: works的数量，默认为0；
+* `eval_dataset`: 验证集；
+* `log_interval`: 打印日志的间隔， 单位为执行批训练的次数。
+* `save_interval`: 保存模型的间隔频次，单位为执行训练的轮数。
+
+## 模型预测
+
+当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
+
+我们使用该模型来进行预测。predict.py脚本如下：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='fcn_hrnetw48_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+参数配置正确后，请执行脚本`python predict.py`。
+**Args**
+* `images`:原始图像路径或BGR格式图片；
+* `visualization`: 是否可视化，默认为True；
+* `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+
+**NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+
+## 服务部署
+
+PaddleHub Serving可以部署一个在线图像分割服务。
+
+### Step1: 启动PaddleHub Serving
+
+运行启动命令：
+
+```shell
+$ hub serving start -m fcn_hrnetw48_cityscapes
+```
+
+这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+
+**NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+
+### Step2: 发送预测请求
+
+配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+
+```python
+import requests
+import json
+import cv2
+import base64
+
+import numpy as np
+
+
+def cv2_to_base64(image):
+    data = cv2.imencode('.jpg', image)[1]
+    return base64.b64encode(data.tostring()).decode('utf8')
+
+def base64_to_cv2(b64str):
+    data = base64.b64decode(b64str.encode('utf8'))
+    data = np.fromstring(data, np.uint8)
+    data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+    return data
+
+# 发送HTTP请求
+org_im = cv2.imread('/PATH/TO/IMAGE')
+data = {'images':[cv2_to_base64(org_im)]}
+headers = {"Content-type": "application/json"}
+url = "http://127.0.0.1:8866/predict/fcn_hrnetw48_cityscapes"
+r = requests.post(url=url, headers=headers, data=json.dumps(data))
+mask = base64_to_cv2(r.json()["results"][0])
+```
+
+### 查看代码
+
+https://github.com/PaddlePaddle/PaddleSeg
+
+### 依赖
+
+paddlepaddle >= 2.0.0
+
+paddlehub >= 2.0.0
diff --git a/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/hrnet.py b/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/hrnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..72d29357247626cc38c07e586b2f4dffc067513c
--- /dev/null
+++ b/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/hrnet.py
@@ -0,0 +1,528 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+from typing import List
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+import fcn_hrnetw48_cityscapes.layers as layers
+
+
+class HRNet_W48(nn.Layer):
+    """
+    The HRNet implementation based on PaddlePaddle.
+    The original article refers to
+    Jingdong Wang, et, al. "HRNet：Deep High-Resolution Representation Learning for Visual Recognition"
+    (https://arxiv.org/pdf/1908.07919.pdf).
+    Args:
+        stage1_num_modules (int, optional): Number of modules for stage1. Default 1.
+        stage1_num_blocks (list, optional): Number of blocks per module for stage1. Default (4).
+        stage1_num_channels (list, optional): Number of channels per branch for stage1. Default (64).
+        stage2_num_modules (int, optional): Number of modules for stage2. Default 1.
+        stage2_num_blocks (list, optional): Number of blocks per module for stage2. Default (4, 4).
+        stage2_num_channels (list, optional): Number of channels per branch for stage2. Default (48, 96).
+        stage3_num_modules (int, optional): Number of modules for stage3. Default 4.
+        stage3_num_blocks (list, optional): Number of blocks per module for stage3. Default (4, 4, 4).
+        stage3_num_channels (list, optional): Number of channels per branch for stage3. Default [48, 96, 192).
+        stage4_num_modules (int, optional): Number of modules for stage4. Default 3.
+        stage4_num_blocks (list, optional): Number of blocks per module for stage4. Default (4, 4, 4, 4).
+        stage4_num_channels (list, optional): Number of channels per branch for stage4. Default (48, 96, 192. 384).
+        has_se (bool, optional): Whether to use Squeeze-and-Excitation module. Default False.
+        align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
+            e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+    """
+
+    def __init__(self,
+                 stage1_num_modules: int = 1,
+                 stage1_num_blocks: List[int] = [4],
+                 stage1_num_channels: List[int] = [64],
+                 stage2_num_modules: int = 1,
+                 stage2_num_blocks: List[int] = [4, 4],
+                 stage2_num_channels: List[int] = [48, 96],
+                 stage3_num_modules: int = 4,
+                 stage3_num_blocks: List[int] = [4, 4, 4],
+                 stage3_num_channels: List[int] = [48, 96, 192],
+                 stage4_num_modules: int = 3,
+                 stage4_num_blocks: List[int] = [4, 4, 4, 4],
+                 stage4_num_channels: List[int] = [48, 96, 192, 384],
+                 has_se=False,
+                 align_corners=False):
+        super(HRNet_W48, self).__init__()
+        self.stage1_num_modules = stage1_num_modules
+        self.stage1_num_blocks = stage1_num_blocks
+        self.stage1_num_channels = stage1_num_channels
+        self.stage2_num_modules = stage2_num_modules
+        self.stage2_num_blocks = stage2_num_blocks
+        self.stage2_num_channels = stage2_num_channels
+        self.stage3_num_modules = stage3_num_modules
+        self.stage3_num_blocks = stage3_num_blocks
+        self.stage3_num_channels = stage3_num_channels
+        self.stage4_num_modules = stage4_num_modules
+        self.stage4_num_blocks = stage4_num_blocks
+        self.stage4_num_channels = stage4_num_channels
+        self.has_se = has_se
+        self.align_corners = align_corners
+        self.feat_channels = [sum(stage4_num_channels)]
+
+        self.conv_layer1_1 = layers.ConvBNReLU(
+            in_channels=3, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False)
+
+        self.conv_layer1_2 = layers.ConvBNReLU(
+            in_channels=64, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False)
+
+        self.la1 = Layer1(
+            num_channels=64,
+            num_blocks=self.stage1_num_blocks[0],
+            num_filters=self.stage1_num_channels[0],
+            has_se=has_se,
+            name="layer2")
+
+        self.tr1 = TransitionLayer(
+            in_channels=[self.stage1_num_channels[0] * 4], out_channels=self.stage2_num_channels, name="tr1")
+
+        self.st2 = Stage(
+            num_channels=self.stage2_num_channels,
+            num_modules=self.stage2_num_modules,
+            num_blocks=self.stage2_num_blocks,
+            num_filters=self.stage2_num_channels,
+            has_se=self.has_se,
+            name="st2",
+            align_corners=align_corners)
+
+        self.tr2 = TransitionLayer(
+            in_channels=self.stage2_num_channels, out_channels=self.stage3_num_channels, name="tr2")
+        self.st3 = Stage(
+            num_channels=self.stage3_num_channels,
+            num_modules=self.stage3_num_modules,
+            num_blocks=self.stage3_num_blocks,
+            num_filters=self.stage3_num_channels,
+            has_se=self.has_se,
+            name="st3",
+            align_corners=align_corners)
+
+        self.tr3 = TransitionLayer(
+            in_channels=self.stage3_num_channels, out_channels=self.stage4_num_channels, name="tr3")
+        self.st4 = Stage(
+            num_channels=self.stage4_num_channels,
+            num_modules=self.stage4_num_modules,
+            num_blocks=self.stage4_num_blocks,
+            num_filters=self.stage4_num_channels,
+            has_se=self.has_se,
+            name="st4",
+            align_corners=align_corners)
+
+    def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+        conv1 = self.conv_layer1_1(x)
+        conv2 = self.conv_layer1_2(conv1)
+
+        la1 = self.la1(conv2)
+
+        tr1 = self.tr1([la1])
+        st2 = self.st2(tr1)
+
+        tr2 = self.tr2(st2)
+        st3 = self.st3(tr2)
+
+        tr3 = self.tr3(st3)
+        st4 = self.st4(tr3)
+
+        size = paddle.shape(st4[0])[2:]
+        x1 = F.interpolate(st4[1], size, mode='bilinear', align_corners=self.align_corners)
+        x2 = F.interpolate(st4[2], size, mode='bilinear', align_corners=self.align_corners)
+        x3 = F.interpolate(st4[3], size, mode='bilinear', align_corners=self.align_corners)
+        x = paddle.concat([st4[0], x1, x2, x3], axis=1)
+
+        return [x]
+
+
+class Layer1(nn.Layer):
+    def __init__(self, num_channels: int, num_filters: int, num_blocks: int, has_se: bool = False, name: str = None):
+        super(Layer1, self).__init__()
+
+        self.bottleneck_block_list = []
+
+        for i in range(num_blocks):
+            bottleneck_block = self.add_sublayer(
+                "bb_{}_{}".format(name, i + 1),
+                BottleneckBlock(
+                    num_channels=num_channels if i == 0 else num_filters * 4,
+                    num_filters=num_filters,
+                    has_se=has_se,
+                    stride=1,
+                    downsample=True if i == 0 else False,
+                    name=name + '_' + str(i + 1)))
+            self.bottleneck_block_list.append(bottleneck_block)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        conv = x
+        for block_func in self.bottleneck_block_list:
+            conv = block_func(conv)
+        return conv
+
+
+class TransitionLayer(nn.Layer):
+    def __init__(self, in_channels: int, out_channels: int, name: str = None):
+        super(TransitionLayer, self).__init__()
+
+        num_in = len(in_channels)
+        num_out = len(out_channels)
+        self.conv_bn_func_list = []
+        for i in range(num_out):
+            residual = None
+            if i < num_in:
+                if in_channels[i] != out_channels[i]:
+                    residual = self.add_sublayer(
+                        "transition_{}_layer_{}".format(name, i + 1),
+                        layers.ConvBNReLU(
+                            in_channels=in_channels[i],
+                            out_channels=out_channels[i],
+                            kernel_size=3,
+                            padding='same',
+                            bias_attr=False))
+            else:
+                residual = self.add_sublayer(
+                    "transition_{}_layer_{}".format(name, i + 1),
+                    layers.ConvBNReLU(
+                        in_channels=in_channels[-1],
+                        out_channels=out_channels[i],
+                        kernel_size=3,
+                        stride=2,
+                        padding='same',
+                        bias_attr=False))
+            self.conv_bn_func_list.append(residual)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outs = []
+        for idx, conv_bn_func in enumerate(self.conv_bn_func_list):
+            if conv_bn_func is None:
+                outs.append(x[idx])
+            else:
+                if idx < len(x):
+                    outs.append(conv_bn_func(x[idx]))
+                else:
+                    outs.append(conv_bn_func(x[-1]))
+        return outs
+
+
+class Branches(nn.Layer):
+    def __init__(self, num_blocks: int, in_channels: int, out_channels: int, has_se: bool = False, name: str = None):
+        super(Branches, self).__init__()
+
+        self.basic_block_list = []
+
+        for i in range(len(out_channels)):
+            self.basic_block_list.append([])
+            for j in range(num_blocks[i]):
+                in_ch = in_channels[i] if j == 0 else out_channels[i]
+                basic_block_func = self.add_sublayer(
+                    "bb_{}_branch_layer_{}_{}".format(name, i + 1, j + 1),
+                    BasicBlock(
+                        num_channels=in_ch,
+                        num_filters=out_channels[i],
+                        has_se=has_se,
+                        name=name + '_branch_layer_' + str(i + 1) + '_' + str(j + 1)))
+                self.basic_block_list[i].append(basic_block_func)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outs = []
+        for idx, input in enumerate(x):
+            conv = input
+            for basic_block_func in self.basic_block_list[idx]:
+                conv = basic_block_func(conv)
+            outs.append(conv)
+        return outs
+
+
+class BottleneckBlock(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_filters: int,
+                 has_se: bool,
+                 stride: int = 1,
+                 downsample: bool = False,
+                 name: str = None):
+        super(BottleneckBlock, self).__init__()
+
+        self.has_se = has_se
+        self.downsample = downsample
+
+        self.conv1 = layers.ConvBNReLU(
+            in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False)
+
+        self.conv2 = layers.ConvBNReLU(
+            in_channels=num_filters,
+            out_channels=num_filters,
+            kernel_size=3,
+            stride=stride,
+            padding='same',
+            bias_attr=False)
+
+        self.conv3 = layers.ConvBN(
+            in_channels=num_filters, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False)
+
+        if self.downsample:
+            self.conv_down = layers.ConvBN(
+                in_channels=num_channels, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False)
+
+        if self.has_se:
+            self.se = SELayer(
+                num_channels=num_filters * 4, num_filters=num_filters * 4, reduction_ratio=16, name=name + '_fc')
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        residual = x
+        conv1 = self.conv1(x)
+        conv2 = self.conv2(conv1)
+        conv3 = self.conv3(conv2)
+
+        if self.downsample:
+            residual = self.conv_down(x)
+
+        if self.has_se:
+            conv3 = self.se(conv3)
+
+        y = conv3 + residual
+        y = F.relu(y)
+        return y
+
+
+class BasicBlock(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_filters: int,
+                 stride: int = 1,
+                 has_se: bool = False,
+                 downsample: bool = False,
+                 name: str = None):
+        super(BasicBlock, self).__init__()
+
+        self.has_se = has_se
+        self.downsample = downsample
+
+        self.conv1 = layers.ConvBNReLU(
+            in_channels=num_channels,
+            out_channels=num_filters,
+            kernel_size=3,
+            stride=stride,
+            padding='same',
+            bias_attr=False)
+        self.conv2 = layers.ConvBN(
+            in_channels=num_filters, out_channels=num_filters, kernel_size=3, padding='same', bias_attr=False)
+
+        if self.downsample:
+            self.conv_down = layers.ConvBNReLU(
+                in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False)
+
+        if self.has_se:
+            self.se = SELayer(num_channels=num_filters, num_filters=num_filters, reduction_ratio=16, name=name + '_fc')
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        residual = x
+        conv1 = self.conv1(x)
+        conv2 = self.conv2(conv1)
+
+        if self.downsample:
+            residual = self.conv_down(x)
+
+        if self.has_se:
+            conv2 = self.se(conv2)
+
+        y = conv2 + residual
+        y = F.relu(y)
+        return y
+
+
+class SELayer(nn.Layer):
+    def __init__(self, num_channels: int, num_filters: int, reduction_ratio: float, name: str = None):
+        super(SELayer, self).__init__()
+
+        self.pool2d_gap = nn.AdaptiveAvgPool2D(1)
+
+        self._num_channels = num_channels
+
+        med_ch = int(num_channels / reduction_ratio)
+        stdv = 1.0 / math.sqrt(num_channels * 1.0)
+        self.squeeze = nn.Linear(
+            num_channels, med_ch, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv)))
+
+        stdv = 1.0 / math.sqrt(med_ch * 1.0)
+        self.excitation = nn.Linear(
+            med_ch, num_filters, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv)))
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        pool = self.pool2d_gap(x)
+        pool = paddle.reshape(pool, shape=[-1, self._num_channels])
+        squeeze = self.squeeze(pool)
+        squeeze = F.relu(squeeze)
+        excitation = self.excitation(squeeze)
+        excitation = F.sigmoid(excitation)
+        excitation = paddle.reshape(excitation, shape=[-1, self._num_channels, 1, 1])
+        out = x * excitation
+        return out
+
+
+class Stage(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_modules: int,
+                 num_blocks: int,
+                 num_filters: int,
+                 has_se: bool = False,
+                 multi_scale_output: bool = True,
+                 name: str = None,
+                 align_corners: bool = False):
+        super(Stage, self).__init__()
+
+        self._num_modules = num_modules
+
+        self.stage_func_list = []
+        for i in range(num_modules):
+            if i == num_modules - 1 and not multi_scale_output:
+                stage_func = self.add_sublayer(
+                    "stage_{}_{}".format(name, i + 1),
+                    HighResolutionModule(
+                        num_channels=num_channels,
+                        num_blocks=num_blocks,
+                        num_filters=num_filters,
+                        has_se=has_se,
+                        multi_scale_output=False,
+                        name=name + '_' + str(i + 1),
+                        align_corners=align_corners))
+            else:
+                stage_func = self.add_sublayer(
+                    "stage_{}_{}".format(name, i + 1),
+                    HighResolutionModule(
+                        num_channels=num_channels,
+                        num_blocks=num_blocks,
+                        num_filters=num_filters,
+                        has_se=has_se,
+                        name=name + '_' + str(i + 1),
+                        align_corners=align_corners))
+
+            self.stage_func_list.append(stage_func)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        out = x
+        for idx in range(self._num_modules):
+            out = self.stage_func_list[idx](out)
+        return out
+
+
+class HighResolutionModule(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_blocks: int,
+                 num_filters: int,
+                 has_se: bool = False,
+                 multi_scale_output: bool = True,
+                 name: str = None,
+                 align_corners: bool = False):
+        super(HighResolutionModule, self).__init__()
+
+        self.branches_func = Branches(
+            num_blocks=num_blocks, in_channels=num_channels, out_channels=num_filters, has_se=has_se, name=name)
+
+        self.fuse_func = FuseLayers(
+            in_channels=num_filters,
+            out_channels=num_filters,
+            multi_scale_output=multi_scale_output,
+            name=name,
+            align_corners=align_corners)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        out = self.branches_func(x)
+        out = self.fuse_func(out)
+        return out
+
+
+class FuseLayers(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 multi_scale_output: bool = True,
+                 name: str = None,
+                 align_corners: bool = False):
+        super(FuseLayers, self).__init__()
+
+        self._actual_ch = len(in_channels) if multi_scale_output else 1
+        self._in_channels = in_channels
+        self.align_corners = align_corners
+
+        self.residual_func_list = []
+        for i in range(self._actual_ch):
+            for j in range(len(in_channels)):
+                if j > i:
+                    residual_func = self.add_sublayer(
+                        "residual_{}_layer_{}_{}".format(name, i + 1, j + 1),
+                        layers.ConvBN(
+                            in_channels=in_channels[j],
+                            out_channels=out_channels[i],
+                            kernel_size=1,
+                            padding='same',
+                            bias_attr=False))
+                    self.residual_func_list.append(residual_func)
+                elif j < i:
+                    pre_num_filters = in_channels[j]
+                    for k in range(i - j):
+                        if k == i - j - 1:
+                            residual_func = self.add_sublayer(
+                                "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1),
+                                layers.ConvBN(
+                                    in_channels=pre_num_filters,
+                                    out_channels=out_channels[i],
+                                    kernel_size=3,
+                                    stride=2,
+                                    padding='same',
+                                    bias_attr=False))
+                            pre_num_filters = out_channels[i]
+                        else:
+                            residual_func = self.add_sublayer(
+                                "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1),
+                                layers.ConvBNReLU(
+                                    in_channels=pre_num_filters,
+                                    out_channels=out_channels[j],
+                                    kernel_size=3,
+                                    stride=2,
+                                    padding='same',
+                                    bias_attr=False))
+                            pre_num_filters = out_channels[j]
+                        self.residual_func_list.append(residual_func)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outs = []
+        residual_func_idx = 0
+        for i in range(self._actual_ch):
+            residual = x[i]
+            residual_shape = paddle.shape(residual)[-2:]
+            for j in range(len(self._in_channels)):
+                if j > i:
+                    y = self.residual_func_list[residual_func_idx](x[j])
+                    residual_func_idx += 1
+
+                    y = F.interpolate(y, residual_shape, mode='bilinear', align_corners=self.align_corners)
+                    residual = residual + y
+                elif j < i:
+                    y = x[j]
+                    for k in range(i - j):
+                        y = self.residual_func_list[residual_func_idx](y)
+                        residual_func_idx += 1
+
+                    residual = residual + y
+
+            residual = F.relu(residual)
+            outs.append(residual)
+
+        return outs
diff --git a/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/layers.py b/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..09fd7d68e8a34a84c921dbe230749869040308c3
--- /dev/null
+++ b/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/layers.py
@@ -0,0 +1,297 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import Tuple
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu':
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+    """Basic conv bn relu layer."""
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 stride: int = 1,
+                 dilation: int = 1,
+                 groups: int = 1,
+                 is_vd_mode: bool = False,
+                 act: str = None,
+                 name: str = None):
+        super(ConvBNLayer, self).__init__()
+
+        self.is_vd_mode = is_vd_mode
+        self._pool2d_avg = AvgPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True)
+        self._conv = Conv2D(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+            dilation=dilation,
+            groups=groups,
+            bias_attr=False)
+
+        self._batch_norm = SyncBatchNorm(out_channels)
+        self._act_op = Activation(act=act)
+
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        if self.is_vd_mode:
+            inputs = self._pool2d_avg(inputs)
+        y = self._conv(inputs)
+        y = self._batch_norm(y)
+        y = self._act_op(y)
+
+        return y
+
+
+class BottleneckBlock(nn.Layer):
+    """Residual bottleneck block"""
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 shortcut: bool = True,
+                 if_first: bool = False,
+                 dilation: int = 1,
+                 name: str = None):
+        super(BottleneckBlock, self).__init__()
+
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels, out_channels=out_channels, kernel_size=1, act='relu', name=name + "_branch2a")
+
+        self.dilation = dilation
+
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            act='relu',
+            dilation=dilation,
+            name=name + "_branch2b")
+        self.conv2 = ConvBNLayer(
+            in_channels=out_channels, out_channels=out_channels * 4, kernel_size=1, act=None, name=name + "_branch2c")
+
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels * 4,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                name=name + "_branch1")
+
+        self.shortcut = shortcut
+
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        if self.dilation > 1:
+            padding = self.dilation
+            y = F.pad(y, [padding, padding, padding, padding])
+
+        conv1 = self.conv1(y)
+        conv2 = self.conv2(conv1)
+
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+
+        y = paddle.add(x=short, y=conv2)
+        y = F.relu(y)
+        return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+    """Depthwise Separable Convolution."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
+        super(SeparableConvBNReLU, self).__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+
+
+class ConvBN(nn.Layer):
+    """Basic conv bn layer"""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
+        super(ConvBN, self).__init__()
+        self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+
+
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
+        super(ConvBNReLU, self).__init__()
+
+        self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+
+
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+
+        self._act = act
+        upper_act_names = nn.layer.activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("nn.layer.activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys()))
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
+
+
+class ASPPModule(nn.Layer):
+    """
+    Atrous Spatial Pyramid Pooling.
+
+    Args:
+        aspp_ratios (tuple): The dilation rate using in ASSP module.
+        in_channels (int): The number of input channels.
+        out_channels (int): The number of output channels.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+        use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+        image_pooling (bool, optional): If augmented with image-level features. Default: False
+    """
+
+    def __init__(self,
+                 aspp_ratios: Tuple[int],
+                 in_channels: int,
+                 out_channels: int,
+                 align_corners: bool,
+                 use_sep_conv: bool = False,
+                 image_pooling: bool = False):
+        super().__init__()
+
+        self.align_corners = align_corners
+        self.aspp_blocks = nn.LayerList()
+
+        for ratio in aspp_ratios:
+            if use_sep_conv and ratio > 1:
+                conv_func = SeparableConvBNReLU
+            else:
+                conv_func = ConvBNReLU
+
+            block = conv_func(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1 if ratio == 1 else 3,
+                dilation=ratio,
+                padding=0 if ratio == 1 else ratio)
+            self.aspp_blocks.append(block)
+
+        out_size = len(self.aspp_blocks)
+
+        if image_pooling:
+            self.global_avg_pool = nn.Sequential(
+                nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+                ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+            out_size += 1
+        self.image_pooling = image_pooling
+
+        self.conv_bn_relu = ConvBNReLU(in_channels=out_channels * out_size, out_channels=out_channels, kernel_size=1)
+
+        self.dropout = nn.Dropout(p=0.1)  # drop rate
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outputs = []
+        for block in self.aspp_blocks:
+            y = block(x)
+            y = F.interpolate(y, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
+            outputs.append(y)
+
+        if self.image_pooling:
+            img_avg = self.global_avg_pool(x)
+            img_avg = F.interpolate(img_avg, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
+            outputs.append(img_avg)
+
+        x = paddle.concat(outputs, axis=1)
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+
+        return x
diff --git a/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/module.py b/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..c7ff6d98c465fd6bc7ffed34c9142d1bdb89c60f
--- /dev/null
+++ b/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/module.py
@@ -0,0 +1,133 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+from fcn_hrnetw48_cityscapes.hrnet import HRNet_W48
+import fcn_hrnetw48_cityscapes.layers as layers
+
+
+@moduleinfo(
+    name="fcn_hrnetw48_cityscapes",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="Fcn_hrnetw48 is a segmentation model.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class FCN(nn.Layer):
+    """
+    A simple implementation for FCN based on PaddlePaddle.
+
+    The original article refers to
+    Evan Shelhamer, et, al. "Fully Convolutional Networks for Semantic Segmentation"
+    (https://arxiv.org/abs/1411.4038).
+
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone.
+            Default: (-1, ).
+        channels (int, optional): The channels between conv layer and the last layer of FCNHead.
+            If None, it will be the number of channels of input features. Default: None.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.  Default: False.
+        pretrained (str, optional): The path or url of pretrained model. Default: None
+    """
+
+    def __init__(self,
+                 num_classes: int = 19,
+                 backbone_indices: Tuple[int] = (-1, ),
+                 channels: int = None,
+                 align_corners: bool = False,
+                 pretrained: str = None):
+        super(FCN, self).__init__()
+
+        self.backbone = HRNet_W48()
+        backbone_channels = [self.backbone.feat_channels[i] for i in backbone_indices]
+
+        self.head = FCNHead(num_classes, backbone_indices, backbone_channels, channels)
+
+        self.align_corners = align_corners
+        self.transforms = T.Compose([T.Normalize()])
+
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+
+        else:
+            checkpoint = os.path.join(self.directory, 'model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+
+    def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+        return self.transforms(img)
+
+    def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+        feat_list = self.backbone(x)
+        logit_list = self.head(feat_list)
+        return [
+            F.interpolate(logit, paddle.shape(x)[2:], mode='bilinear', align_corners=self.align_corners)
+            for logit in logit_list
+        ]
+
+
+class FCNHead(nn.Layer):
+    """
+    A simple implementation for FCNHead based on PaddlePaddle
+
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone.
+            Default: (-1, ).
+        backbone_channels (tuple): The values of backbone channels.
+            Default: (270, ).
+        channels (int, optional): The channels between conv layer and the last layer of FCNHead.
+            If None, it will be the number of channels of input features. Default: None.
+        pretrained (str, optional): The path of pretrained model. Default: None
+    """
+
+    def __init__(self,
+                 num_classes: int,
+                 backbone_indices: Tuple[int] = (-1, ),
+                 backbone_channels: Tuple[int] = (270, ),
+                 channels: int = None):
+        super(FCNHead, self).__init__()
+
+        self.num_classes = num_classes
+        self.backbone_indices = backbone_indices
+        if channels is None:
+            channels = backbone_channels[0]
+
+        self.conv_1 = layers.ConvBNReLU(
+            in_channels=backbone_channels[0], out_channels=channels, kernel_size=1, padding='same', stride=1)
+        self.cls = nn.Conv2D(in_channels=channels, out_channels=self.num_classes, kernel_size=1, stride=1, padding=0)
+
+    def forward(self, feat_list: nn.Layer) -> List[paddle.Tensor]:
+        logit_list = []
+        x = feat_list[self.backbone_indices[0]]
+        x = self.conv_1(x)
+        logit = self.cls(x)
+        logit_list.append(logit)
+        return logit_list
diff --git a/modules/image/semantic_segmentation/fcn_hrnetw48_voc/README.md b/modules/image/semantic_segmentation/fcn_hrnetw48_voc/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..1c42e162681a358b0a72e4aa2ac053cd7303a7ae
--- /dev/null
+++ b/modules/image/semantic_segmentation/fcn_hrnetw48_voc/README.md
@@ -0,0 +1,174 @@
+# PaddleHub 图像分割
+
+## 模型预测
+
+
+若想使用我们提供的预训练模型进行预测，可使用如下脚本：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='fcn_hrnetw48_voc')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+
+## 如何开始Fine-tune
+
+在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用fcn_hrnetw48_voc模型对OpticDiscSeg等数据集进行Fine-tune。
+
+## 代码步骤
+
+使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
+
+### Step1: 定义数据预处理方式
+```python
+from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+```
+
+`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+
+### Step2: 下载数据集并使用
+```python
+from paddlehub.datasets import OpticDiscSeg
+
+train_reader = OpticDiscSeg(transform， mode='train')
+
+```
+* `transform`: 数据预处理方式。
+* `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+
+数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+### Step3: 加载预训练模型
+
+```python
+model = hub.Module(name='fcn_hrnetw48_voc', num_classes=2, pretrained=None)
+```
+* `name`: 选择预训练模型的名字。
+* `num_classes`: 分割模型的类别数目。
+* `pretrained`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+
+### Step4: 选择优化策略和运行配置
+
+```python
+scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True)
+```
+
+#### 优化策略
+
+Paddle2.0提供了多种优化器选择，如`SGD`, `Adam`, `Adamax`等，其中`Adam`:
+
+* `learning_rate`: 全局学习率。
+*  `parameters`: 待优化模型参数。
+
+#### 运行配置
+`Trainer` 主要控制Fine-tune的训练，包含以下可控制的参数:
+
+* `model`: 被优化模型；
+* `optimizer`: 优化器选择；
+* `use_gpu`: 是否使用gpu，默认为False;
+* `use_vdl`: 是否使用vdl可视化训练过程；
+* `checkpoint_dir`: 保存模型参数的地址；
+* `compare_metrics`: 保存最优模型的衡量指标；
+
+`trainer.train` 主要控制具体的训练过程，包含以下可控制的参数：
+
+* `train_dataset`: 训练时所用的数据集；
+* `epochs`: 训练轮数；
+* `batch_size`: 训练的批大小，如果使用GPU，请根据实际情况调整batch_size；
+* `num_workers`: works的数量，默认为0；
+* `eval_dataset`: 验证集；
+* `log_interval`: 打印日志的间隔， 单位为执行批训练的次数。
+* `save_interval`: 保存模型的间隔频次，单位为执行训练的轮数。
+
+## 模型预测
+
+当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
+
+我们使用该模型来进行预测。predict.py脚本如下：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='fcn_hrnetw48_voc', pretrained='/PATH/TO/CHECKPOINT')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+参数配置正确后，请执行脚本`python predict.py`。
+**Args**
+* `images`:原始图像路径或BGR格式图片；
+* `visualization`: 是否可视化，默认为True；
+* `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+
+**NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+
+## 服务部署
+
+PaddleHub Serving可以部署一个在线图像分割服务。
+
+### Step1: 启动PaddleHub Serving
+
+运行启动命令：
+
+```shell
+$ hub serving start -m fcn_hrnetw48_voc
+```
+
+这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+
+**NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+
+### Step2: 发送预测请求
+
+配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+
+```python
+import requests
+import json
+import cv2
+import base64
+
+import numpy as np
+
+
+def cv2_to_base64(image):
+    data = cv2.imencode('.jpg', image)[1]
+    return base64.b64encode(data.tostring()).decode('utf8')
+
+def base64_to_cv2(b64str):
+    data = base64.b64decode(b64str.encode('utf8'))
+    data = np.fromstring(data, np.uint8)
+    data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+    return data
+
+# 发送HTTP请求
+org_im = cv2.imread('/PATH/TO/IMAGE')
+data = {'images':[cv2_to_base64(org_im)]}
+headers = {"Content-type": "application/json"}
+url = "http://127.0.0.1:8866/predict/fcn_hrnetw48_voc"
+r = requests.post(url=url, headers=headers, data=json.dumps(data))
+mask = base64_to_cv2(r.json()["results"][0])
+```
+
+### 查看代码
+
+https://github.com/PaddlePaddle/PaddleSeg
+
+### 依赖
+
+paddlepaddle >= 2.0.0
+
+paddlehub >= 2.0.0
diff --git a/modules/image/semantic_segmentation/fcn_hrnetw48_voc/hrnet.py b/modules/image/semantic_segmentation/fcn_hrnetw48_voc/hrnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..421d70392370a2b962627cc5bcf6f25d775dc454
--- /dev/null
+++ b/modules/image/semantic_segmentation/fcn_hrnetw48_voc/hrnet.py
@@ -0,0 +1,528 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+from typing import List
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+import fcn_hrnetw48_voc.layers as layers
+
+
+class HRNet_W48(nn.Layer):
+    """
+    The HRNet implementation based on PaddlePaddle.
+    The original article refers to
+    Jingdong Wang, et, al. "HRNet：Deep High-Resolution Representation Learning for Visual Recognition"
+    (https://arxiv.org/pdf/1908.07919.pdf).
+    Args:
+        stage1_num_modules (int, optional): Number of modules for stage1. Default 1.
+        stage1_num_blocks (list, optional): Number of blocks per module for stage1. Default (4).
+        stage1_num_channels (list, optional): Number of channels per branch for stage1. Default (64).
+        stage2_num_modules (int, optional): Number of modules for stage2. Default 1.
+        stage2_num_blocks (list, optional): Number of blocks per module for stage2. Default (4, 4).
+        stage2_num_channels (list, optional): Number of channels per branch for stage2. Default (48, 96).
+        stage3_num_modules (int, optional): Number of modules for stage3. Default 4.
+        stage3_num_blocks (list, optional): Number of blocks per module for stage3. Default (4, 4, 4).
+        stage3_num_channels (list, optional): Number of channels per branch for stage3. Default [48, 96, 192).
+        stage4_num_modules (int, optional): Number of modules for stage4. Default 3.
+        stage4_num_blocks (list, optional): Number of blocks per module for stage4. Default (4, 4, 4, 4).
+        stage4_num_channels (list, optional): Number of channels per branch for stage4. Default (48, 96, 192. 384).
+        has_se (bool, optional): Whether to use Squeeze-and-Excitation module. Default False.
+        align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
+            e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+    """
+
+    def __init__(self,
+                 stage1_num_modules: int = 1,
+                 stage1_num_blocks: List[int] = [4],
+                 stage1_num_channels: List[int] = [64],
+                 stage2_num_modules: int = 1,
+                 stage2_num_blocks: List[int] = [4, 4],
+                 stage2_num_channels: List[int] = [48, 96],
+                 stage3_num_modules: int = 4,
+                 stage3_num_blocks: List[int] = [4, 4, 4],
+                 stage3_num_channels: List[int] = [48, 96, 192],
+                 stage4_num_modules: int = 3,
+                 stage4_num_blocks: List[int] = [4, 4, 4, 4],
+                 stage4_num_channels: List[int] = [48, 96, 192, 384],
+                 has_se=False,
+                 align_corners=False):
+        super(HRNet_W48, self).__init__()
+        self.stage1_num_modules = stage1_num_modules
+        self.stage1_num_blocks = stage1_num_blocks
+        self.stage1_num_channels = stage1_num_channels
+        self.stage2_num_modules = stage2_num_modules
+        self.stage2_num_blocks = stage2_num_blocks
+        self.stage2_num_channels = stage2_num_channels
+        self.stage3_num_modules = stage3_num_modules
+        self.stage3_num_blocks = stage3_num_blocks
+        self.stage3_num_channels = stage3_num_channels
+        self.stage4_num_modules = stage4_num_modules
+        self.stage4_num_blocks = stage4_num_blocks
+        self.stage4_num_channels = stage4_num_channels
+        self.has_se = has_se
+        self.align_corners = align_corners
+        self.feat_channels = [sum(stage4_num_channels)]
+
+        self.conv_layer1_1 = layers.ConvBNReLU(
+            in_channels=3, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False)
+
+        self.conv_layer1_2 = layers.ConvBNReLU(
+            in_channels=64, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False)
+
+        self.la1 = Layer1(
+            num_channels=64,
+            num_blocks=self.stage1_num_blocks[0],
+            num_filters=self.stage1_num_channels[0],
+            has_se=has_se,
+            name="layer2")
+
+        self.tr1 = TransitionLayer(
+            in_channels=[self.stage1_num_channels[0] * 4], out_channels=self.stage2_num_channels, name="tr1")
+
+        self.st2 = Stage(
+            num_channels=self.stage2_num_channels,
+            num_modules=self.stage2_num_modules,
+            num_blocks=self.stage2_num_blocks,
+            num_filters=self.stage2_num_channels,
+            has_se=self.has_se,
+            name="st2",
+            align_corners=align_corners)
+
+        self.tr2 = TransitionLayer(
+            in_channels=self.stage2_num_channels, out_channels=self.stage3_num_channels, name="tr2")
+        self.st3 = Stage(
+            num_channels=self.stage3_num_channels,
+            num_modules=self.stage3_num_modules,
+            num_blocks=self.stage3_num_blocks,
+            num_filters=self.stage3_num_channels,
+            has_se=self.has_se,
+            name="st3",
+            align_corners=align_corners)
+
+        self.tr3 = TransitionLayer(
+            in_channels=self.stage3_num_channels, out_channels=self.stage4_num_channels, name="tr3")
+        self.st4 = Stage(
+            num_channels=self.stage4_num_channels,
+            num_modules=self.stage4_num_modules,
+            num_blocks=self.stage4_num_blocks,
+            num_filters=self.stage4_num_channels,
+            has_se=self.has_se,
+            name="st4",
+            align_corners=align_corners)
+
+    def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+        conv1 = self.conv_layer1_1(x)
+        conv2 = self.conv_layer1_2(conv1)
+
+        la1 = self.la1(conv2)
+
+        tr1 = self.tr1([la1])
+        st2 = self.st2(tr1)
+
+        tr2 = self.tr2(st2)
+        st3 = self.st3(tr2)
+
+        tr3 = self.tr3(st3)
+        st4 = self.st4(tr3)
+
+        size = paddle.shape(st4[0])[2:]
+        x1 = F.interpolate(st4[1], size, mode='bilinear', align_corners=self.align_corners)
+        x2 = F.interpolate(st4[2], size, mode='bilinear', align_corners=self.align_corners)
+        x3 = F.interpolate(st4[3], size, mode='bilinear', align_corners=self.align_corners)
+        x = paddle.concat([st4[0], x1, x2, x3], axis=1)
+
+        return [x]
+
+
+class Layer1(nn.Layer):
+    def __init__(self, num_channels: int, num_filters: int, num_blocks: int, has_se: bool = False, name: str = None):
+        super(Layer1, self).__init__()
+
+        self.bottleneck_block_list = []
+
+        for i in range(num_blocks):
+            bottleneck_block = self.add_sublayer(
+                "bb_{}_{}".format(name, i + 1),
+                BottleneckBlock(
+                    num_channels=num_channels if i == 0 else num_filters * 4,
+                    num_filters=num_filters,
+                    has_se=has_se,
+                    stride=1,
+                    downsample=True if i == 0 else False,
+                    name=name + '_' + str(i + 1)))
+            self.bottleneck_block_list.append(bottleneck_block)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        conv = x
+        for block_func in self.bottleneck_block_list:
+            conv = block_func(conv)
+        return conv
+
+
+class TransitionLayer(nn.Layer):
+    def __init__(self, in_channels: int, out_channels: int, name: str = None):
+        super(TransitionLayer, self).__init__()
+
+        num_in = len(in_channels)
+        num_out = len(out_channels)
+        self.conv_bn_func_list = []
+        for i in range(num_out):
+            residual = None
+            if i < num_in:
+                if in_channels[i] != out_channels[i]:
+                    residual = self.add_sublayer(
+                        "transition_{}_layer_{}".format(name, i + 1),
+                        layers.ConvBNReLU(
+                            in_channels=in_channels[i],
+                            out_channels=out_channels[i],
+                            kernel_size=3,
+                            padding='same',
+                            bias_attr=False))
+            else:
+                residual = self.add_sublayer(
+                    "transition_{}_layer_{}".format(name, i + 1),
+                    layers.ConvBNReLU(
+                        in_channels=in_channels[-1],
+                        out_channels=out_channels[i],
+                        kernel_size=3,
+                        stride=2,
+                        padding='same',
+                        bias_attr=False))
+            self.conv_bn_func_list.append(residual)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outs = []
+        for idx, conv_bn_func in enumerate(self.conv_bn_func_list):
+            if conv_bn_func is None:
+                outs.append(x[idx])
+            else:
+                if idx < len(x):
+                    outs.append(conv_bn_func(x[idx]))
+                else:
+                    outs.append(conv_bn_func(x[-1]))
+        return outs
+
+
+class Branches(nn.Layer):
+    def __init__(self, num_blocks: int, in_channels: int, out_channels: int, has_se: bool = False, name: str = None):
+        super(Branches, self).__init__()
+
+        self.basic_block_list = []
+
+        for i in range(len(out_channels)):
+            self.basic_block_list.append([])
+            for j in range(num_blocks[i]):
+                in_ch = in_channels[i] if j == 0 else out_channels[i]
+                basic_block_func = self.add_sublayer(
+                    "bb_{}_branch_layer_{}_{}".format(name, i + 1, j + 1),
+                    BasicBlock(
+                        num_channels=in_ch,
+                        num_filters=out_channels[i],
+                        has_se=has_se,
+                        name=name + '_branch_layer_' + str(i + 1) + '_' + str(j + 1)))
+                self.basic_block_list[i].append(basic_block_func)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outs = []
+        for idx, input in enumerate(x):
+            conv = input
+            for basic_block_func in self.basic_block_list[idx]:
+                conv = basic_block_func(conv)
+            outs.append(conv)
+        return outs
+
+
+class BottleneckBlock(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_filters: int,
+                 has_se: bool,
+                 stride: int = 1,
+                 downsample: bool = False,
+                 name: str = None):
+        super(BottleneckBlock, self).__init__()
+
+        self.has_se = has_se
+        self.downsample = downsample
+
+        self.conv1 = layers.ConvBNReLU(
+            in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False)
+
+        self.conv2 = layers.ConvBNReLU(
+            in_channels=num_filters,
+            out_channels=num_filters,
+            kernel_size=3,
+            stride=stride,
+            padding='same',
+            bias_attr=False)
+
+        self.conv3 = layers.ConvBN(
+            in_channels=num_filters, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False)
+
+        if self.downsample:
+            self.conv_down = layers.ConvBN(
+                in_channels=num_channels, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False)
+
+        if self.has_se:
+            self.se = SELayer(
+                num_channels=num_filters * 4, num_filters=num_filters * 4, reduction_ratio=16, name=name + '_fc')
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        residual = x
+        conv1 = self.conv1(x)
+        conv2 = self.conv2(conv1)
+        conv3 = self.conv3(conv2)
+
+        if self.downsample:
+            residual = self.conv_down(x)
+
+        if self.has_se:
+            conv3 = self.se(conv3)
+
+        y = conv3 + residual
+        y = F.relu(y)
+        return y
+
+
+class BasicBlock(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_filters: int,
+                 stride: int = 1,
+                 has_se: bool = False,
+                 downsample: bool = False,
+                 name: str = None):
+        super(BasicBlock, self).__init__()
+
+        self.has_se = has_se
+        self.downsample = downsample
+
+        self.conv1 = layers.ConvBNReLU(
+            in_channels=num_channels,
+            out_channels=num_filters,
+            kernel_size=3,
+            stride=stride,
+            padding='same',
+            bias_attr=False)
+        self.conv2 = layers.ConvBN(
+            in_channels=num_filters, out_channels=num_filters, kernel_size=3, padding='same', bias_attr=False)
+
+        if self.downsample:
+            self.conv_down = layers.ConvBNReLU(
+                in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False)
+
+        if self.has_se:
+            self.se = SELayer(num_channels=num_filters, num_filters=num_filters, reduction_ratio=16, name=name + '_fc')
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        residual = x
+        conv1 = self.conv1(x)
+        conv2 = self.conv2(conv1)
+
+        if self.downsample:
+            residual = self.conv_down(x)
+
+        if self.has_se:
+            conv2 = self.se(conv2)
+
+        y = conv2 + residual
+        y = F.relu(y)
+        return y
+
+
+class SELayer(nn.Layer):
+    def __init__(self, num_channels: int, num_filters: int, reduction_ratio: float, name: str = None):
+        super(SELayer, self).__init__()
+
+        self.pool2d_gap = nn.AdaptiveAvgPool2D(1)
+
+        self._num_channels = num_channels
+
+        med_ch = int(num_channels / reduction_ratio)
+        stdv = 1.0 / math.sqrt(num_channels * 1.0)
+        self.squeeze = nn.Linear(
+            num_channels, med_ch, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv)))
+
+        stdv = 1.0 / math.sqrt(med_ch * 1.0)
+        self.excitation = nn.Linear(
+            med_ch, num_filters, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv)))
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        pool = self.pool2d_gap(x)
+        pool = paddle.reshape(pool, shape=[-1, self._num_channels])
+        squeeze = self.squeeze(pool)
+        squeeze = F.relu(squeeze)
+        excitation = self.excitation(squeeze)
+        excitation = F.sigmoid(excitation)
+        excitation = paddle.reshape(excitation, shape=[-1, self._num_channels, 1, 1])
+        out = x * excitation
+        return out
+
+
+class Stage(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_modules: int,
+                 num_blocks: int,
+                 num_filters: int,
+                 has_se: bool = False,
+                 multi_scale_output: bool = True,
+                 name: str = None,
+                 align_corners: bool = False):
+        super(Stage, self).__init__()
+
+        self._num_modules = num_modules
+
+        self.stage_func_list = []
+        for i in range(num_modules):
+            if i == num_modules - 1 and not multi_scale_output:
+                stage_func = self.add_sublayer(
+                    "stage_{}_{}".format(name, i + 1),
+                    HighResolutionModule(
+                        num_channels=num_channels,
+                        num_blocks=num_blocks,
+                        num_filters=num_filters,
+                        has_se=has_se,
+                        multi_scale_output=False,
+                        name=name + '_' + str(i + 1),
+                        align_corners=align_corners))
+            else:
+                stage_func = self.add_sublayer(
+                    "stage_{}_{}".format(name, i + 1),
+                    HighResolutionModule(
+                        num_channels=num_channels,
+                        num_blocks=num_blocks,
+                        num_filters=num_filters,
+                        has_se=has_se,
+                        name=name + '_' + str(i + 1),
+                        align_corners=align_corners))
+
+            self.stage_func_list.append(stage_func)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        out = x
+        for idx in range(self._num_modules):
+            out = self.stage_func_list[idx](out)
+        return out
+
+
+class HighResolutionModule(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_blocks: int,
+                 num_filters: int,
+                 has_se: bool = False,
+                 multi_scale_output: bool = True,
+                 name: str = None,
+                 align_corners: bool = False):
+        super(HighResolutionModule, self).__init__()
+
+        self.branches_func = Branches(
+            num_blocks=num_blocks, in_channels=num_channels, out_channels=num_filters, has_se=has_se, name=name)
+
+        self.fuse_func = FuseLayers(
+            in_channels=num_filters,
+            out_channels=num_filters,
+            multi_scale_output=multi_scale_output,
+            name=name,
+            align_corners=align_corners)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        out = self.branches_func(x)
+        out = self.fuse_func(out)
+        return out
+
+
+class FuseLayers(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 multi_scale_output: bool = True,
+                 name: str = None,
+                 align_corners: bool = False):
+        super(FuseLayers, self).__init__()
+
+        self._actual_ch = len(in_channels) if multi_scale_output else 1
+        self._in_channels = in_channels
+        self.align_corners = align_corners
+
+        self.residual_func_list = []
+        for i in range(self._actual_ch):
+            for j in range(len(in_channels)):
+                if j > i:
+                    residual_func = self.add_sublayer(
+                        "residual_{}_layer_{}_{}".format(name, i + 1, j + 1),
+                        layers.ConvBN(
+                            in_channels=in_channels[j],
+                            out_channels=out_channels[i],
+                            kernel_size=1,
+                            padding='same',
+                            bias_attr=False))
+                    self.residual_func_list.append(residual_func)
+                elif j < i:
+                    pre_num_filters = in_channels[j]
+                    for k in range(i - j):
+                        if k == i - j - 1:
+                            residual_func = self.add_sublayer(
+                                "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1),
+                                layers.ConvBN(
+                                    in_channels=pre_num_filters,
+                                    out_channels=out_channels[i],
+                                    kernel_size=3,
+                                    stride=2,
+                                    padding='same',
+                                    bias_attr=False))
+                            pre_num_filters = out_channels[i]
+                        else:
+                            residual_func = self.add_sublayer(
+                                "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1),
+                                layers.ConvBNReLU(
+                                    in_channels=pre_num_filters,
+                                    out_channels=out_channels[j],
+                                    kernel_size=3,
+                                    stride=2,
+                                    padding='same',
+                                    bias_attr=False))
+                            pre_num_filters = out_channels[j]
+                        self.residual_func_list.append(residual_func)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outs = []
+        residual_func_idx = 0
+        for i in range(self._actual_ch):
+            residual = x[i]
+            residual_shape = paddle.shape(residual)[-2:]
+            for j in range(len(self._in_channels)):
+                if j > i:
+                    y = self.residual_func_list[residual_func_idx](x[j])
+                    residual_func_idx += 1
+
+                    y = F.interpolate(y, residual_shape, mode='bilinear', align_corners=self.align_corners)
+                    residual = residual + y
+                elif j < i:
+                    y = x[j]
+                    for k in range(i - j):
+                        y = self.residual_func_list[residual_func_idx](y)
+                        residual_func_idx += 1
+
+                    residual = residual + y
+
+            residual = F.relu(residual)
+            outs.append(residual)
+
+        return outs
diff --git a/modules/image/semantic_segmentation/fcn_hrnetw48_voc/layers.py b/modules/image/semantic_segmentation/fcn_hrnetw48_voc/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..aca5e911382235cb96d385091f1db261060bad7d
--- /dev/null
+++ b/modules/image/semantic_segmentation/fcn_hrnetw48_voc/layers.py
@@ -0,0 +1,298 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import Tuple
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu':
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+    """Basic conv bn relu layer."""
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 stride: int = 1,
+                 dilation: int = 1,
+                 groups: int = 1,
+                 is_vd_mode: bool = False,
+                 act: str = None,
+                 name: str = None):
+        super(ConvBNLayer, self).__init__()
+
+        self.is_vd_mode = is_vd_mode
+        self._pool2d_avg = AvgPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True)
+        self._conv = Conv2D(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+            dilation=dilation,
+            groups=groups,
+            bias_attr=False)
+
+        self._batch_norm = SyncBatchNorm(out_channels)
+        self._act_op = Activation(act=act)
+
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        if self.is_vd_mode:
+            inputs = self._pool2d_avg(inputs)
+        y = self._conv(inputs)
+        y = self._batch_norm(y)
+        y = self._act_op(y)
+
+        return y
+
+
+class BottleneckBlock(nn.Layer):
+    """Residual bottleneck block"""
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 shortcut: bool = True,
+                 if_first: bool = False,
+                 dilation: int = 1,
+                 name: str = None):
+        super(BottleneckBlock, self).__init__()
+
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels, out_channels=out_channels, kernel_size=1, act='relu', name=name + "_branch2a")
+
+        self.dilation = dilation
+
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            act='relu',
+            dilation=dilation,
+            name=name + "_branch2b")
+        self.conv2 = ConvBNLayer(
+            in_channels=out_channels, out_channels=out_channels * 4, kernel_size=1, act=None, name=name + "_branch2c")
+
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels * 4,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                name=name + "_branch1")
+
+        self.shortcut = shortcut
+
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        if self.dilation > 1:
+            padding = self.dilation
+            y = F.pad(y, [padding, padding, padding, padding])
+
+        conv1 = self.conv1(y)
+        conv2 = self.conv2(conv1)
+
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+
+        y = paddle.add(x=short, y=conv2)
+        y = F.relu(y)
+        return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+    """Depthwise Separable Convolution."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
+        super(SeparableConvBNReLU, self).__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+
+
+class ConvBN(nn.Layer):
+    """Basic conv bn layer"""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
+        super(ConvBN, self).__init__()
+        self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+
+
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
+        super(ConvBNReLU, self).__init__()
+
+        self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+
+
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+
+        self._act = act
+        upper_act_names = nn.layer.activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("nn.layer.activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys()))
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
+
+
+class ASPPModule(nn.Layer):
+    """
+    Atrous Spatial Pyramid Pooling.
+
+    Args:
+        aspp_ratios (tuple): The dilation rate using in ASSP module.
+        in_channels (int): The number of input channels.
+        out_channels (int): The number of output channels.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+        use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+        image_pooling (bool, optional): If augmented with image-level features. Default: False
+    """
+
+    def __init__(self,
+                 aspp_ratios: Tuple[int],
+                 in_channels: int,
+                 out_channels: int,
+                 align_corners: bool,
+                 use_sep_conv: bool = False,
+                 image_pooling: bool = False):
+        super().__init__()
+
+        self.align_corners = align_corners
+        self.aspp_blocks = nn.LayerList()
+
+        for ratio in aspp_ratios:
+            if use_sep_conv and ratio > 1:
+                conv_func = SeparableConvBNReLU
+            else:
+                conv_func = ConvBNReLU
+
+            block = conv_func(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1 if ratio == 1 else 3,
+                dilation=ratio,
+                padding=0 if ratio == 1 else ratio)
+            self.aspp_blocks.append(block)
+
+        out_size = len(self.aspp_blocks)
+
+        if image_pooling:
+            self.global_avg_pool = nn.Sequential(
+                nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+                ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+            out_size += 1
+        self.image_pooling = image_pooling
+
+        self.conv_bn_relu = ConvBNReLU(in_channels=out_channels * out_size, out_channels=out_channels, kernel_size=1)
+
+        self.dropout = nn.Dropout(p=0.1)  # drop rate
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outputs = []
+        for block in self.aspp_blocks:
+            y = block(x)
+            y = F.interpolate(y, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
+            outputs.append(y)
+
+        if self.image_pooling:
+            img_avg = self.global_avg_pool(x)
+            img_avg = F.interpolate(img_avg, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
+            outputs.append(img_avg)
+
+        x = paddle.concat(outputs, axis=1)
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+
+        return x
diff --git a/modules/image/semantic_segmentation/fcn_hrnetw48_voc/module.py b/modules/image/semantic_segmentation/fcn_hrnetw48_voc/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..b0a77b381cc224f0cb2f9f598d787a0a141c3d01
--- /dev/null
+++ b/modules/image/semantic_segmentation/fcn_hrnetw48_voc/module.py
@@ -0,0 +1,133 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+from fcn_hrnetw48_voc.hrnet import HRNet_W48
+import fcn_hrnetw48_voc.layers as layers
+
+
+@moduleinfo(
+    name="fcn_hrnetw48_voc",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="Fcn_hrnetw48 is a segmentation model.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class FCN(nn.Layer):
+    """
+    A simple implementation for FCN based on PaddlePaddle.
+
+    The original article refers to
+    Evan Shelhamer, et, al. "Fully Convolutional Networks for Semantic Segmentation"
+    (https://arxiv.org/abs/1411.4038).
+
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone.
+            Default: (-1, ).
+        channels (int, optional): The channels between conv layer and the last layer of FCNHead.
+            If None, it will be the number of channels of input features. Default: None.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.  Default: False.
+        pretrained (str, optional): The path or url of pretrained model. Default: None
+    """
+
+    def __init__(self,
+                 num_classes: int = 21,
+                 backbone_indices: Tuple[int] = (-1, ),
+                 channels: int = None,
+                 align_corners: bool = False,
+                 pretrained: str = None):
+        super(FCN, self).__init__()
+
+        self.backbone = HRNet_W48()
+        backbone_channels = [self.backbone.feat_channels[i] for i in backbone_indices]
+
+        self.head = FCNHead(num_classes, backbone_indices, backbone_channels, channels)
+
+        self.align_corners = align_corners
+        self.transforms = T.Compose([T.Normalize()])
+
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+
+        else:
+            checkpoint = os.path.join(self.directory, 'model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+
+    def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+        return self.transforms(img)
+
+    def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+        feat_list = self.backbone(x)
+        logit_list = self.head(feat_list)
+        return [
+            F.interpolate(logit, paddle.shape(x)[2:], mode='bilinear', align_corners=self.align_corners)
+            for logit in logit_list
+        ]
+
+
+class FCNHead(nn.Layer):
+    """
+    A simple implementation for FCNHead based on PaddlePaddle
+
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone.
+            Default: (-1, ).
+        backbone_channels (tuple): The values of backbone channels.
+            Default: (270, ).
+        channels (int, optional): The channels between conv layer and the last layer of FCNHead.
+            If None, it will be the number of channels of input features. Default: None.
+        pretrained (str, optional): The path of pretrained model. Default: None
+    """
+
+    def __init__(self,
+                 num_classes: int,
+                 backbone_indices: Tuple[int] = (-1, ),
+                 backbone_channels: Tuple[int] = (270, ),
+                 channels: int = None):
+        super(FCNHead, self).__init__()
+
+        self.num_classes = num_classes
+        self.backbone_indices = backbone_indices
+        if channels is None:
+            channels = backbone_channels[0]
+
+        self.conv_1 = layers.ConvBNReLU(
+            in_channels=backbone_channels[0], out_channels=channels, kernel_size=1, padding='same', stride=1)
+        self.cls = nn.Conv2D(in_channels=channels, out_channels=self.num_classes, kernel_size=1, stride=1, padding=0)
+
+    def forward(self, feat_list: nn.Layer) -> List[paddle.Tensor]:
+        logit_list = []
+        x = feat_list[self.backbone_indices[0]]
+        x = self.conv_1(x)
+        logit = self.cls(x)
+        logit_list.append(logit)
+        return logit_list
diff --git a/modules/image/semantic_segmentation/hardnet_cityscapes/README.md b/modules/image/semantic_segmentation/hardnet_cityscapes/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..75a44dd551187029409ae788ad736fbb713f0e84
--- /dev/null
+++ b/modules/image/semantic_segmentation/hardnet_cityscapes/README.md
@@ -0,0 +1,173 @@
+# PaddleHub 图像分割
+
+## 模型预测
+
+若想使用我们提供的预训练模型进行预测，可使用如下脚本：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='hardnet_cityscapes')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+
+## 如何开始Fine-tune
+
+在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用hardnet_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。
+
+## 代码步骤
+
+使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
+
+### Step1: 定义数据预处理方式
+```python
+from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+```
+
+`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+
+### Step2: 下载数据集并使用
+```python
+from paddlehub.datasets import OpticDiscSeg
+
+train_reader = OpticDiscSeg(transform， mode='train')
+
+```
+* `transform`: 数据预处理方式。
+* `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+
+数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+### Step3: 加载预训练模型
+
+```python
+model = hub.Module(name='hardnet_cityscapes', num_classes=2, pretrained=None)
+```
+* `name`: 选择预训练模型的名字。
+* `num_classes`: 分割模型的类别数目。
+* `pretrained`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+
+### Step4: 选择优化策略和运行配置
+
+```python
+scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True)
+```
+
+#### 优化策略
+
+Paddle2.0提供了多种优化器选择，如`SGD`, `Adam`, `Adamax`等，其中`Adam`:
+
+* `learning_rate`: 全局学习率。
+*  `parameters`: 待优化模型参数。
+
+#### 运行配置
+`Trainer` 主要控制Fine-tune的训练，包含以下可控制的参数:
+
+* `model`: 被优化模型；
+* `optimizer`: 优化器选择；
+* `use_gpu`: 是否使用gpu，默认为False;
+* `use_vdl`: 是否使用vdl可视化训练过程；
+* `checkpoint_dir`: 保存模型参数的地址；
+* `compare_metrics`: 保存最优模型的衡量指标；
+
+`trainer.train` 主要控制具体的训练过程，包含以下可控制的参数：
+
+* `train_dataset`: 训练时所用的数据集；
+* `epochs`: 训练轮数；
+* `batch_size`: 训练的批大小，如果使用GPU，请根据实际情况调整batch_size；
+* `num_workers`: works的数量，默认为0；
+* `eval_dataset`: 验证集；
+* `log_interval`: 打印日志的间隔， 单位为执行批训练的次数。
+* `save_interval`: 保存模型的间隔频次，单位为执行训练的轮数。
+
+## 模型预测
+
+当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
+
+我们使用该模型来进行预测。predict.py脚本如下：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='hardnet_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+参数配置正确后，请执行脚本`python predict.py`。
+**Args**
+* `images`:原始图像路径或BGR格式图片；
+* `visualization`: 是否可视化，默认为True；
+* `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+
+**NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+
+## 服务部署
+
+PaddleHub Serving可以部署一个在线图像分割服务。
+
+### Step1: 启动PaddleHub Serving
+
+运行启动命令：
+
+```shell
+$ hub serving start -m hardnet_cityscapes
+```
+
+这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+
+**NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+
+### Step2: 发送预测请求
+
+配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+
+```python
+import requests
+import json
+import cv2
+import base64
+
+import numpy as np
+
+
+def cv2_to_base64(image):
+    data = cv2.imencode('.jpg', image)[1]
+    return base64.b64encode(data.tostring()).decode('utf8')
+
+def base64_to_cv2(b64str):
+    data = base64.b64decode(b64str.encode('utf8'))
+    data = np.fromstring(data, np.uint8)
+    data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+    return data
+
+# 发送HTTP请求
+org_im = cv2.imread('/PATH/TO/IMAGE')
+data = {'images':[cv2_to_base64(org_im)]}
+headers = {"Content-type": "application/json"}
+url = "http://127.0.0.1:8866/predict/hardnet_cityscapes"
+r = requests.post(url=url, headers=headers, data=json.dumps(data))
+mask = base64_to_cv2(r.json()["results"][0])
+```
+
+### 查看代码
+
+https://github.com/PaddlePaddle/PaddleSeg
+
+### 依赖
+
+paddlepaddle >= 2.0.0
+
+paddlehub >= 2.0.0
diff --git a/modules/image/semantic_segmentation/hardnet_cityscapes/layers.py b/modules/image/semantic_segmentation/hardnet_cityscapes/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..cbcb7ad830fa82a87e1fbd86b1e59a63cc4ef579
--- /dev/null
+++ b/modules/image/semantic_segmentation/hardnet_cityscapes/layers.py
@@ -0,0 +1,185 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu' or os.environ.get('PADDLESEG_EXPORT_STAGE'):
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
+        super().__init__()
+
+        self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+
+
+class ConvBN(nn.Layer):
+    """Basic conv bn layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
+        super().__init__()
+        self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+
+
+class ConvReLUPool(nn.Layer):
+    """Basic conv bn pool layer."""
+
+    def __init__(self, in_channels: int, out_channels: int):
+        super().__init__()
+        self.conv = nn.Conv2D(in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv(x)
+        x = F.relu(x)
+        x = F.pool2d(x, pool_size=2, pool_type="max", pool_stride=2)
+        return x
+
+
+class SeparableConvBNReLU(nn.Layer):
+    """Basic separable conv bn relu layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
+        super().__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+
+
+class DepthwiseConvBN(nn.Layer):
+    """Basic depthwise conv bn relu layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
+        super().__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        return x
+
+
+class AuxLayer(nn.Layer):
+    """
+    The auxiliary layer implementation for auxiliary loss.
+
+    Args:
+        in_channels (int): The number of input channels.
+        inter_channels (int): The intermediate channels.
+        out_channels (int): The number of output channels, and usually it is num_classes.
+        dropout_prob (float, optional): The drop rate. Default: 0.1.
+    """
+
+    def __init__(self, in_channels: int, inter_channels: int, out_channels: int, dropout_prob: float = 0.1):
+        super().__init__()
+
+        self.conv_bn_relu = ConvBNReLU(in_channels=in_channels, out_channels=inter_channels, kernel_size=3, padding=1)
+
+        self.dropout = nn.Dropout(p=dropout_prob)
+
+        self.conv = nn.Conv2D(in_channels=inter_channels, out_channels=out_channels, kernel_size=1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        x = self.conv(x)
+        return x
+
+
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+
+        self._act = act
+        upper_act_names = nn.layer.activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("nn.layer.activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys()))
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
diff --git a/modules/image/semantic_segmentation/hardnet_cityscapes/module.py b/modules/image/semantic_segmentation/hardnet_cityscapes/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..3923bff5ae20dfd69433d46dcedfd6851d5f40ee
--- /dev/null
+++ b/modules/image/semantic_segmentation/hardnet_cityscapes/module.py
@@ -0,0 +1,291 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+from typing import Union, Tuple, List
+
+import numpy as np
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+import hardnet_cityscapes.layers as layers
+
+
+@moduleinfo(
+    name="hardnet_cityscapes",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="Hardnet is a segmentation model trained by PascalVoc.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class HarDNet(nn.Layer):
+    """
+    [Real Time] The FC-HardDNet 70 implementation based on PaddlePaddle.
+    The original article refers to
+        Chao, Ping, et al. "HarDNet: A Low Memory Traffic Network"
+        (https://arxiv.org/pdf/1909.00948.pdf)
+
+    Args:
+        num_classes (int): The unique number of target classes.
+        stem_channels (tuple|list, optional): The number of channels before the encoder. Default: (16, 24, 32, 48).
+        ch_list (tuple|list, optional): The number of channels at each block in the encoder. Default: (64, 96, 160, 224, 320).
+        grmul (float, optional): The channel multiplying factor in HarDBlock, which is m in the paper. Default: 1.7.
+        gr (tuple|list, optional): The growth rate in each HarDBlock, which is k in the paper. Default: (10, 16, 18, 24, 32).
+        n_layers (tuple|list, optional): The number of layers in each HarDBlock. Default: (4, 4, 8, 8, 8).
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.  Default: False.
+        pretrained (str, optional): The path or url of pretrained model. Default: None.
+    """
+
+    def __init__(self,
+                 num_classes: int = 19,
+                 stem_channels: Tuple[int] = (16, 24, 32, 48),
+                 ch_list: Tuple[int] = (64, 96, 160, 224, 320),
+                 grmul: float = 1.7,
+                 gr: Tuple[int] = (10, 16, 18, 24, 32),
+                 n_layers: Tuple[int] = (4, 4, 8, 8, 8),
+                 align_corners: bool = False,
+                 pretrained: str = None):
+
+        super(HarDNet, self).__init__()
+        self.align_corners = align_corners
+        self.pretrained = pretrained
+        encoder_blks_num = len(n_layers)
+        decoder_blks_num = encoder_blks_num - 1
+        encoder_in_channels = stem_channels[3]
+
+        self.stem = nn.Sequential(
+            layers.ConvBNReLU(3, stem_channels[0], kernel_size=3, bias_attr=False),
+            layers.ConvBNReLU(stem_channels[0], stem_channels[1], kernel_size=3, bias_attr=False),
+            layers.ConvBNReLU(stem_channels[1], stem_channels[2], kernel_size=3, stride=2, bias_attr=False),
+            layers.ConvBNReLU(stem_channels[2], stem_channels[3], kernel_size=3, bias_attr=False))
+
+        self.encoder = Encoder(encoder_blks_num, encoder_in_channels, ch_list, gr, grmul, n_layers)
+
+        skip_connection_channels = self.encoder.get_skip_channels()
+        decoder_in_channels = self.encoder.get_out_channels()
+
+        self.decoder = Decoder(decoder_blks_num, decoder_in_channels, skip_connection_channels, gr, grmul, n_layers,
+                               align_corners)
+
+        self.cls_head = nn.Conv2D(in_channels=self.decoder.get_out_channels(), out_channels=num_classes, kernel_size=1)
+
+        self.transforms = T.Compose([T.Normalize()])
+
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+
+        else:
+            checkpoint = os.path.join(self.directory, 'model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+
+    def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+        return self.transforms(img)
+
+    def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+        input_shape = paddle.shape(x)[2:]
+        x = self.stem(x)
+        x, skip_connections = self.encoder(x)
+        x = self.decoder(x, skip_connections)
+        logit = self.cls_head(x)
+        logit = F.interpolate(logit, size=input_shape, mode="bilinear", align_corners=self.align_corners)
+        return [logit]
+
+
+class Encoder(nn.Layer):
+    """The Encoder implementation of FC-HardDNet 70.
+
+    Args:
+        n_blocks (int): The number of blocks in the Encoder module.
+        in_channels (int): The number of input channels.
+        ch_list (tuple|list): The number of channels at each block in the encoder.
+        grmul (float): The channel multiplying factor in HarDBlock, which is m in the paper.
+        gr (tuple|list): The growth rate in each HarDBlock, which is k in the paper.
+        n_layers (tuple|list): The number of layers in each HarDBlock.
+    """
+
+    def __init__(self, n_blocks: int, in_channels: int, ch_list: List[int], gr: List[int], grmul: float,
+                 n_layers: List[int]):
+        super().__init__()
+        self.skip_connection_channels = []
+        self.shortcut_layers = []
+        self.blks = nn.LayerList()
+        ch = in_channels
+        for i in range(n_blocks):
+            blk = HarDBlock(ch, gr[i], grmul, n_layers[i])
+            ch = blk.get_out_ch()
+            self.skip_connection_channels.append(ch)
+            self.blks.append(blk)
+            if i < n_blocks - 1:
+                self.shortcut_layers.append(len(self.blks) - 1)
+            self.blks.append(layers.ConvBNReLU(ch, ch_list[i], kernel_size=1, bias_attr=False))
+
+            ch = ch_list[i]
+            if i < n_blocks - 1:
+                self.blks.append(nn.AvgPool2D(kernel_size=2, stride=2))
+        self.out_channels = ch
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        skip_connections = []
+        for i in range(len(self.blks)):
+            x = self.blks[i](x)
+            if i in self.shortcut_layers:
+                skip_connections.append(x)
+        return x, skip_connections
+
+    def get_skip_channels(self):
+        return self.skip_connection_channels
+
+    def get_out_channels(self):
+        return self.out_channels
+
+
+class Decoder(nn.Layer):
+    """The Decoder implementation of FC-HardDNet 70.
+
+    Args:
+        n_blocks (int): The number of blocks in the Encoder module.
+        in_channels (int): The number of input channels.
+        skip_connection_channels (tuple|list): The channels of shortcut layers in encoder.
+        grmul (float): The channel multiplying factor in HarDBlock, which is m in the paper.
+        gr (tuple|list): The growth rate in each HarDBlock, which is k in the paper.
+        n_layers (tuple|list): The number of layers in each HarDBlock.
+    """
+
+    def __init__(self,
+                 n_blocks: int,
+                 in_channels: int,
+                 skip_connection_channels: List[paddle.Tensor],
+                 gr: List[int],
+                 grmul: float,
+                 n_layers: List[int],
+                 align_corners: bool = False):
+        super().__init__()
+        prev_block_channels = in_channels
+        self.n_blocks = n_blocks
+        self.dense_blocks_up = nn.LayerList()
+        self.conv1x1_up = nn.LayerList()
+
+        for i in range(n_blocks - 1, -1, -1):
+            cur_channels_count = prev_block_channels + skip_connection_channels[i]
+            conv1x1 = layers.ConvBNReLU(cur_channels_count, cur_channels_count // 2, kernel_size=1, bias_attr=False)
+            blk = HarDBlock(base_channels=cur_channels_count // 2, growth_rate=gr[i], grmul=grmul, n_layers=n_layers[i])
+
+            self.conv1x1_up.append(conv1x1)
+            self.dense_blocks_up.append(blk)
+
+            prev_block_channels = blk.get_out_ch()
+
+        self.out_channels = prev_block_channels
+        self.align_corners = align_corners
+
+    def forward(self, x: paddle.Tensor, skip_connections: List[paddle.Tensor]) -> paddle.Tensor:
+        for i in range(self.n_blocks):
+            skip = skip_connections.pop()
+            x = F.interpolate(x, size=paddle.shape(skip)[2:], mode="bilinear", align_corners=self.align_corners)
+            x = paddle.concat([x, skip], axis=1)
+            x = self.conv1x1_up[i](x)
+            x = self.dense_blocks_up[i](x)
+        return x
+
+    def get_out_channels(self):
+        return self.out_channels
+
+
+class HarDBlock(nn.Layer):
+    """The HarDBlock implementation
+
+    Args:
+        base_channels (int): The base channels.
+        growth_rate (tuple|list): The growth rate.
+        grmul (float): The channel multiplying factor.
+        n_layers (tuple|list): The number of layers.
+        keepBase (bool, optional): A bool value indicates whether concatenating the first layer. Default: False.
+    """
+
+    def __init__(self,
+                 base_channels: int,
+                 growth_rate: List[int],
+                 grmul: float,
+                 n_layers: List[int],
+                 keepBase: bool = False):
+        super().__init__()
+        self.keepBase = keepBase
+        self.links = []
+        layers_ = []
+        self.out_channels = 0
+        for i in range(n_layers):
+            outch, inch, link = get_link(i + 1, base_channels, growth_rate, grmul)
+
+            self.links.append(link)
+            layers_.append(layers.ConvBNReLU(inch, outch, kernel_size=3, bias_attr=False))
+            if (i % 2 == 0) or (i == n_layers - 1):
+                self.out_channels += outch
+        self.layers = nn.LayerList(layers_)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        layers_ = [x]
+        for layer in range(len(self.layers)):
+            link = self.links[layer]
+            tin = []
+            for i in link:
+                tin.append(layers_[i])
+            if len(tin) > 1:
+                x = paddle.concat(tin, axis=1)
+            else:
+                x = tin[0]
+            out = self.layers[layer](x)
+            layers_.append(out)
+
+        t = len(layers_)
+        out_ = []
+        for i in range(t):
+            if (i == 0 and self.keepBase) or \
+                (i == t - 1) or (i % 2 == 1):
+                out_.append(layers_[i])
+        out = paddle.concat(out_, 1)
+
+        return out
+
+    def get_out_ch(self):
+        return self.out_channels
+
+
+def get_link(layer: int, base_ch: int, growth_rate: List[int], grmul: float) -> Tuple:
+    if layer == 0:
+        return base_ch, 0, []
+    out_channels = growth_rate
+    link = []
+    for i in range(10):
+        dv = 2**i
+        if layer % dv == 0:
+            k = layer - dv
+            link.insert(0, k)
+            if i > 0:
+                out_channels *= grmul
+    out_channels = int(int(out_channels + 1) / 2) * 2
+    in_channels = 0
+    for i in link:
+        ch, _, _ = get_link(i, base_ch, growth_rate, grmul)
+        in_channels += ch
+    return out_channels, in_channels, link
diff --git a/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/README.md b/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..e5a557d39ea40b17e67c2711db4a38fe212f5a50
--- /dev/null
+++ b/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/README.md
@@ -0,0 +1,173 @@
+# PaddleHub 图像分割
+
+## 模型预测
+
+若想使用我们提供的预训练模型进行预测，可使用如下脚本：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='ocrnet_hrnetw18_cityscapes')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+
+## 如何开始Fine-tune
+
+在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用ocrnet_hrnetw18_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。
+
+## 代码步骤
+
+使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
+
+### Step1: 定义数据预处理方式
+```python
+from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+```
+
+`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+
+### Step2: 下载数据集并使用
+```python
+from paddlehub.datasets import OpticDiscSeg
+
+train_reader = OpticDiscSeg(transform， mode='train')
+
+```
+* `transform`: 数据预处理方式。
+* `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+
+数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+### Step3: 加载预训练模型
+
+```python
+model = hub.Module(name='ocrnet_hrnetw18_cityscapes', num_classes=2, pretrained=None)
+```
+* `name`: 选择预训练模型的名字。
+* `num_classes`: 分割模型的类别数目。
+* `pretrained`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+
+### Step4: 选择优化策略和运行配置
+
+```python
+scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True)
+```
+
+#### 优化策略
+
+Paddle2.0提供了多种优化器选择，如`SGD`, `Adam`, `Adamax`等，其中`Adam`:
+
+* `learning_rate`: 全局学习率。
+*  `parameters`: 待优化模型参数。
+
+#### 运行配置
+`Trainer` 主要控制Fine-tune的训练，包含以下可控制的参数:
+
+* `model`: 被优化模型；
+* `optimizer`: 优化器选择；
+* `use_gpu`: 是否使用gpu，默认为False;
+* `use_vdl`: 是否使用vdl可视化训练过程；
+* `checkpoint_dir`: 保存模型参数的地址；
+* `compare_metrics`: 保存最优模型的衡量指标；
+
+`trainer.train` 主要控制具体的训练过程，包含以下可控制的参数：
+
+* `train_dataset`: 训练时所用的数据集；
+* `epochs`: 训练轮数；
+* `batch_size`: 训练的批大小，如果使用GPU，请根据实际情况调整batch_size；
+* `num_workers`: works的数量，默认为0；
+* `eval_dataset`: 验证集；
+* `log_interval`: 打印日志的间隔， 单位为执行批训练的次数。
+* `save_interval`: 保存模型的间隔频次，单位为执行训练的轮数。
+
+## 模型预测
+
+当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
+
+我们使用该模型来进行预测。predict.py脚本如下：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='ocrnet_hrnetw18_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+参数配置正确后，请执行脚本`python predict.py`。
+**Args**
+* `images`:原始图像路径或BGR格式图片；
+* `visualization`: 是否可视化，默认为True；
+* `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+
+**NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+
+## 服务部署
+
+PaddleHub Serving可以部署一个在线图像分割服务。
+
+### Step1: 启动PaddleHub Serving
+
+运行启动命令：
+
+```shell
+$ hub serving start -m ocrnet_hrnetw18_cityscapes
+```
+
+这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+
+**NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+
+### Step2: 发送预测请求
+
+配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+
+```python
+import requests
+import json
+import cv2
+import base64
+
+import numpy as np
+
+
+def cv2_to_base64(image):
+    data = cv2.imencode('.jpg', image)[1]
+    return base64.b64encode(data.tostring()).decode('utf8')
+
+def base64_to_cv2(b64str):
+    data = base64.b64decode(b64str.encode('utf8'))
+    data = np.fromstring(data, np.uint8)
+    data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+    return data
+
+# 发送HTTP请求
+org_im = cv2.imread('/PATH/TO/IMAGE')
+data = {'images':[cv2_to_base64(org_im)]}
+headers = {"Content-type": "application/json"}
+url = "http://127.0.0.1:8866/predict/ocrnet_hrnetw18_cityscapes"
+r = requests.post(url=url, headers=headers, data=json.dumps(data))
+mask = base64_to_cv2(r.json()["results"][0])
+```
+
+### 查看代码
+
+https://github.com/PaddlePaddle/PaddleSeg
+
+### 依赖
+
+paddlepaddle >= 2.0.0
+
+paddlehub >= 2.0.0
diff --git a/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/hrnet.py b/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/hrnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..82f396340cf4db9269a6f140ccdd3d60364035e4
--- /dev/null
+++ b/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/hrnet.py
@@ -0,0 +1,531 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+from typing import Tuple
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+import ocrnet_hrnetw18_cityscapes.layers as L
+
+
+class HRNet_W18(nn.Layer):
+    """
+    The HRNet implementation based on PaddlePaddle.
+
+    The original article refers to
+    Jingdong Wang, et, al. "HRNet：Deep High-Resolution Representation Learning for Visual Recognition"
+    (https://arxiv.org/pdf/1908.07919.pdf).
+
+    Args:
+        stage1_num_modules (int, optional): Number of modules for stage1. Default 1.
+        stage1_num_blocks (list, optional): Number of blocks per module for stage1. Default (4).
+        stage1_num_channels (list, optional): Number of channels per branch for stage1. Default (64).
+        stage2_num_modules (int, optional): Number of modules for stage2. Default 1.
+        stage2_num_blocks (list, optional): Number of blocks per module for stage2. Default (4, 4).
+        stage2_num_channels (list, optional): Number of channels per branch for stage2. Default (18, 36).
+        stage3_num_modules (int, optional): Number of modules for stage3. Default 4.
+        stage3_num_blocks (list, optional): Number of blocks per module for stage3. Default (4, 4, 4).
+        stage3_num_channels (list, optional): Number of channels per branch for stage3. Default (18, 36, 72).
+        stage4_num_modules (int, optional): Number of modules for stage4. Default 3.
+        stage4_num_blocks (list, optional): Number of blocks per module for stage4. Default (4, 4, 4, 4).
+        stage4_num_channels (list, optional): Number of channels per branch for stage4. Default (18, 36, 72. 144).
+        has_se (bool, optional): Whether to use Squeeze-and-Excitation module. Default False.
+        align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
+            e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+    """
+
+    def __init__(self,
+                 stage1_num_modules: int = 1,
+                 stage1_num_blocks: Tuple[int] = (4, ),
+                 stage1_num_channels: Tuple[int] = (64, ),
+                 stage2_num_modules: int = 1,
+                 stage2_num_blocks: Tuple[int] = (4, 4),
+                 stage2_num_channels: Tuple[int] = (18, 36),
+                 stage3_num_modules: int = 4,
+                 stage3_num_blocks: Tuple[int] = (4, 4, 4),
+                 stage3_num_channels: Tuple[int] = (18, 36, 72),
+                 stage4_num_modules: int = 3,
+                 stage4_num_blocks: Tuple[int] = (4, 4, 4, 4),
+                 stage4_num_channels: Tuple[int] = (18, 36, 72, 144),
+                 has_se: bool = False,
+                 align_corners: bool = False):
+        super(HRNet_W18, self).__init__()
+
+        self.stage1_num_modules = stage1_num_modules
+        self.stage1_num_blocks = stage1_num_blocks
+        self.stage1_num_channels = stage1_num_channels
+        self.stage2_num_modules = stage2_num_modules
+        self.stage2_num_blocks = stage2_num_blocks
+        self.stage2_num_channels = stage2_num_channels
+        self.stage3_num_modules = stage3_num_modules
+        self.stage3_num_blocks = stage3_num_blocks
+        self.stage3_num_channels = stage3_num_channels
+        self.stage4_num_modules = stage4_num_modules
+        self.stage4_num_blocks = stage4_num_blocks
+        self.stage4_num_channels = stage4_num_channels
+        self.has_se = has_se
+        self.align_corners = align_corners
+        self.feat_channels = [sum(stage4_num_channels)]
+
+        self.conv_layer1_1 = L.ConvBNReLU(
+            in_channels=3, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False)
+
+        self.conv_layer1_2 = L.ConvBNReLU(
+            in_channels=64, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False)
+
+        self.la1 = Layer1(
+            num_channels=64,
+            num_blocks=self.stage1_num_blocks[0],
+            num_filters=self.stage1_num_channels[0],
+            has_se=has_se,
+            name="layer2")
+
+        self.tr1 = TransitionLayer(
+            in_channels=[self.stage1_num_channels[0] * 4], out_channels=self.stage2_num_channels, name="tr1")
+
+        self.st2 = Stage(
+            num_channels=self.stage2_num_channels,
+            num_modules=self.stage2_num_modules,
+            num_blocks=self.stage2_num_blocks,
+            num_filters=self.stage2_num_channels,
+            has_se=self.has_se,
+            name="st2",
+            align_corners=align_corners)
+
+        self.tr2 = TransitionLayer(
+            in_channels=self.stage2_num_channels, out_channels=self.stage3_num_channels, name="tr2")
+        self.st3 = Stage(
+            num_channels=self.stage3_num_channels,
+            num_modules=self.stage3_num_modules,
+            num_blocks=self.stage3_num_blocks,
+            num_filters=self.stage3_num_channels,
+            has_se=self.has_se,
+            name="st3",
+            align_corners=align_corners)
+
+        self.tr3 = TransitionLayer(
+            in_channels=self.stage3_num_channels, out_channels=self.stage4_num_channels, name="tr3")
+        self.st4 = Stage(
+            num_channels=self.stage4_num_channels,
+            num_modules=self.stage4_num_modules,
+            num_blocks=self.stage4_num_blocks,
+            num_filters=self.stage4_num_channels,
+            has_se=self.has_se,
+            name="st4",
+            align_corners=align_corners)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        conv1 = self.conv_layer1_1(x)
+        conv2 = self.conv_layer1_2(conv1)
+
+        la1 = self.la1(conv2)
+
+        tr1 = self.tr1([la1])
+        st2 = self.st2(tr1)
+
+        tr2 = self.tr2(st2)
+        st3 = self.st3(tr2)
+
+        tr3 = self.tr3(st3)
+        st4 = self.st4(tr3)
+
+        x0_h, x0_w = st4[0].shape[2:]
+        x1 = F.interpolate(st4[1], (x0_h, x0_w), mode='bilinear', align_corners=self.align_corners)
+        x2 = F.interpolate(st4[2], (x0_h, x0_w), mode='bilinear', align_corners=self.align_corners)
+        x3 = F.interpolate(st4[3], (x0_h, x0_w), mode='bilinear', align_corners=self.align_corners)
+        x = paddle.concat([st4[0], x1, x2, x3], axis=1)
+
+        return [x]
+
+
+class Layer1(nn.Layer):
+    def __init__(self, num_channels: int, num_filters: int, num_blocks: int, has_se: bool = False, name: str = None):
+        super(Layer1, self).__init__()
+
+        self.bottleneck_block_list = []
+
+        for i in range(num_blocks):
+            bottleneck_block = self.add_sublayer(
+                "bb_{}_{}".format(name, i + 1),
+                BottleneckBlock(
+                    num_channels=num_channels if i == 0 else num_filters * 4,
+                    num_filters=num_filters,
+                    has_se=has_se,
+                    stride=1,
+                    downsample=True if i == 0 else False,
+                    name=name + '_' + str(i + 1)))
+            self.bottleneck_block_list.append(bottleneck_block)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        conv = x
+        for block_func in self.bottleneck_block_list:
+            conv = block_func(conv)
+        return conv
+
+
+class TransitionLayer(nn.Layer):
+    def __init__(self, in_channels: int, out_channels: int, name=None):
+        super(TransitionLayer, self).__init__()
+
+        num_in = len(in_channels)
+        num_out = len(out_channels)
+        self.conv_bn_func_list = []
+        for i in range(num_out):
+            residual = None
+            if i < num_in:
+                if in_channels[i] != out_channels[i]:
+                    residual = self.add_sublayer(
+                        "transition_{}_layer_{}".format(name, i + 1),
+                        L.ConvBNReLU(
+                            in_channels=in_channels[i],
+                            out_channels=out_channels[i],
+                            kernel_size=3,
+                            padding='same',
+                            bias_attr=False))
+            else:
+                residual = self.add_sublayer(
+                    "transition_{}_layer_{}".format(name, i + 1),
+                    L.ConvBNReLU(
+                        in_channels=in_channels[-1],
+                        out_channels=out_channels[i],
+                        kernel_size=3,
+                        stride=2,
+                        padding='same',
+                        bias_attr=False))
+            self.conv_bn_func_list.append(residual)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outs = []
+        for idx, conv_bn_func in enumerate(self.conv_bn_func_list):
+            if conv_bn_func is None:
+                outs.append(x[idx])
+            else:
+                if idx < len(x):
+                    outs.append(conv_bn_func(x[idx]))
+                else:
+                    outs.append(conv_bn_func(x[-1]))
+        return outs
+
+
+class Branches(nn.Layer):
+    def __init__(self, num_blocks: int, in_channels: int, out_channels: int, has_se: bool = False, name: str = None):
+        super(Branches, self).__init__()
+
+        self.basic_block_list = []
+
+        for i in range(len(out_channels)):
+            self.basic_block_list.append([])
+            for j in range(num_blocks[i]):
+                in_ch = in_channels[i] if j == 0 else out_channels[i]
+                basic_block_func = self.add_sublayer(
+                    "bb_{}_branch_layer_{}_{}".format(name, i + 1, j + 1),
+                    BasicBlock(
+                        num_channels=in_ch,
+                        num_filters=out_channels[i],
+                        has_se=has_se,
+                        name=name + '_branch_layer_' + str(i + 1) + '_' + str(j + 1)))
+                self.basic_block_list[i].append(basic_block_func)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outs = []
+        for idx, input in enumerate(x):
+            conv = input
+            for basic_block_func in self.basic_block_list[idx]:
+                conv = basic_block_func(conv)
+            outs.append(conv)
+        return outs
+
+
+class BottleneckBlock(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_filters: int,
+                 has_se: bool,
+                 stride: int = 1,
+                 downsample: bool = False,
+                 name: str = None):
+        super(BottleneckBlock, self).__init__()
+
+        self.has_se = has_se
+        self.downsample = downsample
+
+        self.conv1 = L.ConvBNReLU(
+            in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False)
+
+        self.conv2 = L.ConvBNReLU(
+            in_channels=num_filters,
+            out_channels=num_filters,
+            kernel_size=3,
+            stride=stride,
+            padding='same',
+            bias_attr=False)
+
+        self.conv3 = L.ConvBN(
+            in_channels=num_filters, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False)
+
+        if self.downsample:
+            self.conv_down = L.ConvBN(
+                in_channels=num_channels, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False)
+
+        if self.has_se:
+            self.se = SELayer(
+                num_channels=num_filters * 4, num_filters=num_filters * 4, reduction_ratio=16, name=name + '_fc')
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        residual = x
+        conv1 = self.conv1(x)
+        conv2 = self.conv2(conv1)
+        conv3 = self.conv3(conv2)
+
+        if self.downsample:
+            residual = self.conv_down(x)
+
+        if self.has_se:
+            conv3 = self.se(conv3)
+
+        y = conv3 + residual
+        y = F.relu(y)
+        return y
+
+
+class BasicBlock(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_filters: int,
+                 stride: int = 1,
+                 has_se: bool = False,
+                 downsample: bool = False,
+                 name: str = None):
+        super(BasicBlock, self).__init__()
+
+        self.has_se = has_se
+        self.downsample = downsample
+
+        self.conv1 = L.ConvBNReLU(
+            in_channels=num_channels,
+            out_channels=num_filters,
+            kernel_size=3,
+            stride=stride,
+            padding='same',
+            bias_attr=False)
+        self.conv2 = L.ConvBN(
+            in_channels=num_filters, out_channels=num_filters, kernel_size=3, padding='same', bias_attr=False)
+
+        if self.downsample:
+            self.conv_down = L.ConvBNReLU(
+                in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False)
+
+        if self.has_se:
+            self.se = SELayer(num_channels=num_filters, num_filters=num_filters, reduction_ratio=16, name=name + '_fc')
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        residual = x
+        conv1 = self.conv1(x)
+        conv2 = self.conv2(conv1)
+
+        if self.downsample:
+            residual = self.conv_down(x)
+
+        if self.has_se:
+            conv2 = self.se(conv2)
+
+        y = conv2 + residual
+        y = F.relu(y)
+        return y
+
+
+class SELayer(nn.Layer):
+    def __init__(self, num_channels: int, num_filters: int, reduction_ratio: int, name: str = None):
+        super(SELayer, self).__init__()
+
+        self.pool2d_gap = nn.AdaptiveAvgPool2D(1)
+
+        self._num_channels = num_channels
+
+        med_ch = int(num_channels / reduction_ratio)
+        stdv = 1.0 / math.sqrt(num_channels * 1.0)
+        self.squeeze = nn.Linear(
+            num_channels, med_ch, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv)))
+
+        stdv = 1.0 / math.sqrt(med_ch * 1.0)
+        self.excitation = nn.Linear(
+            med_ch, num_filters, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv)))
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        pool = self.pool2d_gap(x)
+        pool = paddle.reshape(pool, shape=[-1, self._num_channels])
+        squeeze = self.squeeze(pool)
+        squeeze = F.relu(squeeze)
+        excitation = self.excitation(squeeze)
+        excitation = F.sigmoid(excitation)
+        excitation = paddle.reshape(excitation, shape=[-1, self._num_channels, 1, 1])
+        out = x * excitation
+        return out
+
+
+class Stage(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_modules: int,
+                 num_blocks: int,
+                 num_filters: int,
+                 has_se: bool = False,
+                 multi_scale_output: bool = True,
+                 name: str = None,
+                 align_corners: bool = False):
+        super(Stage, self).__init__()
+
+        self._num_modules = num_modules
+
+        self.stage_func_list = []
+        for i in range(num_modules):
+            if i == num_modules - 1 and not multi_scale_output:
+                stage_func = self.add_sublayer(
+                    "stage_{}_{}".format(name, i + 1),
+                    HighResolutionModule(
+                        num_channels=num_channels,
+                        num_blocks=num_blocks,
+                        num_filters=num_filters,
+                        has_se=has_se,
+                        multi_scale_output=False,
+                        name=name + '_' + str(i + 1),
+                        align_corners=align_corners))
+            else:
+                stage_func = self.add_sublayer(
+                    "stage_{}_{}".format(name, i + 1),
+                    HighResolutionModule(
+                        num_channels=num_channels,
+                        num_blocks=num_blocks,
+                        num_filters=num_filters,
+                        has_se=has_se,
+                        name=name + '_' + str(i + 1),
+                        align_corners=align_corners))
+
+            self.stage_func_list.append(stage_func)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        out = x
+        for idx in range(self._num_modules):
+            out = self.stage_func_list[idx](out)
+        return out
+
+
+class HighResolutionModule(nn.Layer):
+    def __init__(self,
+                 num_channels: int,
+                 num_blocks: int,
+                 num_filters: int,
+                 has_se: bool = False,
+                 multi_scale_output: bool = True,
+                 name: str = None,
+                 align_corners: str = False):
+        super(HighResolutionModule, self).__init__()
+
+        self.branches_func = Branches(
+            num_blocks=num_blocks, in_channels=num_channels, out_channels=num_filters, has_se=has_se, name=name)
+
+        self.fuse_func = FuseLayers(
+            in_channels=num_filters,
+            out_channels=num_filters,
+            multi_scale_output=multi_scale_output,
+            name=name,
+            align_corners=align_corners)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        out = self.branches_func(x)
+        out = self.fuse_func(out)
+        return out
+
+
+class FuseLayers(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 multi_scale_output: bool = True,
+                 name: str = None,
+                 align_corners: bool = False):
+        super(FuseLayers, self).__init__()
+
+        self._actual_ch = len(in_channels) if multi_scale_output else 1
+        self._in_channels = in_channels
+        self.align_corners = align_corners
+
+        self.residual_func_list = []
+        for i in range(self._actual_ch):
+            for j in range(len(in_channels)):
+                if j > i:
+                    residual_func = self.add_sublayer(
+                        "residual_{}_layer_{}_{}".format(name, i + 1, j + 1),
+                        L.ConvBN(
+                            in_channels=in_channels[j],
+                            out_channels=out_channels[i],
+                            kernel_size=1,
+                            padding='same',
+                            bias_attr=False))
+                    self.residual_func_list.append(residual_func)
+                elif j < i:
+                    pre_num_filters = in_channels[j]
+                    for k in range(i - j):
+                        if k == i - j - 1:
+                            residual_func = self.add_sublayer(
+                                "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1),
+                                L.ConvBN(
+                                    in_channels=pre_num_filters,
+                                    out_channels=out_channels[i],
+                                    kernel_size=3,
+                                    stride=2,
+                                    padding='same',
+                                    bias_attr=False))
+                            pre_num_filters = out_channels[i]
+                        else:
+                            residual_func = self.add_sublayer(
+                                "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1),
+                                L.ConvBNReLU(
+                                    in_channels=pre_num_filters,
+                                    out_channels=out_channels[j],
+                                    kernel_size=3,
+                                    stride=2,
+                                    padding='same',
+                                    bias_attr=False))
+                            pre_num_filters = out_channels[j]
+                        self.residual_func_list.append(residual_func)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outs = []
+        residual_func_idx = 0
+        for i in range(self._actual_ch):
+            residual = x[i]
+            residual_shape = residual.shape[-2:]
+            for j in range(len(self._in_channels)):
+                if j > i:
+                    y = self.residual_func_list[residual_func_idx](x[j])
+                    residual_func_idx += 1
+
+                    y = F.interpolate(y, residual_shape, mode='bilinear', align_corners=self.align_corners)
+                    residual = residual + y
+                elif j < i:
+                    y = x[j]
+                    for k in range(i - j):
+                        y = self.residual_func_list[residual_func_idx](y)
+                        residual_func_idx += 1
+
+                    residual = residual + y
+
+            residual = F.relu(residual)
+            outs.append(residual)
+
+        return outs
diff --git a/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/layers.py b/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..27c5a68a7c725aacca231279aea7ecdd216b20a1
--- /dev/null
+++ b/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/layers.py
@@ -0,0 +1,297 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Tuple
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu':
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+    """Basic conv bn relu layer."""
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 stride: int = 1,
+                 dilation: int = 1,
+                 groups: int = 1,
+                 is_vd_mode: bool = False,
+                 act: str = None,
+                 name: str = None):
+        super(ConvBNLayer, self).__init__()
+
+        self.is_vd_mode = is_vd_mode
+        self._pool2d_avg = AvgPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True)
+        self._conv = Conv2D(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+            dilation=dilation,
+            groups=groups,
+            bias_attr=False)
+
+        self._batch_norm = SyncBatchNorm(out_channels)
+        self._act_op = Activation(act=act)
+
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        if self.is_vd_mode:
+            inputs = self._pool2d_avg(inputs)
+        y = self._conv(inputs)
+        y = self._batch_norm(y)
+        y = self._act_op(y)
+
+        return y
+
+
+class BottleneckBlock(nn.Layer):
+    """Residual bottleneck block"""
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 shortcut: bool = True,
+                 if_first: bool = False,
+                 dilation: int = 1,
+                 name: str = None):
+        super(BottleneckBlock, self).__init__()
+
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels, out_channels=out_channels, kernel_size=1, act='relu', name=name + "_branch2a")
+
+        self.dilation = dilation
+
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            act='relu',
+            dilation=dilation,
+            name=name + "_branch2b")
+        self.conv2 = ConvBNLayer(
+            in_channels=out_channels, out_channels=out_channels * 4, kernel_size=1, act=None, name=name + "_branch2c")
+
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels * 4,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                name=name + "_branch1")
+
+        self.shortcut = shortcut
+
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        if self.dilation > 1:
+            padding = self.dilation
+            y = F.pad(y, [padding, padding, padding, padding])
+
+        conv1 = self.conv1(y)
+        conv2 = self.conv2(conv1)
+
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+
+        y = paddle.add(x=short, y=conv2)
+        y = F.relu(y)
+        return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+    """Depthwise Separable Convolution."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
+        super(SeparableConvBNReLU, self).__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+
+
+class ConvBN(nn.Layer):
+    """Basic conv bn layer"""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
+        super(ConvBN, self).__init__()
+        self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+
+
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
+        super(ConvBNReLU, self).__init__()
+
+        self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+
+
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+
+        self._act = act
+        upper_act_names = nn.layer.activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("nn.layer.activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys()))
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
+
+
+class ASPPModule(nn.Layer):
+    """
+    Atrous Spatial Pyramid Pooling.
+
+    Args:
+        aspp_ratios (tuple): The dilation rate using in ASSP module.
+        in_channels (int): The number of input channels.
+        out_channels (int): The number of output channels.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+        use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+        image_pooling (bool, optional): If augmented with image-level features. Default: False
+    """
+
+    def __init__(self,
+                 aspp_ratios: Tuple[int],
+                 in_channels: int,
+                 out_channels: int,
+                 align_corners: bool,
+                 use_sep_conv: bool = False,
+                 image_pooling: bool = False):
+        super().__init__()
+
+        self.align_corners = align_corners
+        self.aspp_blocks = nn.LayerList()
+
+        for ratio in aspp_ratios:
+            if use_sep_conv and ratio > 1:
+                conv_func = SeparableConvBNReLU
+            else:
+                conv_func = ConvBNReLU
+
+            block = conv_func(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1 if ratio == 1 else 3,
+                dilation=ratio,
+                padding=0 if ratio == 1 else ratio)
+            self.aspp_blocks.append(block)
+
+        out_size = len(self.aspp_blocks)
+
+        if image_pooling:
+            self.global_avg_pool = nn.Sequential(
+                nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+                ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+            out_size += 1
+        self.image_pooling = image_pooling
+
+        self.conv_bn_relu = ConvBNReLU(in_channels=out_channels * out_size, out_channels=out_channels, kernel_size=1)
+
+        self.dropout = nn.Dropout(p=0.1)  # drop rate
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outputs = []
+        for block in self.aspp_blocks:
+            y = block(x)
+            y = F.interpolate(y, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
+            outputs.append(y)
+
+        if self.image_pooling:
+            img_avg = self.global_avg_pool(x)
+            img_avg = F.interpolate(img_avg, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
+            outputs.append(img_avg)
+
+        x = paddle.concat(outputs, axis=1)
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+
+        return x
diff --git a/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/module.py b/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..2ebbfbef041133ce3014877bb91035b6a3e40ff7
--- /dev/null
+++ b/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/module.py
@@ -0,0 +1,224 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+from typing import List
+
+import paddle
+import numpy as np
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+import ocrnet_hrnetw18_cityscapes.layers as L
+from ocrnet_hrnetw18_cityscapes.hrnet import HRNet_W18
+
+
+@moduleinfo(
+    name="ocrnet_hrnetw18_cityscapes",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="OCRNetHRNetW18 is a segmentation model pretrained by pascal voc.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class OCRNetHRNetW18(nn.Layer):
+    """
+    The OCRNet implementation based on PaddlePaddle.
+    The original article refers to
+        Yuan, Yuhui, et al. "Object-Contextual Representations for Semantic Segmentation"
+        (https://arxiv.org/pdf/1909.11065.pdf)
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (list): A list indicates the indices of output of backbone.
+            It can be either one or two values, if two values, the first index will be taken as
+            a deep-supervision feature in auxiliary layer; the second one will be taken as
+            input of pixel representation. If one value, it is taken by both above.
+        ocr_mid_channels (int, optional): The number of middle channels in OCRHead. Default: 512.
+        ocr_key_channels (int, optional): The number of key channels in ObjectAttentionBlock. Default: 256.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.  Default: False.
+        pretrained (str, optional): The path or url of pretrained model. Default: None.
+    """
+
+    def __init__(self,
+                 num_classes: int = 19,
+                 backbone_indices: List[int] = [0],
+                 ocr_mid_channels: int = 512,
+                 ocr_key_channels: int = 256,
+                 align_corners: bool = False,
+                 pretrained: str = None):
+        super(OCRNetHRNetW18, self).__init__()
+        self.backbone = HRNet_W18()
+        self.backbone_indices = backbone_indices
+        in_channels = [self.backbone.feat_channels[i] for i in backbone_indices]
+        self.head = OCRHead(
+            num_classes=num_classes,
+            in_channels=in_channels,
+            ocr_mid_channels=ocr_mid_channels,
+            ocr_key_channels=ocr_key_channels)
+        self.align_corners = align_corners
+        self.transforms = T.Compose([T.Normalize()])
+
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+
+        else:
+            checkpoint = os.path.join(self.directory, 'model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+
+    def transform(self, img: np.ndarray) -> np.ndarray:
+        return self.transforms(img)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        feats = self.backbone(x)
+        feats = [feats[i] for i in self.backbone_indices]
+        logit_list = self.head(feats)
+        logit_list = [
+            F.interpolate(logit, x.shape[2:], mode='bilinear', align_corners=self.align_corners) for logit in logit_list
+        ]
+        return logit_list
+
+
+class OCRHead(nn.Layer):
+    """
+    The Object contextual representation head.
+    Args:
+        num_classes(int): The unique number of target classes.
+        in_channels(tuple): The number of input channels.
+        ocr_mid_channels(int, optional): The number of middle channels in OCRHead. Default: 512.
+        ocr_key_channels(int, optional): The number of key channels in ObjectAttentionBlock. Default: 256.
+    """
+
+    def __init__(self, num_classes: int, in_channels: int, ocr_mid_channels: int = 512, ocr_key_channels: int = 256):
+        super().__init__()
+
+        self.num_classes = num_classes
+        self.spatial_gather = SpatialGatherBlock()
+        self.spatial_ocr = SpatialOCRModule(ocr_mid_channels, ocr_key_channels, ocr_mid_channels)
+
+        self.indices = [-2, -1] if len(in_channels) > 1 else [-1, -1]
+
+        self.conv3x3_ocr = L.ConvBNReLU(in_channels[self.indices[1]], ocr_mid_channels, 3, padding=1)
+        self.cls_head = nn.Conv2D(ocr_mid_channels, self.num_classes, 1)
+        self.aux_head = nn.Sequential(
+            L.ConvBNReLU(in_channels[self.indices[0]], in_channels[self.indices[0]], 1),
+            nn.Conv2D(in_channels[self.indices[0]], self.num_classes, 1))
+
+    def forward(self, feat_list: List[paddle.Tensor]) -> paddle.Tensor:
+        feat_shallow, feat_deep = feat_list[self.indices[0]], feat_list[self.indices[1]]
+
+        soft_regions = self.aux_head(feat_shallow)
+        pixels = self.conv3x3_ocr(feat_deep)
+
+        object_regions = self.spatial_gather(pixels, soft_regions)
+        ocr = self.spatial_ocr(pixels, object_regions)
+
+        logit = self.cls_head(ocr)
+        return [logit, soft_regions]
+
+
+class SpatialGatherBlock(nn.Layer):
+    """Aggregation layer to compute the pixel-region representation."""
+
+    def forward(self, pixels: paddle.Tensor, regions: paddle.Tensor) -> paddle.Tensor:
+        n, c, h, w = pixels.shape
+        _, k, _, _ = regions.shape
+
+        # pixels: from (n, c, h, w) to (n, h*w, c)
+        pixels = paddle.reshape(pixels, (n, c, h * w))
+        pixels = paddle.transpose(pixels, [0, 2, 1])
+
+        # regions: from (n, k, h, w) to (n, k, h*w)
+        regions = paddle.reshape(regions, (n, k, h * w))
+        regions = F.softmax(regions, axis=2)
+
+        # feats: from (n, k, c) to (n, c, k, 1)
+        feats = paddle.bmm(regions, pixels)
+        feats = paddle.transpose(feats, [0, 2, 1])
+        feats = paddle.unsqueeze(feats, axis=-1)
+
+        return feats
+
+
+class SpatialOCRModule(nn.Layer):
+    """Aggregate the global object representation to update the representation for each pixel."""
+
+    def __init__(self, in_channels: int, key_channels: int, out_channels: int, dropout_rate: float = 0.1):
+        super().__init__()
+
+        self.attention_block = ObjectAttentionBlock(in_channels, key_channels)
+        self.conv1x1 = nn.Sequential(L.ConvBNReLU(2 * in_channels, out_channels, 1), nn.Dropout2D(dropout_rate))
+
+    def forward(self, pixels: paddle.Tensor, regions: paddle.Tensor) -> paddle.Tensor:
+        context = self.attention_block(pixels, regions)
+        feats = paddle.concat([context, pixels], axis=1)
+        feats = self.conv1x1(feats)
+
+        return feats
+
+
+class ObjectAttentionBlock(nn.Layer):
+    """A self-attention module."""
+
+    def __init__(self, in_channels: int, key_channels: int):
+        super().__init__()
+
+        self.in_channels = in_channels
+        self.key_channels = key_channels
+
+        self.f_pixel = nn.Sequential(
+            L.ConvBNReLU(in_channels, key_channels, 1), L.ConvBNReLU(key_channels, key_channels, 1))
+
+        self.f_object = nn.Sequential(
+            L.ConvBNReLU(in_channels, key_channels, 1), L.ConvBNReLU(key_channels, key_channels, 1))
+
+        self.f_down = L.ConvBNReLU(in_channels, key_channels, 1)
+
+        self.f_up = L.ConvBNReLU(key_channels, in_channels, 1)
+
+    def forward(self, x: paddle.Tensor, proxy: paddle.Tensor) -> paddle.Tensor:
+        n, _, h, w = x.shape
+
+        # query : from (n, c1, h1, w1) to (n, h1*w1, key_channels)
+        query = self.f_pixel(x)
+        query = paddle.reshape(query, (n, self.key_channels, -1))
+        query = paddle.transpose(query, [0, 2, 1])
+
+        # key : from (n, c2, h2, w2) to (n, key_channels, h2*w2)
+        key = self.f_object(proxy)
+        key = paddle.reshape(key, (n, self.key_channels, -1))
+
+        # value : from (n, c2, h2, w2) to (n, h2*w2, key_channels)
+        value = self.f_down(proxy)
+        value = paddle.reshape(value, (n, self.key_channels, -1))
+        value = paddle.transpose(value, [0, 2, 1])
+
+        # sim_map (n, h1*w1, h2*w2)
+        sim_map = paddle.bmm(query, key)
+        sim_map = (self.key_channels**-.5) * sim_map
+        sim_map = F.softmax(sim_map, axis=-1)
+
+        # context from (n, h1*w1, key_channels) to (n , out_channels, h1, w1)
+        context = paddle.bmm(sim_map, value)
+        context = paddle.transpose(context, [0, 2, 1])
+        context = paddle.reshape(context, (n, self.key_channels, h, w))
+        context = self.f_up(context)
+
+        return context
diff --git a/modules/image/semantic_segmentation/unet_cityscapes/README.md b/modules/image/semantic_segmentation/unet_cityscapes/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..8510ac7fc2d313f4613a5ee70ccf80ba26b4ae30
--- /dev/null
+++ b/modules/image/semantic_segmentation/unet_cityscapes/README.md
@@ -0,0 +1,174 @@
+# PaddleHub 图像分割
+
+## 模型预测
+
+若想使用我们提供的预训练模型进行预测，可使用如下脚本：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='unet_cityscapes')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+## 如何开始Fine-tune
+
+在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用unet_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。
+
+## 代码步骤
+
+使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
+
+### Step1: 定义数据预处理方式
+```python
+from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+```
+
+`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+
+### Step2: 下载数据集并使用
+```python
+from paddlehub.datasets import OpticDiscSeg
+
+train_reader = OpticDiscSeg(transform， mode='train')
+
+```
+* `transform`: 数据预处理方式。
+* `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+
+数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+### Step3: 加载预训练模型
+
+```python
+model = hub.Module(name='unet_cityscapes', num_classes=2, pretrained=None)
+```
+* `name`: 选择预训练模型的名字。
+* `num_classes`: 分割模型的类别数目。
+* `pretrained`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+
+### Step4: 选择优化策略和运行配置
+
+```python
+scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True)
+```
+
+#### 优化策略
+
+Paddle2.0rc提供了多种优化器选择，如`SGD`, `Adam`, `Adamax`等，详细参见[策略](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0-rc/api/paddle/optimizer/optimizer/Optimizer_cn.html)。
+
+其中`Adam`:
+
+* `learning_rate`: 全局学习率。
+*  `parameters`: 待优化模型参数。
+
+#### 运行配置
+`Trainer` 主要控制Fine-tune的训练，包含以下可控制的参数:
+
+* `model`: 被优化模型；
+* `optimizer`: 优化器选择；
+* `use_gpu`: 是否使用gpu，默认为False;
+* `use_vdl`: 是否使用vdl可视化训练过程；
+* `checkpoint_dir`: 保存模型参数的地址；
+* `compare_metrics`: 保存最优模型的衡量指标；
+
+`trainer.train` 主要控制具体的训练过程，包含以下可控制的参数：
+
+* `train_dataset`: 训练时所用的数据集；
+* `epochs`: 训练轮数；
+* `batch_size`: 训练的批大小，如果使用GPU，请根据实际情况调整batch_size；
+* `num_workers`: works的数量，默认为0；
+* `eval_dataset`: 验证集；
+* `log_interval`: 打印日志的间隔， 单位为执行批训练的次数。
+* `save_interval`: 保存模型的间隔频次，单位为执行训练的轮数。
+
+## 模型预测
+
+当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
+
+我们使用该模型来进行预测。predict.py脚本如下：
+
+```python
+import paddle
+import cv2
+import paddlehub as hub
+
+if __name__ == '__main__':
+    model = hub.Module(name='unet_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+    img = cv2.imread("/PATH/TO/IMAGE")
+    model.predict(images=[img], visualization=True)
+```
+
+参数配置正确后，请执行脚本`python predict.py`。
+**Args**
+* `images`:原始图像路径或BGR格式图片；
+* `visualization`: 是否可视化，默认为True；
+* `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+
+**NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+
+## 服务部署
+
+PaddleHub Serving可以部署一个在线图像分割服务。
+
+### Step1: 启动PaddleHub Serving
+
+运行启动命令：
+
+```shell
+$ hub serving start -m unet_cityscapes
+```
+
+这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+
+**NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+
+### Step2: 发送预测请求
+
+配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+
+```python
+import requests
+import json
+import cv2
+import base64
+
+import numpy as np
+
+
+def cv2_to_base64(image):
+    data = cv2.imencode('.jpg', image)[1]
+    return base64.b64encode(data.tostring()).decode('utf8')
+
+def base64_to_cv2(b64str):
+    data = base64.b64decode(b64str.encode('utf8'))
+    data = np.fromstring(data, np.uint8)
+    data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+    return data
+
+# 发送HTTP请求
+org_im = cv2.imread('/PATH/TO/IMAGE')
+data = {'images':[cv2_to_base64(org_im)]}
+headers = {"Content-type": "application/json"}
+url = "http://127.0.0.1:8866/predict/unet_cityscapes"
+r = requests.post(url=url, headers=headers, data=json.dumps(data))
+mask = base64_to_cv2(r.json()["results"][0])
+```
+
+### 查看代码
+
+https://github.com/PaddlePaddle/PaddleSeg
+
+### 依赖
+
+paddlepaddle >= 2.0.0
+
+paddlehub >= 2.0.0
diff --git a/modules/image/semantic_segmentation/unet_cityscapes/layers.py b/modules/image/semantic_segmentation/unet_cityscapes/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..e4f909588a88236e9f4f2d2aed9c9c4ea06fead3
--- /dev/null
+++ b/modules/image/semantic_segmentation/unet_cityscapes/layers.py
@@ -0,0 +1,185 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu' or os.environ.get('PADDLESEG_EXPORT_STAGE'):
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
+        super().__init__()
+
+        self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+
+
+class ConvBN(nn.Layer):
+    """Basic conv bn layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
+        super().__init__()
+        self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+
+
+class ConvReLUPool(nn.Layer):
+    """Basic conv bn pool layer."""
+
+    def __init__(self, in_channels: int, out_channels: int):
+        super().__init__()
+        self.conv = nn.Conv2D(in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv(x)
+        x = F.relu(x)
+        x = F.pool2d(x, pool_size=2, pool_type="max", pool_stride=2)
+        return x
+
+
+class SeparableConvBNReLU(nn.Layer):
+    """Basic separable Convolution layer."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
+        super().__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+
+
+class DepthwiseConvBN(nn.Layer):
+    """Depthwise Convolution."""
+
+    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
+        super().__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        return x
+
+
+class AuxLayer(nn.Layer):
+    """
+    The auxiliary layer implementation for auxiliary loss.
+
+    Args:
+        in_channels (int): The number of input channels.
+        inter_channels (int): The intermediate channels.
+        out_channels (int): The number of output channels, and usually it is num_classes.
+        dropout_prob (float, optional): The drop rate. Default: 0.1.
+    """
+
+    def __init__(self, in_channels: int, inter_channels: int, out_channels: int, dropout_prob: float = 0.1):
+        super().__init__()
+
+        self.conv_bn_relu = ConvBNReLU(in_channels=in_channels, out_channels=inter_channels, kernel_size=3, padding=1)
+
+        self.dropout = nn.Dropout(p=dropout_prob)
+
+        self.conv = nn.Conv2D(in_channels=inter_channels, out_channels=out_channels, kernel_size=1)
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        x = self.conv(x)
+        return x
+
+
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+
+        self._act = act
+        upper_act_names = nn.layer.activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("nn.layer.activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys()))
+
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
diff --git a/modules/image/semantic_segmentation/unet_cityscapes/module.py b/modules/image/semantic_segmentation/unet_cityscapes/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..f2bcc19f5c7662858ecac7b9c2d89dbbc2f8628b
--- /dev/null
+++ b/modules/image/semantic_segmentation/unet_cityscapes/module.py
@@ -0,0 +1,151 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+import unet_cityscapes.layers as layers
+
+
+@moduleinfo(
+    name="unet_cityscapes",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="Unet is a segmentation model.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class UNet(nn.Layer):
+    """
+    The UNet implementation based on PaddlePaddle.
+
+    The original article refers to
+    Olaf Ronneberger, et, al. "U-Net: Convolutional Networks for Biomedical Image Segmentation"
+    (https://arxiv.org/abs/1505.04597).
+
+    Args:
+        num_classes (int): The unique number of target classes.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.  Default: False.
+        use_deconv (bool, optional): A bool value indicates whether using deconvolution in upsampling.
+            If False, use resize_bilinear. Default: False.
+        pretrained (str, optional): The path or url of pretrained model for fine tuning. Default: None.
+    """
+
+    def __init__(self,
+                 num_classes: int = 19,
+                 align_corners: bool = False,
+                 use_deconv: bool = False,
+                 pretrained: str = None):
+        super(UNet, self).__init__()
+
+        self.encode = Encoder()
+        self.decode = Decoder(align_corners, use_deconv=use_deconv)
+        self.cls = self.conv = nn.Conv2D(in_channels=64, out_channels=num_classes, kernel_size=3, stride=1, padding=1)
+
+        self.transforms = T.Compose([T.Normalize()])
+
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+
+        else:
+            checkpoint = os.path.join(self.directory, 'model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+
+    def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+        return self.transforms(img)
+
+    def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+        logit_list = []
+        x, short_cuts = self.encode(x)
+        x = self.decode(x, short_cuts)
+        logit = self.cls(x)
+        logit_list.append(logit)
+        return logit_list
+
+
+class Encoder(nn.Layer):
+    def __init__(self):
+        super().__init__()
+
+        self.double_conv = nn.Sequential(layers.ConvBNReLU(3, 64, 3), layers.ConvBNReLU(64, 64, 3))
+        down_channels = [[64, 128], [128, 256], [256, 512], [512, 512]]
+        self.down_sample_list = nn.LayerList([self.down_sampling(channel[0], channel[1]) for channel in down_channels])
+
+    def down_sampling(self, in_channels: int, out_channels: int) -> nn.Layer:
+        modules = []
+        modules.append(nn.MaxPool2D(kernel_size=2, stride=2))
+        modules.append(layers.ConvBNReLU(in_channels, out_channels, 3))
+        modules.append(layers.ConvBNReLU(out_channels, out_channels, 3))
+        return nn.Sequential(*modules)
+
+    def forward(self, x: paddle.Tensor) -> Tuple:
+        short_cuts = []
+        x = self.double_conv(x)
+        for down_sample in self.down_sample_list:
+            short_cuts.append(x)
+            x = down_sample(x)
+        return x, short_cuts
+
+
+class Decoder(nn.Layer):
+    def __init__(self, align_corners: bool, use_deconv: bool = False):
+        super().__init__()
+
+        up_channels = [[512, 256], [256, 128], [128, 64], [64, 64]]
+        self.up_sample_list = nn.LayerList(
+            [UpSampling(channel[0], channel[1], align_corners, use_deconv) for channel in up_channels])
+
+    def forward(self, x: paddle.Tensor, short_cuts: List) -> paddle.Tensor:
+        for i in range(len(short_cuts)):
+            x = self.up_sample_list[i](x, short_cuts[-(i + 1)])
+        return x
+
+
+class UpSampling(nn.Layer):
+    def __init__(self, in_channels: int, out_channels: int, align_corners: bool, use_deconv: bool = False):
+        super().__init__()
+
+        self.align_corners = align_corners
+
+        self.use_deconv = use_deconv
+        if self.use_deconv:
+            self.deconv = nn.Conv2DTranspose(in_channels, out_channels // 2, kernel_size=2, stride=2, padding=0)
+            in_channels = in_channels + out_channels // 2
+        else:
+            in_channels *= 2
+
+        self.double_conv = nn.Sequential(
+            layers.ConvBNReLU(in_channels, out_channels, 3), layers.ConvBNReLU(out_channels, out_channels, 3))
+
+    def forward(self, x: paddle.Tensor, short_cut: paddle.Tensor) -> paddle.Tensor:
+        if self.use_deconv:
+            x = self.deconv(x)
+        else:
+            x = F.interpolate(x, paddle.shape(short_cut)[2:], mode='bilinear', align_corners=self.align_corners)
+        x = paddle.concat([x, short_cut], axis=1)
+        x = self.double_conv(x)
+        return x