add 10 segmentation model

1e69d066 · haoyuying · a25574b6 · 1e69d066 · 1e69d066 · 1e69d066
50 changed file
--- a/modules/image/semantic_segmentation/ann_resnet50_cityscapes/README.md
+++ b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/README.md
+# ann_resnet50_cityscapes
+|模型名称|ann_resnet50_cityscapes|
+| :--- | :---: | 
+|类别|图像-图像分割|
+|网络|ann_resnet50vd|
+|数据集|Cityscapes|
+|是否支持Fine-tuning|是|
+|模型大小|228MB|
+|指标|-|
+|最新更新日期|2022-03-22|
+## 一、模型基本信息
+  - 样例结果示例：
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212111-df341f2a-e994-45d7-92d6-2288d666079c.png"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212188-2db40b29-2943-47ce-9ad2-36a6fb85ba3e.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### 模型介绍
+    - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+    - 更多详情请参考：[ann](https://arxiv.org/pdf/1908.07678.pdf)
+## 二、安装
+- ### 1、环境依赖
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、安装
+    - ```shell
+      $ hub install ann_resnet50_cityscapes
+      ```
+    -  如您安装时遇到问题，可参考：[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+    | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)   
+## 三、模型API预测
+- ### 1.预测代码示例
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='ann_resnet50_cityscapes')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.如何开始Fine-tune
+    - 在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用ann_resnet50_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下：
+    - 代码步骤
+        - Step1: 定义数据预处理方式
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+        - Step2: 下载数据集并使用
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                - `transforms`: 数据预处理方式。
+                - `mode`: `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+                - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+        - Step3: 加载预训练模型
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='ann_resnet50_cityscapes', num_classes=2, pretrained=None)
+              ```
+                - `name`: 选择预训练模型的名字。
+                - `load_checkpoint`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+        - Step4: 选择优化策略和运行配置
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```
+    - 模型预测
+        - 当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下：
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='ann_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - 参数配置正确后，请执行脚本`python predict.py`。
+            - **Args**
+                * `images`:原始图像路径或BGR格式图片；
+                * `visualization`: 是否可视化，默认为True；
+                * `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+                **NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+## 四、服务部署
+- PaddleHub Serving可以部署一个在线图像分割服务。
+- ### 第一步：启动PaddleHub Serving
+    - 运行启动命令：
+    - ```shell
+      $ hub serving start -m ann_resnet50_cityscapes
+      ```
+    - 这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+    - **NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+- ### 第二步：发送预测请求
+    - 配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        # 发送HTTP请求
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## 五、更新历史
+* 1.0.0
+  初始发布
--- a/modules/image/semantic_segmentation/ann_resnet50_cityscapes/README_en.md
+++ b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/README_en.md
+# ann_resnet50_cityscapes
+|Module Name|ann_resnet50_cityscapes|
+| :--- | :---: | 
+|Category|Image Segmentation|
+|Network|ann_resnet50vd|
+|Dataset|Cityscapes|
+|Fine-tuning supported or not|Yes|
+|Module Size|228MB|
+|Data indicators|-|
+|Latest update date|2022-03-22|
+## I. Basic Information 
+- ### Application Effect Display
+    - Sample results:
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212111-df341f2a-e994-45d7-92d6-2288d666079c.png"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212188-2db40b29-2943-47ce-9ad2-36a6fb85ba3e.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### Module Introduction
+    - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+    - For more information, please refer to: [ann](https://arxiv.org/pdf/1908.07678.pdf)
+## II. Installation
+- ### 1、Environmental Dependence
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、Installation
+    - ```shell
+      $ hub install ann_resnet50_cityscapes
+      ```
+    - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+    | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)  
+## III. Module API Prediction
+- ### 1、Prediction Code Example
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='ann_resnet50_cityscapes')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.Fine-tune and Encapsulation
+    - After completing the installation of PaddlePaddle and PaddleHub, you can start using the ann_resnet50_cityscapes model to fine-tune datasets such as OpticDiscSeg.
+    - Steps:
+         - Step1: Define the data preprocessing method
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+         - Step2: Download the dataset
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                * `transforms`: data preprocessing methods.
+                * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+                * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+        - Step3: Load the pre-trained model
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='ann_resnet50_cityscapes', num_classes=2, pretrained=None)
+              ```
+                - `name`: model name.
+                - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+        - Step4:  Optimization strategy
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```
+    -  Model prediction
+        - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='ann_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - **Args**
+                * `images`: Image path or ndarray data with format [H, W, C], BGR.
+                * `visualization`: Whether to save the recognition results as picture files.
+                * `save_path`: Save path of the result, default is 'seg_result'.
+## IV. Server Deployment
+- PaddleHub Serving can deploy an online service of image segmentation.
+- ### Step 1: Start PaddleHub Serving
+    - Run the startup command:
+        - ```shell
+          $ hub serving start -m ann_resnet50_cityscapes
+          ```
+    - The servitization API is now deployed and the default port number is 8866.
+    - **NOTE:**  If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+- ### Step 2: Send a predictive request
+    - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/ann_resnet50_cityscapes"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## V. Release Note
+- 1.0.0
+  First release
--- a/modules/image/semantic_segmentation/ann_resnet50_cityscapes/layers.py
+++ b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/layers.py
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu':
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+class SeparableConvBNReLU(nn.Layer):
+    """Depthwise Separable Convolution."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(SeparableConvBNReLU, self).__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(
+            in_channels, out_channels, kernel_size=1, groups=1)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+class ConvBN(nn.Layer):
+    """Basic conv bn layer"""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBN, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBNReLU, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+        self._act = act
+        upper_act_names = activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(
+                    act, act_dict.keys()))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
+class ASPPModule(nn.Layer):
+    """
+    Atrous Spatial Pyramid Pooling.
+    Args:
+        aspp_ratios (tuple): The dilation rate using in ASSP module.
+        in_channels (int): The number of input channels.
+        out_channels (int): The number of output channels.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+        use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+        image_pooling (bool, optional): If augmented with image-level features. Default: False
+    """
+    def __init__(self,
+                 aspp_ratios: tuple,
+                 in_channels: int,
+                 out_channels: int,
+                 align_corners: bool,
+                 use_sep_conv: bool = False,
+                 image_pooling: bool = False):
+        super().__init__()
+        self.align_corners = align_corners
+        self.aspp_blocks = nn.LayerList()
+        for ratio in aspp_ratios:
+            if use_sep_conv and ratio > 1:
+                conv_func = SeparableConvBNReLU
+            else:
+                conv_func = ConvBNReLU
+            block = conv_func(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1 if ratio == 1 else 3,
+                dilation=ratio,
+                padding=0 if ratio == 1 else ratio)
+            self.aspp_blocks.append(block)
+        out_size = len(self.aspp_blocks)
+        if image_pooling:
+            self.global_avg_pool = nn.Sequential(
+                nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+                ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+            out_size += 1
+        self.image_pooling = image_pooling
+        self.conv_bn_relu = ConvBNReLU(
+            in_channels=out_channels * out_size,
+            out_channels=out_channels,
+            kernel_size=1)
+        self.dropout = nn.Dropout(p=0.1)  # drop rate
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outputs = []
+        for block in self.aspp_blocks:
+            y = block(x)
+            y = F.interpolate(
+                y,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(y)
+        if self.image_pooling:
+            img_avg = self.global_avg_pool(x)
+            img_avg = F.interpolate(
+                img_avg,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(img_avg)
+        x = paddle.concat(outputs, axis=1)
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        return x
+class AuxLayer(nn.Layer):
+    """
+    The auxiliary layer implementation for auxiliary loss.
+    Args:
+        in_channels (int): The number of input channels.
+        inter_channels (int): The intermediate channels.
+        out_channels (int): The number of output channels, and usually it is num_classes.
+        dropout_prob (float, optional): The drop rate. Default: 0.1.
+    """
+    def __init__(self,
+                 in_channels: int,
+                 inter_channels: int,
+                 out_channels: int,
+                 dropout_prob: float = 0.1,
+                 **kwargs):
+        super().__init__()
+        self.conv_bn_relu = ConvBNReLU(
+            in_channels=in_channels,
+            out_channels=inter_channels,
+            kernel_size=3,
+            padding=1,
+            **kwargs)
+        self.dropout = nn.Dropout(p=dropout_prob)
+        self.conv = nn.Conv2D(
+            in_channels=inter_channels,
+            out_channels=out_channels,
+            kernel_size=1)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        x = self.conv(x)
+        return x
+class Add(nn.Layer):
+    def __init__(self):
+        super().__init__()
+    def forward(self, x: paddle.Tensor, y: paddle.Tensor, name: str = None):
+        return paddle.add(x, y, name)
--- a/modules/image/semantic_segmentation/ann_resnet50_cityscapes/module.py
+++ b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/module.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+from typing import Union, List, Tuple
+import numpy as np
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from ann_resnet50_cityscapes.resnet import ResNet50_vd
+import ann_resnet50_cityscapes.layers as layers
+@moduleinfo(
+    name="ann_resnet50_cityscapes",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="ANNResnet50 is a segmentation model.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class ANN(nn.Layer):
+    """
+    The ANN implementation based on PaddlePaddle.
+    The original article refers to
+    Zhen, Zhu, et al. "Asymmetric Non-local Neural Networks for Semantic Segmentation"
+    (https://arxiv.org/pdf/1908.07678.pdf).
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (tuple, optional): Two values in the tuple indicate the indices of output of backbone.
+        key_value_channels (int, optional): The key and value channels of self-attention map in both AFNB and APNB modules.
+            Default: 256.
+        inter_channels (int, optional): Both input and output channels of APNB modules. Default: 512.
+        psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+        enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+        align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
+            e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+        pretrained (str, optional): The path or url of pretrained model. Default: None.
+    """
+    def __init__(self,
+                 num_classes: int = 19,
+                 backbone_indices: Tuple[int] = (2, 3),
+                 key_value_channels: int = 256,
+                 inter_channels: int = 512,
+                 psp_size: Tuple[int] = (1, 3, 6, 8),
+                 align_corners: bool = False,
+                 pretrained: str = None):
+        super(ANN, self).__init__()
+        self.backbone = ResNet50_vd()
+        backbone_channels = [
+            self.backbone.feat_channels[i] for i in backbone_indices
+        ]
+        self.head = ANNHead(num_classes, backbone_indices, backbone_channels,
+                            key_value_channels, inter_channels, psp_size)
+        self.align_corners = align_corners
+        self.transforms = T.Compose([T.Normalize()])
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+        else:
+            checkpoint = os.path.join(self.directory, 'model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+    def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+        return self.transforms(img)
+    def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+        feat_list = self.backbone(x)
+        logit_list = self.head(feat_list)
+        return [
+            F.interpolate(
+                logit,
+                paddle.shape(x)[2:],
+                mode='bilinear',
+                align_corners=self.align_corners) for logit in logit_list
+        ]
+class ANNHead(nn.Layer):
+    """
+    The ANNHead implementation.
+    It mainly consists of AFNB and APNB modules.
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (tuple): Two values in the tuple indicate the indices of output of backbone.
+            The first index will be taken as low-level features; the second one will be
+            taken as high-level features in AFNB module. Usually backbone consists of four
+            downsampling stage, such as ResNet, and return an output of each stage. If it is (2, 3),
+            it means taking feature map of the third stage and the fourth stage in backbone.
+        backbone_channels (tuple): The same length with "backbone_indices". It indicates the channels of corresponding index.
+        key_value_channels (int): The key and value channels of self-attention map in both AFNB and APNB modules.
+        inter_channels (int): Both input and output channels of APNB modules.
+        psp_size (tuple): The out size of pooled feature maps.
+        enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: False
+    """
+    def __init__(self,
+                 num_classes: int,
+                 backbone_indices: Tuple[int],
+                 backbone_channels: Tuple[int],
+                 key_value_channels: int,
+                 inter_channels: int,
+                 psp_size: Tuple[int],
+                 enable_auxiliary_loss: bool  = False):
+        super().__init__()
+        low_in_channels = backbone_channels[0]
+        high_in_channels = backbone_channels[1]
+        self.fusion = AFNB(
+            low_in_channels=low_in_channels,
+            high_in_channels=high_in_channels,
+            out_channels=high_in_channels,
+            key_channels=key_value_channels,
+            value_channels=key_value_channels,
+            dropout_prob=0.05,
+            repeat_sizes=([1]),
+            psp_size=psp_size)
+        self.context = nn.Sequential(
+            layers.ConvBNReLU(
+                in_channels=high_in_channels,
+                out_channels=inter_channels,
+                kernel_size=3,
+                padding=1),
+            APNB(
+                in_channels=inter_channels,
+                out_channels=inter_channels,
+                key_channels=key_value_channels,
+                value_channels=key_value_channels,
+                dropout_prob=0.05,
+                repeat_sizes=([1]),
+                psp_size=psp_size))
+        self.cls = nn.Conv2D(
+            in_channels=inter_channels, out_channels=num_classes, kernel_size=1)
+        self.auxlayer = layers.AuxLayer(
+            in_channels=low_in_channels,
+            inter_channels=low_in_channels // 2,
+            out_channels=num_classes,
+            dropout_prob=0.05)
+        self.backbone_indices = backbone_indices
+        self.enable_auxiliary_loss = enable_auxiliary_loss
+    def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]:
+        logit_list = []
+        low_level_x = feat_list[self.backbone_indices[0]]
+        high_level_x = feat_list[self.backbone_indices[1]]
+        x = self.fusion(low_level_x, high_level_x)
+        x = self.context(x)
+        logit = self.cls(x)
+        logit_list.append(logit)
+        if self.enable_auxiliary_loss:
+            auxiliary_logit = self.auxlayer(low_level_x)
+            logit_list.append(auxiliary_logit)
+        return logit_list
+class AFNB(nn.Layer):
+    """
+    Asymmetric Fusion Non-local Block.
+    Args:
+        low_in_channels (int): Low-level-feature channels.
+        high_in_channels (int): High-level-feature channels.
+        out_channels (int): Out channels of AFNB module.
+        key_channels (int): The key channels in self-attention block.
+        value_channels (int): The value channels in self-attention block.
+        dropout_prob (float): The dropout rate of output.
+        repeat_sizes (tuple, optional): The number of AFNB modules. Default: ([1]).
+        psp_size (tuple. optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+    """
+    def __init__(self,
+                 low_in_channels: int,
+                 high_in_channels: int,
+                 out_channels: int,
+                 key_channels: int,
+                 value_channels: int,
+                 dropout_prob: float,
+                 repeat_sizes: Tuple[int] = ([1]),
+                 psp_size: Tuple[int] = (1, 3, 6, 8)):
+        super().__init__()
+        self.psp_size = psp_size
+        self.stages = nn.LayerList([
+            SelfAttentionBlock_AFNB(low_in_channels, high_in_channels,
+                                    key_channels, value_channels, out_channels,
+                                    size) for size in repeat_sizes
+        ])
+        self.conv_bn = layers.ConvBN(
+            in_channels=out_channels + high_in_channels,
+            out_channels=out_channels,
+            kernel_size=1)
+        self.dropout = nn.Dropout(p=dropout_prob)
+    def forward(self, low_feats: List[paddle.Tensor], high_feats: List[paddle.Tensor]) -> paddle.Tensor:
+        priors = [stage(low_feats, high_feats) for stage in self.stages]
+        context = priors[0]
+        for i in range(1, len(priors)):
+            context += priors[i]
+        output = self.conv_bn(paddle.concat([context, high_feats], axis=1))
+        output = self.dropout(output)
+        return output
+class APNB(nn.Layer):
+    """
+    Asymmetric Pyramid Non-local Block.
+    Args:
+        in_channels (int): The input channels of APNB module.
+        out_channels (int): Out channels of APNB module.
+        key_channels (int): The key channels in self-attention block.
+        value_channels (int): The value channels in self-attention block.
+        dropout_prob (float): The dropout rate of output.
+        repeat_sizes (tuple, optional): The number of AFNB modules. Default: ([1]).
+        psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+    """
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 key_channels: int,
+                 value_channels: int,
+                 dropout_prob: float,
+                 repeat_sizes: Tuple[int] = ([1]),
+                 psp_size: Tuple[int] = (1, 3, 6, 8)):
+        super().__init__()
+        self.psp_size = psp_size
+        self.stages = nn.LayerList([
+            SelfAttentionBlock_APNB(in_channels, out_channels, key_channels,
+                                    value_channels, size)
+            for size in repeat_sizes
+        ])
+        self.conv_bn = layers.ConvBNReLU(
+            in_channels=in_channels * 2,
+            out_channels=out_channels,
+            kernel_size=1)
+        self.dropout = nn.Dropout(p=dropout_prob)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        priors = [stage(x) for stage in self.stages]
+        context = priors[0]
+        for i in range(1, len(priors)):
+            context += priors[i]
+        output = self.conv_bn(paddle.concat([context, x], axis=1))
+        output = self.dropout(output)
+        return output
+def _pp_module(x: paddle.Tensor, psp_size: List[int]) -> paddle.Tensor:
+    n, c, h, w = x.shape
+    priors = []
+    for size in psp_size:
+        feat = F.adaptive_avg_pool2d(x, size)
+        feat = paddle.reshape(feat, shape=(0, c, -1))
+        priors.append(feat)
+    center = paddle.concat(priors, axis=-1)
+    return center
+class SelfAttentionBlock_AFNB(nn.Layer):
+    """
+    Self-Attention Block for AFNB module.
+    Args:
+        low_in_channels (int): Low-level-feature channels.
+        high_in_channels (int): High-level-feature channels.
+        key_channels (int): The key channels in self-attention block.
+        value_channels (int): The value channels in self-attention block.
+        out_channels (int, optional): Out channels of AFNB module. Default: None.
+        scale (int, optional): Pooling size. Default: 1.
+        psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+    """
+    def __init__(self,
+                 low_in_channels: int,
+                 high_in_channels: int,
+                 key_channels: int,
+                 value_channels: int,
+                 out_channels: int = None,
+                 scale: int = 1,
+                 psp_size: Tuple[int] = (1, 3, 6, 8)):
+        super().__init__()
+        self.scale = scale
+        self.in_channels = low_in_channels
+        self.out_channels = out_channels
+        self.key_channels = key_channels
+        self.value_channels = value_channels
+        if out_channels == None:
+            self.out_channels = high_in_channels
+        self.pool = nn.MaxPool2D(scale)
+        self.f_key = layers.ConvBNReLU(
+            in_channels=low_in_channels,
+            out_channels=key_channels,
+            kernel_size=1)
+        self.f_query = layers.ConvBNReLU(
+            in_channels=high_in_channels,
+            out_channels=key_channels,
+            kernel_size=1)
+        self.f_value = nn.Conv2D(
+            in_channels=low_in_channels,
+            out_channels=value_channels,
+            kernel_size=1)
+        self.W = nn.Conv2D(
+            in_channels=value_channels,
+            out_channels=out_channels,
+            kernel_size=1)
+        self.psp_size = psp_size
+    def forward(self, low_feats: List[paddle.Tensor], high_feats: List[paddle.Tensor]) -> paddle.Tensor:
+        batch_size, _, h, w = high_feats.shape
+        value = self.f_value(low_feats)
+        value = _pp_module(value, self.psp_size)
+        value = paddle.transpose(value, (0, 2, 1))
+        query = self.f_query(high_feats)
+        query = paddle.reshape(query, shape=(0, self.key_channels, -1))
+        query = paddle.transpose(query, perm=(0, 2, 1))
+        key = self.f_key(low_feats)
+        key = _pp_module(key, self.psp_size)
+        sim_map = paddle.matmul(query, key)
+        sim_map = (self.key_channels**-.5) * sim_map
+        sim_map = F.softmax(sim_map, axis=-1)
+        context = paddle.matmul(sim_map, value)
+        context = paddle.transpose(context, perm=(0, 2, 1))
+        hf_shape = paddle.shape(high_feats)
+        context = paddle.reshape(
+            context, shape=[0, self.value_channels, hf_shape[2], hf_shape[3]])
+        context = self.W(context)
+        return context
+class SelfAttentionBlock_APNB(nn.Layer):
+    """
+    Self-Attention Block for APNB module.
+    Args:
+        in_channels (int): The input channels of APNB module.
+        out_channels (int): The out channels of APNB module.
+        key_channels (int): The key channels in self-attention block.
+        value_channels (int): The value channels in self-attention block.
+        scale (int, optional): Pooling size. Default: 1.
+        psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+    """
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 key_channels: int,
+                 value_channels: int,
+                 scale: int = 1,
+                 psp_size: Tuple[int] = (1, 3, 6, 8)):
+        super().__init__()
+        self.scale = scale
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.key_channels = key_channels
+        self.value_channels = value_channels
+        self.pool = nn.MaxPool2D(scale)
+        self.f_key = layers.ConvBNReLU(
+            in_channels=self.in_channels,
+            out_channels=self.key_channels,
+            kernel_size=1)
+        self.f_query = self.f_key
+        self.f_value = nn.Conv2D(
+            in_channels=self.in_channels,
+            out_channels=self.value_channels,
+            kernel_size=1)
+        self.W = nn.Conv2D(
+            in_channels=self.value_channels,
+            out_channels=self.out_channels,
+            kernel_size=1)
+        self.psp_size = psp_size
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        batch_size, _, h, w = x.shape
+        if self.scale > 1:
+            x = self.pool(x)
+        value = self.f_value(x)
+        value = _pp_module(value, self.psp_size)
+        value = paddle.transpose(value, perm=(0, 2, 1))
+        query = self.f_query(x)
+        query = paddle.reshape(query, shape=(0, self.key_channels, -1))
+        query = paddle.transpose(query, perm=(0, 2, 1))
+        key = self.f_key(x)
+        key = _pp_module(key, self.psp_size)
+        sim_map = paddle.matmul(query, key)
+        sim_map = (self.key_channels**-.5) * sim_map
+        sim_map = F.softmax(sim_map, axis=-1)
+        context = paddle.matmul(sim_map, value)
+        context = paddle.transpose(context, perm=(0, 2, 1))
+        x_shape = paddle.shape(x)
+        context = paddle.reshape(
+            context, shape=[0, self.value_channels, x_shape[2], x_shape[3]])
+        context = self.W(context)
+        return context
--- a/modules/image/semantic_segmentation/ann_resnet50_cityscapes/resnet.py
+++ b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/resnet.py
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Union, List, Tuple
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import ann_resnet50_cityscapes.layers as layers
+class ConvBNLayer(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 stride: int = 1,
+                 dilation: int = 1,
+                 groups: int = 1,
+                 is_vd_mode: bool = False,
+                 act: str = None,
+                 data_format: str = 'NCHW'):
+        super(ConvBNLayer, self).__init__()
+        if dilation != 1 and kernel_size != 3:
+            raise RuntimeError("When the dilation isn't 1," \
+                               "the kernel_size should be 3.")
+        self.is_vd_mode = is_vd_mode
+        self._pool2d_avg = nn.AvgPool2D(
+            kernel_size=2,
+            stride=2,
+            padding=0,
+            ceil_mode=True,
+            data_format=data_format)
+        self._conv = nn.Conv2D(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=(kernel_size - 1) // 2 \
+                if dilation == 1 else dilation,
+            dilation=dilation,
+            groups=groups,
+            bias_attr=False,
+            data_format=data_format)
+        self._batch_norm = layers.SyncBatchNorm(
+            out_channels, data_format=data_format)
+        self._act_op = layers.Activation(act=act)
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        if self.is_vd_mode:
+            inputs = self._pool2d_avg(inputs)
+        y = self._conv(inputs)
+        y = self._batch_norm(y)
+        y = self._act_op(y)
+        return y
+class BottleneckBlock(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 shortcut: bool = True,
+                 if_first: bool  = False,
+                 dilation: int = 1,
+                 data_format: str = 'NCHW'):
+        super(BottleneckBlock, self).__init__()
+        self.data_format = data_format
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=1,
+            act='relu',
+            data_format=data_format)
+        self.dilation = dilation
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            act='relu',
+            dilation=dilation,
+            data_format=data_format)
+        self.conv2 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels * 4,
+            kernel_size=1,
+            act=None,
+            data_format=data_format)
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels * 4,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                data_format=data_format)
+        self.shortcut = shortcut
+        # NOTE: Use the wrap layer for quantization training
+        self.add = layers.Add()
+        self.relu = layers.Activation(act="relu")
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        conv1 = self.conv1(y)
+        conv2 = self.conv2(conv1)
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+        y = self.add(short, conv2)
+        y = self.relu(y)
+        return y
+class BasicBlock(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 dilation: int = 1,
+                 shortcut: bool = True,
+                 if_first: bool = False,
+                 data_format: str = 'NCHW'):
+        super(BasicBlock, self).__init__()
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            dilation=dilation,
+            act='relu',
+            data_format=data_format)
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            dilation=dilation,
+            act=None,
+            data_format=data_format)
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                data_format=data_format)
+        self.shortcut = shortcut
+        self.dilation = dilation
+        self.data_format = data_format
+        self.add = layers.Add()
+        self.relu = layers.Activation(act="relu")
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        conv1 = self.conv1(y)
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+        y = self.add(short, conv1)
+        y = self.relu(y)
+        return y
+class ResNet_vd(nn.Layer):
+    """
+    The ResNet_vd implementation based on PaddlePaddle.
+    The original article refers to Jingdong
+    Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
+    (https://arxiv.org/pdf/1812.01187.pdf).
+    Args:
+        layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50.
+        output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8.
+        multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1).
+        pretrained (str, optional): The path of pretrained model.
+    """
+    def __init__(self,
+                 layers: int = 50,
+                 output_stride: int = 8,
+                 multi_grid: Tuple[int] = (1, 1, 1),
+                 pretrained: str = None,
+                 data_format: str = 'NCHW'):
+        super(ResNet_vd, self).__init__()
+        self.data_format = data_format
+        self.conv1_logit = None  # for gscnn shape stream
+        self.layers = layers
+        supported_layers = [18, 34, 50, 101, 152, 200]
+        assert layers in supported_layers, \
+            "supported layers are {} but input layer is {}".format(
+                supported_layers, layers)
+        if layers == 18:
+            depth = [2, 2, 2, 2]
+        elif layers == 34 or layers == 50:
+            depth = [3, 4, 6, 3]
+        elif layers == 101:
+            depth = [3, 4, 23, 3]
+        elif layers == 152:
+            depth = [3, 8, 36, 3]
+        elif layers == 200:
+            depth = [3, 12, 48, 3]
+        num_channels = [64, 256, 512, 1024
+                        ] if layers >= 50 else [64, 64, 128, 256]
+        num_filters = [64, 128, 256, 512]
+        # for channels of four returned stages
+        self.feat_channels = [c * 4 for c in num_filters
+                              ] if layers >= 50 else num_filters
+        dilation_dict = None
+        if output_stride == 8:
+            dilation_dict = {2: 2, 3: 4}
+        elif output_stride == 16:
+            dilation_dict = {3: 2}
+        self.conv1_1 = ConvBNLayer(
+            in_channels=3,
+            out_channels=32,
+            kernel_size=3,
+            stride=2,
+            act='relu',
+            data_format=data_format)
+        self.conv1_2 = ConvBNLayer(
+            in_channels=32,
+            out_channels=32,
+            kernel_size=3,
+            stride=1,
+            act='relu',
+            data_format=data_format)
+        self.conv1_3 = ConvBNLayer(
+            in_channels=32,
+            out_channels=64,
+            kernel_size=3,
+            stride=1,
+            act='relu',
+            data_format=data_format)
+        self.pool2d_max = nn.MaxPool2D(
+            kernel_size=3, stride=2, padding=1, data_format=data_format)
+        # self.block_list = []
+        self.stage_list = []
+        if layers >= 50:
+            for block in range(len(depth)):
+                shortcut = False
+                block_list = []
+                for i in range(depth[block]):
+                    if layers in [101, 152] and block == 2:
+                        if i == 0:
+                            conv_name = "res" + str(block + 2) + "a"
+                        else:
+                            conv_name = "res" + str(block + 2) + "b" + str(i)
+                    else:
+                        conv_name = "res" + str(block + 2) + chr(97 + i)
+                    ###############################################################################
+                    # Add dilation rate for some segmentation tasks, if dilation_dict is not None.
+                    dilation_rate = dilation_dict[
+                        block] if dilation_dict and block in dilation_dict else 1
+                    # Actually block here is 'stage', and i is 'block' in 'stage'
+                    # At the stage 4, expand the the dilation_rate if given multi_grid
+                    if block == 3:
+                        dilation_rate = dilation_rate * multi_grid[i]
+                    ###############################################################################
+                    bottleneck_block = self.add_sublayer(
+                        'bb_%d_%d' % (block, i),
+                        BottleneckBlock(
+                            in_channels=num_channels[block]
+                            if i == 0 else num_filters[block] * 4,
+                            out_channels=num_filters[block],
+                            stride=2 if i == 0 and block != 0
+                                        and dilation_rate == 1 else 1,
+                            shortcut=shortcut,
+                            if_first=block == i == 0,
+                            dilation=dilation_rate,
+                            data_format=data_format))
+                    block_list.append(bottleneck_block)
+                    shortcut = True
+                self.stage_list.append(block_list)
+        else:
+            for block in range(len(depth)):
+                shortcut = False
+                block_list = []
+                for i in range(depth[block]):
+                    dilation_rate = dilation_dict[block] \
+                        if dilation_dict and block in dilation_dict else 1
+                    if block == 3:
+                        dilation_rate = dilation_rate * multi_grid[i]
+                    basic_block = self.add_sublayer(
+                        'bb_%d_%d' % (block, i),
+                        BasicBlock(
+                            in_channels=num_channels[block]
+                            if i == 0 else num_filters[block],
+                            out_channels=num_filters[block],
+                            stride=2 if i == 0 and block != 0 \
+                                        and dilation_rate == 1 else 1,
+                            dilation=dilation_rate,
+                            shortcut=shortcut,
+                            if_first=block == i == 0,
+                            data_format=data_format))
+                    block_list.append(basic_block)
+                    shortcut = True
+                self.stage_list.append(block_list)
+        self.pretrained = pretrained
+    def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
+        y = self.conv1_1(inputs)
+        y = self.conv1_2(y)
+        y = self.conv1_3(y)
+        self.conv1_logit = y.clone()
+        y = self.pool2d_max(y)
+        # A feature list saves the output feature map of each stage.
+        feat_list = []
+        for stage in self.stage_list:
+            for block in stage:
+                y = block(y)
+            feat_list.append(y)
+        return feat_list
+def ResNet50_vd(**args):
+    model = ResNet_vd(layers=50, **args)
+    return model
\ No newline at end of file
--- a/modules/image/semantic_segmentation/ann_resnet50_voc/README.md
+++ b/modules/image/semantic_segmentation/ann_resnet50_voc/README.md
+# ann_resnet50_voc
+|模型名称|ann_resnet50_voc|
+| :--- | :---: | 
+|类别|图像-图像分割|
+|网络|ann_resnet50vd|
+|数据集|PascalVOC2012|
+|是否支持Fine-tuning|是|
+|模型大小|228MB|
+|指标|-|
+|最新更新日期|2022-03-21|
+## 一、模型基本信息
+  - 样例结果示例：
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212097-443a5a65-2f2e-4126-9c07-d7c3c220e55f.jpg"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212375-52e123af-4699-4c25-8f50-4240bbb714b4.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### 模型介绍
+    - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+    - 更多详情请参考：[ann](https://arxiv.org/pdf/1908.07678.pdf)
+## 二、安装
+- ### 1、环境依赖
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、安装
+    - ```shell
+      $ hub install ann_resnet50_voc
+      ```
+    -  如您安装时遇到问题，可参考：[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+    | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)   
+## 三、模型API预测
+- ### 1.预测代码示例
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='ann_resnet50_voc')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.如何开始Fine-tune
+    - 在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用ann_resnet50_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下：
+    - 代码步骤
+        - Step1: 定义数据预处理方式
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+        - Step2: 下载数据集并使用
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                - `transforms`: 数据预处理方式。
+                - `mode`: `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+                - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+        - Step3: 加载预训练模型
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='ann_resnet50_voc', num_classes=2, pretrained=None)
+              ```
+                - `name`: 选择预训练模型的名字。
+                - `load_checkpoint`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+        - Step4: 选择优化策略和运行配置
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```
+    - 模型预测
+        - 当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下：
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='ann_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - 参数配置正确后，请执行脚本`python predict.py`。
+            - **Args**
+                * `images`:原始图像路径或BGR格式图片；
+                * `visualization`: 是否可视化，默认为True；
+                * `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+                **NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+## 四、服务部署
+- PaddleHub Serving可以部署一个在线图像分割服务。
+- ### 第一步：启动PaddleHub Serving
+    - 运行启动命令：
+    - ```shell
+      $ hub serving start -m ann_resnet50_voc
+      ```
+    - 这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+    - **NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+- ### 第二步：发送预测请求
+    - 配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        # 发送HTTP请求
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## 五、更新历史
+* 1.0.0
+  初始发布
--- a/modules/image/semantic_segmentation/ann_resnet50_voc/README_en.md
+++ b/modules/image/semantic_segmentation/ann_resnet50_voc/README_en.md
+# ann_resnet50_voc
+|Module Name|ann_resnet50_voc|
+| :--- | :---: | 
+|Category|Image Segmentation|
+|Network|ann_resnet50vd|
+|Dataset|PascalVOC2012|
+|Fine-tuning supported or not|Yes|
+|Module Size|228MB|
+|Data indicators|-|
+|Latest update date|2022-03-22|
+## I. Basic Information 
+- ### Application Effect Display
+    - Sample results:
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212097-443a5a65-2f2e-4126-9c07-d7c3c220e55f.jpg"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212375-52e123af-4699-4c25-8f50-4240bbb714b4.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### Module Introduction
+    - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+    - For more information, please refer to: [ann](https://arxiv.org/pdf/1908.07678.pdf)
+## II. Installation
+- ### 1、Environmental Dependence
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、Installation
+    - ```shell
+      $ hub install ann_resnet50_voc
+      ```
+    - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+    | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)  
+## III. Module API Prediction
+- ### 1、Prediction Code Example
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='ann_resnet50_voc')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.Fine-tune and Encapsulation
+    - After completing the installation of PaddlePaddle and PaddleHub, you can start using the ann_resnet50_voc model to fine-tune datasets such as OpticDiscSeg.
+    - Steps:
+         - Step1: Define the data preprocessing method
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+         - Step2: Download the dataset
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                * `transforms`: data preprocessing methods.
+                * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+                * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+        - Step3: Load the pre-trained model
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='ann_resnet50_voc', num_classes=2, pretrained=None)
+              ```
+                - `name`: model name.
+                - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+        - Step4:  Optimization strategy
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```            
+    -  Model prediction
+        - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='ann_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - **Args**
+                * `images`: Image path or ndarray data with format [H, W, C], BGR.
+                * `visualization`: Whether to save the recognition results as picture files.
+                * `save_path`: Save path of the result, default is 'seg_result'.
+## IV. Server Deployment
+- PaddleHub Serving can deploy an online service of image segmentation.
+- ### Step 1: Start PaddleHub Serving
+    - Run the startup command:
+        - ```shell
+          $ hub serving start -m ann_resnet50_voc
+          ```
+    - The servitization API is now deployed and the default port number is 8866.
+    - **NOTE:**  If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+- ### Step 2: Send a predictive request
+    - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/ann_resnet50_voc"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## V. Release Note
+- 1.0.0
+  First release
--- a/modules/image/semantic_segmentation/ann_resnet50_voc/layers.py
+++ b/modules/image/semantic_segmentation/ann_resnet50_voc/layers.py
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Union, List, Tuple
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu':
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+class SeparableConvBNReLU(nn.Layer):
+    """Depthwise Separable Convolution."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(SeparableConvBNReLU, self).__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(
+            in_channels, out_channels, kernel_size=1, groups=1)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+class ConvBN(nn.Layer):
+    """Basic conv bn layer"""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBN, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBNReLU, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+        self._act = act
+        upper_act_names = activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(
+                    act, act_dict.keys()))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
+class ASPPModule(nn.Layer):
+    """
+    Atrous Spatial Pyramid Pooling.
+    Args:
+        aspp_ratios (tuple): The dilation rate using in ASSP module.
+        in_channels (int): The number of input channels.
+        out_channels (int): The number of output channels.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+        use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+        image_pooling (bool, optional): If augmented with image-level features. Default: False
+    """
+    def __init__(self,
+                 aspp_ratios: tuple,
+                 in_channels: int,
+                 out_channels: int,
+                 align_corners: bool,
+                 use_sep_conv: bool = False,
+                 image_pooling: bool = False):
+        super().__init__()
+        self.align_corners = align_corners
+        self.aspp_blocks = nn.LayerList()
+        for ratio in aspp_ratios:
+            if use_sep_conv and ratio > 1:
+                conv_func = SeparableConvBNReLU
+            else:
+                conv_func = ConvBNReLU
+            block = conv_func(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1 if ratio == 1 else 3,
+                dilation=ratio,
+                padding=0 if ratio == 1 else ratio)
+            self.aspp_blocks.append(block)
+        out_size = len(self.aspp_blocks)
+        if image_pooling:
+            self.global_avg_pool = nn.Sequential(
+                nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+                ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+            out_size += 1
+        self.image_pooling = image_pooling
+        self.conv_bn_relu = ConvBNReLU(
+            in_channels=out_channels * out_size,
+            out_channels=out_channels,
+            kernel_size=1)
+        self.dropout = nn.Dropout(p=0.1)  # drop rate
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outputs = []
+        for block in self.aspp_blocks:
+            y = block(x)
+            y = F.interpolate(
+                y,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(y)
+        if self.image_pooling:
+            img_avg = self.global_avg_pool(x)
+            img_avg = F.interpolate(
+                img_avg,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(img_avg)
+        x = paddle.concat(outputs, axis=1)
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        return x
+class AuxLayer(nn.Layer):
+    """
+    The auxiliary layer implementation for auxiliary loss.
+    Args:
+        in_channels (int): The number of input channels.
+        inter_channels (int): The intermediate channels.
+        out_channels (int): The number of output channels, and usually it is num_classes.
+        dropout_prob (float, optional): The drop rate. Default: 0.1.
+    """
+    def __init__(self,
+                 in_channels: int,
+                 inter_channels: int,
+                 out_channels: int,
+                 dropout_prob: float = 0.1,
+                 **kwargs):
+        super().__init__()
+        self.conv_bn_relu = ConvBNReLU(
+            in_channels=in_channels,
+            out_channels=inter_channels,
+            kernel_size=3,
+            padding=1,
+            **kwargs)
+        self.dropout = nn.Dropout(p=dropout_prob)
+        self.conv = nn.Conv2D(
+            in_channels=inter_channels,
+            out_channels=out_channels,
+            kernel_size=1)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        x = self.conv(x)
+        return x
+class Add(nn.Layer):
+    def __init__(self):
+        super().__init__()
+    def forward(self, x: paddle.Tensor, y: paddle.Tensor, name: str = None) -> paddle.Tensor:
+        return paddle.add(x, y, name)
--- a/modules/image/semantic_segmentation/ann_resnet50_voc/module.py
+++ b/modules/image/semantic_segmentation/ann_resnet50_voc/module.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+from typing import Union, List, Tuple
+import numpy as np
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from ann_resnet50_voc.resnet import ResNet50_vd
+import ann_resnet50_voc.layers as layers
+@moduleinfo(
+    name="ann_resnet50_voc",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="ANNResnet50 is a segmentation model.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class ANN(nn.Layer):
+    """
+    The ANN implementation based on PaddlePaddle.
+    The original article refers to
+    Zhen, Zhu, et al. "Asymmetric Non-local Neural Networks for Semantic Segmentation"
+    (https://arxiv.org/pdf/1908.07678.pdf).
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (tuple, optional): Two values in the tuple indicate the indices of output of backbone.
+        key_value_channels (int, optional): The key and value channels of self-attention map in both AFNB and APNB modules.
+            Default: 256.
+        inter_channels (int, optional): Both input and output channels of APNB modules. Default: 512.
+        psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+        enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+        align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
+            e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+        pretrained (str, optional): The path or url of pretrained model. Default: None.
+    """
+    def __init__(self,
+                 num_classes: int = 21,
+                 backbone_indices: Tuple[int] = (2, 3),
+                 key_value_channels: int = 256,
+                 inter_channels: int = 512,
+                 psp_size: Tuple[int] = (1, 3, 6, 8),
+                 align_corners: bool = False,
+                 pretrained: str = None):
+        super(ANN, self).__init__()
+        self.backbone = ResNet50_vd()
+        backbone_channels = [
+            self.backbone.feat_channels[i] for i in backbone_indices
+        ]
+        self.head = ANNHead(num_classes, backbone_indices, backbone_channels,
+                            key_value_channels, inter_channels, psp_size)
+        self.align_corners = align_corners
+        self.transforms = T.Compose([T.Normalize()])
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+        else:
+            checkpoint = os.path.join(self.directory, 'model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+    def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+        return self.transforms(img)
+    def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+        feat_list = self.backbone(x)
+        logit_list = self.head(feat_list)
+        return [
+            F.interpolate(
+                logit,
+                paddle.shape(x)[2:],
+                mode='bilinear',
+                align_corners=self.align_corners) for logit in logit_list
+        ]
+class ANNHead(nn.Layer):
+    """
+    The ANNHead implementation.
+    It mainly consists of AFNB and APNB modules.
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (tuple): Two values in the tuple indicate the indices of output of backbone.
+            The first index will be taken as low-level features; the second one will be
+            taken as high-level features in AFNB module. Usually backbone consists of four
+            downsampling stage, such as ResNet, and return an output of each stage. If it is (2, 3),
+            it means taking feature map of the third stage and the fourth stage in backbone.
+        backbone_channels (tuple): The same length with "backbone_indices". It indicates the channels of corresponding index.
+        key_value_channels (int): The key and value channels of self-attention map in both AFNB and APNB modules.
+        inter_channels (int): Both input and output channels of APNB modules.
+        psp_size (tuple): The out size of pooled feature maps.
+        enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: False
+    """
+    def __init__(self,
+                 num_classes: int,
+                 backbone_indices: Tuple[int],
+                 backbone_channels: Tuple[int],
+                 key_value_channels: int,
+                 inter_channels: int,
+                 psp_size: Tuple[int],
+                 enable_auxiliary_loss: bool = False):
+        super().__init__()
+        low_in_channels = backbone_channels[0]
+        high_in_channels = backbone_channels[1]
+        self.fusion = AFNB(
+            low_in_channels=low_in_channels,
+            high_in_channels=high_in_channels,
+            out_channels=high_in_channels,
+            key_channels=key_value_channels,
+            value_channels=key_value_channels,
+            dropout_prob=0.05,
+            repeat_sizes=([1]),
+            psp_size=psp_size)
+        self.context = nn.Sequential(
+            layers.ConvBNReLU(
+                in_channels=high_in_channels,
+                out_channels=inter_channels,
+                kernel_size=3,
+                padding=1),
+            APNB(
+                in_channels=inter_channels,
+                out_channels=inter_channels,
+                key_channels=key_value_channels,
+                value_channels=key_value_channels,
+                dropout_prob=0.05,
+                repeat_sizes=([1]),
+                psp_size=psp_size))
+        self.cls = nn.Conv2D(
+            in_channels=inter_channels, out_channels=num_classes, kernel_size=1)
+        self.auxlayer = layers.AuxLayer(
+            in_channels=low_in_channels,
+            inter_channels=low_in_channels // 2,
+            out_channels=num_classes,
+            dropout_prob=0.05)
+        self.backbone_indices = backbone_indices
+        self.enable_auxiliary_loss = enable_auxiliary_loss
+    def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]:
+        logit_list = []
+        low_level_x = feat_list[self.backbone_indices[0]]
+        high_level_x = feat_list[self.backbone_indices[1]]
+        x = self.fusion(low_level_x, high_level_x)
+        x = self.context(x)
+        logit = self.cls(x)
+        logit_list.append(logit)
+        if self.enable_auxiliary_loss:
+            auxiliary_logit = self.auxlayer(low_level_x)
+            logit_list.append(auxiliary_logit)
+        return logit_list
+class AFNB(nn.Layer):
+    """
+    Asymmetric Fusion Non-local Block.
+    Args:
+        low_in_channels (int): Low-level-feature channels.
+        high_in_channels (int): High-level-feature channels.
+        out_channels (int): Out channels of AFNB module.
+        key_channels (int): The key channels in self-attention block.
+        value_channels (int): The value channels in self-attention block.
+        dropout_prob (float): The dropout rate of output.
+        repeat_sizes (tuple, optional): The number of AFNB modules. Default: ([1]).
+        psp_size (tuple. optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+    """
+    def __init__(self,
+                 low_in_channels: int,
+                 high_in_channels: int,
+                 out_channels: int,
+                 key_channels: int,
+                 value_channels: int,
+                 dropout_prob: float,
+                 repeat_sizes: Tuple[int] = ([1]),
+                 psp_size: Tuple[int] = (1, 3, 6, 8)):
+        super().__init__()
+        self.psp_size = psp_size
+        self.stages = nn.LayerList([
+            SelfAttentionBlock_AFNB(low_in_channels, high_in_channels,
+                                    key_channels, value_channels, out_channels,
+                                    size) for size in repeat_sizes
+        ])
+        self.conv_bn = layers.ConvBN(
+            in_channels=out_channels + high_in_channels,
+            out_channels=out_channels,
+            kernel_size=1)
+        self.dropout = nn.Dropout(p=dropout_prob)
+    def forward(self, low_feats: List[paddle.Tensor], high_feats: List[paddle.Tensor]) -> paddle.Tensor:
+        priors = [stage(low_feats, high_feats) for stage in self.stages]
+        context = priors[0]
+        for i in range(1, len(priors)):
+            context += priors[i]
+        output = self.conv_bn(paddle.concat([context, high_feats], axis=1))
+        output = self.dropout(output)
+        return output
+class APNB(nn.Layer):
+    """
+    Asymmetric Pyramid Non-local Block.
+    Args:
+        in_channels (int): The input channels of APNB module.
+        out_channels (int): Out channels of APNB module.
+        key_channels (int): The key channels in self-attention block.
+        value_channels (int): The value channels in self-attention block.
+        dropout_prob (float): The dropout rate of output.
+        repeat_sizes (tuple, optional): The number of AFNB modules. Default: ([1]).
+        psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+    """
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 key_channels: int,
+                 value_channels: int,
+                 dropout_prob: float,
+                 repeat_sizes: Tuple[int] = ([1]),
+                 psp_size: Tuple[int] = (1, 3, 6, 8)):
+        super().__init__()
+        self.psp_size = psp_size
+        self.stages = nn.LayerList([
+            SelfAttentionBlock_APNB(in_channels, out_channels, key_channels,
+                                    value_channels, size)
+            for size in repeat_sizes
+        ])
+        self.conv_bn = layers.ConvBNReLU(
+            in_channels=in_channels * 2,
+            out_channels=out_channels,
+            kernel_size=1)
+        self.dropout = nn.Dropout(p=dropout_prob)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        priors = [stage(x) for stage in self.stages]
+        context = priors[0]
+        for i in range(1, len(priors)):
+            context += priors[i]
+        output = self.conv_bn(paddle.concat([context, x], axis=1))
+        output = self.dropout(output)
+        return output
+def _pp_module(x: paddle.Tensor, psp_size: List[int]) -> paddle.Tensor:
+    n, c, h, w = x.shape
+    priors = []
+    for size in psp_size:
+        feat = F.adaptive_avg_pool2d(x, size)
+        feat = paddle.reshape(feat, shape=(0, c, -1))
+        priors.append(feat)
+    center = paddle.concat(priors, axis=-1)
+    return center
+class SelfAttentionBlock_AFNB(nn.Layer):
+    """
+    Self-Attention Block for AFNB module.
+    Args:
+        low_in_channels (int): Low-level-feature channels.
+        high_in_channels (int): High-level-feature channels.
+        key_channels (int): The key channels in self-attention block.
+        value_channels (int): The value channels in self-attention block.
+        out_channels (int, optional): Out channels of AFNB module. Default: None.
+        scale (int, optional): Pooling size. Default: 1.
+        psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+    """
+    def __init__(self,
+                 low_in_channels: int,
+                 high_in_channels: int,
+                 key_channels: int,
+                 value_channels: int,
+                 out_channels: int = None,
+                 scale: int = 1,
+                 psp_size: Tuple[int] = (1, 3, 6, 8)):
+        super().__init__()
+        self.scale = scale
+        self.in_channels = low_in_channels
+        self.out_channels = out_channels
+        self.key_channels = key_channels
+        self.value_channels = value_channels
+        if out_channels == None:
+            self.out_channels = high_in_channels
+        self.pool = nn.MaxPool2D(scale)
+        self.f_key = layers.ConvBNReLU(
+            in_channels=low_in_channels,
+            out_channels=key_channels,
+            kernel_size=1)
+        self.f_query = layers.ConvBNReLU(
+            in_channels=high_in_channels,
+            out_channels=key_channels,
+            kernel_size=1)
+        self.f_value = nn.Conv2D(
+            in_channels=low_in_channels,
+            out_channels=value_channels,
+            kernel_size=1)
+        self.W = nn.Conv2D(
+            in_channels=value_channels,
+            out_channels=out_channels,
+            kernel_size=1)
+        self.psp_size = psp_size
+    def forward(self, low_feats: List[paddle.Tensor], high_feats: List[paddle.Tensor]) -> paddle.Tensor:
+        batch_size, _, h, w = high_feats.shape
+        value = self.f_value(low_feats)
+        value = _pp_module(value, self.psp_size)
+        value = paddle.transpose(value, (0, 2, 1))
+        query = self.f_query(high_feats)
+        query = paddle.reshape(query, shape=(0, self.key_channels, -1))
+        query = paddle.transpose(query, perm=(0, 2, 1))
+        key = self.f_key(low_feats)
+        key = _pp_module(key, self.psp_size)
+        sim_map = paddle.matmul(query, key)
+        sim_map = (self.key_channels**-.5) * sim_map
+        sim_map = F.softmax(sim_map, axis=-1)
+        context = paddle.matmul(sim_map, value)
+        context = paddle.transpose(context, perm=(0, 2, 1))
+        hf_shape = paddle.shape(high_feats)
+        context = paddle.reshape(
+            context, shape=[0, self.value_channels, hf_shape[2], hf_shape[3]])
+        context = self.W(context)
+        return context
+class SelfAttentionBlock_APNB(nn.Layer):
+    """
+    Self-Attention Block for APNB module.
+    Args:
+        in_channels (int): The input channels of APNB module.
+        out_channels (int): The out channels of APNB module.
+        key_channels (int): The key channels in self-attention block.
+        value_channels (int): The value channels in self-attention block.
+        scale (int, optional): Pooling size. Default: 1.
+        psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+    """
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 key_channels: int,
+                 value_channels: int,
+                 scale: int = 1,
+                 psp_size: Tuple[int] = (1, 3, 6, 8)):
+        super().__init__()
+        self.scale = scale
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.key_channels = key_channels
+        self.value_channels = value_channels
+        self.pool = nn.MaxPool2D(scale)
+        self.f_key = layers.ConvBNReLU(
+            in_channels=self.in_channels,
+            out_channels=self.key_channels,
+            kernel_size=1)
+        self.f_query = self.f_key
+        self.f_value = nn.Conv2D(
+            in_channels=self.in_channels,
+            out_channels=self.value_channels,
+            kernel_size=1)
+        self.W = nn.Conv2D(
+            in_channels=self.value_channels,
+            out_channels=self.out_channels,
+            kernel_size=1)
+        self.psp_size = psp_size
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        batch_size, _, h, w = x.shape
+        if self.scale > 1:
+            x = self.pool(x)
+        value = self.f_value(x)
+        value = _pp_module(value, self.psp_size)
+        value = paddle.transpose(value, perm=(0, 2, 1))
+        query = self.f_query(x)
+        query = paddle.reshape(query, shape=(0, self.key_channels, -1))
+        query = paddle.transpose(query, perm=(0, 2, 1))
+        key = self.f_key(x)
+        key = _pp_module(key, self.psp_size)
+        sim_map = paddle.matmul(query, key)
+        sim_map = (self.key_channels**-.5) * sim_map
+        sim_map = F.softmax(sim_map, axis=-1)
+        context = paddle.matmul(sim_map, value)
+        context = paddle.transpose(context, perm=(0, 2, 1))
+        x_shape = paddle.shape(x)
+        context = paddle.reshape(
+            context, shape=[0, self.value_channels, x_shape[2], x_shape[3]])
+        context = self.W(context)
+        return context
--- a/modules/image/semantic_segmentation/ann_resnet50_voc/resnet.py
+++ b/modules/image/semantic_segmentation/ann_resnet50_voc/resnet.py
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Union, List, Tuple
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import ann_resnet50_voc.layers as layers
+class ConvBNLayer(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 stride: int = 1,
+                 dilation: int = 1,
+                 groups: int  = 1,
+                 is_vd_mode: bool = False,
+                 act: str = None,
+                 data_format: str = 'NCHW'):
+        super(ConvBNLayer, self).__init__()
+        if dilation != 1 and kernel_size != 3:
+            raise RuntimeError("When the dilation isn't 1," \
+                               "the kernel_size should be 3.")
+        self.is_vd_mode = is_vd_mode
+        self._pool2d_avg = nn.AvgPool2D(
+            kernel_size=2,
+            stride=2,
+            padding=0,
+            ceil_mode=True,
+            data_format=data_format)
+        self._conv = nn.Conv2D(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=(kernel_size - 1) // 2 \
+                if dilation == 1 else dilation,
+            dilation=dilation,
+            groups=groups,
+            bias_attr=False,
+            data_format=data_format)
+        self._batch_norm = layers.SyncBatchNorm(
+            out_channels, data_format=data_format)
+        self._act_op = layers.Activation(act=act)
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        if self.is_vd_mode:
+            inputs = self._pool2d_avg(inputs)
+        y = self._conv(inputs)
+        y = self._batch_norm(y)
+        y = self._act_op(y)
+        return y
+class BottleneckBlock(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 shortcut: bool = True,
+                 if_first: bool = False,
+                 dilation: int = 1,
+                 data_format: str = 'NCHW'):
+        super(BottleneckBlock, self).__init__()
+        self.data_format = data_format
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=1,
+            act='relu',
+            data_format=data_format)
+        self.dilation = dilation
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            act='relu',
+            dilation=dilation,
+            data_format=data_format)
+        self.conv2 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels * 4,
+            kernel_size=1,
+            act=None,
+            data_format=data_format)
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels * 4,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                data_format=data_format)
+        self.shortcut = shortcut
+        # NOTE: Use the wrap layer for quantization training
+        self.add = layers.Add()
+        self.relu = layers.Activation(act="relu")
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        conv1 = self.conv1(y)
+        conv2 = self.conv2(conv1)
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+        y = self.add(short, conv2)
+        y = self.relu(y)
+        return y
+class BasicBlock(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 dilation: int = 1,
+                 shortcut: bool = True,
+                 if_first: bool = False,
+                 data_format: str = 'NCHW'):
+        super(BasicBlock, self).__init__()
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            dilation=dilation,
+            act='relu',
+            data_format=data_format)
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            dilation=dilation,
+            act=None,
+            data_format=data_format)
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                data_format=data_format)
+        self.shortcut = shortcut
+        self.dilation = dilation
+        self.data_format = data_format
+        self.add = layers.Add()
+        self.relu = layers.Activation(act="relu")
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        conv1 = self.conv1(y)
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+        y = self.add(short, conv1)
+        y = self.relu(y)
+        return y
+class ResNet_vd(nn.Layer):
+    """
+    The ResNet_vd implementation based on PaddlePaddle.
+    The original article refers to Jingdong
+    Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
+    (https://arxiv.org/pdf/1812.01187.pdf).
+    Args:
+        layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50.
+        output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8.
+        multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1).
+        pretrained (str, optional): The path of pretrained model.
+    """
+    def __init__(self,
+                 layers: int = 50,
+                 output_stride: int = 8,
+                 multi_grid: Tuple[int]=(1, 1, 1),
+                 pretrained: str = None,
+                 data_format: str = 'NCHW'):
+        super(ResNet_vd, self).__init__()
+        self.data_format = data_format
+        self.conv1_logit = None  # for gscnn shape stream
+        self.layers = layers
+        supported_layers = [18, 34, 50, 101, 152, 200]
+        assert layers in supported_layers, \
+            "supported layers are {} but input layer is {}".format(
+                supported_layers, layers)
+        if layers == 18:
+            depth = [2, 2, 2, 2]
+        elif layers == 34 or layers == 50:
+            depth = [3, 4, 6, 3]
+        elif layers == 101:
+            depth = [3, 4, 23, 3]
+        elif layers == 152:
+            depth = [3, 8, 36, 3]
+        elif layers == 200:
+            depth = [3, 12, 48, 3]
+        num_channels = [64, 256, 512, 1024
+                        ] if layers >= 50 else [64, 64, 128, 256]
+        num_filters = [64, 128, 256, 512]
+        # for channels of four returned stages
+        self.feat_channels = [c * 4 for c in num_filters
+                              ] if layers >= 50 else num_filters
+        dilation_dict = None
+        if output_stride == 8:
+            dilation_dict = {2: 2, 3: 4}
+        elif output_stride == 16:
+            dilation_dict = {3: 2}
+        self.conv1_1 = ConvBNLayer(
+            in_channels=3,
+            out_channels=32,
+            kernel_size=3,
+            stride=2,
+            act='relu',
+            data_format=data_format)
+        self.conv1_2 = ConvBNLayer(
+            in_channels=32,
+            out_channels=32,
+            kernel_size=3,
+            stride=1,
+            act='relu',
+            data_format=data_format)
+        self.conv1_3 = ConvBNLayer(
+            in_channels=32,
+            out_channels=64,
+            kernel_size=3,
+            stride=1,
+            act='relu',
+            data_format=data_format)
+        self.pool2d_max = nn.MaxPool2D(
+            kernel_size=3, stride=2, padding=1, data_format=data_format)
+        # self.block_list = []
+        self.stage_list = []
+        if layers >= 50:
+            for block in range(len(depth)):
+                shortcut = False
+                block_list = []
+                for i in range(depth[block]):
+                    if layers in [101, 152] and block == 2:
+                        if i == 0:
+                            conv_name = "res" + str(block + 2) + "a"
+                        else:
+                            conv_name = "res" + str(block + 2) + "b" + str(i)
+                    else:
+                        conv_name = "res" + str(block + 2) + chr(97 + i)
+                    ###############################################################################
+                    # Add dilation rate for some segmentation tasks, if dilation_dict is not None.
+                    dilation_rate = dilation_dict[
+                        block] if dilation_dict and block in dilation_dict else 1
+                    # Actually block here is 'stage', and i is 'block' in 'stage'
+                    # At the stage 4, expand the the dilation_rate if given multi_grid
+                    if block == 3:
+                        dilation_rate = dilation_rate * multi_grid[i]
+                    ###############################################################################
+                    bottleneck_block = self.add_sublayer(
+                        'bb_%d_%d' % (block, i),
+                        BottleneckBlock(
+                            in_channels=num_channels[block]
+                            if i == 0 else num_filters[block] * 4,
+                            out_channels=num_filters[block],
+                            stride=2 if i == 0 and block != 0
+                                        and dilation_rate == 1 else 1,
+                            shortcut=shortcut,
+                            if_first=block == i == 0,
+                            dilation=dilation_rate,
+                            data_format=data_format))
+                    block_list.append(bottleneck_block)
+                    shortcut = True
+                self.stage_list.append(block_list)
+        else:
+            for block in range(len(depth)):
+                shortcut = False
+                block_list = []
+                for i in range(depth[block]):
+                    dilation_rate = dilation_dict[block] \
+                        if dilation_dict and block in dilation_dict else 1
+                    if block == 3:
+                        dilation_rate = dilation_rate * multi_grid[i]
+                    basic_block = self.add_sublayer(
+                        'bb_%d_%d' % (block, i),
+                        BasicBlock(
+                            in_channels=num_channels[block]
+                            if i == 0 else num_filters[block],
+                            out_channels=num_filters[block],
+                            stride=2 if i == 0 and block != 0 \
+                                        and dilation_rate == 1 else 1,
+                            dilation=dilation_rate,
+                            shortcut=shortcut,
+                            if_first=block == i == 0,
+                            data_format=data_format))
+                    block_list.append(basic_block)
+                    shortcut = True
+                self.stage_list.append(block_list)
+        self.pretrained = pretrained
+    def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
+        y = self.conv1_1(inputs)
+        y = self.conv1_2(y)
+        y = self.conv1_3(y)
+        self.conv1_logit = y.clone()
+        y = self.pool2d_max(y)
+        # A feature list saves the output feature map of each stage.
+        feat_list = []
+        for stage in self.stage_list:
+            for block in stage:
+                y = block(y)
+            feat_list.append(y)
+        return feat_list
+def ResNet50_vd(**args):
+    model = ResNet_vd(layers=50, **args)
+    return model
--- a/modules/image/semantic_segmentation/danet_resnet50_cityscapes/README.md
+++ b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/README.md
+# danet_resnet50_cityscapes
+|模型名称|danet_resnet50_cityscapes|
+| :--- | :---: | 
+|类别|图像-图像分割|
+|网络|danet_resnet50vd|
+|数据集|Cityscapes|
+|是否支持Fine-tuning|是|
+|模型大小|272MB|
+|指标|-|
+|最新更新日期|2022-03-21|
+## 一、模型基本信息
+  - 样例结果示例：
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212111-df341f2a-e994-45d7-92d6-2288d666079c.png"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212188-2db40b29-2943-47ce-9ad2-36a6fb85ba3e.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### 模型介绍
+    - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+    - 更多详情请参考：[ann](https://arxiv.org/pdf/1908.07678.pdf)
+## 二、安装
+- ### 1、环境依赖
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、安装
+    - ```shell
+      $ hub install danet_resnet50_cityscapes
+      ```
+    -  如您安装时遇到问题，可参考：[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+    | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)   
+## 三、模型API预测
+- ### 1.预测代码示例
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='danet_resnet50_cityscapes')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.如何开始Fine-tune
+    - 在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用danet_resnet50_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下：
+    - 代码步骤
+        - Step1: 定义数据预处理方式
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+        - Step2: 下载数据集并使用
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                - `transforms`: 数据预处理方式。
+                - `mode`: `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+                - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+        - Step3: 加载预训练模型
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='danet_resnet50_cityscapes', num_classes=2, pretrained=None)
+              ```
+                - `name`: 选择预训练模型的名字。
+                - `load_checkpoint`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+        - Step4: 选择优化策略和运行配置
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```
+    - 模型预测
+        - 当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下：
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='danet_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - 参数配置正确后，请执行脚本`python predict.py`。
+            - **Args**
+                * `images`:原始图像路径或BGR格式图片；
+                * `visualization`: 是否可视化，默认为True；
+                * `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+                **NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+## 四、服务部署
+- PaddleHub Serving可以部署一个在线图像分割服务。
+- ### 第一步：启动PaddleHub Serving
+    - 运行启动命令：
+    - ```shell
+      $ hub serving start -m danet_resnet50_cityscapes
+      ```
+    - 这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+    - **NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+- ### 第二步：发送预测请求
+    - 配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        # 发送HTTP请求
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## 五、更新历史
+* 1.0.0
+  初始发布
--- a/modules/image/semantic_segmentation/danet_resnet50_cityscapes/README_en.md
+++ b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/README_en.md
+# danet_resnet50_cityscapes
+|Module Name|danet_resnet50_cityscapes|
+| :--- | :---: | 
+|Category|Image Segmentation|
+|Network|danet_resnet50vd|
+|Dataset|Cityscapes|
+|Fine-tuning supported or not|Yes|
+|Module Size|272MB|
+|Data indicators|-|
+|Latest update date|2022-03-21|
+## I. Basic Information 
+- ### Application Effect Display
+    - Sample results:
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212111-df341f2a-e994-45d7-92d6-2288d666079c.png"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212188-2db40b29-2943-47ce-9ad2-36a6fb85ba3e.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### Module Introduction
+    - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+    - For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
+## II. Installation
+- ### 1、Environmental Dependence
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、Installation
+    - ```shell
+      $ hub install danet_resnet50_cityscapes
+      ```
+    - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+    | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)  
+## III. Module API Prediction
+- ### 1、Prediction Code Example
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='danet_resnet50_cityscapes')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.Fine-tune and Encapsulation
+    - After completing the installation of PaddlePaddle and PaddleHub, you can start using the danet_resnet50_cityscapes model to fine-tune datasets such as OpticDiscSeg.
+    - Steps:
+         - Step1: Define the data preprocessing method
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+         - Step2: Download the dataset
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                * `transforms`: data preprocessing methods.
+                * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+                * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+        - Step3: Load the pre-trained model
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='danet_resnet50_cityscapes', num_classes=2, pretrained=None)
+              ```
+                - `name`: model name.
+                - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+        - Step4:  Optimization strategy
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```            
+    -  Model prediction
+        - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='danet_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - **Args**
+                * `images`: Image path or ndarray data with format [H, W, C], BGR.
+                * `visualization`: Whether to save the recognition results as picture files.
+                * `save_path`: Save path of the result, default is 'seg_result'.
+## IV. Server Deployment
+- PaddleHub Serving can deploy an online service of image segmentation.
+- ### Step 1: Start PaddleHub Serving
+    - Run the startup command:
+        - ```shell
+          $ hub serving start -m danet_resnet50_cityscapes
+          ```
+    - The servitization API is now deployed and the default port number is 8866.
+    - **NOTE:**  If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+- ### Step 2: Send a predictive request
+    - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/danet_resnet50_cityscapes"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## V. Release Note
+- 1.0.0
+  First release
--- a/modules/image/semantic_segmentation/danet_resnet50_cityscapes/layers.py
+++ b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/layers.py
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu':
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+class ConvBNLayer(nn.Layer):
+    """Basic conv bn relu layer."""
+    def __init__(
+            self,
+            in_channels: int,
+            out_channels: int,
+            kernel_size: int,
+            stride: int = 1,
+            dilation: int = 1,
+            groups: int = 1,
+            is_vd_mode: bool = False,
+            act: str = None,
+            name: str = None):
+        super(ConvBNLayer, self).__init__()
+        self.is_vd_mode = is_vd_mode
+        self._pool2d_avg = AvgPool2D(
+            kernel_size=2, stride=2, padding=0, ceil_mode=True)
+        self._conv = Conv2D(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+            dilation=dilation,
+            groups=groups,
+            bias_attr=False)
+        self._batch_norm = SyncBatchNorm(out_channels)
+        self._act_op = Activation(act=act)
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        if self.is_vd_mode:
+            inputs = self._pool2d_avg(inputs)
+        y = self._conv(inputs)
+        y = self._batch_norm(y)
+        y = self._act_op(y)
+        return y
+class BottleneckBlock(nn.Layer):
+    """Residual bottleneck block"""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 shortcut: bool = True,
+                 if_first: bool = False,
+                 dilation: int = 1,
+                 name: str = None):
+        super(BottleneckBlock, self).__init__()
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=1,
+            act='relu',
+            name=name + "_branch2a")
+        self.dilation = dilation
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            act='relu',
+            dilation=dilation,
+            name=name + "_branch2b")
+        self.conv2 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels * 4,
+            kernel_size=1,
+            act=None,
+            name=name + "_branch2c")
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels * 4,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                name=name + "_branch1")
+        self.shortcut = shortcut
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        if self.dilation > 1:
+            padding = self.dilation
+            y = F.pad(y, [padding, padding, padding, padding])
+        conv1 = self.conv1(y)
+        conv2 = self.conv2(conv1)
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+        y = paddle.add(x=short, y=conv2)
+        y = F.relu(y)
+        return y
+class SeparableConvBNReLU(nn.Layer):
+    """Depthwise Separable Convolution."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(SeparableConvBNReLU, self).__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(
+            in_channels, out_channels, kernel_size=1, groups=1)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+class ConvBN(nn.Layer):
+    """Basic conv bn layer"""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBN, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBNReLU, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+        self._act = act
+        upper_act_names = activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(
+                    act, act_dict.keys()))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
+class ASPPModule(nn.Layer):
+    """
+    Atrous Spatial Pyramid Pooling.
+    Args:
+        aspp_ratios (tuple): The dilation rate using in ASSP module.
+        in_channels (int): The number of input channels.
+        out_channels (int): The number of output channels.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+        use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+        image_pooling (bool, optional): If augmented with image-level features. Default: False
+    """
+    def __init__(self,
+                 aspp_ratios: tuple,
+                 in_channels: int,
+                 out_channels: int,
+                 align_corners: bool,
+                 use_sep_conv: bool= False,
+                 image_pooling: bool = False):
+        super().__init__()
+        self.align_corners = align_corners
+        self.aspp_blocks = nn.LayerList()
+        for ratio in aspp_ratios:
+            if use_sep_conv and ratio > 1:
+                conv_func = SeparableConvBNReLU
+            else:
+                conv_func = ConvBNReLU
+            block = conv_func(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1 if ratio == 1 else 3,
+                dilation=ratio,
+                padding=0 if ratio == 1 else ratio)
+            self.aspp_blocks.append(block)
+        out_size = len(self.aspp_blocks)
+        if image_pooling:
+            self.global_avg_pool = nn.Sequential(
+                nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+                ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+            out_size += 1
+        self.image_pooling = image_pooling
+        self.conv_bn_relu = ConvBNReLU(
+            in_channels=out_channels * out_size,
+            out_channels=out_channels,
+            kernel_size=1)
+        self.dropout = nn.Dropout(p=0.1)  # drop rate
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outputs = []
+        for block in self.aspp_blocks:
+            y = block(x)
+            y = F.interpolate(
+                y,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(y)
+        if self.image_pooling:
+            img_avg = self.global_avg_pool(x)
+            img_avg = F.interpolate(
+                img_avg,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(img_avg)
+        x = paddle.concat(outputs, axis=1)
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        return x
--- a/modules/image/semantic_segmentation/danet_resnet50_cityscapes/module.py
+++ b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/module.py
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+from typing import Union, List, Tuple
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from danet_resnet50_voc.resnet import ResNet50_vd
+import danet_resnet50_voc.layers as L
+@moduleinfo(
+    name="danet_resnet50_cityscapes",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="DANetResnet50 is a segmentation model.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class DANet(nn.Layer):
+    """
+    The DANet implementation based on PaddlePaddle.
+    The original article refers to
+    Fu, jun, et al. "Dual Attention Network for Scene Segmentation"
+    (https://arxiv.org/pdf/1809.02983.pdf)
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone (Paddle.nn.Layer): A backbone network.
+        backbone_indices (tuple): The values in the tuple indicate the indices of
+            output of backbone.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.  Default: False.
+        pretrained (str, optional): The path or url of pretrained model. Default: None.
+    """
+    def __init__(self,
+                 num_classes: int = 19,
+                 backbone_indices: Tuple[int] = (2, 3),
+                 align_corners: bool = False,
+                 pretrained: str = None):
+        super(DANet, self).__init__()
+        self.backbone = ResNet50_vd()
+        self.backbone_indices = backbone_indices
+        in_channels = [self.backbone.feat_channels[i] for i in backbone_indices]
+        self.head = DAHead(num_classes=num_classes, in_channels=in_channels)
+        self.align_corners = align_corners
+        self.transforms = T.Compose([T.Normalize()])
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+        else:
+            checkpoint = os.path.join(self.directory, 'model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+    def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+        feats = self.backbone(x)
+        feats = [feats[i] for i in self.backbone_indices]
+        logit_list = self.head(feats)
+        if not self.training:
+            logit_list = [logit_list[0]]
+        logit_list = [
+            F.interpolate(
+                logit,
+                paddle.shape(x)[2:],
+                mode='bilinear',
+                align_corners=self.align_corners,
+                align_mode=1) for logit in logit_list
+        ]
+        return logit_list
+    def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+        return self.transforms(img)
+class DAHead(nn.Layer):
+    """
+    The Dual attention head.
+    Args:
+        num_classes (int): The unique number of target classes.
+        in_channels (tuple): The number of input channels.
+    """
+    def __init__(self, num_classes: int, in_channels: int):
+        super().__init__()
+        in_channels = in_channels[-1]
+        inter_channels = in_channels // 4
+        self.channel_conv = L.ConvBNReLU(in_channels, inter_channels, 3)
+        self.position_conv = L.ConvBNReLU(in_channels, inter_channels, 3)
+        self.pam = PAM(inter_channels)
+        self.cam = CAM(inter_channels)
+        self.conv1 = L.ConvBNReLU(inter_channels, inter_channels, 3)
+        self.conv2 = L.ConvBNReLU(inter_channels, inter_channels, 3)
+        self.aux_head = nn.Sequential(
+            nn.Dropout2D(0.1), nn.Conv2D(in_channels, num_classes, 1))
+        self.aux_head_pam = nn.Sequential(
+            nn.Dropout2D(0.1), nn.Conv2D(inter_channels, num_classes, 1))
+        self.aux_head_cam = nn.Sequential(
+            nn.Dropout2D(0.1), nn.Conv2D(inter_channels, num_classes, 1))
+        self.cls_head = nn.Sequential(
+            nn.Dropout2D(0.1), nn.Conv2D(inter_channels, num_classes, 1))
+    def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]:
+        feats = feat_list[-1]
+        channel_feats = self.channel_conv(feats)
+        channel_feats = self.cam(channel_feats)
+        channel_feats = self.conv1(channel_feats)
+        position_feats = self.position_conv(feats)
+        position_feats = self.pam(position_feats)
+        position_feats = self.conv2(position_feats)
+        feats_sum = position_feats + channel_feats
+        logit = self.cls_head(feats_sum)
+        if not self.training:
+            return [logit]
+        cam_logit = self.aux_head_cam(channel_feats)
+        pam_logit = self.aux_head_cam(position_feats)
+        aux_logit = self.aux_head(feats)
+        return [logit, cam_logit, pam_logit, aux_logit]
+class PAM(nn.Layer):
+    """Position attention module."""
+    def __init__(self, in_channels: int):
+        super().__init__()
+        mid_channels = in_channels // 8
+        self.mid_channels = mid_channels
+        self.in_channels = in_channels
+        self.query_conv = nn.Conv2D(in_channels, mid_channels, 1, 1)
+        self.key_conv = nn.Conv2D(in_channels, mid_channels, 1, 1)
+        self.value_conv = nn.Conv2D(in_channels, in_channels, 1, 1)
+        self.gamma = self.create_parameter(
+            shape=[1],
+            dtype='float32',
+            default_initializer=nn.initializer.Constant(0))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x_shape = paddle.shape(x)
+        # query: n, h * w, c1
+        query = self.query_conv(x)
+        query = paddle.reshape(query, (0, self.mid_channels, -1))
+        query = paddle.transpose(query, (0, 2, 1))
+        # key: n, c1, h * w
+        key = self.key_conv(x)
+        key = paddle.reshape(key, (0, self.mid_channels, -1))
+        # sim: n, h * w, h * w
+        sim = paddle.bmm(query, key)
+        sim = F.softmax(sim, axis=-1)
+        value = self.value_conv(x)
+        value = paddle.reshape(value, (0, self.in_channels, -1))
+        sim = paddle.transpose(sim, (0, 2, 1))
+        # feat: from (n, c2, h * w) -> (n, c2, h, w)
+        feat = paddle.bmm(value, sim)
+        feat = paddle.reshape(feat,
+                              (0, self.in_channels, x_shape[2], x_shape[3]))
+        out = self.gamma * feat + x
+        return out
+class CAM(nn.Layer):
+    """Channel attention module."""
+    def __init__(self, channels: int):
+        super().__init__()
+        self.channels = channels
+        self.gamma = self.create_parameter(
+            shape=[1],
+            dtype='float32',
+            default_initializer=nn.initializer.Constant(0))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x_shape = paddle.shape(x)
+        # query: n, c, h * w
+        query = paddle.reshape(x, (0, self.channels, -1))
+        # key: n, h * w, c
+        key = paddle.reshape(x, (0, self.channels, -1))
+        key = paddle.transpose(key, (0, 2, 1))
+        # sim: n, c, c
+        sim = paddle.bmm(query, key)
+        # The danet author claims that this can avoid gradient divergence
+        sim = paddle.max(
+            sim, axis=-1, keepdim=True).tile([1, 1, self.channels]) - sim
+        sim = F.softmax(sim, axis=-1)
+        # feat: from (n, c, h * w) to (n, c, h, w)
+        value = paddle.reshape(x, (0, self.channels, -1))
+        feat = paddle.bmm(sim, value)
+        feat = paddle.reshape(feat, (0, self.channels, x_shape[2], x_shape[3]))
+        out = self.gamma * feat + x
+        return out
--- a/modules/image/semantic_segmentation/danet_resnet50_cityscapes/resnet.py
+++ b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/resnet.py
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Union, List, Tuple
+import paddle.nn as nn
+import ann_resnet50_voc.layers as layers
+class ConvBNLayer(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 stride: int = 1,
+                 dilation: int = 1,
+                 groups: int = 1,
+                 is_vd_mode: bool = False,
+                 act: str = None,
+                 data_format: str = 'NCHW'):
+        super(ConvBNLayer, self).__init__()
+        if dilation != 1 and kernel_size != 3:
+            raise RuntimeError("When the dilation isn't 1," \
+                               "the kernel_size should be 3.")
+        self.is_vd_mode = is_vd_mode
+        self._pool2d_avg = nn.AvgPool2D(
+            kernel_size=2,
+            stride=2,
+            padding=0,
+            ceil_mode=True,
+            data_format=data_format)
+        self._conv = nn.Conv2D(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=(kernel_size - 1) // 2 \
+                if dilation == 1 else dilation,
+            dilation=dilation,
+            groups=groups,
+            bias_attr=False,
+            data_format=data_format)
+        self._batch_norm = layers.SyncBatchNorm(
+            out_channels, data_format=data_format)
+        self._act_op = layers.Activation(act=act)
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        if self.is_vd_mode:
+            inputs = self._pool2d_avg(inputs)
+        y = self._conv(inputs)
+        y = self._batch_norm(y)
+        y = self._act_op(y)
+        return y
+class BottleneckBlock(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 shortcut: bool = True,
+                 if_first: bool  = False,
+                 dilation: int = 1,
+                 data_format: str = 'NCHW'):
+        super(BottleneckBlock, self).__init__()
+        self.data_format = data_format
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=1,
+            act='relu',
+            data_format=data_format)
+        self.dilation = dilation
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            act='relu',
+            dilation=dilation,
+            data_format=data_format)
+        self.conv2 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels * 4,
+            kernel_size=1,
+            act=None,
+            data_format=data_format)
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels * 4,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                data_format=data_format)
+        self.shortcut = shortcut
+        # NOTE: Use the wrap layer for quantization training
+        self.add = layers.Add()
+        self.relu = layers.Activation(act="relu")
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        conv1 = self.conv1(y)
+        conv2 = self.conv2(conv1)
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+        y = self.add(short, conv2)
+        y = self.relu(y)
+        return y
+class BasicBlock(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 dilation: int = 1,
+                 shortcut: bool = True,
+                 if_first: bool = False,
+                 data_format: str = 'NCHW'):
+        super(BasicBlock, self).__init__()
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            dilation=dilation,
+            act='relu',
+            data_format=data_format)
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            dilation=dilation,
+            act=None,
+            data_format=data_format)
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                data_format=data_format)
+        self.shortcut = shortcut
+        self.dilation = dilation
+        self.data_format = data_format
+        self.add = layers.Add()
+        self.relu = layers.Activation(act="relu")
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        conv1 = self.conv1(y)
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+        y = self.add(short, conv1)
+        y = self.relu(y)
+        return y
+class ResNet_vd(nn.Layer):
+    """
+    The ResNet_vd implementation based on PaddlePaddle.
+    The original article refers to Jingdong
+    Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
+    (https://arxiv.org/pdf/1812.01187.pdf).
+    Args:
+        layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50.
+        output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8.
+        multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1).
+        pretrained (str, optional): The path of pretrained model.
+    """
+    def __init__(self,
+                 layers: int = 50,
+                 output_stride: int = 8,
+                 multi_grid: Tuple[int] = (1, 1, 1),
+                 pretrained: str = None,
+                 data_format: str = 'NCHW'):
+        super(ResNet_vd, self).__init__()
+        self.data_format = data_format
+        self.conv1_logit = None  # for gscnn shape stream
+        self.layers = layers
+        supported_layers = [18, 34, 50, 101, 152, 200]
+        assert layers in supported_layers, \
+            "supported layers are {} but input layer is {}".format(
+                supported_layers, layers)
+        if layers == 18:
+            depth = [2, 2, 2, 2]
+        elif layers == 34 or layers == 50:
+            depth = [3, 4, 6, 3]
+        elif layers == 101:
+            depth = [3, 4, 23, 3]
+        elif layers == 152:
+            depth = [3, 8, 36, 3]
+        elif layers == 200:
+            depth = [3, 12, 48, 3]
+        num_channels = [64, 256, 512, 1024
+                        ] if layers >= 50 else [64, 64, 128, 256]
+        num_filters = [64, 128, 256, 512]
+        # for channels of four returned stages
+        self.feat_channels = [c * 4 for c in num_filters
+                              ] if layers >= 50 else num_filters
+        dilation_dict = None
+        if output_stride == 8:
+            dilation_dict = {2: 2, 3: 4}
+        elif output_stride == 16:
+            dilation_dict = {3: 2}
+        self.conv1_1 = ConvBNLayer(
+            in_channels=3,
+            out_channels=32,
+            kernel_size=3,
+            stride=2,
+            act='relu',
+            data_format=data_format)
+        self.conv1_2 = ConvBNLayer(
+            in_channels=32,
+            out_channels=32,
+            kernel_size=3,
+            stride=1,
+            act='relu',
+            data_format=data_format)
+        self.conv1_3 = ConvBNLayer(
+            in_channels=32,
+            out_channels=64,
+            kernel_size=3,
+            stride=1,
+            act='relu',
+            data_format=data_format)
+        self.pool2d_max = nn.MaxPool2D(
+            kernel_size=3, stride=2, padding=1, data_format=data_format)
+        # self.block_list = []
+        self.stage_list = []
+        if layers >= 50:
+            for block in range(len(depth)):
+                shortcut = False
+                block_list = []
+                for i in range(depth[block]):
+                    if layers in [101, 152] and block == 2:
+                        if i == 0:
+                            conv_name = "res" + str(block + 2) + "a"
+                        else:
+                            conv_name = "res" + str(block + 2) + "b" + str(i)
+                    else:
+                        conv_name = "res" + str(block + 2) + chr(97 + i)
+                    ###############################################################################
+                    # Add dilation rate for some segmentation tasks, if dilation_dict is not None.
+                    dilation_rate = dilation_dict[
+                        block] if dilation_dict and block in dilation_dict else 1
+                    # Actually block here is 'stage', and i is 'block' in 'stage'
+                    # At the stage 4, expand the the dilation_rate if given multi_grid
+                    if block == 3:
+                        dilation_rate = dilation_rate * multi_grid[i]
+                    ###############################################################################
+                    bottleneck_block = self.add_sublayer(
+                        'bb_%d_%d' % (block, i),
+                        BottleneckBlock(
+                            in_channels=num_channels[block]
+                            if i == 0 else num_filters[block] * 4,
+                            out_channels=num_filters[block],
+                            stride=2 if i == 0 and block != 0
+                                        and dilation_rate == 1 else 1,
+                            shortcut=shortcut,
+                            if_first=block == i == 0,
+                            dilation=dilation_rate,
+                            data_format=data_format))
+                    block_list.append(bottleneck_block)
+                    shortcut = True
+                self.stage_list.append(block_list)
+        else:
+            for block in range(len(depth)):
+                shortcut = False
+                block_list = []
+                for i in range(depth[block]):
+                    dilation_rate = dilation_dict[block] \
+                        if dilation_dict and block in dilation_dict else 1
+                    if block == 3:
+                        dilation_rate = dilation_rate * multi_grid[i]
+                    basic_block = self.add_sublayer(
+                        'bb_%d_%d' % (block, i),
+                        BasicBlock(
+                            in_channels=num_channels[block]
+                            if i == 0 else num_filters[block],
+                            out_channels=num_filters[block],
+                            stride=2 if i == 0 and block != 0 \
+                                        and dilation_rate == 1 else 1,
+                            dilation=dilation_rate,
+                            shortcut=shortcut,
+                            if_first=block == i == 0,
+                            data_format=data_format))
+                    block_list.append(basic_block)
+                    shortcut = True
+                self.stage_list.append(block_list)
+        self.pretrained = pretrained
+    def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
+        y = self.conv1_1(inputs)
+        y = self.conv1_2(y)
+        y = self.conv1_3(y)
+        self.conv1_logit = y.clone()
+        y = self.pool2d_max(y)
+        # A feature list saves the output feature map of each stage.
+        feat_list = []
+        for stage in self.stage_list:
+            for block in stage:
+                y = block(y)
+            feat_list.append(y)
+        return feat_list
+def ResNet50_vd(**args):
+    model = ResNet_vd(layers=50, **args)
+    return model
\ No newline at end of file
--- a/modules/image/semantic_segmentation/danet_resnet50_voc/README.md
+++ b/modules/image/semantic_segmentation/danet_resnet50_voc/README.md
+# danet_resnet50_voc
+|模型名称|danet_resnet50_voc|
+| :--- | :---: | 
+|类别|图像-图像分割|
+|网络|danet_resnet50vd|
+|数据集|PascalVOC2012|
+|是否支持Fine-tuning|是|
+|模型大小|273MB|
+|指标|-|
+|最新更新日期|2022-03-21|
+## 一、模型基本信息
+  - 样例结果示例：
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212097-443a5a65-2f2e-4126-9c07-d7c3c220e55f.jpg"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212375-52e123af-4699-4c25-8f50-4240bbb714b4.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### 模型介绍
+    - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+    - 更多详情请参考：[danet](https://arxiv.org/pdf/1809.02983.pdf)
+## 二、安装
+- ### 1、环境依赖
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、安装
+    - ```shell
+      $ hub install danet_resnet50_voc
+      ```
+    -  如您安装时遇到问题，可参考：[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+    | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)   
+## 三、模型API预测
+- ### 1.预测代码示例
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='danet_resnet50_voc')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.如何开始Fine-tune
+    - 在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用danet_resnet50_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下：
+    - 代码步骤
+        - Step1: 定义数据预处理方式
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+        - Step2: 下载数据集并使用
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                - `transforms`: 数据预处理方式。
+                - `mode`: `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+                - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+        - Step3: 加载预训练模型
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='danet_resnet50_voc', num_classes=2, pretrained=None)
+              ```
+                - `name`: 选择预训练模型的名字。
+                - `load_checkpoint`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+        - Step4: 选择优化策略和运行配置
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```
+    - 模型预测
+        - 当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下：
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='danet_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - 参数配置正确后，请执行脚本`python predict.py`。
+            - **Args**
+                * `images`:原始图像路径或BGR格式图片；
+                * `visualization`: 是否可视化，默认为True；
+                * `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+                **NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+## 四、服务部署
+- PaddleHub Serving可以部署一个在线图像分割服务。
+- ### 第一步：启动PaddleHub Serving
+    - 运行启动命令：
+    - ```shell
+      $ hub serving start -m danet_resnet50_voc
+      ```
+    - 这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+    - **NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+- ### 第二步：发送预测请求
+    - 配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        # 发送HTTP请求
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## 五、更新历史
+* 1.0.0
+  初始发布
--- a/modules/image/semantic_segmentation/danet_resnet50_voc/README_en.md
+++ b/modules/image/semantic_segmentation/danet_resnet50_voc/README_en.md
+# danet_resnet50_voc
+|Module Name|danet_resnet50_voc|
+| :--- | :---: | 
+|Category|Image Segmentation|
+|Network|danet_resnet50vd|
+|Dataset|PascalVOC2012|
+|Fine-tuning supported or not|Yes|
+|Module Size|273MB|
+|Data indicators|-|
+|Latest update date|2022-03-22|
+## I. Basic Information 
+- ### Application Effect Display
+    - Sample results:
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212097-443a5a65-2f2e-4126-9c07-d7c3c220e55f.jpg"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212375-52e123af-4699-4c25-8f50-4240bbb714b4.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### Module Introduction
+    - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+    - For more information, please refer to: [danet](https://arxiv.org/pdf/1809.02983.pdf)
+## II. Installation
+- ### 1、Environmental Dependence
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、Installation
+    - ```shell
+      $ hub install danet_resnet50_voc
+      ```
+    - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+    | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)  
+## III. Module API Prediction
+- ### 1、Prediction Code Example
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='danet_resnet50_voc')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.Fine-tune and Encapsulation
+    - After completing the installation of PaddlePaddle and PaddleHub, you can start using the danet_resnet50_voc model to fine-tune datasets such as OpticDiscSeg.
+    - Steps:
+         - Step1: Define the data preprocessing method
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+         - Step2: Download the dataset
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                * `transforms`: data preprocessing methods.
+                * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+                * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+        - Step3: Load the pre-trained model
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='danet_resnet50_voc', num_classes=2, pretrained=None)
+              ```
+                - `name`: model name.
+                - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+        - Step4:  Optimization strategy
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```
+    -  Model prediction
+        - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='danet_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - **Args**
+                * `images`: Image path or ndarray data with format [H, W, C], BGR.
+                * `visualization`: Whether to save the recognition results as picture files.
+                * `save_path`: Save path of the result, default is 'seg_result'.
+## IV. Server Deployment
+- PaddleHub Serving can deploy an online service of image segmentation.
+- ### Step 1: Start PaddleHub Serving
+    - Run the startup command:
+        - ```shell
+          $ hub serving start -m danet_resnet50_voc
+          ```
+    - The servitization API is now deployed and the default port number is 8866.
+    - **NOTE:**  If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+- ### Step 2: Send a predictive request
+    - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/danet_resnet50_voc"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## V. Release Note
+- 1.0.0
+  First release
--- a/modules/image/semantic_segmentation/danet_resnet50_voc/layers.py
+++ b/modules/image/semantic_segmentation/danet_resnet50_voc/layers.py
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu':
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+class ConvBNLayer(nn.Layer):
+    """Basic conv bn relu layer."""
+    def __init__(
+            self,
+            in_channels: int,
+            out_channels: int,
+            kernel_size: int,
+            stride: int = 1,
+            dilation: int = 1,
+            groups: int = 1,
+            is_vd_mode: bool = False,
+            act: str = None,
+            name: str = None):
+        super(ConvBNLayer, self).__init__()
+        self.is_vd_mode = is_vd_mode
+        self._pool2d_avg = AvgPool2D(
+            kernel_size=2, stride=2, padding=0, ceil_mode=True)
+        self._conv = Conv2D(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+            dilation=dilation,
+            groups=groups,
+            bias_attr=False)
+        self._batch_norm = SyncBatchNorm(out_channels)
+        self._act_op = Activation(act=act)
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        if self.is_vd_mode:
+            inputs = self._pool2d_avg(inputs)
+        y = self._conv(inputs)
+        y = self._batch_norm(y)
+        y = self._act_op(y)
+        return y
+class BottleneckBlock(nn.Layer):
+    """Residual bottleneck block"""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 shortcut: bool = True,
+                 if_first: bool = False,
+                 dilation: int = 1,
+                 name: str = None):
+        super(BottleneckBlock, self).__init__()
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=1,
+            act='relu',
+            name=name + "_branch2a")
+        self.dilation = dilation
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            act='relu',
+            dilation=dilation,
+            name=name + "_branch2b")
+        self.conv2 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels * 4,
+            kernel_size=1,
+            act=None,
+            name=name + "_branch2c")
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels * 4,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                name=name + "_branch1")
+        self.shortcut = shortcut
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        if self.dilation > 1:
+            padding = self.dilation
+            y = F.pad(y, [padding, padding, padding, padding])
+        conv1 = self.conv1(y)
+        conv2 = self.conv2(conv1)
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+        y = paddle.add(x=short, y=conv2)
+        y = F.relu(y)
+        return y
+class SeparableConvBNReLU(nn.Layer):
+    """Depthwise Separable Convolution."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(SeparableConvBNReLU, self).__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(
+            in_channels, out_channels, kernel_size=1, groups=1)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+class ConvBN(nn.Layer):
+    """Basic conv bn layer"""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBN, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBNReLU, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+        self._act = act
+        upper_act_names = activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(
+                    act, act_dict.keys()))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
+class ASPPModule(nn.Layer):
+    """
+    Atrous Spatial Pyramid Pooling.
+    Args:
+        aspp_ratios (tuple): The dilation rate using in ASSP module.
+        in_channels (int): The number of input channels.
+        out_channels (int): The number of output channels.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+        use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+        image_pooling (bool, optional): If augmented with image-level features. Default: False
+    """
+    def __init__(self,
+                 aspp_ratios: tuple,
+                 in_channels: int,
+                 out_channels: int,
+                 align_corners: bool,
+                 use_sep_conv: bool= False,
+                 image_pooling: bool = False):
+        super().__init__()
+        self.align_corners = align_corners
+        self.aspp_blocks = nn.LayerList()
+        for ratio in aspp_ratios:
+            if use_sep_conv and ratio > 1:
+                conv_func = SeparableConvBNReLU
+            else:
+                conv_func = ConvBNReLU
+            block = conv_func(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1 if ratio == 1 else 3,
+                dilation=ratio,
+                padding=0 if ratio == 1 else ratio)
+            self.aspp_blocks.append(block)
+        out_size = len(self.aspp_blocks)
+        if image_pooling:
+            self.global_avg_pool = nn.Sequential(
+                nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+                ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+            out_size += 1
+        self.image_pooling = image_pooling
+        self.conv_bn_relu = ConvBNReLU(
+            in_channels=out_channels * out_size,
+            out_channels=out_channels,
+            kernel_size=1)
+        self.dropout = nn.Dropout(p=0.1)  # drop rate
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outputs = []
+        for block in self.aspp_blocks:
+            y = block(x)
+            y = F.interpolate(
+                y,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(y)
+        if self.image_pooling:
+            img_avg = self.global_avg_pool(x)
+            img_avg = F.interpolate(
+                img_avg,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(img_avg)
+        x = paddle.concat(outputs, axis=1)
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        return x
--- a/modules/image/semantic_segmentation/danet_resnet50_voc/module.py
+++ b/modules/image/semantic_segmentation/danet_resnet50_voc/module.py
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+from typing import Union, List, Tuple
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from danet_resnet50_voc.resnet import ResNet50_vd
+import danet_resnet50_voc.layers as L
+@moduleinfo(
+    name="danet_resnet50_voc",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="DeepLabV3PResnet50 is a segmentation model.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class DANet(nn.Layer):
+    """
+    The DANet implementation based on PaddlePaddle.
+    The original article refers to
+    Fu, jun, et al. "Dual Attention Network for Scene Segmentation"
+    (https://arxiv.org/pdf/1809.02983.pdf)
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone (Paddle.nn.Layer): A backbone network.
+        backbone_indices (tuple): The values in the tuple indicate the indices of
+            output of backbone.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.  Default: False.
+        pretrained (str, optional): The path or url of pretrained model. Default: None.
+    """
+    def __init__(self,
+                 num_classes: int = 21,
+                 backbone_indices: Tuple[int] = (2, 3),
+                 align_corners: bool = False,
+                 pretrained: str = None):
+        super(DANet, self).__init__()
+        self.backbone = ResNet50_vd()
+        self.backbone_indices = backbone_indices
+        in_channels = [self.backbone.feat_channels[i] for i in backbone_indices]
+        self.head = DAHead(num_classes=num_classes, in_channels=in_channels)
+        self.align_corners = align_corners
+        self.transforms = T.Compose([T.Normalize()])
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+        else:
+            checkpoint = os.path.join(self.directory, 'model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+    def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+        feats = self.backbone(x)
+        feats = [feats[i] for i in self.backbone_indices]
+        logit_list = self.head(feats)
+        if not self.training:
+            logit_list = [logit_list[0]]
+        logit_list = [
+            F.interpolate(
+                logit,
+                paddle.shape(x)[2:],
+                mode='bilinear',
+                align_corners=self.align_corners,
+                align_mode=1) for logit in logit_list
+        ]
+        return logit_list
+    def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+        return self.transforms(img)
+class DAHead(nn.Layer):
+    """
+    The Dual attention head.
+    Args:
+        num_classes (int): The unique number of target classes.
+        in_channels (tuple): The number of input channels.
+    """
+    def __init__(self, num_classes: int, in_channels: int):
+        super().__init__()
+        in_channels = in_channels[-1]
+        inter_channels = in_channels // 4
+        self.channel_conv = L.ConvBNReLU(in_channels, inter_channels, 3)
+        self.position_conv = L.ConvBNReLU(in_channels, inter_channels, 3)
+        self.pam = PAM(inter_channels)
+        self.cam = CAM(inter_channels)
+        self.conv1 = L.ConvBNReLU(inter_channels, inter_channels, 3)
+        self.conv2 = L.ConvBNReLU(inter_channels, inter_channels, 3)
+        self.aux_head = nn.Sequential(
+            nn.Dropout2D(0.1), nn.Conv2D(in_channels, num_classes, 1))
+        self.aux_head_pam = nn.Sequential(
+            nn.Dropout2D(0.1), nn.Conv2D(inter_channels, num_classes, 1))
+        self.aux_head_cam = nn.Sequential(
+            nn.Dropout2D(0.1), nn.Conv2D(inter_channels, num_classes, 1))
+        self.cls_head = nn.Sequential(
+            nn.Dropout2D(0.1), nn.Conv2D(inter_channels, num_classes, 1))
+    def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]:
+        feats = feat_list[-1]
+        channel_feats = self.channel_conv(feats)
+        channel_feats = self.cam(channel_feats)
+        channel_feats = self.conv1(channel_feats)
+        position_feats = self.position_conv(feats)
+        position_feats = self.pam(position_feats)
+        position_feats = self.conv2(position_feats)
+        feats_sum = position_feats + channel_feats
+        logit = self.cls_head(feats_sum)
+        if not self.training:
+            return [logit]
+        cam_logit = self.aux_head_cam(channel_feats)
+        pam_logit = self.aux_head_cam(position_feats)
+        aux_logit = self.aux_head(feats)
+        return [logit, cam_logit, pam_logit, aux_logit]
+class PAM(nn.Layer):
+    """Position attention module."""
+    def __init__(self, in_channels: int):
+        super().__init__()
+        mid_channels = in_channels // 8
+        self.mid_channels = mid_channels
+        self.in_channels = in_channels
+        self.query_conv = nn.Conv2D(in_channels, mid_channels, 1, 1)
+        self.key_conv = nn.Conv2D(in_channels, mid_channels, 1, 1)
+        self.value_conv = nn.Conv2D(in_channels, in_channels, 1, 1)
+        self.gamma = self.create_parameter(
+            shape=[1],
+            dtype='float32',
+            default_initializer=nn.initializer.Constant(0))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x_shape = paddle.shape(x)
+        # query: n, h * w, c1
+        query = self.query_conv(x)
+        query = paddle.reshape(query, (0, self.mid_channels, -1))
+        query = paddle.transpose(query, (0, 2, 1))
+        # key: n, c1, h * w
+        key = self.key_conv(x)
+        key = paddle.reshape(key, (0, self.mid_channels, -1))
+        # sim: n, h * w, h * w
+        sim = paddle.bmm(query, key)
+        sim = F.softmax(sim, axis=-1)
+        value = self.value_conv(x)
+        value = paddle.reshape(value, (0, self.in_channels, -1))
+        sim = paddle.transpose(sim, (0, 2, 1))
+        # feat: from (n, c2, h * w) -> (n, c2, h, w)
+        feat = paddle.bmm(value, sim)
+        feat = paddle.reshape(feat,
+                              (0, self.in_channels, x_shape[2], x_shape[3]))
+        out = self.gamma * feat + x
+        return out
+class CAM(nn.Layer):
+    """Channel attention module."""
+    def __init__(self, channels: int):
+        super().__init__()
+        self.channels = channels
+        self.gamma = self.create_parameter(
+            shape=[1],
+            dtype='float32',
+            default_initializer=nn.initializer.Constant(0))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x_shape = paddle.shape(x)
+        # query: n, c, h * w
+        query = paddle.reshape(x, (0, self.channels, -1))
+        # key: n, h * w, c
+        key = paddle.reshape(x, (0, self.channels, -1))
+        key = paddle.transpose(key, (0, 2, 1))
+        # sim: n, c, c
+        sim = paddle.bmm(query, key)
+        # The danet author claims that this can avoid gradient divergence
+        sim = paddle.max(
+            sim, axis=-1, keepdim=True).tile([1, 1, self.channels]) - sim
+        sim = F.softmax(sim, axis=-1)
+        # feat: from (n, c, h * w) to (n, c, h, w)
+        value = paddle.reshape(x, (0, self.channels, -1))
+        feat = paddle.bmm(sim, value)
+        feat = paddle.reshape(feat, (0, self.channels, x_shape[2], x_shape[3]))
+        out = self.gamma * feat + x
+        return out
--- a/modules/image/semantic_segmentation/danet_resnet50_voc/resnet.py
+++ b/modules/image/semantic_segmentation/danet_resnet50_voc/resnet.py
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Union, List, Tuple
+import paddle.nn as nn
+import ann_resnet50_voc.layers as layers
+class ConvBNLayer(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 stride: int = 1,
+                 dilation: int = 1,
+                 groups: int = 1,
+                 is_vd_mode: bool = False,
+                 act: str = None,
+                 data_format: str = 'NCHW'):
+        super(ConvBNLayer, self).__init__()
+        if dilation != 1 and kernel_size != 3:
+            raise RuntimeError("When the dilation isn't 1," \
+                               "the kernel_size should be 3.")
+        self.is_vd_mode = is_vd_mode
+        self._pool2d_avg = nn.AvgPool2D(
+            kernel_size=2,
+            stride=2,
+            padding=0,
+            ceil_mode=True,
+            data_format=data_format)
+        self._conv = nn.Conv2D(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=(kernel_size - 1) // 2 \
+                if dilation == 1 else dilation,
+            dilation=dilation,
+            groups=groups,
+            bias_attr=False,
+            data_format=data_format)
+        self._batch_norm = layers.SyncBatchNorm(
+            out_channels, data_format=data_format)
+        self._act_op = layers.Activation(act=act)
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        if self.is_vd_mode:
+            inputs = self._pool2d_avg(inputs)
+        y = self._conv(inputs)
+        y = self._batch_norm(y)
+        y = self._act_op(y)
+        return y
+class BottleneckBlock(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 shortcut: bool = True,
+                 if_first: bool  = False,
+                 dilation: int = 1,
+                 data_format: str = 'NCHW'):
+        super(BottleneckBlock, self).__init__()
+        self.data_format = data_format
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=1,
+            act='relu',
+            data_format=data_format)
+        self.dilation = dilation
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            act='relu',
+            dilation=dilation,
+            data_format=data_format)
+        self.conv2 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels * 4,
+            kernel_size=1,
+            act=None,
+            data_format=data_format)
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels * 4,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                data_format=data_format)
+        self.shortcut = shortcut
+        # NOTE: Use the wrap layer for quantization training
+        self.add = layers.Add()
+        self.relu = layers.Activation(act="relu")
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        conv1 = self.conv1(y)
+        conv2 = self.conv2(conv1)
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+        y = self.add(short, conv2)
+        y = self.relu(y)
+        return y
+class BasicBlock(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 dilation: int = 1,
+                 shortcut: bool = True,
+                 if_first: bool = False,
+                 data_format: str = 'NCHW'):
+        super(BasicBlock, self).__init__()
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            dilation=dilation,
+            act='relu',
+            data_format=data_format)
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            dilation=dilation,
+            act=None,
+            data_format=data_format)
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                data_format=data_format)
+        self.shortcut = shortcut
+        self.dilation = dilation
+        self.data_format = data_format
+        self.add = layers.Add()
+        self.relu = layers.Activation(act="relu")
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        conv1 = self.conv1(y)
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+        y = self.add(short, conv1)
+        y = self.relu(y)
+        return y
+class ResNet_vd(nn.Layer):
+    """
+    The ResNet_vd implementation based on PaddlePaddle.
+    The original article refers to Jingdong
+    Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
+    (https://arxiv.org/pdf/1812.01187.pdf).
+    Args:
+        layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50.
+        output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8.
+        multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1).
+        pretrained (str, optional): The path of pretrained model.
+    """
+    def __init__(self,
+                 layers: int = 50,
+                 output_stride: int = 8,
+                 multi_grid: Tuple[int] = (1, 1, 1),
+                 pretrained: str = None,
+                 data_format: str = 'NCHW'):
+        super(ResNet_vd, self).__init__()
+        self.data_format = data_format
+        self.conv1_logit = None  # for gscnn shape stream
+        self.layers = layers
+        supported_layers = [18, 34, 50, 101, 152, 200]
+        assert layers in supported_layers, \
+            "supported layers are {} but input layer is {}".format(
+                supported_layers, layers)
+        if layers == 18:
+            depth = [2, 2, 2, 2]
+        elif layers == 34 or layers == 50:
+            depth = [3, 4, 6, 3]
+        elif layers == 101:
+            depth = [3, 4, 23, 3]
+        elif layers == 152:
+            depth = [3, 8, 36, 3]
+        elif layers == 200:
+            depth = [3, 12, 48, 3]
+        num_channels = [64, 256, 512, 1024
+                        ] if layers >= 50 else [64, 64, 128, 256]
+        num_filters = [64, 128, 256, 512]
+        # for channels of four returned stages
+        self.feat_channels = [c * 4 for c in num_filters
+                              ] if layers >= 50 else num_filters
+        dilation_dict = None
+        if output_stride == 8:
+            dilation_dict = {2: 2, 3: 4}
+        elif output_stride == 16:
+            dilation_dict = {3: 2}
+        self.conv1_1 = ConvBNLayer(
+            in_channels=3,
+            out_channels=32,
+            kernel_size=3,
+            stride=2,
+            act='relu',
+            data_format=data_format)
+        self.conv1_2 = ConvBNLayer(
+            in_channels=32,
+            out_channels=32,
+            kernel_size=3,
+            stride=1,
+            act='relu',
+            data_format=data_format)
+        self.conv1_3 = ConvBNLayer(
+            in_channels=32,
+            out_channels=64,
+            kernel_size=3,
+            stride=1,
+            act='relu',
+            data_format=data_format)
+        self.pool2d_max = nn.MaxPool2D(
+            kernel_size=3, stride=2, padding=1, data_format=data_format)
+        # self.block_list = []
+        self.stage_list = []
+        if layers >= 50:
+            for block in range(len(depth)):
+                shortcut = False
+                block_list = []
+                for i in range(depth[block]):
+                    if layers in [101, 152] and block == 2:
+                        if i == 0:
+                            conv_name = "res" + str(block + 2) + "a"
+                        else:
+                            conv_name = "res" + str(block + 2) + "b" + str(i)
+                    else:
+                        conv_name = "res" + str(block + 2) + chr(97 + i)
+                    ###############################################################################
+                    # Add dilation rate for some segmentation tasks, if dilation_dict is not None.
+                    dilation_rate = dilation_dict[
+                        block] if dilation_dict and block in dilation_dict else 1
+                    # Actually block here is 'stage', and i is 'block' in 'stage'
+                    # At the stage 4, expand the the dilation_rate if given multi_grid
+                    if block == 3:
+                        dilation_rate = dilation_rate * multi_grid[i]
+                    ###############################################################################
+                    bottleneck_block = self.add_sublayer(
+                        'bb_%d_%d' % (block, i),
+                        BottleneckBlock(
+                            in_channels=num_channels[block]
+                            if i == 0 else num_filters[block] * 4,
+                            out_channels=num_filters[block],
+                            stride=2 if i == 0 and block != 0
+                                        and dilation_rate == 1 else 1,
+                            shortcut=shortcut,
+                            if_first=block == i == 0,
+                            dilation=dilation_rate,
+                            data_format=data_format))
+                    block_list.append(bottleneck_block)
+                    shortcut = True
+                self.stage_list.append(block_list)
+        else:
+            for block in range(len(depth)):
+                shortcut = False
+                block_list = []
+                for i in range(depth[block]):
+                    dilation_rate = dilation_dict[block] \
+                        if dilation_dict and block in dilation_dict else 1
+                    if block == 3:
+                        dilation_rate = dilation_rate * multi_grid[i]
+                    basic_block = self.add_sublayer(
+                        'bb_%d_%d' % (block, i),
+                        BasicBlock(
+                            in_channels=num_channels[block]
+                            if i == 0 else num_filters[block],
+                            out_channels=num_filters[block],
+                            stride=2 if i == 0 and block != 0 \
+                                        and dilation_rate == 1 else 1,
+                            dilation=dilation_rate,
+                            shortcut=shortcut,
+                            if_first=block == i == 0,
+                            data_format=data_format))
+                    block_list.append(basic_block)
+                    shortcut = True
+                self.stage_list.append(block_list)
+        self.pretrained = pretrained
+    def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
+        y = self.conv1_1(inputs)
+        y = self.conv1_2(y)
+        y = self.conv1_3(y)
+        self.conv1_logit = y.clone()
+        y = self.pool2d_max(y)
+        # A feature list saves the output feature map of each stage.
+        feat_list = []
+        for stage in self.stage_list:
+            for block in stage:
+                y = block(y)
+            feat_list.append(y)
+        return feat_list
+def ResNet50_vd(**args):
+    model = ResNet_vd(layers=50, **args)
+    return model
\ No newline at end of file
--- a/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/README.md
+++ b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/README.md
+# isanet_resnet50_cityscapes
+|模型名称|isanet_resnet50_cityscapes|
+| :--- | :---: | 
+|类别|图像-图像分割|
+|网络|isanet_resnet50vd|
+|数据集|Cityscapes|
+|是否支持Fine-tuning|是|
+|模型大小|217MB|
+|指标|-|
+|最新更新日期|2022-03-21|
+## 一、模型基本信息
+  - 样例结果示例：
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212111-df341f2a-e994-45d7-92d6-2288d666079c.png"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212188-2db40b29-2943-47ce-9ad2-36a6fb85ba3e.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### 模型介绍
+    - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+    - 更多详情请参考：[isanet](https://arxiv.org/abs/1907.12273)
+## 二、安装
+- ### 1、环境依赖
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、安装
+    - ```shell
+      $ hub install isanet_resnet50_cityscapes
+      ```
+    -  如您安装时遇到问题，可参考：[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+    | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)   
+## 三、模型API预测
+- ### 1.预测代码示例
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='isanet_resnet50_cityscapes')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.如何开始Fine-tune
+    - 在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用isanet_resnet50_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下：
+    - 代码步骤
+        - Step1: 定义数据预处理方式
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+        - Step2: 下载数据集并使用
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                - `transforms`: 数据预处理方式。
+                - `mode`: `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+                - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+        - Step3: 加载预训练模型
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='isanet_resnet50_cityscapes', num_classes=2, pretrained=None)
+              ```
+                - `name`: 选择预训练模型的名字。
+                - `load_checkpoint`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+        - Step4: 选择优化策略和运行配置
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```
+    - 模型预测
+        - 当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下：
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='isanet_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - 参数配置正确后，请执行脚本`python predict.py`。
+            - **Args**
+                * `images`:原始图像路径或BGR格式图片；
+                * `visualization`: 是否可视化，默认为True；
+                * `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+                **NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+## 四、服务部署
+- PaddleHub Serving可以部署一个在线图像分割服务。
+- ### 第一步：启动PaddleHub Serving
+    - 运行启动命令：
+    - ```shell
+      $ hub serving start -m isanet_resnet50_cityscapes
+      ```
+    - 这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+    - **NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+- ### 第二步：发送预测请求
+    - 配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        # 发送HTTP请求
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## 五、更新历史
+* 1.0.0
+  初始发布
--- a/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/README_en.md
+++ b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/README_en.md
+# isanet_resnet50_cityscapes
+|Module Name|isanet_resnet50_cityscapes|
+| :--- | :---: | 
+|Category|Image Segmentation|
+|Network|isanet_resnet50vd|
+|Dataset|Cityscapes|
+|Fine-tuning supported or not|Yes|
+|Module Size|217MB|
+|Data indicators|-|
+|Latest update date|2022-03-21|
+## I. Basic Information 
+- ### Application Effect Display
+    - Sample results:
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212111-df341f2a-e994-45d7-92d6-2288d666079c.png"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212188-2db40b29-2943-47ce-9ad2-36a6fb85ba3e.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### Module Introduction
+    - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+    - For more information, please refer to: [isanet](https://arxiv.org/abs/1907.12273)
+## II. Installation
+- ### 1、Environmental Dependence
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、Installation
+    - ```shell
+      $ hub install isanet_resnet50_cityscapes
+      ```
+    - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+    | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)  
+## III. Module API Prediction
+- ### 1、Prediction Code Example
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='isanet_resnet50_cityscapes')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.Fine-tune and Encapsulation
+    - After completing the installation of PaddlePaddle and PaddleHub, you can start using the isanet_resnet50_cityscapes model to fine-tune datasets such as OpticDiscSeg.
+    - Steps:
+         - Step1: Define the data preprocessing method
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+         - Step2: Download the dataset
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                * `transforms`: data preprocessing methods.
+                * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+                * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+        - Step3: Load the pre-trained model
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='isanet_resnet50_cityscapes', num_classes=2, pretrained=None)
+              ```
+                - `name`: model name.
+                - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+        - Step4:  Optimization strategy
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```
+    - Model prediction
+        - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='isanet_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - **Args**
+                * `images`: Image path or ndarray data with format [H, W, C], BGR.
+                * `visualization`: Whether to save the recognition results as picture files.
+                * `save_path`: Save path of the result, default is 'seg_result'.
+## IV. Server Deployment
+- PaddleHub Serving can deploy an online service of image segmentation.
+- ### Step 1: Start PaddleHub Serving
+    - Run the startup command:
+        - ```shell
+          $ hub serving start -m isanet_resnet50_cityscapes
+          ```
+    - The servitization API is now deployed and the default port number is 8866.
+    - **NOTE:**  If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+- ### Step 2: Send a predictive request
+    - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/isanet_resnet50_cityscapes"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## V. Release Note
+- 1.0.0
+  First release
--- a/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/layers.py
+++ b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/layers.py
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu':
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+class SeparableConvBNReLU(nn.Layer):
+    """Depthwise Separable Convolution."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(SeparableConvBNReLU, self).__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(
+            in_channels, out_channels, kernel_size=1, groups=1)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+class ConvBN(nn.Layer):
+    """Basic conv bn layer"""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBN, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBNReLU, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+        self._act = act
+        upper_act_names = activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(
+                    act, act_dict.keys()))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
+class ASPPModule(nn.Layer):
+    """
+    Atrous Spatial Pyramid Pooling.
+    Args:
+        aspp_ratios (tuple): The dilation rate using in ASSP module.
+        in_channels (int): The number of input channels.
+        out_channels (int): The number of output channels.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+        use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+        image_pooling (bool, optional): If augmented with image-level features. Default: False
+    """
+    def __init__(self,
+                 aspp_ratios: tuple,
+                 in_channels: int,
+                 out_channels: int,
+                 align_corners: bool,
+                 use_sep_conv: bool = False,
+                 image_pooling: bool = False):
+        super().__init__()
+        self.align_corners = align_corners
+        self.aspp_blocks = nn.LayerList()
+        for ratio in aspp_ratios:
+            if use_sep_conv and ratio > 1:
+                conv_func = SeparableConvBNReLU
+            else:
+                conv_func = ConvBNReLU
+            block = conv_func(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1 if ratio == 1 else 3,
+                dilation=ratio,
+                padding=0 if ratio == 1 else ratio)
+            self.aspp_blocks.append(block)
+        out_size = len(self.aspp_blocks)
+        if image_pooling:
+            self.global_avg_pool = nn.Sequential(
+                nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+                ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+            out_size += 1
+        self.image_pooling = image_pooling
+        self.conv_bn_relu = ConvBNReLU(
+            in_channels=out_channels * out_size,
+            out_channels=out_channels,
+            kernel_size=1)
+        self.dropout = nn.Dropout(p=0.1)  # drop rate
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outputs = []
+        for block in self.aspp_blocks:
+            y = block(x)
+            y = F.interpolate(
+                y,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(y)
+        if self.image_pooling:
+            img_avg = self.global_avg_pool(x)
+            img_avg = F.interpolate(
+                img_avg,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(img_avg)
+        x = paddle.concat(outputs, axis=1)
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        return x
+class AuxLayer(nn.Layer):
+    """
+    The auxiliary layer implementation for auxiliary loss.
+    Args:
+        in_channels (int): The number of input channels.
+        inter_channels (int): The intermediate channels.
+        out_channels (int): The number of output channels, and usually it is num_classes.
+        dropout_prob (float, optional): The drop rate. Default: 0.1.
+    """
+    def __init__(self,
+                 in_channels: int,
+                 inter_channels: int,
+                 out_channels: int,
+                 dropout_prob: float = 0.1,
+                 **kwargs):
+        super().__init__()
+        self.conv_bn_relu = ConvBNReLU(
+            in_channels=in_channels,
+            out_channels=inter_channels,
+            kernel_size=3,
+            padding=1,
+            **kwargs)
+        self.dropout = nn.Dropout(p=dropout_prob)
+        self.conv = nn.Conv2D(
+            in_channels=inter_channels,
+            out_channels=out_channels,
+            kernel_size=1)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        x = self.conv(x)
+        return x
+class Add(nn.Layer):
+    def __init__(self):
+        super().__init__()
+    def forward(self, x: paddle.Tensor, y: paddle.Tensor, name: str = None):
+        return paddle.add(x, y, name)
+class AttentionBlock(nn.Layer):
+    """General self-attention block/non-local block.
+    The original article refers to refer to https://arxiv.org/abs/1706.03762.
+    Args:
+        key_in_channels (int): Input channels of key feature.
+        query_in_channels (int): Input channels of query feature.
+        channels (int): Output channels of key/query transform.
+        out_channels (int): Output channels.
+        share_key_query (bool): Whether share projection weight between key
+            and query projection.
+        query_downsample (nn.Module): Query downsample module.
+        key_downsample (nn.Module): Key downsample module.
+        key_query_num_convs (int): Number of convs for key/query projection.
+        value_out_num_convs (int): Number of convs for value projection.
+        key_query_norm (bool): Whether to use BN for key/query projection.
+        value_out_norm (bool): Whether to use BN for value projection.
+        matmul_norm (bool): Whether normalize attention map with sqrt of
+            channels
+        with_out (bool): Whether use out projection.
+    """
+    def __init__(self, key_in_channels, query_in_channels, channels,
+                 out_channels, share_key_query, query_downsample,
+                 key_downsample, key_query_num_convs, value_out_num_convs,
+                 key_query_norm, value_out_norm, matmul_norm, with_out):
+        super(AttentionBlock, self).__init__()
+        if share_key_query:
+            assert key_in_channels == query_in_channels
+        self.with_out = with_out
+        self.key_in_channels = key_in_channels
+        self.query_in_channels = query_in_channels
+        self.out_channels = out_channels
+        self.channels = channels
+        self.share_key_query = share_key_query
+        self.key_project = self.build_project(
+            key_in_channels,
+            channels,
+            num_convs=key_query_num_convs,
+            use_conv_module=key_query_norm)
+        if share_key_query:
+            self.query_project = self.key_project
+        else:
+            self.query_project = self.build_project(
+                query_in_channels,
+                channels,
+                num_convs=key_query_num_convs,
+                use_conv_module=key_query_norm)
+        self.value_project = self.build_project(
+            key_in_channels,
+            channels if self.with_out else out_channels,
+            num_convs=value_out_num_convs,
+            use_conv_module=value_out_norm)
+        if self.with_out:
+            self.out_project = self.build_project(
+                channels,
+                out_channels,
+                num_convs=value_out_num_convs,
+                use_conv_module=value_out_norm)
+        else:
+            self.out_project = None
+        self.query_downsample = query_downsample
+        self.key_downsample = key_downsample
+        self.matmul_norm = matmul_norm
+    def build_project(self, in_channels: int , channels: int, num_convs: int, use_conv_module: bool):
+        if use_conv_module:
+            convs = [
+                ConvBNReLU(
+                    in_channels=in_channels,
+                    out_channels=channels,
+                    kernel_size=1,
+                    bias_attr=False)
+            ]
+            for _ in range(num_convs - 1):
+                convs.append(
+                    ConvBNReLU(
+                        in_channels=channels,
+                        out_channels=channels,
+                        kernel_size=1,
+                        bias_attr=False))
+        else:
+            convs = [nn.Conv2D(in_channels, channels, 1)]
+            for _ in range(num_convs - 1):
+                convs.append(nn.Conv2D(channels, channels, 1))
+        if len(convs) > 1:
+            convs = nn.Sequential(*convs)
+        else:
+            convs = convs[0]
+        return convs
+    def forward(self, query_feats: paddle.Tensor, key_feats: paddle.Tensor) -> paddle.Tensor:
+        query_shape = paddle.shape(query_feats)
+        query = self.query_project(query_feats)
+        if self.query_downsample is not None:
+            query = self.query_downsample(query)
+        query = query.flatten(2).transpose([0, 2, 1])
+        key = self.key_project(key_feats)
+        value = self.value_project(key_feats)
+        if self.key_downsample is not None:
+            key = self.key_downsample(key)
+            value = self.key_downsample(value)
+        key = key.flatten(2)
+        value = value.flatten(2).transpose([0, 2, 1])
+        sim_map = paddle.matmul(query, key)
+        if self.matmul_norm:
+            sim_map = (self.channels**-0.5) * sim_map
+        sim_map = F.softmax(sim_map, axis=-1)
+        context = paddle.matmul(sim_map, value)
+        context = paddle.transpose(context, [0, 2, 1])
+        context = paddle.reshape(
+            context, [0, self.out_channels, query_shape[2], query_shape[3]])
+        if self.out_project is not None:
+            context = self.out_project(context)
+        return context
--- a/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/module.py
+++ b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/module.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+from typing import Union, List, Tuple
+import numpy as np
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from isanet_resnet50_cityscapes.resnet import ResNet50_vd
+import isanet_resnet50_cityscapes.layers as layers
+@moduleinfo(
+    name="isanet_resnet50_cityscapes",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="ISANetResnet50 is a segmentation model.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class ISANet(nn.Layer):
+    """Interlaced Sparse Self-Attention for Semantic Segmentation.
+    The original article refers to Lang Huang, et al. "Interlaced Sparse Self-Attention for Semantic Segmentation"
+    (https://arxiv.org/abs/1907.12273).
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (tuple): The values in the tuple indicate the indices of output of backbone.
+        isa_channels (int): The channels of ISA Module.
+        down_factor (tuple): Divide the height and width dimension to (Ph, PW) groups.
+        enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.  Default: False.
+        pretrained (str, optional): The path or url of pretrained model. Default: None.
+    """
+    def __init__(self,
+                 num_classes: int = 19,
+                 backbone_indices: Tuple[int] = (2, 3),
+                 isa_channels: int = 256,
+                 down_factor: Tuple[int] = (8, 8),
+                 enable_auxiliary_loss: bool = True,
+                 align_corners: bool = False,
+                 pretrained: str = None):
+        super(ISANet, self).__init__()
+        self.backbone = ResNet50_vd()
+        self.backbone_indices = backbone_indices
+        in_channels = [self.backbone.feat_channels[i] for i in backbone_indices]
+        self.head = ISAHead(num_classes, in_channels, isa_channels, down_factor,
+                            enable_auxiliary_loss)
+        self.align_corners = align_corners
+        self.transforms = T.Compose([T.Normalize()])
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+        else:
+            checkpoint = os.path.join(self.directory, 'model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+    def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+        return self.transforms(img)
+    def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+        feats = self.backbone(x)
+        feats = [feats[i] for i in self.backbone_indices]
+        logit_list = self.head(feats)
+        logit_list = [
+            F.interpolate(
+                logit,
+                paddle.shape(x)[2:],
+                mode='bilinear',
+                align_corners=self.align_corners,
+                align_mode=1) for logit in logit_list
+        ]
+        return logit_list
+class ISAHead(nn.Layer):
+    """
+    The ISAHead.
+    Args:
+        num_classes (int): The unique number of target classes.
+        in_channels (tuple): The number of input channels.
+        isa_channels (int): The channels of ISA Module.
+        down_factor (tuple): Divide the height and width dimension to (Ph, PW) groups.
+        enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+    """
+    def __init__(self, 
+                 num_classes: int, 
+                 in_channels: int, 
+                 isa_channels: int, 
+                 down_factor: Tuple[int],
+                 enable_auxiliary_loss: bool):
+        super(ISAHead, self).__init__()
+        self.in_channels = in_channels[-1]
+        inter_channels = self.in_channels // 4
+        self.inter_channels = inter_channels
+        self.down_factor = down_factor
+        self.enable_auxiliary_loss = enable_auxiliary_loss
+        self.in_conv = layers.ConvBNReLU(
+            self.in_channels, inter_channels, 3, bias_attr=False)
+        self.global_relation = SelfAttentionBlock(inter_channels, isa_channels)
+        self.local_relation = SelfAttentionBlock(inter_channels, isa_channels)
+        self.out_conv = layers.ConvBNReLU(
+            inter_channels * 2, inter_channels, 1, bias_attr=False)
+        self.cls = nn.Sequential(
+            nn.Dropout2D(p=0.1), nn.Conv2D(inter_channels, num_classes, 1))
+        self.aux = nn.Sequential(
+            layers.ConvBNReLU(
+                in_channels=1024,
+                out_channels=256,
+                kernel_size=3,
+                bias_attr=False), nn.Dropout2D(p=0.1),
+            nn.Conv2D(256, num_classes, 1))
+    def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]:
+        C3, C4 = feat_list
+        x = self.in_conv(C4)
+        x_shape = paddle.shape(x)
+        P_h, P_w = self.down_factor
+        Q_h, Q_w = paddle.ceil(x_shape[2] / P_h).astype('int32'), paddle.ceil(
+            x_shape[3] / P_w).astype('int32')
+        pad_h, pad_w = (Q_h * P_h - x_shape[2]).astype('int32'), (
+                Q_w * P_w - x_shape[3]).astype('int32')
+        if pad_h > 0 or pad_w > 0:
+            padding = paddle.concat([
+                pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2
+            ],
+                axis=0)
+            feat = F.pad(x, padding)
+        else:
+            feat = x
+        feat = feat.reshape([0, x_shape[1], Q_h, P_h, Q_w, P_w])
+        feat = feat.transpose([0, 3, 5, 1, 2,
+                               4]).reshape([-1, self.inter_channels, Q_h, Q_w])
+        feat = self.global_relation(feat)
+        feat = feat.reshape([x_shape[0], P_h, P_w, x_shape[1], Q_h, Q_w])
+        feat = feat.transpose([0, 4, 5, 3, 1,
+                               2]).reshape([-1, self.inter_channels, P_h, P_w])
+        feat = self.local_relation(feat)
+        feat = feat.reshape([x_shape[0], Q_h, Q_w, x_shape[1], P_h, P_w])
+        feat = feat.transpose([0, 3, 1, 4, 2, 5]).reshape(
+            [0, self.inter_channels, P_h * Q_h, P_w * Q_w])
+        if pad_h > 0 or pad_w > 0:
+            feat = paddle.slice(
+                feat,
+                axes=[2, 3],
+                starts=[pad_h // 2, pad_w // 2],
+                ends=[pad_h // 2 + x_shape[2], pad_w // 2 + x_shape[3]])
+        feat = self.out_conv(paddle.concat([feat, x], axis=1))
+        output = self.cls(feat)
+        if self.enable_auxiliary_loss:
+            auxout = self.aux(C3)
+            return [output, auxout]
+        else:
+            return [output]
+class SelfAttentionBlock(layers.AttentionBlock):
+    """General self-attention block/non-local block.
+       Args:
+            in_channels (int): Input channels of key/query feature.
+            channels (int): Output channels of key/query transform.
+    """
+    def __init__(self, in_channels: int, channels: int):
+        super(SelfAttentionBlock, self).__init__(
+            key_in_channels=in_channels,
+            query_in_channels=in_channels,
+            channels=channels,
+            out_channels=in_channels,
+            share_key_query=False,
+            query_downsample=None,
+            key_downsample=None,
+            key_query_num_convs=2,
+            key_query_norm=True,
+            value_out_num_convs=1,
+            value_out_norm=False,
+            matmul_norm=True,
+            with_out=False)
+        self.output_project = self.build_project(
+            in_channels, in_channels, num_convs=1, use_conv_module=True)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        context = super(SelfAttentionBlock, self).forward(x, x)
+        return self.output_project(context)
--- a/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/resnet.py
+++ b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/resnet.py
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import isanet_resnet50_cityscapes.layers as layers
+class ConvBNLayer(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 stride: int = 1,
+                 dilation: int = 1,
+                 groups: int = 1,
+                 is_vd_mode: bool = False,
+                 act: str = None,
+                 data_format: str = 'NCHW'):
+        super(ConvBNLayer, self).__init__()
+        if dilation != 1 and kernel_size != 3:
+            raise RuntimeError("When the dilation isn't 1," \
+                               "the kernel_size should be 3.")
+        self.is_vd_mode = is_vd_mode
+        self._pool2d_avg = nn.AvgPool2D(
+            kernel_size=2,
+            stride=2,
+            padding=0,
+            ceil_mode=True,
+            data_format=data_format)
+        self._conv = nn.Conv2D(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=(kernel_size - 1) // 2 \
+                if dilation == 1 else dilation,
+            dilation=dilation,
+            groups=groups,
+            bias_attr=False,
+            data_format=data_format)
+        self._batch_norm = layers.SyncBatchNorm(
+            out_channels, data_format=data_format)
+        self._act_op = layers.Activation(act=act)
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        if self.is_vd_mode:
+            inputs = self._pool2d_avg(inputs)
+        y = self._conv(inputs)
+        y = self._batch_norm(y)
+        y = self._act_op(y)
+        return y
+class BottleneckBlock(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 shortcut: bool = True,
+                 if_first: bool  = False,
+                 dilation: int = 1,
+                 data_format: str = 'NCHW'):
+        super(BottleneckBlock, self).__init__()
+        self.data_format = data_format
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=1,
+            act='relu',
+            data_format=data_format)
+        self.dilation = dilation
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            act='relu',
+            dilation=dilation,
+            data_format=data_format)
+        self.conv2 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels * 4,
+            kernel_size=1,
+            act=None,
+            data_format=data_format)
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels * 4,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                data_format=data_format)
+        self.shortcut = shortcut
+        # NOTE: Use the wrap layer for quantization training
+        self.add = layers.Add()
+        self.relu = layers.Activation(act="relu")
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        conv1 = self.conv1(y)
+        conv2 = self.conv2(conv1)
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+        y = self.add(short, conv2)
+        y = self.relu(y)
+        return y
+class BasicBlock(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 dilation: int = 1,
+                 shortcut: bool = True,
+                 if_first: bool = False,
+                 data_format: str = 'NCHW'):
+        super(BasicBlock, self).__init__()
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            dilation=dilation,
+            act='relu',
+            data_format=data_format)
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            dilation=dilation,
+            act=None,
+            data_format=data_format)
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                data_format=data_format)
+        self.shortcut = shortcut
+        self.dilation = dilation
+        self.data_format = data_format
+        self.add = layers.Add()
+        self.relu = layers.Activation(act="relu")
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        conv1 = self.conv1(y)
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+        y = self.add(short, conv1)
+        y = self.relu(y)
+        return y
+class ResNet_vd(nn.Layer):
+    """
+    The ResNet_vd implementation based on PaddlePaddle.
+    The original article refers to Jingdong
+    Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
+    (https://arxiv.org/pdf/1812.01187.pdf).
+    Args:
+        layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50.
+        output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8.
+        multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1).
+        pretrained (str, optional): The path of pretrained model.
+    """
+    def __init__(self,
+                 layers: int = 50,
+                 output_stride: int = 8,
+                 multi_grid: Tuple[int] = (1, 1, 1),
+                 pretrained: str = None,
+                 data_format: str = 'NCHW'):
+        super(ResNet_vd, self).__init__()
+        self.data_format = data_format
+        self.conv1_logit = None  # for gscnn shape stream
+        self.layers = layers
+        supported_layers = [18, 34, 50, 101, 152, 200]
+        assert layers in supported_layers, \
+            "supported layers are {} but input layer is {}".format(
+                supported_layers, layers)
+        if layers == 18:
+            depth = [2, 2, 2, 2]
+        elif layers == 34 or layers == 50:
+            depth = [3, 4, 6, 3]
+        elif layers == 101:
+            depth = [3, 4, 23, 3]
+        elif layers == 152:
+            depth = [3, 8, 36, 3]
+        elif layers == 200:
+            depth = [3, 12, 48, 3]
+        num_channels = [64, 256, 512, 1024
+                        ] if layers >= 50 else [64, 64, 128, 256]
+        num_filters = [64, 128, 256, 512]
+        # for channels of four returned stages
+        self.feat_channels = [c * 4 for c in num_filters
+                              ] if layers >= 50 else num_filters
+        dilation_dict = None
+        if output_stride == 8:
+            dilation_dict = {2: 2, 3: 4}
+        elif output_stride == 16:
+            dilation_dict = {3: 2}
+        self.conv1_1 = ConvBNLayer(
+            in_channels=3,
+            out_channels=32,
+            kernel_size=3,
+            stride=2,
+            act='relu',
+            data_format=data_format)
+        self.conv1_2 = ConvBNLayer(
+            in_channels=32,
+            out_channels=32,
+            kernel_size=3,
+            stride=1,
+            act='relu',
+            data_format=data_format)
+        self.conv1_3 = ConvBNLayer(
+            in_channels=32,
+            out_channels=64,
+            kernel_size=3,
+            stride=1,
+            act='relu',
+            data_format=data_format)
+        self.pool2d_max = nn.MaxPool2D(
+            kernel_size=3, stride=2, padding=1, data_format=data_format)
+        # self.block_list = []
+        self.stage_list = []
+        if layers >= 50:
+            for block in range(len(depth)):
+                shortcut = False
+                block_list = []
+                for i in range(depth[block]):
+                    if layers in [101, 152] and block == 2:
+                        if i == 0:
+                            conv_name = "res" + str(block + 2) + "a"
+                        else:
+                            conv_name = "res" + str(block + 2) + "b" + str(i)
+                    else:
+                        conv_name = "res" + str(block + 2) + chr(97 + i)
+                    ###############################################################################
+                    # Add dilation rate for some segmentation tasks, if dilation_dict is not None.
+                    dilation_rate = dilation_dict[
+                        block] if dilation_dict and block in dilation_dict else 1
+                    # Actually block here is 'stage', and i is 'block' in 'stage'
+                    # At the stage 4, expand the the dilation_rate if given multi_grid
+                    if block == 3:
+                        dilation_rate = dilation_rate * multi_grid[i]
+                    ###############################################################################
+                    bottleneck_block = self.add_sublayer(
+                        'bb_%d_%d' % (block, i),
+                        BottleneckBlock(
+                            in_channels=num_channels[block]
+                            if i == 0 else num_filters[block] * 4,
+                            out_channels=num_filters[block],
+                            stride=2 if i == 0 and block != 0
+                                        and dilation_rate == 1 else 1,
+                            shortcut=shortcut,
+                            if_first=block == i == 0,
+                            dilation=dilation_rate,
+                            data_format=data_format))
+                    block_list.append(bottleneck_block)
+                    shortcut = True
+                self.stage_list.append(block_list)
+        else:
+            for block in range(len(depth)):
+                shortcut = False
+                block_list = []
+                for i in range(depth[block]):
+                    dilation_rate = dilation_dict[block] \
+                        if dilation_dict and block in dilation_dict else 1
+                    if block == 3:
+                        dilation_rate = dilation_rate * multi_grid[i]
+                    basic_block = self.add_sublayer(
+                        'bb_%d_%d' % (block, i),
+                        BasicBlock(
+                            in_channels=num_channels[block]
+                            if i == 0 else num_filters[block],
+                            out_channels=num_filters[block],
+                            stride=2 if i == 0 and block != 0 \
+                                        and dilation_rate == 1 else 1,
+                            dilation=dilation_rate,
+                            shortcut=shortcut,
+                            if_first=block == i == 0,
+                            data_format=data_format))
+                    block_list.append(basic_block)
+                    shortcut = True
+                self.stage_list.append(block_list)
+        self.pretrained = pretrained
+    def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
+        y = self.conv1_1(inputs)
+        y = self.conv1_2(y)
+        y = self.conv1_3(y)
+        self.conv1_logit = y.clone()
+        y = self.pool2d_max(y)
+        # A feature list saves the output feature map of each stage.
+        feat_list = []
+        for stage in self.stage_list:
+            for block in stage:
+                y = block(y)
+            feat_list.append(y)
+        return feat_list
+def ResNet50_vd(**args):
+    model = ResNet_vd(layers=50, **args)
+    return model
--- a/modules/image/semantic_segmentation/isanet_resnet50_voc/README.md
+++ b/modules/image/semantic_segmentation/isanet_resnet50_voc/README.md
+# isanet_resnet50_voc
+|模型名称|isanet_resnet50_voc|
+| :--- | :---: | 
+|类别|图像-图像分割|
+|网络|isanet_resnet50vd|
+|数据集|PascalVOC2012|
+|是否支持Fine-tuning|是|
+|模型大小|217MB|
+|指标|-|
+|最新更新日期|2022-03-21|
+## 一、模型基本信息
+  - 样例结果示例：
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212097-443a5a65-2f2e-4126-9c07-d7c3c220e55f.jpg"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212375-52e123af-4699-4c25-8f50-4240bbb714b4.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### 模型介绍
+    - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+    - 更多详情请参考：[isanet](https://arxiv.org/abs/1907.12273)
+## 二、安装
+- ### 1、环境依赖
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、安装
+    - ```shell
+      $ hub install isanet_resnet50_voc
+      ```
+    -  如您安装时遇到问题，可参考：[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+    | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)   
+## 三、模型API预测
+- ### 1.预测代码示例
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='isanet_resnet50_voc')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.如何开始Fine-tune
+    - 在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用isanet_resnet50_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下：
+    - 代码步骤
+        - Step1: 定义数据预处理方式
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+        - Step2: 下载数据集并使用
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                - `transforms`: 数据预处理方式。
+                - `mode`: `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+                - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+        - Step3: 加载预训练模型
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='isanet_resnet50_voc', num_classes=2, pretrained=None)
+              ```
+                - `name`: 选择预训练模型的名字。
+                - `load_checkpoint`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+        - Step4: 选择优化策略和运行配置
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```
+    - 模型预测
+        - 当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下：
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='isanet_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - 参数配置正确后，请执行脚本`python predict.py`。
+            - **Args**
+                * `images`:原始图像路径或BGR格式图片；
+                * `visualization`: 是否可视化，默认为True；
+                * `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+                **NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+## 四、服务部署
+- PaddleHub Serving可以部署一个在线图像分割服务。
+- ### 第一步：启动PaddleHub Serving
+    - 运行启动命令：
+    - ```shell
+      $ hub serving start -m isanet_resnet50_voc
+      ```
+    - 这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+    - **NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+- ### 第二步：发送预测请求
+    - 配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        # 发送HTTP请求
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## 五、更新历史
+* 1.0.0
+  初始发布
--- a/modules/image/semantic_segmentation/isanet_resnet50_voc/README_en.md
+++ b/modules/image/semantic_segmentation/isanet_resnet50_voc/README_en.md
+# isanet_resnet50_voc
+|Module Name|isanet_resnet50_voc|
+| :--- | :---: | 
+|Category|Image Segmentation|
+|Network|isanet_resnet50vd|
+|Dataset|PascalVOC2012|
+|Fine-tuning supported or not|Yes|
+|Module Size|217MB|
+|Data indicators|-|
+|Latest update date|2022-03-22|
+## I. Basic Information 
+- ### Application Effect Display
+    - Sample results:
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212097-443a5a65-2f2e-4126-9c07-d7c3c220e55f.jpg"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212375-52e123af-4699-4c25-8f50-4240bbb714b4.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### Module Introduction
+    - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+    - For more information, please refer to: [isanet](https://arxiv.org/abs/1907.12273)
+## II. Installation
+- ### 1、Environmental Dependence
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、Installation
+    - ```shell
+      $ hub install isanet_resnet50_voc
+      ```
+    - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+    | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)  
+## III. Module API Prediction
+- ### 1、Prediction Code Example
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='isanet_resnet50_voc')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.Fine-tune and Encapsulation
+    - After completing the installation of PaddlePaddle and PaddleHub, you can start using the isanet_resnet50_voc model to fine-tune datasets such as OpticDiscSeg.
+    - Steps:
+         - Step1: Define the data preprocessing method
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+         - Step2: Download the dataset
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                * `transforms`: data preprocessing methods.
+                * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+                * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+        - Step3: Load the pre-trained model
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='isanet_resnet50_voc', num_classes=2, pretrained=None)
+              ```
+                - `name`: model name.
+                - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+        - Step4:  Optimization strategy
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```
+    -  Model prediction
+        - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='isanet_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - **Args**
+                * `images`: Image path or ndarray data with format [H, W, C], BGR.
+                * `visualization`: Whether to save the recognition results as picture files.
+                * `save_path`: Save path of the result, default is 'seg_result'.
+## IV. Server Deployment
+- PaddleHub Serving can deploy an online service of image segmentation.
+- ### Step 1: Start PaddleHub Serving
+    - Run the startup command:
+        - ```shell
+          $ hub serving start -m isanet_resnet50_voc
+          ```
+    - The servitization API is now deployed and the default port number is 8866.
+    - **NOTE:**  If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+- ### Step 2: Send a predictive request
+    - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/isanet_resnet50_voc"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## V. Release Note
+- 1.0.0
+  First release
--- a/modules/image/semantic_segmentation/isanet_resnet50_voc/layers.py
+++ b/modules/image/semantic_segmentation/isanet_resnet50_voc/layers.py
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu':
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+class SeparableConvBNReLU(nn.Layer):
+    """Depthwise Separable Convolution."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(SeparableConvBNReLU, self).__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(
+            in_channels, out_channels, kernel_size=1, groups=1)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+class ConvBN(nn.Layer):
+    """Basic conv bn layer"""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBN, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBNReLU, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+        self._act = act
+        upper_act_names = activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(
+                    act, act_dict.keys()))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
+class ASPPModule(nn.Layer):
+    """
+    Atrous Spatial Pyramid Pooling.
+    Args:
+        aspp_ratios (tuple): The dilation rate using in ASSP module.
+        in_channels (int): The number of input channels.
+        out_channels (int): The number of output channels.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+        use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+        image_pooling (bool, optional): If augmented with image-level features. Default: False
+    """
+    def __init__(self,
+                 aspp_ratios: tuple,
+                 in_channels: int,
+                 out_channels: int,
+                 align_corners: bool,
+                 use_sep_conv: bool = False,
+                 image_pooling: bool = False):
+        super().__init__()
+        self.align_corners = align_corners
+        self.aspp_blocks = nn.LayerList()
+        for ratio in aspp_ratios:
+            if use_sep_conv and ratio > 1:
+                conv_func = SeparableConvBNReLU
+            else:
+                conv_func = ConvBNReLU
+            block = conv_func(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1 if ratio == 1 else 3,
+                dilation=ratio,
+                padding=0 if ratio == 1 else ratio)
+            self.aspp_blocks.append(block)
+        out_size = len(self.aspp_blocks)
+        if image_pooling:
+            self.global_avg_pool = nn.Sequential(
+                nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+                ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+            out_size += 1
+        self.image_pooling = image_pooling
+        self.conv_bn_relu = ConvBNReLU(
+            in_channels=out_channels * out_size,
+            out_channels=out_channels,
+            kernel_size=1)
+        self.dropout = nn.Dropout(p=0.1)  # drop rate
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outputs = []
+        for block in self.aspp_blocks:
+            y = block(x)
+            y = F.interpolate(
+                y,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(y)
+        if self.image_pooling:
+            img_avg = self.global_avg_pool(x)
+            img_avg = F.interpolate(
+                img_avg,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(img_avg)
+        x = paddle.concat(outputs, axis=1)
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        return x
+class AuxLayer(nn.Layer):
+    """
+    The auxiliary layer implementation for auxiliary loss.
+    Args:
+        in_channels (int): The number of input channels.
+        inter_channels (int): The intermediate channels.
+        out_channels (int): The number of output channels, and usually it is num_classes.
+        dropout_prob (float, optional): The drop rate. Default: 0.1.
+    """
+    def __init__(self,
+                 in_channels: int,
+                 inter_channels: int,
+                 out_channels: int,
+                 dropout_prob: float = 0.1,
+                 **kwargs):
+        super().__init__()
+        self.conv_bn_relu = ConvBNReLU(
+            in_channels=in_channels,
+            out_channels=inter_channels,
+            kernel_size=3,
+            padding=1,
+            **kwargs)
+        self.dropout = nn.Dropout(p=dropout_prob)
+        self.conv = nn.Conv2D(
+            in_channels=inter_channels,
+            out_channels=out_channels,
+            kernel_size=1)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        x = self.conv(x)
+        return x
+class Add(nn.Layer):
+    def __init__(self):
+        super().__init__()
+    def forward(self, x: paddle.Tensor, y: paddle.Tensor, name: str = None):
+        return paddle.add(x, y, name)
+class AttentionBlock(nn.Layer):
+    """General self-attention block/non-local block.
+    The original article refers to refer to https://arxiv.org/abs/1706.03762.
+    Args:
+        key_in_channels (int): Input channels of key feature.
+        query_in_channels (int): Input channels of query feature.
+        channels (int): Output channels of key/query transform.
+        out_channels (int): Output channels.
+        share_key_query (bool): Whether share projection weight between key
+            and query projection.
+        query_downsample (nn.Module): Query downsample module.
+        key_downsample (nn.Module): Key downsample module.
+        key_query_num_convs (int): Number of convs for key/query projection.
+        value_out_num_convs (int): Number of convs for value projection.
+        key_query_norm (bool): Whether to use BN for key/query projection.
+        value_out_norm (bool): Whether to use BN for value projection.
+        matmul_norm (bool): Whether normalize attention map with sqrt of
+            channels
+        with_out (bool): Whether use out projection.
+    """
+    def __init__(self, key_in_channels, query_in_channels, channels,
+                 out_channels, share_key_query, query_downsample,
+                 key_downsample, key_query_num_convs, value_out_num_convs,
+                 key_query_norm, value_out_norm, matmul_norm, with_out):
+        super(AttentionBlock, self).__init__()
+        if share_key_query:
+            assert key_in_channels == query_in_channels
+        self.with_out = with_out
+        self.key_in_channels = key_in_channels
+        self.query_in_channels = query_in_channels
+        self.out_channels = out_channels
+        self.channels = channels
+        self.share_key_query = share_key_query
+        self.key_project = self.build_project(
+            key_in_channels,
+            channels,
+            num_convs=key_query_num_convs,
+            use_conv_module=key_query_norm)
+        if share_key_query:
+            self.query_project = self.key_project
+        else:
+            self.query_project = self.build_project(
+                query_in_channels,
+                channels,
+                num_convs=key_query_num_convs,
+                use_conv_module=key_query_norm)
+        self.value_project = self.build_project(
+            key_in_channels,
+            channels if self.with_out else out_channels,
+            num_convs=value_out_num_convs,
+            use_conv_module=value_out_norm)
+        if self.with_out:
+            self.out_project = self.build_project(
+                channels,
+                out_channels,
+                num_convs=value_out_num_convs,
+                use_conv_module=value_out_norm)
+        else:
+            self.out_project = None
+        self.query_downsample = query_downsample
+        self.key_downsample = key_downsample
+        self.matmul_norm = matmul_norm
+    def build_project(self, in_channels: int, channels: int, num_convs: int, use_conv_module: bool):
+        if use_conv_module:
+            convs = [
+                ConvBNReLU(
+                    in_channels=in_channels,
+                    out_channels=channels,
+                    kernel_size=1,
+                    bias_attr=False)
+            ]
+            for _ in range(num_convs - 1):
+                convs.append(
+                    ConvBNReLU(
+                        in_channels=channels,
+                        out_channels=channels,
+                        kernel_size=1,
+                        bias_attr=False))
+        else:
+            convs = [nn.Conv2D(in_channels, channels, 1)]
+            for _ in range(num_convs - 1):
+                convs.append(nn.Conv2D(channels, channels, 1))
+        if len(convs) > 1:
+            convs = nn.Sequential(*convs)
+        else:
+            convs = convs[0]
+        return convs
+    def forward(self, query_feats: paddle.Tensor, key_feats: paddle.Tensor) -> paddle.Tensor:
+        query_shape = paddle.shape(query_feats)
+        query = self.query_project(query_feats)
+        if self.query_downsample is not None:
+            query = self.query_downsample(query)
+        query = query.flatten(2).transpose([0, 2, 1])
+        key = self.key_project(key_feats)
+        value = self.value_project(key_feats)
+        if self.key_downsample is not None:
+            key = self.key_downsample(key)
+            value = self.key_downsample(value)
+        key = key.flatten(2)
+        value = value.flatten(2).transpose([0, 2, 1])
+        sim_map = paddle.matmul(query, key)
+        if self.matmul_norm:
+            sim_map = (self.channels**-0.5) * sim_map
+        sim_map = F.softmax(sim_map, axis=-1)
+        context = paddle.matmul(sim_map, value)
+        context = paddle.transpose(context, [0, 2, 1])
+        context = paddle.reshape(
+            context, [0, self.out_channels, query_shape[2], query_shape[3]])
+        if self.out_project is not None:
+            context = self.out_project(context)
+        return context
--- a/modules/image/semantic_segmentation/isanet_resnet50_voc/module.py
+++ b/modules/image/semantic_segmentation/isanet_resnet50_voc/module.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+from typing import Union, List, Tuple
+import numpy as np
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from isanet_resnet50_voc.resnet import ResNet50_vd
+import isanet_resnet50_voc.layers as layers
+@moduleinfo(
+    name="isanet_resnet50_voc",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="ISANetResnet50 is a segmentation model.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class ISANet(nn.Layer):
+    """Interlaced Sparse Self-Attention for Semantic Segmentation.
+    The original article refers to Lang Huang, et al. "Interlaced Sparse Self-Attention for Semantic Segmentation"
+    (https://arxiv.org/abs/1907.12273).
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (tuple): The values in the tuple indicate the indices of output of backbone.
+        isa_channels (int): The channels of ISA Module.
+        down_factor (tuple): Divide the height and width dimension to (Ph, PW) groups.
+        enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.  Default: False.
+        pretrained (str, optional): The path or url of pretrained model. Default: None.
+    """
+    def __init__(self,
+                 num_classes: int = 21,
+                 backbone_indices: Tuple[int] = (2, 3),
+                 isa_channels: int = 256,
+                 down_factor: Tuple[int] = (8, 8),
+                 enable_auxiliary_loss: bool = True,
+                 align_corners: bool = False,
+                 pretrained: str = None):
+        super(ISANet, self).__init__()
+        self.backbone = ResNet50_vd()
+        self.backbone_indices = backbone_indices
+        in_channels = [self.backbone.feat_channels[i] for i in backbone_indices]
+        self.head = ISAHead(num_classes, in_channels, isa_channels, down_factor,
+                            enable_auxiliary_loss)
+        self.align_corners = align_corners
+        self.transforms = T.Compose([T.Normalize()])
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+        else:
+            checkpoint = os.path.join(self.directory, 'model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+    def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+        return self.transforms(img)
+    def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+        feats = self.backbone(x)
+        feats = [feats[i] for i in self.backbone_indices]
+        logit_list = self.head(feats)
+        logit_list = [
+            F.interpolate(
+                logit,
+                paddle.shape(x)[2:],
+                mode='bilinear',
+                align_corners=self.align_corners,
+                align_mode=1) for logit in logit_list
+        ]
+        return logit_list
+class ISAHead(nn.Layer):
+    """
+    The ISAHead.
+    Args:
+        num_classes (int): The unique number of target classes.
+        in_channels (tuple): The number of input channels.
+        isa_channels (int): The channels of ISA Module.
+        down_factor (tuple): Divide the height and width dimension to (Ph, PW) groups.
+        enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+    """
+    def __init__(self, 
+                 num_classes: int, 
+                 in_channels: Tuple[int], 
+                 isa_channels: int, 
+                 down_factor: Tuple[int],
+                 enable_auxiliary_loss: bool):
+        super(ISAHead, self).__init__()
+        self.in_channels = in_channels[-1]
+        inter_channels = self.in_channels // 4
+        self.inter_channels = inter_channels
+        self.down_factor = down_factor
+        self.enable_auxiliary_loss = enable_auxiliary_loss
+        self.in_conv = layers.ConvBNReLU(
+            self.in_channels, inter_channels, 3, bias_attr=False)
+        self.global_relation = SelfAttentionBlock(inter_channels, isa_channels)
+        self.local_relation = SelfAttentionBlock(inter_channels, isa_channels)
+        self.out_conv = layers.ConvBNReLU(
+            inter_channels * 2, inter_channels, 1, bias_attr=False)
+        self.cls = nn.Sequential(
+            nn.Dropout2D(p=0.1), nn.Conv2D(inter_channels, num_classes, 1))
+        self.aux = nn.Sequential(
+            layers.ConvBNReLU(
+                in_channels=1024,
+                out_channels=256,
+                kernel_size=3,
+                bias_attr=False), nn.Dropout2D(p=0.1),
+            nn.Conv2D(256, num_classes, 1))
+    def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]:
+        C3, C4 = feat_list
+        x = self.in_conv(C4)
+        x_shape = paddle.shape(x)
+        P_h, P_w = self.down_factor
+        Q_h, Q_w = paddle.ceil(x_shape[2] / P_h).astype('int32'), paddle.ceil(
+            x_shape[3] / P_w).astype('int32')
+        pad_h, pad_w = (Q_h * P_h - x_shape[2]).astype('int32'), (
+                Q_w * P_w - x_shape[3]).astype('int32')
+        if pad_h > 0 or pad_w > 0:
+            padding = paddle.concat([
+                pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2
+            ],
+                axis=0)
+            feat = F.pad(x, padding)
+        else:
+            feat = x
+        feat = feat.reshape([0, x_shape[1], Q_h, P_h, Q_w, P_w])
+        feat = feat.transpose([0, 3, 5, 1, 2,
+                               4]).reshape([-1, self.inter_channels, Q_h, Q_w])
+        feat = self.global_relation(feat)
+        feat = feat.reshape([x_shape[0], P_h, P_w, x_shape[1], Q_h, Q_w])
+        feat = feat.transpose([0, 4, 5, 3, 1,
+                               2]).reshape([-1, self.inter_channels, P_h, P_w])
+        feat = self.local_relation(feat)
+        feat = feat.reshape([x_shape[0], Q_h, Q_w, x_shape[1], P_h, P_w])
+        feat = feat.transpose([0, 3, 1, 4, 2, 5]).reshape(
+            [0, self.inter_channels, P_h * Q_h, P_w * Q_w])
+        if pad_h > 0 or pad_w > 0:
+            feat = paddle.slice(
+                feat,
+                axes=[2, 3],
+                starts=[pad_h // 2, pad_w // 2],
+                ends=[pad_h // 2 + x_shape[2], pad_w // 2 + x_shape[3]])
+        feat = self.out_conv(paddle.concat([feat, x], axis=1))
+        output = self.cls(feat)
+        if self.enable_auxiliary_loss:
+            auxout = self.aux(C3)
+            return [output, auxout]
+        else:
+            return [output]
+class SelfAttentionBlock(layers.AttentionBlock):
+    """General self-attention block/non-local block.
+       Args:
+            in_channels (int): Input channels of key/query feature.
+            channels (int): Output channels of key/query transform.
+    """
+    def __init__(self, in_channels, channels):
+        super(SelfAttentionBlock, self).__init__(
+            key_in_channels=in_channels,
+            query_in_channels=in_channels,
+            channels=channels,
+            out_channels=in_channels,
+            share_key_query=False,
+            query_downsample=None,
+            key_downsample=None,
+            key_query_num_convs=2,
+            key_query_norm=True,
+            value_out_num_convs=1,
+            value_out_norm=False,
+            matmul_norm=True,
+            with_out=False)
+        self.output_project = self.build_project(
+            in_channels, in_channels, num_convs=1, use_conv_module=True)
+    def forward(self, x):
+        context = super(SelfAttentionBlock, self).forward(x, x)
+        return self.output_project(context)
--- a/modules/image/semantic_segmentation/isanet_resnet50_voc/resnet.py
+++ b/modules/image/semantic_segmentation/isanet_resnet50_voc/resnet.py
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import isanet_resnet50_voc.layers as layers
+class ConvBNLayer(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 stride: int = 1,
+                 dilation: int = 1,
+                 groups: int = 1,
+                 is_vd_mode: bool = False,
+                 act: str = None,
+                 data_format: str = 'NCHW'):
+        super(ConvBNLayer, self).__init__()
+        if dilation != 1 and kernel_size != 3:
+            raise RuntimeError("When the dilation isn't 1," \
+                               "the kernel_size should be 3.")
+        self.is_vd_mode = is_vd_mode
+        self._pool2d_avg = nn.AvgPool2D(
+            kernel_size=2,
+            stride=2,
+            padding=0,
+            ceil_mode=True,
+            data_format=data_format)
+        self._conv = nn.Conv2D(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=(kernel_size - 1) // 2 \
+                if dilation == 1 else dilation,
+            dilation=dilation,
+            groups=groups,
+            bias_attr=False,
+            data_format=data_format)
+        self._batch_norm = layers.SyncBatchNorm(
+            out_channels, data_format=data_format)
+        self._act_op = layers.Activation(act=act)
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        if self.is_vd_mode:
+            inputs = self._pool2d_avg(inputs)
+        y = self._conv(inputs)
+        y = self._batch_norm(y)
+        y = self._act_op(y)
+        return y
+class BottleneckBlock(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 shortcut: bool = True,
+                 if_first: bool  = False,
+                 dilation: int = 1,
+                 data_format: str = 'NCHW'):
+        super(BottleneckBlock, self).__init__()
+        self.data_format = data_format
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=1,
+            act='relu',
+            data_format=data_format)
+        self.dilation = dilation
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            act='relu',
+            dilation=dilation,
+            data_format=data_format)
+        self.conv2 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels * 4,
+            kernel_size=1,
+            act=None,
+            data_format=data_format)
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels * 4,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                data_format=data_format)
+        self.shortcut = shortcut
+        # NOTE: Use the wrap layer for quantization training
+        self.add = layers.Add()
+        self.relu = layers.Activation(act="relu")
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        conv1 = self.conv1(y)
+        conv2 = self.conv2(conv1)
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+        y = self.add(short, conv2)
+        y = self.relu(y)
+        return y
+class BasicBlock(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 dilation: int = 1,
+                 shortcut: bool = True,
+                 if_first: bool = False,
+                 data_format: str = 'NCHW'):
+        super(BasicBlock, self).__init__()
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            dilation=dilation,
+            act='relu',
+            data_format=data_format)
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            dilation=dilation,
+            act=None,
+            data_format=data_format)
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                data_format=data_format)
+        self.shortcut = shortcut
+        self.dilation = dilation
+        self.data_format = data_format
+        self.add = layers.Add()
+        self.relu = layers.Activation(act="relu")
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        conv1 = self.conv1(y)
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+        y = self.add(short, conv1)
+        y = self.relu(y)
+        return y
+class ResNet_vd(nn.Layer):
+    """
+    The ResNet_vd implementation based on PaddlePaddle.
+    The original article refers to Jingdong
+    Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
+    (https://arxiv.org/pdf/1812.01187.pdf).
+    Args:
+        layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50.
+        output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8.
+        multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1).
+        pretrained (str, optional): The path of pretrained model.
+    """
+    def __init__(self,
+                 layers: int = 50,
+                 output_stride: int = 8,
+                 multi_grid: Tuple[int] = (1, 1, 1),
+                 pretrained: str = None,
+                 data_format: str = 'NCHW'):
+        super(ResNet_vd, self).__init__()
+        self.data_format = data_format
+        self.conv1_logit = None  # for gscnn shape stream
+        self.layers = layers
+        supported_layers = [18, 34, 50, 101, 152, 200]
+        assert layers in supported_layers, \
+            "supported layers are {} but input layer is {}".format(
+                supported_layers, layers)
+        if layers == 18:
+            depth = [2, 2, 2, 2]
+        elif layers == 34 or layers == 50:
+            depth = [3, 4, 6, 3]
+        elif layers == 101:
+            depth = [3, 4, 23, 3]
+        elif layers == 152:
+            depth = [3, 8, 36, 3]
+        elif layers == 200:
+            depth = [3, 12, 48, 3]
+        num_channels = [64, 256, 512, 1024
+                        ] if layers >= 50 else [64, 64, 128, 256]
+        num_filters = [64, 128, 256, 512]
+        # for channels of four returned stages
+        self.feat_channels = [c * 4 for c in num_filters
+                              ] if layers >= 50 else num_filters
+        dilation_dict = None
+        if output_stride == 8:
+            dilation_dict = {2: 2, 3: 4}
+        elif output_stride == 16:
+            dilation_dict = {3: 2}
+        self.conv1_1 = ConvBNLayer(
+            in_channels=3,
+            out_channels=32,
+            kernel_size=3,
+            stride=2,
+            act='relu',
+            data_format=data_format)
+        self.conv1_2 = ConvBNLayer(
+            in_channels=32,
+            out_channels=32,
+            kernel_size=3,
+            stride=1,
+            act='relu',
+            data_format=data_format)
+        self.conv1_3 = ConvBNLayer(
+            in_channels=32,
+            out_channels=64,
+            kernel_size=3,
+            stride=1,
+            act='relu',
+            data_format=data_format)
+        self.pool2d_max = nn.MaxPool2D(
+            kernel_size=3, stride=2, padding=1, data_format=data_format)
+        # self.block_list = []
+        self.stage_list = []
+        if layers >= 50:
+            for block in range(len(depth)):
+                shortcut = False
+                block_list = []
+                for i in range(depth[block]):
+                    if layers in [101, 152] and block == 2:
+                        if i == 0:
+                            conv_name = "res" + str(block + 2) + "a"
+                        else:
+                            conv_name = "res" + str(block + 2) + "b" + str(i)
+                    else:
+                        conv_name = "res" + str(block + 2) + chr(97 + i)
+                    ###############################################################################
+                    # Add dilation rate for some segmentation tasks, if dilation_dict is not None.
+                    dilation_rate = dilation_dict[
+                        block] if dilation_dict and block in dilation_dict else 1
+                    # Actually block here is 'stage', and i is 'block' in 'stage'
+                    # At the stage 4, expand the the dilation_rate if given multi_grid
+                    if block == 3:
+                        dilation_rate = dilation_rate * multi_grid[i]
+                    ###############################################################################
+                    bottleneck_block = self.add_sublayer(
+                        'bb_%d_%d' % (block, i),
+                        BottleneckBlock(
+                            in_channels=num_channels[block]
+                            if i == 0 else num_filters[block] * 4,
+                            out_channels=num_filters[block],
+                            stride=2 if i == 0 and block != 0
+                                        and dilation_rate == 1 else 1,
+                            shortcut=shortcut,
+                            if_first=block == i == 0,
+                            dilation=dilation_rate,
+                            data_format=data_format))
+                    block_list.append(bottleneck_block)
+                    shortcut = True
+                self.stage_list.append(block_list)
+        else:
+            for block in range(len(depth)):
+                shortcut = False
+                block_list = []
+                for i in range(depth[block]):
+                    dilation_rate = dilation_dict[block] \
+                        if dilation_dict and block in dilation_dict else 1
+                    if block == 3:
+                        dilation_rate = dilation_rate * multi_grid[i]
+                    basic_block = self.add_sublayer(
+                        'bb_%d_%d' % (block, i),
+                        BasicBlock(
+                            in_channels=num_channels[block]
+                            if i == 0 else num_filters[block],
+                            out_channels=num_filters[block],
+                            stride=2 if i == 0 and block != 0 \
+                                        and dilation_rate == 1 else 1,
+                            dilation=dilation_rate,
+                            shortcut=shortcut,
+                            if_first=block == i == 0,
+                            data_format=data_format))
+                    block_list.append(basic_block)
+                    shortcut = True
+                self.stage_list.append(block_list)
+        self.pretrained = pretrained
+    def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
+        y = self.conv1_1(inputs)
+        y = self.conv1_2(y)
+        y = self.conv1_3(y)
+        self.conv1_logit = y.clone()
+        y = self.pool2d_max(y)
+        # A feature list saves the output feature map of each stage.
+        feat_list = []
+        for stage in self.stage_list:
+            for block in stage:
+                y = block(y)
+            feat_list.append(y)
+        return feat_list
+def ResNet50_vd(**args):
+    model = ResNet_vd(layers=50, **args)
+    return model
\ No newline at end of file
--- a/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/README.md
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/README.md
+# pspnet_resnet50_cityscapes
+|模型名称|pspnet_resnet50_cityscapes|
+| :--- | :---: | 
+|类别|图像-图像分割|
+|网络|pspnet_resnet50vd|
+|数据集|Cityscapes|
+|是否支持Fine-tuning|是|
+|模型大小|390MB|
+|指标|-|
+|最新更新日期|2022-03-21|
+## 一、模型基本信息
+  - 样例结果示例：
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212111-df341f2a-e994-45d7-92d6-2288d666079c.png"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212188-2db40b29-2943-47ce-9ad2-36a6fb85ba3e.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### 模型介绍
+    - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+    - 更多详情请参考：[pspnet](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf)
+## 二、安装
+- ### 1、环境依赖
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、安装
+    - ```shell
+      $ hub install pspnet_resnet50_cityscapes
+      ```
+    -  如您安装时遇到问题，可参考：[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+    | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)   
+## 三、模型API预测
+- ### 1.预测代码示例
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='pspnet_resnet50_cityscapes')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.如何开始Fine-tune
+    - 在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用pspnet_resnet50_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下：
+    - 代码步骤
+        - Step1: 定义数据预处理方式
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+        - Step2: 下载数据集并使用
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                - `transforms`: 数据预处理方式。
+                - `mode`: `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+                - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+        - Step3: 加载预训练模型
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='pspnet_resnet50_cityscapes', num_classes=2, pretrained=None)
+              ```
+                - `name`: 选择预训练模型的名字。
+                - `load_checkpoint`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+        - Step4: 选择优化策略和运行配置
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```
+    - 模型预测
+        - 当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下：
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='pspnet_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - 参数配置正确后，请执行脚本`python predict.py`。
+            - **Args**
+                * `images`:原始图像路径或BGR格式图片；
+                * `visualization`: 是否可视化，默认为True；
+                * `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+                **NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+## 四、服务部署
+- PaddleHub Serving可以部署一个在线图像分割服务。
+- ### 第一步：启动PaddleHub Serving
+    - 运行启动命令：
+    - ```shell
+      $ hub serving start -m pspnet_resnet50_cityscapes
+      ```
+    - 这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+    - **NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+- ### 第二步：发送预测请求
+    - 配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        # 发送HTTP请求
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## 五、更新历史
+* 1.0.0
+  初始发布
--- a/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/README_en.md
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/README_en.md
+# pspnet_resnet50_cityscapes
+|Module Name|pspnet_resnet50_cityscapes|
+| :--- | :---: | 
+|Category|Image Segmentation|
+|Network|pspnet_resnet50vd|
+|Dataset|Cityscapes|
+|Fine-tuning supported or not|Yes|
+|Module Size|390MB|
+|Data indicators|-|
+|Latest update date|2022-03-21|
+## I. Basic Information 
+- ### Application Effect Display
+    - Sample results:
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212111-df341f2a-e994-45d7-92d6-2288d666079c.png"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212188-2db40b29-2943-47ce-9ad2-36a6fb85ba3e.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### Module Introduction
+    - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+    - For more information, please refer to: [pspnet](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf)
+## II. Installation
+- ### 1、Environmental Dependence
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、Installation
+    - ```shell
+      $ hub install pspnet_resnet50_cityscapes
+      ```
+    - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+    | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)  
+## III. Module API Prediction
+- ### 1、Prediction Code Example
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='pspnet_resnet50_cityscapes')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.Fine-tune and Encapsulation
+    - After completing the installation of PaddlePaddle and PaddleHub, you can start using the pspnet_resnet50_cityscapes model to fine-tune datasets such as OpticDiscSeg.
+    - Steps:
+         - Step1: Define the data preprocessing method
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+         - Step2: Download the dataset
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                * `transforms`: data preprocessing methods.
+                * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+                * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+        - Step3: Load the pre-trained model
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='pspnet_resnet50_cityscapes', num_classes=2, pretrained=None)
+              ```
+                - `name`: model name.
+                - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+        - Step4:  Optimization strategy
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```
+    - Model prediction
+        - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='pspnet_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - **Args**
+                * `images`: Image path or ndarray data with format [H, W, C], BGR.
+                * `visualization`: Whether to save the recognition results as picture files.
+                * `save_path`: Save path of the result, default is 'seg_result'.
+## IV. Server Deployment
+- PaddleHub Serving can deploy an online service of image segmentation.
+- ### Step 1: Start PaddleHub Serving
+    - Run the startup command:
+        - ```shell
+          $ hub serving start -m pspnet_resnet50_cityscapes
+          ```
+    - The servitization API is now deployed and the default port number is 8866.
+    - **NOTE:**  If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+- ### Step 2: Send a predictive request
+    - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/pspnet_resnet50_cityscapes"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## V. Release Note
+- 1.0.0
+  First release
--- a/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/layers.py
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/layers.py
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu':
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+class SeparableConvBNReLU(nn.Layer):
+    """Depthwise Separable Convolution."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(SeparableConvBNReLU, self).__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(
+            in_channels, out_channels, kernel_size=1, groups=1)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+class ConvBN(nn.Layer):
+    """Basic conv bn layer"""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBN, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBNReLU, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+        self._act = act
+        upper_act_names = activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(
+                    act, act_dict.keys()))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
+class ASPPModule(nn.Layer):
+    """
+    Atrous Spatial Pyramid Pooling.
+    Args:
+        aspp_ratios (tuple): The dilation rate using in ASSP module.
+        in_channels (int): The number of input channels.
+        out_channels (int): The number of output channels.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+        use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+        image_pooling (bool, optional): If augmented with image-level features. Default: False
+    """
+    def __init__(self,
+                 aspp_ratios: tuple,
+                 in_channels: int,
+                 out_channels: int,
+                 align_corners: bool,
+                 use_sep_conv: bool = False,
+                 image_pooling: bool = False):
+        super().__init__()
+        self.align_corners = align_corners
+        self.aspp_blocks = nn.LayerList()
+        for ratio in aspp_ratios:
+            if use_sep_conv and ratio > 1:
+                conv_func = SeparableConvBNReLU
+            else:
+                conv_func = ConvBNReLU
+            block = conv_func(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1 if ratio == 1 else 3,
+                dilation=ratio,
+                padding=0 if ratio == 1 else ratio)
+            self.aspp_blocks.append(block)
+        out_size = len(self.aspp_blocks)
+        if image_pooling:
+            self.global_avg_pool = nn.Sequential(
+                nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+                ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+            out_size += 1
+        self.image_pooling = image_pooling
+        self.conv_bn_relu = ConvBNReLU(
+            in_channels=out_channels * out_size,
+            out_channels=out_channels,
+            kernel_size=1)
+        self.dropout = nn.Dropout(p=0.1)  # drop rate
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outputs = []
+        for block in self.aspp_blocks:
+            y = block(x)
+            y = F.interpolate(
+                y,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(y)
+        if self.image_pooling:
+            img_avg = self.global_avg_pool(x)
+            img_avg = F.interpolate(
+                img_avg,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(img_avg)
+        x = paddle.concat(outputs, axis=1)
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        return x
+class AuxLayer(nn.Layer):
+    """
+    The auxiliary layer implementation for auxiliary loss.
+    Args:
+        in_channels (int): The number of input channels.
+        inter_channels (int): The intermediate channels.
+        out_channels (int): The number of output channels, and usually it is num_classes.
+        dropout_prob (float, optional): The drop rate. Default: 0.1.
+    """
+    def __init__(self,
+                 in_channels: int,
+                 inter_channels: int,
+                 out_channels: int,
+                 dropout_prob: float = 0.1,
+                 **kwargs):
+        super().__init__()
+        self.conv_bn_relu = ConvBNReLU(
+            in_channels=in_channels,
+            out_channels=inter_channels,
+            kernel_size=3,
+            padding=1,
+            **kwargs)
+        self.dropout = nn.Dropout(p=dropout_prob)
+        self.conv = nn.Conv2D(
+            in_channels=inter_channels,
+            out_channels=out_channels,
+            kernel_size=1)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        x = self.conv(x)
+        return x
+class Add(nn.Layer):
+    def __init__(self):
+        super().__init__()
+    def forward(self, x: paddle.Tensor, y: paddle.Tensor, name: str = None) -> paddle.Tensor:
+        return paddle.add(x, y, name)
+class PPModule(nn.Layer):
+    """
+    Pyramid pooling module originally in PSPNet.
+    Args:
+        in_channels (int): The number of intput channels to pyramid pooling module.
+        out_channels (int): The number of output channels after pyramid pooling module.
+        bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1, 2, 3, 6).
+        dim_reduction (bool, optional): A bool value represents if reducing dimension after pooling. Default: True.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+    """
+    def __init__(self, 
+                 in_channels: int, 
+                 out_channels: int, 
+                 bin_sizes: Tuple[int], 
+                 dim_reduction: bool,
+                 align_corners: bool):
+        super().__init__()
+        self.bin_sizes = bin_sizes
+        inter_channels = in_channels
+        if dim_reduction:
+            inter_channels = in_channels // len(bin_sizes)
+        # we use dimension reduction after pooling mentioned in original implementation.
+        self.stages = nn.LayerList([
+            self._make_stage(in_channels, inter_channels, size)
+            for size in bin_sizes
+        ])
+        self.conv_bn_relu2 = ConvBNReLU(
+            in_channels=in_channels + inter_channels * len(bin_sizes),
+            out_channels=out_channels,
+            kernel_size=3,
+            padding=1)
+        self.align_corners = align_corners
+    def _make_stage(self, in_channels: int, out_channels: int, size: int):
+        """
+        Create one pooling layer.
+        In our implementation, we adopt the same dimension reduction as the original paper that might be
+        slightly different with other implementations.
+        After pooling, the channels are reduced to 1/len(bin_sizes) immediately, while some other implementations
+        keep the channels to be same.
+        Args:
+            in_channels (int): The number of intput channels to pyramid pooling module.
+            size (int): The out size of the pooled layer.
+        Returns:
+            conv (Tensor): A tensor after Pyramid Pooling Module.
+        """
+        prior = nn.AdaptiveAvgPool2D(output_size=(size, size))
+        conv = ConvBNReLU(
+            in_channels=in_channels, out_channels=out_channels, kernel_size=1)
+        return nn.Sequential(prior, conv)
+    def forward(self, input: paddle.Tensor) -> paddle.Tensor:
+        cat_layers = []
+        for stage in self.stages:
+            x = stage(input)
+            x = F.interpolate(
+                x,
+                paddle.shape(input)[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            cat_layers.append(x)
+        cat_layers = [input] + cat_layers[::-1]
+        cat = paddle.concat(cat_layers, axis=1)
+        out = self.conv_bn_relu2(cat)
+        return out
--- a/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/module.py
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/module.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+from typing import Union, List, Tuple
+import numpy as np
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from pspnet_resnet50_cityscapes.resnet import ResNet50_vd
+import pspnet_resnet50_cityscapes.layers as layers
+@moduleinfo(
+    name="pspnet_resnet50_cityscapes",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="PSPNetResnet50 is a segmentation model.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class PSPNet(nn.Layer):
+    """
+    The PSPNet implementation based on PaddlePaddle.
+    The original article refers to
+    Zhao, Hengshuang, et al. "Pyramid scene parsing network"
+    (https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf).
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (tuple, optional): Two values in the tuple indicate the indices of output of backbone.
+        pp_out_channels (int, optional): The output channels after Pyramid Pooling Module. Default: 1024.
+        bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1,2,3,6).
+        enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+        align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
+            e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+        pretrained (str, optional): The path or url of pretrained model. Default: None.
+    """
+    def __init__(self,
+                 num_classes: int = 19,
+                 backbone_indices: Tuple[int] = (2, 3),
+                 pp_out_channels: int = 1024,
+                 bin_sizes: Tuple[int] = (1, 2, 3, 6),
+                 enable_auxiliary_loss: bool = True,
+                 align_corners: bool = False,
+                 pretrained: str = None):
+        super(PSPNet, self).__init__()
+        self.backbone = ResNet50_vd()
+        backbone_channels = [
+            self.backbone.feat_channels[i] for i in backbone_indices
+        ]
+        self.head = PSPNetHead(num_classes, backbone_indices, backbone_channels,
+                               pp_out_channels, bin_sizes,
+                               enable_auxiliary_loss, align_corners)
+        self.align_corners = align_corners
+        self.transforms = T.Compose([T.Normalize()])
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+        else:
+            checkpoint = os.path.join(self.directory, 'model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+    def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+        return self.transforms(img)
+    def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+        feat_list = self.backbone(x)
+        logit_list = self.head(feat_list)
+        return [
+            F.interpolate(
+                logit,
+                paddle.shape(x)[2:],
+                mode='bilinear',
+                align_corners=self.align_corners) for logit in logit_list
+        ]
+class PSPNetHead(nn.Layer):
+    """
+    The PSPNetHead implementation.
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (tuple): Two values in the tuple indicate the indices of output of backbone.
+            The first index will be taken as a deep-supervision feature in auxiliary layer;
+            the second one will be taken as input of Pyramid Pooling Module (PPModule).
+            Usually backbone consists of four downsampling stage, and return an output of
+            each stage. If we set it as (2, 3) in ResNet, that means taking feature map of the third
+            stage (res4b22) in backbone, and feature map of the fourth stage (res5c) as input of PPModule.
+        backbone_channels (tuple): The same length with "backbone_indices". It indicates the channels of corresponding index.
+        pp_out_channels (int): The output channels after Pyramid Pooling Module.
+        bin_sizes (tuple): The out size of pooled feature maps.
+        enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+    """
+    def __init__(self, num_classes, backbone_indices, backbone_channels,
+                 pp_out_channels, bin_sizes, enable_auxiliary_loss,
+                 align_corners):
+        super().__init__()
+        self.backbone_indices = backbone_indices
+        self.psp_module = layers.PPModule(
+            in_channels=backbone_channels[1],
+            out_channels=pp_out_channels,
+            bin_sizes=bin_sizes,
+            dim_reduction=True,
+            align_corners=align_corners)
+        self.dropout = nn.Dropout(p=0.1)  # dropout_prob
+        self.conv = nn.Conv2D(
+            in_channels=pp_out_channels,
+            out_channels=num_classes,
+            kernel_size=1)
+        if enable_auxiliary_loss:
+            self.auxlayer = layers.AuxLayer(
+                in_channels=backbone_channels[0],
+                inter_channels=backbone_channels[0] // 4,
+                out_channels=num_classes)
+        self.enable_auxiliary_loss = enable_auxiliary_loss
+    def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]:
+        logit_list = []
+        x = feat_list[self.backbone_indices[1]]
+        x = self.psp_module(x)
+        x = self.dropout(x)
+        logit = self.conv(x)
+        logit_list.append(logit)
+        if self.enable_auxiliary_loss:
+            auxiliary_feat = feat_list[self.backbone_indices[0]]
+            auxiliary_logit = self.auxlayer(auxiliary_feat)
+            logit_list.append(auxiliary_logit)
+        return logit_list
--- a/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/resnet.py
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/resnet.py
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle.nn as nn
+import pspnet_resnet50_cityscapes.layers as layers
+class ConvBNLayer(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 stride: int = 1,
+                 dilation: int = 1,
+                 groups: int = 1,
+                 is_vd_mode: bool = False,
+                 act: str = None,
+                 data_format: str = 'NCHW'):
+        super(ConvBNLayer, self).__init__()
+        if dilation != 1 and kernel_size != 3:
+            raise RuntimeError("When the dilation isn't 1," \
+                               "the kernel_size should be 3.")
+        self.is_vd_mode = is_vd_mode
+        self._pool2d_avg = nn.AvgPool2D(
+            kernel_size=2,
+            stride=2,
+            padding=0,
+            ceil_mode=True,
+            data_format=data_format)
+        self._conv = nn.Conv2D(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=(kernel_size - 1) // 2 \
+                if dilation == 1 else dilation,
+            dilation=dilation,
+            groups=groups,
+            bias_attr=False,
+            data_format=data_format)
+        self._batch_norm = layers.SyncBatchNorm(
+            out_channels, data_format=data_format)
+        self._act_op = layers.Activation(act=act)
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        if self.is_vd_mode:
+            inputs = self._pool2d_avg(inputs)
+        y = self._conv(inputs)
+        y = self._batch_norm(y)
+        y = self._act_op(y)
+        return y
+class BottleneckBlock(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 shortcut: bool = True,
+                 if_first: bool  = False,
+                 dilation: int = 1,
+                 data_format: str = 'NCHW'):
+        super(BottleneckBlock, self).__init__()
+        self.data_format = data_format
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=1,
+            act='relu',
+            data_format=data_format)
+        self.dilation = dilation
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            act='relu',
+            dilation=dilation,
+            data_format=data_format)
+        self.conv2 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels * 4,
+            kernel_size=1,
+            act=None,
+            data_format=data_format)
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels * 4,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                data_format=data_format)
+        self.shortcut = shortcut
+        # NOTE: Use the wrap layer for quantization training
+        self.add = layers.Add()
+        self.relu = layers.Activation(act="relu")
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        conv1 = self.conv1(y)
+        conv2 = self.conv2(conv1)
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+        y = self.add(short, conv2)
+        y = self.relu(y)
+        return y
+class BasicBlock(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 dilation: int = 1,
+                 shortcut: bool = True,
+                 if_first: bool = False,
+                 data_format: str = 'NCHW'):
+        super(BasicBlock, self).__init__()
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            dilation=dilation,
+            act='relu',
+            data_format=data_format)
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            dilation=dilation,
+            act=None,
+            data_format=data_format)
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                data_format=data_format)
+        self.shortcut = shortcut
+        self.dilation = dilation
+        self.data_format = data_format
+        self.add = layers.Add()
+        self.relu = layers.Activation(act="relu")
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        conv1 = self.conv1(y)
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+        y = self.add(short, conv1)
+        y = self.relu(y)
+        return y
+class ResNet_vd(nn.Layer):
+    """
+    The ResNet_vd implementation based on PaddlePaddle.
+    The original article refers to Jingdong
+    Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
+    (https://arxiv.org/pdf/1812.01187.pdf).
+    Args:
+        layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50.
+        output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8.
+        multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1).
+        pretrained (str, optional): The path of pretrained model.
+    """
+    def __init__(self,
+                 layers: int = 50,
+                 output_stride: int = 8,
+                 multi_grid: Tuple[int] = (1, 1, 1),
+                 pretrained: str = None,
+                 data_format: str = 'NCHW'):
+        super(ResNet_vd, self).__init__()
+        self.data_format = data_format
+        self.conv1_logit = None  # for gscnn shape stream
+        self.layers = layers
+        supported_layers = [18, 34, 50, 101, 152, 200]
+        assert layers in supported_layers, \
+            "supported layers are {} but input layer is {}".format(
+                supported_layers, layers)
+        if layers == 18:
+            depth = [2, 2, 2, 2]
+        elif layers == 34 or layers == 50:
+            depth = [3, 4, 6, 3]
+        elif layers == 101:
+            depth = [3, 4, 23, 3]
+        elif layers == 152:
+            depth = [3, 8, 36, 3]
+        elif layers == 200:
+            depth = [3, 12, 48, 3]
+        num_channels = [64, 256, 512, 1024
+                        ] if layers >= 50 else [64, 64, 128, 256]
+        num_filters = [64, 128, 256, 512]
+        # for channels of four returned stages
+        self.feat_channels = [c * 4 for c in num_filters
+                              ] if layers >= 50 else num_filters
+        dilation_dict = None
+        if output_stride == 8:
+            dilation_dict = {2: 2, 3: 4}
+        elif output_stride == 16:
+            dilation_dict = {3: 2}
+        self.conv1_1 = ConvBNLayer(
+            in_channels=3,
+            out_channels=32,
+            kernel_size=3,
+            stride=2,
+            act='relu',
+            data_format=data_format)
+        self.conv1_2 = ConvBNLayer(
+            in_channels=32,
+            out_channels=32,
+            kernel_size=3,
+            stride=1,
+            act='relu',
+            data_format=data_format)
+        self.conv1_3 = ConvBNLayer(
+            in_channels=32,
+            out_channels=64,
+            kernel_size=3,
+            stride=1,
+            act='relu',
+            data_format=data_format)
+        self.pool2d_max = nn.MaxPool2D(
+            kernel_size=3, stride=2, padding=1, data_format=data_format)
+        # self.block_list = []
+        self.stage_list = []
+        if layers >= 50:
+            for block in range(len(depth)):
+                shortcut = False
+                block_list = []
+                for i in range(depth[block]):
+                    if layers in [101, 152] and block == 2:
+                        if i == 0:
+                            conv_name = "res" + str(block + 2) + "a"
+                        else:
+                            conv_name = "res" + str(block + 2) + "b" + str(i)
+                    else:
+                        conv_name = "res" + str(block + 2) + chr(97 + i)
+                    ###############################################################################
+                    # Add dilation rate for some segmentation tasks, if dilation_dict is not None.
+                    dilation_rate = dilation_dict[
+                        block] if dilation_dict and block in dilation_dict else 1
+                    # Actually block here is 'stage', and i is 'block' in 'stage'
+                    # At the stage 4, expand the the dilation_rate if given multi_grid
+                    if block == 3:
+                        dilation_rate = dilation_rate * multi_grid[i]
+                    ###############################################################################
+                    bottleneck_block = self.add_sublayer(
+                        'bb_%d_%d' % (block, i),
+                        BottleneckBlock(
+                            in_channels=num_channels[block]
+                            if i == 0 else num_filters[block] * 4,
+                            out_channels=num_filters[block],
+                            stride=2 if i == 0 and block != 0
+                                        and dilation_rate == 1 else 1,
+                            shortcut=shortcut,
+                            if_first=block == i == 0,
+                            dilation=dilation_rate,
+                            data_format=data_format))
+                    block_list.append(bottleneck_block)
+                    shortcut = True
+                self.stage_list.append(block_list)
+        else:
+            for block in range(len(depth)):
+                shortcut = False
+                block_list = []
+                for i in range(depth[block]):
+                    dilation_rate = dilation_dict[block] \
+                        if dilation_dict and block in dilation_dict else 1
+                    if block == 3:
+                        dilation_rate = dilation_rate * multi_grid[i]
+                    basic_block = self.add_sublayer(
+                        'bb_%d_%d' % (block, i),
+                        BasicBlock(
+                            in_channels=num_channels[block]
+                            if i == 0 else num_filters[block],
+                            out_channels=num_filters[block],
+                            stride=2 if i == 0 and block != 0 \
+                                        and dilation_rate == 1 else 1,
+                            dilation=dilation_rate,
+                            shortcut=shortcut,
+                            if_first=block == i == 0,
+                            data_format=data_format))
+                    block_list.append(basic_block)
+                    shortcut = True
+                self.stage_list.append(block_list)
+        self.pretrained = pretrained
+    def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
+        y = self.conv1_1(inputs)
+        y = self.conv1_2(y)
+        y = self.conv1_3(y)
+        self.conv1_logit = y.clone()
+        y = self.pool2d_max(y)
+        # A feature list saves the output feature map of each stage.
+        feat_list = []
+        for stage in self.stage_list:
+            for block in stage:
+                y = block(y)
+            feat_list.append(y)
+        return feat_list
+def ResNet50_vd(**args):
+    model = ResNet_vd(layers=50, **args)
+    return model
\ No newline at end of file
--- a/modules/image/semantic_segmentation/pspnet_resnet50_voc/README.md
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_voc/README.md
+# pspnet_resnet50_voc
+|模型名称|pspnet_resnet50_voc|
+| :--- | :---: | 
+|类别|图像-图像分割|
+|网络|pspnet_resnet50vd|
+|数据集|PascalVOC2012|
+|是否支持Fine-tuning|是|
+|模型大小|390MB|
+|指标|-|
+|最新更新日期|2022-03-21|
+## 一、模型基本信息
+  - 样例结果示例：
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212097-443a5a65-2f2e-4126-9c07-d7c3c220e55f.jpg"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212375-52e123af-4699-4c25-8f50-4240bbb714b4.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### 模型介绍
+    - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+    - 更多详情请参考：[pspnet](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf)
+## 二、安装
+- ### 1、环境依赖
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、安装
+    - ```shell
+      $ hub install pspnet_resnet50_voc
+      ```
+    -  如您安装时遇到问题，可参考：[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+    | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)   
+## 三、模型API预测
+- ### 1.预测代码示例
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='pspnet_resnet50_voc')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.如何开始Fine-tune
+    - 在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用pspnet_resnet50_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下：
+    - 代码步骤
+        - Step1: 定义数据预处理方式
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+        - Step2: 下载数据集并使用
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                - `transforms`: 数据预处理方式。
+                - `mode`: `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+                - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+        - Step3: 加载预训练模型
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='pspnet_resnet50_voc', num_classes=2, pretrained=None)
+              ```
+                - `name`: 选择预训练模型的名字。
+                - `load_checkpoint`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+        - Step4: 选择优化策略和运行配置
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```
+    - 模型预测
+        - 当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下：
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='pspnet_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - 参数配置正确后，请执行脚本`python predict.py`。
+            - **Args**
+                * `images`:原始图像路径或BGR格式图片；
+                * `visualization`: 是否可视化，默认为True；
+                * `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+                **NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+## 四、服务部署
+- PaddleHub Serving可以部署一个在线图像分割服务。
+- ### 第一步：启动PaddleHub Serving
+    - 运行启动命令：
+    - ```shell
+      $ hub serving start -m pspnet_resnet50_voc
+      ```
+    - 这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+    - **NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+- ### 第二步：发送预测请求
+    - 配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        # 发送HTTP请求
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## 五、更新历史
+* 1.0.0
+  初始发布
--- a/modules/image/semantic_segmentation/pspnet_resnet50_voc/README_en.md
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_voc/README_en.md
+# pspnet_resnet50_voc
+|Module Name|pspnet_resnet50_voc|
+| :--- | :---: | 
+|Category|Image Segmentation|
+|Network|pspnet_resnet50vd|
+|Dataset|PascalVOC2012|
+|Fine-tuning supported or not|Yes|
+|Module Size|370MB|
+|Data indicators|-|
+|Latest update date|2022-03-22|
+## I. Basic Information 
+- ### Application Effect Display
+    - Sample results:
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212097-443a5a65-2f2e-4126-9c07-d7c3c220e55f.jpg"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212375-52e123af-4699-4c25-8f50-4240bbb714b4.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### Module Introduction
+    - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+    - For more information, please refer to: [pspnet](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf)
+## II. Installation
+- ### 1、Environmental Dependence
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、Installation
+    - ```shell
+      $ hub install pspnet_resnet50_voc
+      ```
+    - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+    | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)  
+## III. Module API Prediction
+- ### 1、Prediction Code Example
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='pspnet_resnet50_voc')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.Fine-tune and Encapsulation
+    - After completing the installation of PaddlePaddle and PaddleHub, you can start using the pspnet_resnet50_voc model to fine-tune datasets such as OpticDiscSeg.
+    - Steps:
+         - Step1: Define the data preprocessing method
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+         - Step2: Download the dataset
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                * `transforms`: data preprocessing methods.
+                * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+                * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+        - Step3: Load the pre-trained model
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='pspnet_resnet50_voc', num_classes=2, pretrained=None)
+              ```
+                - `name`: model name.
+                - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+        - Step4:  Optimization strategy
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```
+    -  Model prediction
+        - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='pspnet_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - **Args**
+                * `images`: Image path or ndarray data with format [H, W, C], BGR.
+                * `visualization`: Whether to save the recognition results as picture files.
+                * `save_path`: Save path of the result, default is 'seg_result'.
+## IV. Server Deployment
+- PaddleHub Serving can deploy an online service of image segmentation.
+- ### Step 1: Start PaddleHub Serving
+    - Run the startup command:
+        - ```shell
+          $ hub serving start -m pspnet_resnet50_voc
+          ```
+    - The servitization API is now deployed and the default port number is 8866.
+    - **NOTE:**  If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+- ### Step 2: Send a predictive request
+    - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/pspnet_resnet50_voc"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## V. Release Note
+- 1.0.0
+  First release
--- a/modules/image/semantic_segmentation/pspnet_resnet50_voc/layers.py
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_voc/layers.py
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu':
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+class SeparableConvBNReLU(nn.Layer):
+    """Depthwise Separable Convolution."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(SeparableConvBNReLU, self).__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(
+            in_channels, out_channels, kernel_size=1, groups=1)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+class ConvBN(nn.Layer):
+    """Basic conv bn layer"""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBN, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBNReLU, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+        self._act = act
+        upper_act_names = activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(
+                    act, act_dict.keys()))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
+class ASPPModule(nn.Layer):
+    """
+    Atrous Spatial Pyramid Pooling.
+    Args:
+        aspp_ratios (tuple): The dilation rate using in ASSP module.
+        in_channels (int): The number of input channels.
+        out_channels (int): The number of output channels.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+        use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+        image_pooling (bool, optional): If augmented with image-level features. Default: False
+    """
+    def __init__(self,
+                 aspp_ratios: tuple,
+                 in_channels: int,
+                 out_channels: int,
+                 align_corners: bool,
+                 use_sep_conv: bool = False,
+                 image_pooling: bool = False):
+        super().__init__()
+        self.align_corners = align_corners
+        self.aspp_blocks = nn.LayerList()
+        for ratio in aspp_ratios:
+            if use_sep_conv and ratio > 1:
+                conv_func = SeparableConvBNReLU
+            else:
+                conv_func = ConvBNReLU
+            block = conv_func(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1 if ratio == 1 else 3,
+                dilation=ratio,
+                padding=0 if ratio == 1 else ratio)
+            self.aspp_blocks.append(block)
+        out_size = len(self.aspp_blocks)
+        if image_pooling:
+            self.global_avg_pool = nn.Sequential(
+                nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+                ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+            out_size += 1
+        self.image_pooling = image_pooling
+        self.conv_bn_relu = ConvBNReLU(
+            in_channels=out_channels * out_size,
+            out_channels=out_channels,
+            kernel_size=1)
+        self.dropout = nn.Dropout(p=0.1)  # drop rate
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outputs = []
+        for block in self.aspp_blocks:
+            y = block(x)
+            y = F.interpolate(
+                y,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(y)
+        if self.image_pooling:
+            img_avg = self.global_avg_pool(x)
+            img_avg = F.interpolate(
+                img_avg,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(img_avg)
+        x = paddle.concat(outputs, axis=1)
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        return x
+class AuxLayer(nn.Layer):
+    """
+    The auxiliary layer implementation for auxiliary loss.
+    Args:
+        in_channels (int): The number of input channels.
+        inter_channels (int): The intermediate channels.
+        out_channels (int): The number of output channels, and usually it is num_classes.
+        dropout_prob (float, optional): The drop rate. Default: 0.1.
+    """
+    def __init__(self,
+                 in_channels: int,
+                 inter_channels: int,
+                 out_channels: int,
+                 dropout_prob: float = 0.1,
+                 **kwargs):
+        super().__init__()
+        self.conv_bn_relu = ConvBNReLU(
+            in_channels=in_channels,
+            out_channels=inter_channels,
+            kernel_size=3,
+            padding=1,
+            **kwargs)
+        self.dropout = nn.Dropout(p=dropout_prob)
+        self.conv = nn.Conv2D(
+            in_channels=inter_channels,
+            out_channels=out_channels,
+            kernel_size=1)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        x = self.conv(x)
+        return x
+class Add(nn.Layer):
+    def __init__(self):
+        super().__init__()
+    def forward(self, x: paddle.Tensor, y: paddle.Tensor, name: str = None):
+        return paddle.add(x, y, name)
+class PPModule(nn.Layer):
+    """
+    Pyramid pooling module originally in PSPNet.
+    Args:
+        in_channels (int): The number of intput channels to pyramid pooling module.
+        out_channels (int): The number of output channels after pyramid pooling module.
+        bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1, 2, 3, 6).
+        dim_reduction (bool, optional): A bool value represents if reducing dimension after pooling. Default: True.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+    """
+    def __init__(self, in_channels: int, out_channels: int, bin_sizes: tuple, dim_reduction: bool,
+                 align_corners: bool):
+        super().__init__()
+        self.bin_sizes = bin_sizes
+        inter_channels = in_channels
+        if dim_reduction:
+            inter_channels = in_channels // len(bin_sizes)
+        # we use dimension reduction after pooling mentioned in original implementation.
+        self.stages = nn.LayerList([
+            self._make_stage(in_channels, inter_channels, size)
+            for size in bin_sizes
+        ])
+        self.conv_bn_relu2 = ConvBNReLU(
+            in_channels=in_channels + inter_channels * len(bin_sizes),
+            out_channels=out_channels,
+            kernel_size=3,
+            padding=1)
+        self.align_corners = align_corners
+    def _make_stage(self, in_channels: int, out_channels: int, size: int):
+        """
+        Create one pooling layer.
+        In our implementation, we adopt the same dimension reduction as the original paper that might be
+        slightly different with other implementations.
+        After pooling, the channels are reduced to 1/len(bin_sizes) immediately, while some other implementations
+        keep the channels to be same.
+        Args:
+            in_channels (int): The number of intput channels to pyramid pooling module.
+            out_channels (int): The number of output channels to pyramid pooling module.
+            size (int): The out size of the pooled layer.
+        Returns:
+            conv (Tensor): A tensor after Pyramid Pooling Module.
+        """
+        prior = nn.AdaptiveAvgPool2D(output_size=(size, size))
+        conv = ConvBNReLU(
+            in_channels=in_channels, out_channels=out_channels, kernel_size=1)
+        return nn.Sequential(prior, conv)
+    def forward(self, input: paddle.Tensor) -> paddle.Tensor:
+        cat_layers = []
+        for stage in self.stages:
+            x = stage(input)
+            x = F.interpolate(
+                x,
+                paddle.shape(input)[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            cat_layers.append(x)
+        cat_layers = [input] + cat_layers[::-1]
+        cat = paddle.concat(cat_layers, axis=1)
+        out = self.conv_bn_relu2(cat)
+        return out
--- a/modules/image/semantic_segmentation/pspnet_resnet50_voc/module.py
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_voc/module.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+from typing import Union, List, Tuple
+import numpy as np
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from pspnet_resnet50_voc.resnet import ResNet50_vd
+import pspnet_resnet50_voc.layers as layers
+@moduleinfo(
+    name="pspnet_resnet50_voc",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="PSPNetResnet50 is a segmentation model.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class PSPNet(nn.Layer):
+    """
+    The PSPNet implementation based on PaddlePaddle.
+    The original article refers to
+    Zhao, Hengshuang, et al. "Pyramid scene parsing network"
+    (https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf).
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (tuple, optional): Two values in the tuple indicate the indices of output of backbone.
+        pp_out_channels (int, optional): The output channels after Pyramid Pooling Module. Default: 1024.
+        bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1,2,3,6).
+        enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+        align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
+            e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+        pretrained (str, optional): The path or url of pretrained model. Default: None.
+    """
+    def __init__(self,
+                 num_classes: int = 21,
+                 backbone_indices: Tuple[int] = (2, 3),
+                 pp_out_channels: int = 1024,
+                 bin_sizes: Tuple[int] = (1, 2, 3, 6),
+                 enable_auxiliary_loss: bool  = True,
+                 align_corners: bool = False,
+                 pretrained: str = None):
+        super(PSPNet, self).__init__()
+        self.backbone = ResNet50_vd()
+        backbone_channels = [
+            self.backbone.feat_channels[i] for i in backbone_indices
+        ]
+        self.head = PSPNetHead(num_classes, backbone_indices, backbone_channels,
+                               pp_out_channels, bin_sizes,
+                               enable_auxiliary_loss, align_corners)
+        self.align_corners = align_corners
+        self.transforms = T.Compose([T.Normalize()])
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+        else:
+            checkpoint = os.path.join(self.directory, 'model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+    def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+        return self.transforms(img)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        feat_list = self.backbone(x)
+        logit_list = self.head(feat_list)
+        return [
+            F.interpolate(
+                logit,
+                paddle.shape(x)[2:],
+                mode='bilinear',
+                align_corners=self.align_corners) for logit in logit_list
+        ]
+class PSPNetHead(nn.Layer):
+    """
+    The PSPNetHead implementation.
+    Args:
+        num_classes (int): The unique number of target classes.
+        backbone_indices (tuple): Two values in the tuple indicate the indices of output of backbone.
+            The first index will be taken as a deep-supervision feature in auxiliary layer;
+            the second one will be taken as input of Pyramid Pooling Module (PPModule).
+            Usually backbone consists of four downsampling stage, and return an output of
+            each stage. If we set it as (2, 3) in ResNet, that means taking feature map of the third
+            stage (res4b22) in backbone, and feature map of the fourth stage (res5c) as input of PPModule.
+        backbone_channels (tuple): The same length with "backbone_indices". It indicates the channels of corresponding index.
+        pp_out_channels (int): The output channels after Pyramid Pooling Module.
+        bin_sizes (tuple): The out size of pooled feature maps.
+        enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+    """
+    def __init__(self, num_classes, backbone_indices, backbone_channels,
+                 pp_out_channels, bin_sizes, enable_auxiliary_loss,
+                 align_corners):
+        super().__init__()
+        self.backbone_indices = backbone_indices
+        self.psp_module = layers.PPModule(
+            in_channels=backbone_channels[1],
+            out_channels=pp_out_channels,
+            bin_sizes=bin_sizes,
+            dim_reduction=True,
+            align_corners=align_corners)
+        self.dropout = nn.Dropout(p=0.1)  # dropout_prob
+        self.conv = nn.Conv2D(
+            in_channels=pp_out_channels,
+            out_channels=num_classes,
+            kernel_size=1)
+        if enable_auxiliary_loss:
+            self.auxlayer = layers.AuxLayer(
+                in_channels=backbone_channels[0],
+                inter_channels=backbone_channels[0] // 4,
+                out_channels=num_classes)
+        self.enable_auxiliary_loss = enable_auxiliary_loss
+    def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]:
+        logit_list = []
+        x = feat_list[self.backbone_indices[1]]
+        x = self.psp_module(x)
+        x = self.dropout(x)
+        logit = self.conv(x)
+        logit_list.append(logit)
+        if self.enable_auxiliary_loss:
+            auxiliary_feat = feat_list[self.backbone_indices[0]]
+            auxiliary_logit = self.auxlayer(auxiliary_feat)
+            logit_list.append(auxiliary_logit)
+        return logit_list
--- a/modules/image/semantic_segmentation/pspnet_resnet50_voc/resnet.py
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_voc/resnet.py
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle.nn as nn
+import pspnet_resnet50_voc.layers as layers
+class ConvBNLayer(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 stride: int = 1,
+                 dilation: int = 1,
+                 groups: int = 1,
+                 is_vd_mode: bool = False,
+                 act: str = None,
+                 data_format: str = 'NCHW'):
+        super(ConvBNLayer, self).__init__()
+        if dilation != 1 and kernel_size != 3:
+            raise RuntimeError("When the dilation isn't 1," \
+                               "the kernel_size should be 3.")
+        self.is_vd_mode = is_vd_mode
+        self._pool2d_avg = nn.AvgPool2D(
+            kernel_size=2,
+            stride=2,
+            padding=0,
+            ceil_mode=True,
+            data_format=data_format)
+        self._conv = nn.Conv2D(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=(kernel_size - 1) // 2 \
+                if dilation == 1 else dilation,
+            dilation=dilation,
+            groups=groups,
+            bias_attr=False,
+            data_format=data_format)
+        self._batch_norm = layers.SyncBatchNorm(
+            out_channels, data_format=data_format)
+        self._act_op = layers.Activation(act=act)
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        if self.is_vd_mode:
+            inputs = self._pool2d_avg(inputs)
+        y = self._conv(inputs)
+        y = self._batch_norm(y)
+        y = self._act_op(y)
+        return y
+class BottleneckBlock(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 shortcut: bool = True,
+                 if_first: bool  = False,
+                 dilation: int = 1,
+                 data_format: str = 'NCHW'):
+        super(BottleneckBlock, self).__init__()
+        self.data_format = data_format
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=1,
+            act='relu',
+            data_format=data_format)
+        self.dilation = dilation
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            act='relu',
+            dilation=dilation,
+            data_format=data_format)
+        self.conv2 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels * 4,
+            kernel_size=1,
+            act=None,
+            data_format=data_format)
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels * 4,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                data_format=data_format)
+        self.shortcut = shortcut
+        # NOTE: Use the wrap layer for quantization training
+        self.add = layers.Add()
+        self.relu = layers.Activation(act="relu")
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        conv1 = self.conv1(y)
+        conv2 = self.conv2(conv1)
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+        y = self.add(short, conv2)
+        y = self.relu(y)
+        return y
+class BasicBlock(nn.Layer):
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 stride: int,
+                 dilation: int = 1,
+                 shortcut: bool = True,
+                 if_first: bool = False,
+                 data_format: str = 'NCHW'):
+        super(BasicBlock, self).__init__()
+        self.conv0 = ConvBNLayer(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            stride=stride,
+            dilation=dilation,
+            act='relu',
+            data_format=data_format)
+        self.conv1 = ConvBNLayer(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            kernel_size=3,
+            dilation=dilation,
+            act=None,
+            data_format=data_format)
+        if not shortcut:
+            self.short = ConvBNLayer(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1,
+                stride=1,
+                is_vd_mode=False if if_first or stride == 1 else True,
+                data_format=data_format)
+        self.shortcut = shortcut
+        self.dilation = dilation
+        self.data_format = data_format
+        self.add = layers.Add()
+        self.relu = layers.Activation(act="relu")
+    def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+        y = self.conv0(inputs)
+        conv1 = self.conv1(y)
+        if self.shortcut:
+            short = inputs
+        else:
+            short = self.short(inputs)
+        y = self.add(short, conv1)
+        y = self.relu(y)
+        return y
+class ResNet_vd(nn.Layer):
+    """
+    The ResNet_vd implementation based on PaddlePaddle.
+    The original article refers to Jingdong
+    Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
+    (https://arxiv.org/pdf/1812.01187.pdf).
+    Args:
+        layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50.
+        output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8.
+        multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1).
+        pretrained (str, optional): The path of pretrained model.
+    """
+    def __init__(self,
+                 layers: int = 50,
+                 output_stride: int = 8,
+                 multi_grid: Tuple[int] = (1, 1, 1),
+                 pretrained: str = None,
+                 data_format: str = 'NCHW'):
+        super(ResNet_vd, self).__init__()
+        self.data_format = data_format
+        self.conv1_logit = None  # for gscnn shape stream
+        self.layers = layers
+        supported_layers = [18, 34, 50, 101, 152, 200]
+        assert layers in supported_layers, \
+            "supported layers are {} but input layer is {}".format(
+                supported_layers, layers)
+        if layers == 18:
+            depth = [2, 2, 2, 2]
+        elif layers == 34 or layers == 50:
+            depth = [3, 4, 6, 3]
+        elif layers == 101:
+            depth = [3, 4, 23, 3]
+        elif layers == 152:
+            depth = [3, 8, 36, 3]
+        elif layers == 200:
+            depth = [3, 12, 48, 3]
+        num_channels = [64, 256, 512, 1024
+                        ] if layers >= 50 else [64, 64, 128, 256]
+        num_filters = [64, 128, 256, 512]
+        # for channels of four returned stages
+        self.feat_channels = [c * 4 for c in num_filters
+                              ] if layers >= 50 else num_filters
+        dilation_dict = None
+        if output_stride == 8:
+            dilation_dict = {2: 2, 3: 4}
+        elif output_stride == 16:
+            dilation_dict = {3: 2}
+        self.conv1_1 = ConvBNLayer(
+            in_channels=3,
+            out_channels=32,
+            kernel_size=3,
+            stride=2,
+            act='relu',
+            data_format=data_format)
+        self.conv1_2 = ConvBNLayer(
+            in_channels=32,
+            out_channels=32,
+            kernel_size=3,
+            stride=1,
+            act='relu',
+            data_format=data_format)
+        self.conv1_3 = ConvBNLayer(
+            in_channels=32,
+            out_channels=64,
+            kernel_size=3,
+            stride=1,
+            act='relu',
+            data_format=data_format)
+        self.pool2d_max = nn.MaxPool2D(
+            kernel_size=3, stride=2, padding=1, data_format=data_format)
+        # self.block_list = []
+        self.stage_list = []
+        if layers >= 50:
+            for block in range(len(depth)):
+                shortcut = False
+                block_list = []
+                for i in range(depth[block]):
+                    if layers in [101, 152] and block == 2:
+                        if i == 0:
+                            conv_name = "res" + str(block + 2) + "a"
+                        else:
+                            conv_name = "res" + str(block + 2) + "b" + str(i)
+                    else:
+                        conv_name = "res" + str(block + 2) + chr(97 + i)
+                    ###############################################################################
+                    # Add dilation rate for some segmentation tasks, if dilation_dict is not None.
+                    dilation_rate = dilation_dict[
+                        block] if dilation_dict and block in dilation_dict else 1
+                    # Actually block here is 'stage', and i is 'block' in 'stage'
+                    # At the stage 4, expand the the dilation_rate if given multi_grid
+                    if block == 3:
+                        dilation_rate = dilation_rate * multi_grid[i]
+                    ###############################################################################
+                    bottleneck_block = self.add_sublayer(
+                        'bb_%d_%d' % (block, i),
+                        BottleneckBlock(
+                            in_channels=num_channels[block]
+                            if i == 0 else num_filters[block] * 4,
+                            out_channels=num_filters[block],
+                            stride=2 if i == 0 and block != 0
+                                        and dilation_rate == 1 else 1,
+                            shortcut=shortcut,
+                            if_first=block == i == 0,
+                            dilation=dilation_rate,
+                            data_format=data_format))
+                    block_list.append(bottleneck_block)
+                    shortcut = True
+                self.stage_list.append(block_list)
+        else:
+            for block in range(len(depth)):
+                shortcut = False
+                block_list = []
+                for i in range(depth[block]):
+                    dilation_rate = dilation_dict[block] \
+                        if dilation_dict and block in dilation_dict else 1
+                    if block == 3:
+                        dilation_rate = dilation_rate * multi_grid[i]
+                    basic_block = self.add_sublayer(
+                        'bb_%d_%d' % (block, i),
+                        BasicBlock(
+                            in_channels=num_channels[block]
+                            if i == 0 else num_filters[block],
+                            out_channels=num_filters[block],
+                            stride=2 if i == 0 and block != 0 \
+                                        and dilation_rate == 1 else 1,
+                            dilation=dilation_rate,
+                            shortcut=shortcut,
+                            if_first=block == i == 0,
+                            data_format=data_format))
+                    block_list.append(basic_block)
+                    shortcut = True
+                self.stage_list.append(block_list)
+        self.pretrained = pretrained
+    def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
+        y = self.conv1_1(inputs)
+        y = self.conv1_2(y)
+        y = self.conv1_3(y)
+        self.conv1_logit = y.clone()
+        y = self.pool2d_max(y)
+        # A feature list saves the output feature map of each stage.
+        feat_list = []
+        for stage in self.stage_list:
+            for block in stage:
+                y = block(y)
+            feat_list.append(y)
+        return feat_list
+def ResNet50_vd(**args):
+    model = ResNet_vd(layers=50, **args)
+    return model
\ No newline at end of file
--- a/modules/image/semantic_segmentation/stdc1_seg_cityscapes/README.md
+++ b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/README.md
+# stdc1_seg_cityscapes
+|模型名称|stdc1_seg_cityscapes|
+| :--- | :---: | 
+|类别|图像-图像分割|
+|网络|stdc1_seg|
+|数据集|Cityscapes|
+|是否支持Fine-tuning|是|
+|模型大小|67MB|
+|指标|-|
+|最新更新日期|2022-03-21|
+## 一、模型基本信息
+  - 样例结果示例：
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212111-df341f2a-e994-45d7-92d6-2288d666079c.png"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212188-2db40b29-2943-47ce-9ad2-36a6fb85ba3e.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### 模型介绍
+    - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+    - 更多详情请参考：[stdc](https://arxiv.org/abs/2104.13188)
+## 二、安装
+- ### 1、环境依赖
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、安装
+    - ```shell
+      $ hub install stdc1_seg_cityscapes
+      ```
+    -  如您安装时遇到问题，可参考：[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+    | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)   
+## 三、模型API预测
+- ### 1.预测代码示例
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='stdc1_seg_cityscapes')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.如何开始Fine-tune
+    - 在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用stdc1_seg_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下：
+    - 代码步骤
+        - Step1: 定义数据预处理方式
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+        - Step2: 下载数据集并使用
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                - `transforms`: 数据预处理方式。
+                - `mode`: `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+                - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+        - Step3: 加载预训练模型
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='stdc1_seg_cityscapes', num_classes=2, pretrained=None)
+              ```
+                - `name`: 选择预训练模型的名字。
+                - `load_checkpoint`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+        - Step4: 选择优化策略和运行配置
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```
+    - 模型预测
+        - 当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下：
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='stdc1_seg_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - 参数配置正确后，请执行脚本`python predict.py`。
+            - **Args**
+                * `images`:原始图像路径或BGR格式图片；
+                * `visualization`: 是否可视化，默认为True；
+                * `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+                **NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+## 四、服务部署
+- PaddleHub Serving可以部署一个在线图像分割服务。
+- ### 第一步：启动PaddleHub Serving
+    - 运行启动命令：
+    - ```shell
+      $ hub serving start -m stdc1_seg_cityscapes
+      ```
+    - 这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+    - **NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+- ### 第二步：发送预测请求
+    - 配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        # 发送HTTP请求
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## 五、更新历史
+* 1.0.0
+  初始发布
--- a/modules/image/semantic_segmentation/stdc1_seg_cityscapes/README_en.md
+++ b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/README_en.md
+# stdc1_seg_cityscapes
+|Module Name|stdc1_seg_cityscapes|
+| :--- | :---: | 
+|Category|Image Segmentation|
+|Network|stdc1_seg|
+|Dataset|Cityscapes|
+|Fine-tuning supported or not|Yes|
+|Module Size|67MB|
+|Data indicators|-|
+|Latest update date|2022-03-21|
+## I. Basic Information 
+- ### Application Effect Display
+    - Sample results:
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212111-df341f2a-e994-45d7-92d6-2288d666079c.png"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212188-2db40b29-2943-47ce-9ad2-36a6fb85ba3e.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### Module Introduction
+    - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+    - For more information, please refer to: [pspnet](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf)
+## II. Installation
+- ### 1、Environmental Dependence
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、Installation
+    - ```shell
+      $ hub install stdc1_seg_cityscapes
+      ```
+    - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+    | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)  
+## III. Module API Prediction
+- ### 1、Prediction Code Example
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='stdc1_seg_cityscapes')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.Fine-tune and Encapsulation
+    - After completing the installation of PaddlePaddle and PaddleHub, you can start using the stdc1_seg_cityscapes model to fine-tune datasets such as OpticDiscSeg.
+    - Steps:
+         - Step1: Define the data preprocessing method
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+         - Step2: Download the dataset
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                * `transforms`: data preprocessing methods.
+                * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+                * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+        - Step3: Load the pre-trained model
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='stdc1_seg_cityscapes', num_classes=2, pretrained=None)
+              ```
+                - `name`: model name.
+                - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+        - Step4:  Optimization strategy
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```
+    - Model prediction
+        - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='stdc1_seg_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - **Args**
+                * `images`: Image path or ndarray data with format [H, W, C], BGR.
+                * `visualization`: Whether to save the recognition results as picture files.
+                * `save_path`: Save path of the result, default is 'seg_result'.
+## IV. Server Deployment
+- PaddleHub Serving can deploy an online service of image segmentation.
+- ### Step 1: Start PaddleHub Serving
+    - Run the startup command:
+        - ```shell
+          $ hub serving start -m stdc1_seg_cityscapes
+          ```
+    - The servitization API is now deployed and the default port number is 8866.
+    - **NOTE:**  If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+- ### Step 2: Send a predictive request
+    - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/stdc1_seg_cityscapes"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## V. Release Note
+- 1.0.0
+  First release
--- a/modules/image/semantic_segmentation/stdc1_seg_cityscapes/layers.py
+++ b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/layers.py
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu':
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+class SeparableConvBNReLU(nn.Layer):
+    """Depthwise Separable Convolution."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(SeparableConvBNReLU, self).__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(
+            in_channels, out_channels, kernel_size=1, groups=1)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+class ConvBN(nn.Layer):
+    """Basic conv bn layer"""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBN, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBNReLU, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+        self._act = act
+        upper_act_names = activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(
+                    act, act_dict.keys()))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
+class ASPPModule(nn.Layer):
+    """
+    Atrous Spatial Pyramid Pooling.
+    Args:
+        aspp_ratios (tuple): The dilation rate using in ASSP module.
+        in_channels (int): The number of input channels.
+        out_channels (int): The number of output channels.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+        use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+        image_pooling (bool, optional): If augmented with image-level features. Default: False
+    """
+    def __init__(self,
+                 aspp_ratios: tuple,
+                 in_channels: int,
+                 out_channels: int,
+                 align_corners: bool,
+                 use_sep_conv: bool = False,
+                 image_pooling: bool = False):
+        super().__init__()
+        self.align_corners = align_corners
+        self.aspp_blocks = nn.LayerList()
+        for ratio in aspp_ratios:
+            if use_sep_conv and ratio > 1:
+                conv_func = SeparableConvBNReLU
+            else:
+                conv_func = ConvBNReLU
+            block = conv_func(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1 if ratio == 1 else 3,
+                dilation=ratio,
+                padding=0 if ratio == 1 else ratio)
+            self.aspp_blocks.append(block)
+        out_size = len(self.aspp_blocks)
+        if image_pooling:
+            self.global_avg_pool = nn.Sequential(
+                nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+                ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+            out_size += 1
+        self.image_pooling = image_pooling
+        self.conv_bn_relu = ConvBNReLU(
+            in_channels=out_channels * out_size,
+            out_channels=out_channels,
+            kernel_size=1)
+        self.dropout = nn.Dropout(p=0.1)  # drop rate
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outputs = []
+        for block in self.aspp_blocks:
+            y = block(x)
+            y = F.interpolate(
+                y,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(y)
+        if self.image_pooling:
+            img_avg = self.global_avg_pool(x)
+            img_avg = F.interpolate(
+                img_avg,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(img_avg)
+        x = paddle.concat(outputs, axis=1)
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        return x
+class AuxLayer(nn.Layer):
+    """
+    The auxiliary layer implementation for auxiliary loss.
+    Args:
+        in_channels (int): The number of input channels.
+        inter_channels (int): The intermediate channels.
+        out_channels (int): The number of output channels, and usually it is num_classes.
+        dropout_prob (float, optional): The drop rate. Default: 0.1.
+    """
+    def __init__(self,
+                 in_channels: int,
+                 inter_channels: int,
+                 out_channels: int,
+                 dropout_prob: float = 0.1,
+                 **kwargs):
+        super().__init__()
+        self.conv_bn_relu = ConvBNReLU(
+            in_channels=in_channels,
+            out_channels=inter_channels,
+            kernel_size=3,
+            padding=1,
+            **kwargs)
+        self.dropout = nn.Dropout(p=dropout_prob)
+        self.conv = nn.Conv2D(
+            in_channels=inter_channels,
+            out_channels=out_channels,
+            kernel_size=1)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        x = self.conv(x)
+        return x
+class Add(nn.Layer):
+    def __init__(self):
+        super().__init__()
+    def forward(self, x: paddle.Tensor, y: paddle.Tensor, name=None) -> paddle.Tensor:
+        return paddle.add(x, y, name)
+class PPModule(nn.Layer):
+    """
+    Pyramid pooling module originally in PSPNet.
+    Args:
+        in_channels (int): The number of intput channels to pyramid pooling module.
+        out_channels (int): The number of output channels after pyramid pooling module.
+        bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1, 2, 3, 6).
+        dim_reduction (bool, optional): A bool value represents if reducing dimension after pooling. Default: True.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+    """
+    def __init__(self, 
+                 in_channels: int, 
+                 out_channels: int, 
+                 bin_sizes: tuple, 
+                 dim_reduction: bool,
+                 align_corners: bool):
+        super().__init__()
+        self.bin_sizes = bin_sizes
+        inter_channels = in_channels
+        if dim_reduction:
+            inter_channels = in_channels // len(bin_sizes)
+        # we use dimension reduction after pooling mentioned in original implementation.
+        self.stages = nn.LayerList([
+            self._make_stage(in_channels, inter_channels, size)
+            for size in bin_sizes
+        ])
+        self.conv_bn_relu2 = ConvBNReLU(
+            in_channels=in_channels + inter_channels * len(bin_sizes),
+            out_channels=out_channels,
+            kernel_size=3,
+            padding=1)
+        self.align_corners = align_corners
+    def _make_stage(self, in_channels: int, out_channels: int, size: int):
+        """
+        Create one pooling layer.
+        In our implementation, we adopt the same dimension reduction as the original paper that might be
+        slightly different with other implementations.
+        After pooling, the channels are reduced to 1/len(bin_sizes) immediately, while some other implementations
+        keep the channels to be same.
+        Args:
+            in_channels (int): The number of intput channels to pyramid pooling module.
+            out_channels (int): The number of output channels to pyramid pooling module.
+            size (int): The out size of the pooled layer.
+        Returns:
+            conv (Tensor): A tensor after Pyramid Pooling Module.
+        """
+        prior = nn.AdaptiveAvgPool2D(output_size=(size, size))
+        conv = ConvBNReLU(
+            in_channels=in_channels, out_channels=out_channels, kernel_size=1)
+        return nn.Sequential(prior, conv)
+    def forward(self, input: paddle.Tensor) -> paddle.Tensor:
+        cat_layers = []
+        for stage in self.stages:
+            x = stage(input)
+            x = F.interpolate(
+                x,
+                paddle.shape(input)[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            cat_layers.append(x)
+        cat_layers = [input] + cat_layers[::-1]
+        cat = paddle.concat(cat_layers, axis=1)
+        out = self.conv_bn_relu2(cat)
+        return out
--- a/modules/image/semantic_segmentation/stdc1_seg_cityscapes/module.py
+++ b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/module.py
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+from typing import Union, List, Tuple
+import numpy as np
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from stdc1_seg_cityscapes.stdcnet import STDC1
+import stdc1_seg_cityscapes.layers as layers
+@moduleinfo(
+    name="stdc1_seg_cityscapes",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="STDCSeg is a segmentation model.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class STDCSeg(nn.Layer):
+    """
+    The STDCSeg implementation based on PaddlePaddle.
+    The original article refers to Meituan
+    Fan, Mingyuan, et al. "Rethinking BiSeNet For Real-time Semantic Segmentation."
+    (https://arxiv.org/abs/2104.13188)
+    Args:
+        num_classes(int,optional): The unique number of target classes.
+        use_boundary_8(bool,non-optional): Whether to use detail loss. it should be True accroding to paper for best metric. Default: True.
+        Actually,if you want to use _boundary_2/_boundary_4/_boundary_16,you should append loss function number of DetailAggregateLoss.It should work properly.
+        use_conv_last(bool,optional): Determine ContextPath 's inplanes variable according to whether to use bockbone's last conv. Default: False.
+        pretrained (str, optional): The path or url of pretrained model. Default: None.
+    """
+    def __init__(self,
+                 num_classes: int = 19,
+                 use_boundary_2: bool = False,
+                 use_boundary_4: bool = False,
+                 use_boundary_8: bool = True,
+                 use_boundary_16: bool = False,
+                 use_conv_last: bool = False,
+                 pretrained: str = None):
+        super(STDCSeg, self).__init__()
+        self.use_boundary_2 = use_boundary_2
+        self.use_boundary_4 = use_boundary_4
+        self.use_boundary_8 = use_boundary_8
+        self.use_boundary_16 = use_boundary_16
+        self.cp = ContextPath(STDC1(), use_conv_last=use_conv_last)
+        self.ffm = FeatureFusionModule(384, 256)
+        self.conv_out = SegHead(256, 256, num_classes)
+        self.conv_out8 = SegHead(128, 64, num_classes)
+        self.conv_out16 = SegHead(128, 64, num_classes)
+        self.conv_out_sp16 = SegHead(512, 64, 1)
+        self.conv_out_sp8 = SegHead(256, 64, 1)
+        self.conv_out_sp4 = SegHead(64, 64, 1)
+        self.conv_out_sp2 = SegHead(32, 64, 1)
+        self.transforms = T.Compose([T.Normalize()])
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+        else:
+            checkpoint = os.path.join(self.directory, 'model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+    def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+        return self.transforms(img)
+    def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+        x_hw = paddle.shape(x)[2:]
+        feat_res2, feat_res4, feat_res8, _, feat_cp8, feat_cp16 = self.cp(x)
+        logit_list = []
+        if self.training:
+            feat_fuse = self.ffm(feat_res8, feat_cp8)
+            feat_out = self.conv_out(feat_fuse)
+            feat_out8 = self.conv_out8(feat_cp8)
+            feat_out16 = self.conv_out16(feat_cp16)
+            logit_list = [feat_out, feat_out8, feat_out16]
+            logit_list = [
+                F.interpolate(x, x_hw, mode='bilinear', align_corners=True)
+                for x in logit_list
+            ]
+            if self.use_boundary_2:
+                feat_out_sp2 = self.conv_out_sp2(feat_res2)
+                logit_list.append(feat_out_sp2)
+            if self.use_boundary_4:
+                feat_out_sp4 = self.conv_out_sp4(feat_res4)
+                logit_list.append(feat_out_sp4)
+            if self.use_boundary_8:
+                feat_out_sp8 = self.conv_out_sp8(feat_res8)
+                logit_list.append(feat_out_sp8)
+        else:
+            feat_fuse = self.ffm(feat_res8, feat_cp8)
+            feat_out = self.conv_out(feat_fuse)
+            feat_out = F.interpolate(
+                feat_out, x_hw, mode='bilinear', align_corners=True)
+            logit_list = [feat_out]
+        return logit_list
+class SegHead(nn.Layer):
+    def __init__(self, in_chan: int, mid_chan: int, n_classes:int):
+        super(SegHead, self).__init__()
+        self.conv = layers.ConvBNReLU(
+            in_chan, mid_chan, kernel_size=3, stride=1, padding=1)
+        self.conv_out = nn.Conv2D(
+            mid_chan, n_classes, kernel_size=1, bias_attr=None)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv(x)
+        x = self.conv_out(x)
+        return x
+class AttentionRefinementModule(nn.Layer):
+    def __init__(self, in_chan: int, out_chan: int):
+        super(AttentionRefinementModule, self).__init__()
+        self.conv = layers.ConvBNReLU(
+            in_chan, out_chan, kernel_size=3, stride=1, padding=1)
+        self.conv_atten = nn.Conv2D(
+            out_chan, out_chan, kernel_size=1, bias_attr=None)
+        self.bn_atten = nn.BatchNorm2D(out_chan)
+        self.sigmoid_atten = nn.Sigmoid()
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        feat = self.conv(x)
+        atten = F.adaptive_avg_pool2d(feat, 1)
+        atten = self.conv_atten(atten)
+        atten = self.bn_atten(atten)
+        atten = self.sigmoid_atten(atten)
+        out = paddle.multiply(feat, atten)
+        return out
+class ContextPath(nn.Layer):
+    def __init__(self, backbone, use_conv_last: bool = False):
+        super(ContextPath, self).__init__()
+        self.backbone = backbone
+        self.arm16 = AttentionRefinementModule(512, 128)
+        inplanes = 1024
+        if use_conv_last:
+            inplanes = 1024
+        self.arm32 = AttentionRefinementModule(inplanes, 128)
+        self.conv_head32 = layers.ConvBNReLU(
+            128, 128, kernel_size=3, stride=1, padding=1)
+        self.conv_head16 = layers.ConvBNReLU(
+            128, 128, kernel_size=3, stride=1, padding=1)
+        self.conv_avg = layers.ConvBNReLU(
+            inplanes, 128, kernel_size=1, stride=1, padding=0)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        feat2, feat4, feat8, feat16, feat32 = self.backbone(x)
+        feat8_hw = paddle.shape(feat8)[2:]
+        feat16_hw = paddle.shape(feat16)[2:]
+        feat32_hw = paddle.shape(feat32)[2:]
+        avg = F.adaptive_avg_pool2d(feat32, 1)
+        avg = self.conv_avg(avg)
+        avg_up = F.interpolate(avg, feat32_hw, mode='nearest')
+        feat32_arm = self.arm32(feat32)
+        feat32_sum = feat32_arm + avg_up
+        feat32_up = F.interpolate(feat32_sum, feat16_hw, mode='nearest')
+        feat32_up = self.conv_head32(feat32_up)
+        feat16_arm = self.arm16(feat16)
+        feat16_sum = feat16_arm + feat32_up
+        feat16_up = F.interpolate(feat16_sum, feat8_hw, mode='nearest')
+        feat16_up = self.conv_head16(feat16_up)
+        return feat2, feat4, feat8, feat16, feat16_up, feat32_up  # x8, x16
+class FeatureFusionModule(nn.Layer):
+    def __init__(self, in_chan:int , out_chan: int):
+        super(FeatureFusionModule, self).__init__()
+        self.convblk = layers.ConvBNReLU(
+            in_chan, out_chan, kernel_size=1, stride=1, padding=0)
+        self.conv1 = nn.Conv2D(
+            out_chan,
+            out_chan // 4,
+            kernel_size=1,
+            stride=1,
+            padding=0,
+            bias_attr=None)
+        self.conv2 = nn.Conv2D(
+            out_chan // 4,
+            out_chan,
+            kernel_size=1,
+            stride=1,
+            padding=0,
+            bias_attr=None)
+        self.relu = nn.ReLU()
+        self.sigmoid = nn.Sigmoid()
+    def forward(self, fsp: paddle.Tensor, fcp: paddle.Tensor) -> paddle.Tensor:
+        fcat = paddle.concat([fsp, fcp], axis=1)
+        feat = self.convblk(fcat)
+        atten = F.adaptive_avg_pool2d(feat, 1)
+        atten = self.conv1(atten)
+        atten = self.relu(atten)
+        atten = self.conv2(atten)
+        atten = self.sigmoid(atten)
+        feat_atten = paddle.multiply(feat, atten)
+        feat_out = feat_atten + feat
+        return feat_out
\ No newline at end of file
--- a/modules/image/semantic_segmentation/stdc1_seg_cityscapes/stdcnet.py
+++ b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/stdcnet.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Union, List, Tuple
+import math
+import paddle
+import paddle.nn as nn
+import stdc1_seg_cityscapes.layers as L
+__all__ = ["STDC1", "STDC2"]
+class STDCNet(nn.Layer):
+    """
+    The STDCNet implementation based on PaddlePaddle.
+    The original article refers to Meituan
+    Fan, Mingyuan, et al. "Rethinking BiSeNet For Real-time Semantic Segmentation."
+    (https://arxiv.org/abs/2104.13188)
+    Args:
+        base(int, optional): base channels. Default: 64.
+        layers(list, optional): layers numbers list. It determines STDC block numbers of STDCNet's stage3\4\5. Defualt: [4, 5, 3].
+        block_num(int,optional): block_num of features block. Default: 4.
+        type(str,optional): feature fusion method "cat"/"add". Default: "cat".
+        num_classes(int, optional): class number for image classification. Default: 1000.
+        dropout(float,optional): dropout ratio. if >0,use dropout ratio.  Default: 0.20.
+        use_conv_last(bool,optional): whether to use the last ConvBNReLU layer . Default: False.
+        pretrained(str, optional): the path of pretrained model.
+    """
+    def __init__(self,
+                 base: int = 64,
+                 layers: List[int] = [4, 5, 3],
+                 block_num: int = 4,
+                 type: str = "cat",
+                 num_classes: int = 1000,
+                 dropout: float = 0.20,
+                 use_conv_last: bool = False):
+        super(STDCNet, self).__init__()
+        if type == "cat":
+            block = CatBottleneck
+        elif type == "add":
+            block = AddBottleneck
+        self.use_conv_last = use_conv_last
+        self.features = self._make_layers(base, layers, block_num, block)
+        self.conv_last = ConvBNRelu(base * 16, max(1024, base * 16), 1, 1)
+        if (layers == [4, 5, 3]):  #stdc1446
+            self.x2 = nn.Sequential(self.features[:1])
+            self.x4 = nn.Sequential(self.features[1:2])
+            self.x8 = nn.Sequential(self.features[2:6])
+            self.x16 = nn.Sequential(self.features[6:11])
+            self.x32 = nn.Sequential(self.features[11:])
+        elif (layers == [2, 2, 2]):  #stdc813
+            self.x2 = nn.Sequential(self.features[:1])
+            self.x4 = nn.Sequential(self.features[1:2])
+            self.x8 = nn.Sequential(self.features[2:4])
+            self.x16 = nn.Sequential(self.features[4:6])
+            self.x32 = nn.Sequential(self.features[6:])
+        else:
+            raise NotImplementedError(
+                "model with layers:{} is not implemented!".format(layers))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        """
+        forward function for feature extract.
+        """
+        feat2 = self.x2(x)
+        feat4 = self.x4(feat2)
+        feat8 = self.x8(feat4)
+        feat16 = self.x16(feat8)
+        feat32 = self.x32(feat16)
+        if self.use_conv_last:
+            feat32 = self.conv_last(feat32)
+        return feat2, feat4, feat8, feat16, feat32
+    def _make_layers(self, base, layers, block_num, block):
+        features = []
+        features += [ConvBNRelu(3, base // 2, 3, 2)]
+        features += [ConvBNRelu(base // 2, base, 3, 2)]
+        for i, layer in enumerate(layers):
+            for j in range(layer):
+                if i == 0 and j == 0:
+                    features.append(block(base, base * 4, block_num, 2))
+                elif j == 0:
+                    features.append(
+                        block(base * int(math.pow(2, i + 1)),
+                              base * int(math.pow(2, i + 2)), block_num, 2))
+                else:
+                    features.append(
+                        block(base * int(math.pow(2, i + 2)),
+                              base * int(math.pow(2, i + 2)), block_num, 1))
+        return nn.Sequential(*features)
+class ConvBNRelu(nn.Layer):
+    def __init__(self, in_planes: int, out_planes: int, kernel: int = 3, stride: int = 1):
+        super(ConvBNRelu, self).__init__()
+        self.conv = nn.Conv2D(
+            in_planes,
+            out_planes,
+            kernel_size=kernel,
+            stride=stride,
+            padding=kernel // 2,
+            bias_attr=False)
+        self.bn = L.SyncBatchNorm(out_planes, data_format='NCHW')
+        self.relu = nn.ReLU()
+    def forward(self, x):
+        out = self.relu(self.bn(self.conv(x)))
+        return out
+class AddBottleneck(nn.Layer):
+    def __init__(self, in_planes: int, out_planes: int, block_num: int = 3, stride: int = 1):
+        super(AddBottleneck, self).__init__()
+        assert block_num > 1, "block number should be larger than 1."
+        self.conv_list = nn.LayerList()
+        self.stride = stride
+        if stride == 2:
+            self.avd_layer = nn.Sequential(
+                nn.Conv2D(
+                    out_planes // 2,
+                    out_planes // 2,
+                    kernel_size=3,
+                    stride=2,
+                    padding=1,
+                    groups=out_planes // 2,
+                    bias_attr=False),
+                nn.BatchNorm2D(out_planes // 2),
+            )
+            self.skip = nn.Sequential(
+                nn.Conv2D(
+                    in_planes,
+                    in_planes,
+                    kernel_size=3,
+                    stride=2,
+                    padding=1,
+                    groups=in_planes,
+                    bias_attr=False),
+                nn.BatchNorm2D(in_planes),
+                nn.Conv2D(
+                    in_planes, out_planes, kernel_size=1, bias_attr=False),
+                nn.BatchNorm2D(out_planes),
+            )
+            stride = 1
+        for idx in range(block_num):
+            if idx == 0:
+                self.conv_list.append(
+                    ConvBNRelu(in_planes, out_planes // 2, kernel=1))
+            elif idx == 1 and block_num == 2:
+                self.conv_list.append(
+                    ConvBNRelu(out_planes // 2, out_planes // 2, stride=stride))
+            elif idx == 1 and block_num > 2:
+                self.conv_list.append(
+                    ConvBNRelu(out_planes // 2, out_planes // 4, stride=stride))
+            elif idx < block_num - 1:
+                self.conv_list.append(
+                    ConvBNRelu(out_planes // int(math.pow(2, idx)),
+                               out_planes // int(math.pow(2, idx + 1))))
+            else:
+                self.conv_list.append(
+                    ConvBNRelu(out_planes // int(math.pow(2, idx)),
+                               out_planes // int(math.pow(2, idx))))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        out_list = []
+        out = x
+        for idx, conv in enumerate(self.conv_list):
+            if idx == 0 and self.stride == 2:
+                out = self.avd_layer(conv(out))
+            else:
+                out = conv(out)
+            out_list.append(out)
+        if self.stride == 2:
+            x = self.skip(x)
+        return paddle.concat(out_list, axis=1) + x
+class CatBottleneck(nn.Layer):
+    def __init__(self, in_planes: int, out_planes: int, block_num: int = 3, stride: int = 1):
+        super(CatBottleneck, self).__init__()
+        assert block_num > 1, "block number should be larger than 1."
+        self.conv_list = nn.LayerList()
+        self.stride = stride
+        if stride == 2:
+            self.avd_layer = nn.Sequential(
+                nn.Conv2D(
+                    out_planes // 2,
+                    out_planes // 2,
+                    kernel_size=3,
+                    stride=2,
+                    padding=1,
+                    groups=out_planes // 2,
+                    bias_attr=False),
+                nn.BatchNorm2D(out_planes // 2),
+            )
+            self.skip = nn.AvgPool2D(kernel_size=3, stride=2, padding=1)
+            stride = 1
+        for idx in range(block_num):
+            if idx == 0:
+                self.conv_list.append(
+                    ConvBNRelu(in_planes, out_planes // 2, kernel=1))
+            elif idx == 1 and block_num == 2:
+                self.conv_list.append(
+                    ConvBNRelu(out_planes // 2, out_planes // 2, stride=stride))
+            elif idx == 1 and block_num > 2:
+                self.conv_list.append(
+                    ConvBNRelu(out_planes // 2, out_planes // 4, stride=stride))
+            elif idx < block_num - 1:
+                self.conv_list.append(
+                    ConvBNRelu(out_planes // int(math.pow(2, idx)),
+                               out_planes // int(math.pow(2, idx + 1))))
+            else:
+                self.conv_list.append(
+                    ConvBNRelu(out_planes // int(math.pow(2, idx)),
+                               out_planes // int(math.pow(2, idx))))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        out_list = []
+        out1 = self.conv_list[0](x)
+        for idx, conv in enumerate(self.conv_list[1:]):
+            if idx == 0:
+                if self.stride == 2:
+                    out = conv(self.avd_layer(out1))
+                else:
+                    out = conv(out1)
+            else:
+                out = conv(out)
+            out_list.append(out)
+        if self.stride == 2:
+            out1 = self.skip(out1)
+        out_list.insert(0, out1)
+        out = paddle.concat(out_list, axis=1)
+        return out
+def STDC2(**kwargs):
+    model = STDCNet(base=64, layers=[4, 5, 3], **kwargs)
+    return model
+def STDC1(**kwargs):
+    model = STDCNet(base=64, layers=[2, 2, 2], **kwargs)
+    return model
\ No newline at end of file
--- a/modules/image/semantic_segmentation/stdc1_seg_voc/README.md
+++ b/modules/image/semantic_segmentation/stdc1_seg_voc/README.md
+# stdc1_seg_voc
+|模型名称|stdc1_seg_voc|
+| :--- | :---: | 
+|类别|图像-图像分割|
+|网络|stdc1_seg|
+|数据集|PascalVOC2012|
+|是否支持Fine-tuning|是|
+|模型大小|67MB|
+|指标|-|
+|最新更新日期|2022-03-21|
+## 一、模型基本信息
+  - 样例结果示例：
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212097-443a5a65-2f2e-4126-9c07-d7c3c220e55f.jpg"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212375-52e123af-4699-4c25-8f50-4240bbb714b4.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### 模型介绍
+    - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+    - 更多详情请参考：[stdc](https://arxiv.org/abs/2104.13188)
+## 二、安装
+- ### 1、环境依赖
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、安装
+    - ```shell
+      $ hub install stdc1_seg_voc
+      ```
+    -  如您安装时遇到问题，可参考：[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+    | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)   
+## 三、模型API预测
+- ### 1.预测代码示例
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='stdc1_seg_voc')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.如何开始Fine-tune
+    - 在完成安装PaddlePaddle与PaddleHub后，通过执行`python train.py`即可开始使用stdc1_seg_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下：
+    - 代码步骤
+        - Step1: 定义数据预处理方式
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式，用户可按照需求替换自己需要的数据预处理方式。
+        - Step2: 下载数据集并使用
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                - `transforms`: 数据预处理方式。
+                - `mode`: `mode`: 选择数据模式，可选项有 `train`, `test`, `val`, 默认为`train`。
+                - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+        - Step3: 加载预训练模型
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='stdc1_seg_voc', num_classes=2, pretrained=None)
+              ```
+                - `name`: 选择预训练模型的名字。
+                - `load_checkpoint`: 是否加载自己训练的模型，若为None，则加载提供的模型默认参数。
+        - Step4: 选择优化策略和运行配置
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```
+    - 模型预测
+        - 当完成Fine-tune后，Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下，其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下：
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='stdc1_seg_voc', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - 参数配置正确后，请执行脚本`python predict.py`。
+            - **Args**
+                * `images`:原始图像路径或BGR格式图片；
+                * `visualization`: 是否可视化，默认为True；
+                * `save_path`: 保存结果的路径，默认保存路径为'seg_result'。
+                **NOTE:** 进行预测时，所选择的module，checkpoint_dir，dataset必须和Fine-tune所用的一样。
+## 四、服务部署
+- PaddleHub Serving可以部署一个在线图像分割服务。
+- ### 第一步：启动PaddleHub Serving
+    - 运行启动命令：
+    - ```shell
+      $ hub serving start -m stdc1_seg_voc
+      ```
+    - 这样就完成了一个图像分割服务化API的部署，默认端口号为8866。
+    - **NOTE:** 如使用GPU预测，则需要在启动服务之前，请设置CUDA_VISIBLE_DEVICES环境变量，否则不用设置。
+- ### 第二步：发送预测请求
+    - 配置好服务端，以下数行代码即可实现发送预测请求，获取预测结果
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        # 发送HTTP请求
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## 五、更新历史
+* 1.0.0
+  初始发布
--- a/modules/image/semantic_segmentation/stdc1_seg_voc/README_en.md
+++ b/modules/image/semantic_segmentation/stdc1_seg_voc/README_en.md
+# stdc1_seg_voc
+|Module Name|stdc1_seg_voc|
+| :--- | :---: | 
+|Category|Image Segmentation|
+|Network|stdc1_seg|
+|Dataset|PascalVOC2012|
+|Fine-tuning supported or not|Yes|
+|Module Size|370MB|
+|Data indicators|-|
+|Latest update date|2022-03-22|
+## I. Basic Information 
+- ### Application Effect Display
+    - Sample results:
+    <p align="center">
+    <img src="https://user-images.githubusercontent.com/35907364/159212097-443a5a65-2f2e-4126-9c07-d7c3c220e55f.jpg"  width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/159212375-52e123af-4699-4c25-8f50-4240bbb714b4.png" width = "420" height = "505" hspace='10'/>
+    </p>
+- ### Module Introduction
+    - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+    - For more information, please refer to: [stdc](https://arxiv.org/abs/2104.13188)
+## II. Installation
+- ### 1、Environmental Dependence
+    - paddlepaddle >= 2.0.0
+    - paddlehub >= 2.0.0
+- ### 2、Installation
+    - ```shell
+      $ hub install stdc1_seg_voc
+      ```
+    - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+    | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)  
+## III. Module API Prediction
+- ### 1、Prediction Code Example
+    - ```python
+      import cv2
+      import paddle
+      import paddlehub as hub
+      if __name__ == '__main__':
+          model = hub.Module(name='stdc1_seg_voc')
+          img = cv2.imread("/PATH/TO/IMAGE")
+          result = model.predict(images=[img], visualization=True)
+      ```
+- ### 2.Fine-tune and Encapsulation
+    - After completing the installation of PaddlePaddle and PaddleHub, you can start using the stdc1_seg_voc model to fine-tune datasets such as OpticDiscSeg.
+    - Steps:
+         - Step1: Define the data preprocessing method
+            - ```python
+              from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+              transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+              ```
+            - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+         - Step2: Download the dataset
+            - ```python
+              from paddlehub.datasets import OpticDiscSeg
+              train_reader = OpticDiscSeg(transform, mode='train')
+              ```
+                * `transforms`: data preprocessing methods.
+                * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+                * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+        - Step3: Load the pre-trained model
+            - ```python
+              import paddlehub as hub
+              model = hub.Module(name='stdc1_seg_voc', num_classes=2, pretrained=None)
+              ```
+                - `name`: model name.
+                - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+        - Step4:  Optimization strategy
+            - ```python
+              import paddle
+              from paddlehub.finetune.trainer import Trainer
+              scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9,  end_lr=0.0001)
+              optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+              trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+              trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+              ```
+    -  Model prediction
+        - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+            ```python
+            import paddle
+            import cv2
+            import paddlehub as hub
+            if __name__ == '__main__':
+                model = hub.Module(name='stdc1_seg_voc', pretrained='/PATH/TO/CHECKPOINT')
+                img = cv2.imread("/PATH/TO/IMAGE")
+                model.predict(images=[img], visualization=True)
+            ```
+            - **Args**
+                * `images`: Image path or ndarray data with format [H, W, C], BGR.
+                * `visualization`: Whether to save the recognition results as picture files.
+                * `save_path`: Save path of the result, default is 'seg_result'.
+## IV. Server Deployment
+- PaddleHub Serving can deploy an online service of image segmentation.
+- ### Step 1: Start PaddleHub Serving
+    - Run the startup command:
+        - ```shell
+          $ hub serving start -m stdc1_seg_voc
+          ```
+    - The servitization API is now deployed and the default port number is 8866.
+    - **NOTE:**  If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+- ### Step 2: Send a predictive request
+    - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+        ```python
+        import requests
+        import json
+        import cv2
+        import base64
+        import numpy as np
+        def cv2_to_base64(image):
+            data = cv2.imencode('.jpg', image)[1]
+            return base64.b64encode(data.tostring()).decode('utf8')
+        def base64_to_cv2(b64str):
+            data = base64.b64decode(b64str.encode('utf8'))
+            data = np.fromstring(data, np.uint8)
+            data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return data
+        org_im = cv2.imread('/PATH/TO/IMAGE')
+        data = {'images':[cv2_to_base64(org_im)]}
+        headers = {"Content-type": "application/json"}
+        url = "http://127.0.0.1:8866/predict/stdc1_seg_voc"
+        r = requests.post(url=url, headers=headers, data=json.dumps(data))
+        mask = base64_to_cv2(r.json()["results"][0])
+        ```
+## V. Release Note
+- 1.0.0
+  First release
--- a/modules/image/semantic_segmentation/stdc1_seg_voc/layers.py
+++ b/modules/image/semantic_segmentation/stdc1_seg_voc/layers.py
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+def SyncBatchNorm(*args, **kwargs):
+    """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+    if paddle.get_device() == 'cpu':
+        return nn.BatchNorm2D(*args, **kwargs)
+    else:
+        return nn.SyncBatchNorm(*args, **kwargs)
+class SeparableConvBNReLU(nn.Layer):
+    """Depthwise Separable Convolution."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(SeparableConvBNReLU, self).__init__()
+        self.depthwise_conv = ConvBN(
+            in_channels,
+            out_channels=in_channels,
+            kernel_size=kernel_size,
+            padding=padding,
+            groups=in_channels,
+            **kwargs)
+        self.piontwise_conv = ConvBNReLU(
+            in_channels, out_channels, kernel_size=1, groups=1)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.depthwise_conv(x)
+        x = self.piontwise_conv(x)
+        return x
+class ConvBN(nn.Layer):
+    """Basic conv bn layer"""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBN, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        return x
+class ConvBNReLU(nn.Layer):
+    """Basic conv bn relu layer."""
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int,
+                 padding: str = 'same',
+                 **kwargs: dict):
+        super(ConvBNReLU, self).__init__()
+        self._conv = Conv2D(
+            in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+        self._batch_norm = SyncBatchNorm(out_channels)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self._conv(x)
+        x = self._batch_norm(x)
+        x = F.relu(x)
+        return x
+class Activation(nn.Layer):
+    """
+    The wrapper of activations.
+    Args:
+        act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+            'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+            'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+            'hsigmoid']. Default: None, means identical transformation.
+    Returns:
+        A callable object of Activation.
+    Raises:
+        KeyError: When parameter `act` is not in the optional range.
+    Examples:
+        from paddleseg.models.common.activation import Activation
+        relu = Activation("relu")
+        print(relu)
+        # <class 'paddle.nn.layer.activation.ReLU'>
+        sigmoid = Activation("sigmoid")
+        print(sigmoid)
+        # <class 'paddle.nn.layer.activation.Sigmoid'>
+        not_exit_one = Activation("not_exit_one")
+        # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+        # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+        # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+    """
+    def __init__(self, act: str = None):
+        super(Activation, self).__init__()
+        self._act = act
+        upper_act_names = activation.__dict__.keys()
+        lower_act_names = [act.lower() for act in upper_act_names]
+        act_dict = dict(zip(lower_act_names, upper_act_names))
+        if act is not None:
+            if act in act_dict.keys():
+                act_name = act_dict[act]
+                self.act_func = eval("activation.{}()".format(act_name))
+            else:
+                raise KeyError("{} does not exist in the current {}".format(
+                    act, act_dict.keys()))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        if self._act is not None:
+            return self.act_func(x)
+        else:
+            return x
+class ASPPModule(nn.Layer):
+    """
+    Atrous Spatial Pyramid Pooling.
+    Args:
+        aspp_ratios (tuple): The dilation rate using in ASSP module.
+        in_channels (int): The number of input channels.
+        out_channels (int): The number of output channels.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+        use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+        image_pooling (bool, optional): If augmented with image-level features. Default: False
+    """
+    def __init__(self,
+                 aspp_ratios: tuple,
+                 in_channels: int,
+                 out_channels: int,
+                 align_corners: bool,
+                 use_sep_conv: bool = False,
+                 image_pooling: bool = False):
+        super().__init__()
+        self.align_corners = align_corners
+        self.aspp_blocks = nn.LayerList()
+        for ratio in aspp_ratios:
+            if use_sep_conv and ratio > 1:
+                conv_func = SeparableConvBNReLU
+            else:
+                conv_func = ConvBNReLU
+            block = conv_func(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=1 if ratio == 1 else 3,
+                dilation=ratio,
+                padding=0 if ratio == 1 else ratio)
+            self.aspp_blocks.append(block)
+        out_size = len(self.aspp_blocks)
+        if image_pooling:
+            self.global_avg_pool = nn.Sequential(
+                nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+                ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+            out_size += 1
+        self.image_pooling = image_pooling
+        self.conv_bn_relu = ConvBNReLU(
+            in_channels=out_channels * out_size,
+            out_channels=out_channels,
+            kernel_size=1)
+        self.dropout = nn.Dropout(p=0.1)  # drop rate
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        outputs = []
+        for block in self.aspp_blocks:
+            y = block(x)
+            y = F.interpolate(
+                y,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(y)
+        if self.image_pooling:
+            img_avg = self.global_avg_pool(x)
+            img_avg = F.interpolate(
+                img_avg,
+                x.shape[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            outputs.append(img_avg)
+        x = paddle.concat(outputs, axis=1)
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        return x
+class AuxLayer(nn.Layer):
+    """
+    The auxiliary layer implementation for auxiliary loss.
+    Args:
+        in_channels (int): The number of input channels.
+        inter_channels (int): The intermediate channels.
+        out_channels (int): The number of output channels, and usually it is num_classes.
+        dropout_prob (float, optional): The drop rate. Default: 0.1.
+    """
+    def __init__(self,
+                 in_channels: int,
+                 inter_channels: int,
+                 out_channels: int,
+                 dropout_prob: float = 0.1,
+                 **kwargs):
+        super().__init__()
+        self.conv_bn_relu = ConvBNReLU(
+            in_channels=in_channels,
+            out_channels=inter_channels,
+            kernel_size=3,
+            padding=1,
+            **kwargs)
+        self.dropout = nn.Dropout(p=dropout_prob)
+        self.conv = nn.Conv2D(
+            in_channels=inter_channels,
+            out_channels=out_channels,
+            kernel_size=1)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv_bn_relu(x)
+        x = self.dropout(x)
+        x = self.conv(x)
+        return x
+class Add(nn.Layer):
+    def __init__(self):
+        super().__init__()
+    def forward(self, x: paddle.Tensor, y: paddle.Tensor, name=None) -> paddle.Tensor:
+        return paddle.add(x, y, name)
+class PPModule(nn.Layer):
+    """
+    Pyramid pooling module originally in PSPNet.
+    Args:
+        in_channels (int): The number of intput channels to pyramid pooling module.
+        out_channels (int): The number of output channels after pyramid pooling module.
+        bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1, 2, 3, 6).
+        dim_reduction (bool, optional): A bool value represents if reducing dimension after pooling. Default: True.
+        align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+            is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+    """
+    def __init__(self, 
+                 in_channels: int, 
+                 out_channels: int, 
+                 bin_sizes: tuple, 
+                 dim_reduction: bool,
+                 align_corners: bool):
+        super().__init__()
+        self.bin_sizes = bin_sizes
+        inter_channels = in_channels
+        if dim_reduction:
+            inter_channels = in_channels // len(bin_sizes)
+        # we use dimension reduction after pooling mentioned in original implementation.
+        self.stages = nn.LayerList([
+            self._make_stage(in_channels, inter_channels, size)
+            for size in bin_sizes
+        ])
+        self.conv_bn_relu2 = ConvBNReLU(
+            in_channels=in_channels + inter_channels * len(bin_sizes),
+            out_channels=out_channels,
+            kernel_size=3,
+            padding=1)
+        self.align_corners = align_corners
+    def _make_stage(self, in_channels: int, out_channels: int, size: int):
+        """
+        Create one pooling layer.
+        In our implementation, we adopt the same dimension reduction as the original paper that might be
+        slightly different with other implementations.
+        After pooling, the channels are reduced to 1/len(bin_sizes) immediately, while some other implementations
+        keep the channels to be same.
+        Args:
+            in_channels (int): The number of intput channels to pyramid pooling module.
+            out_channels (int): The number of output channels to pyramid pooling module.
+            size (int): The out size of the pooled layer.
+        Returns:
+            conv (Tensor): A tensor after Pyramid Pooling Module.
+        """
+        prior = nn.AdaptiveAvgPool2D(output_size=(size, size))
+        conv = ConvBNReLU(
+            in_channels=in_channels, out_channels=out_channels, kernel_size=1)
+        return nn.Sequential(prior, conv)
+    def forward(self, input: paddle.Tensor) -> paddle.Tensor:
+        cat_layers = []
+        for stage in self.stages:
+            x = stage(input)
+            x = F.interpolate(
+                x,
+                paddle.shape(input)[2:],
+                mode='bilinear',
+                align_corners=self.align_corners)
+            cat_layers.append(x)
+        cat_layers = [input] + cat_layers[::-1]
+        cat = paddle.concat(cat_layers, axis=1)
+        out = self.conv_bn_relu2(cat)
+        return out
\ No newline at end of file
--- a/modules/image/semantic_segmentation/stdc1_seg_voc/module.py
+++ b/modules/image/semantic_segmentation/stdc1_seg_voc/module.py
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+from typing import Union, List, Tuple
+import numpy as np
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from stdc1_seg_voc.stdcnet import STDC1
+import stdc1_seg_voc.layers as layers
+@moduleinfo(
+    name="stdc1_seg_voc",
+    type="CV/semantic_segmentation",
+    author="paddlepaddle",
+    author_email="",
+    summary="STDCSeg is a segmentation model.",
+    version="1.0.0",
+    meta=ImageSegmentationModule)
+class STDCSeg(nn.Layer):
+    """
+    The STDCSeg implementation based on PaddlePaddle.
+    The original article refers to Meituan
+    Fan, Mingyuan, et al. "Rethinking BiSeNet For Real-time Semantic Segmentation."
+    (https://arxiv.org/abs/2104.13188)
+    Args:
+        num_classes(int,optional): The unique number of target classes.
+        use_boundary_8(bool,non-optional): Whether to use detail loss. it should be True accroding to paper for best metric. Default: True.
+        Actually,if you want to use _boundary_2/_boundary_4/_boundary_16,you should append loss function number of DetailAggregateLoss.It should work properly.
+        use_conv_last(bool,optional): Determine ContextPath 's inplanes variable according to whether to use bockbone's last conv. Default: False.
+        pretrained (str, optional): The path or url of pretrained model. Default: None.
+    """
+    def __init__(self,
+                 num_classes: int = 19,
+                 use_boundary_2: bool = False,
+                 use_boundary_4: bool = False,
+                 use_boundary_8: bool = True,
+                 use_boundary_16: bool = False,
+                 use_conv_last: bool = False,
+                 pretrained: str = None):
+        super(STDCSeg, self).__init__()
+        self.use_boundary_2 = use_boundary_2
+        self.use_boundary_4 = use_boundary_4
+        self.use_boundary_8 = use_boundary_8
+        self.use_boundary_16 = use_boundary_16
+        self.cp = ContextPath(STDC1(), use_conv_last=use_conv_last)
+        self.ffm = FeatureFusionModule(384, 256)
+        self.conv_out = SegHead(256, 256, num_classes)
+        self.conv_out8 = SegHead(128, 64, num_classes)
+        self.conv_out16 = SegHead(128, 64, num_classes)
+        self.conv_out_sp16 = SegHead(512, 64, 1)
+        self.conv_out_sp8 = SegHead(256, 64, 1)
+        self.conv_out_sp4 = SegHead(64, 64, 1)
+        self.conv_out_sp2 = SegHead(32, 64, 1)
+        self.transforms = T.Compose([T.Normalize()])
+        if pretrained is not None:
+            model_dict = paddle.load(pretrained)
+            self.set_dict(model_dict)
+            print("load custom parameters success")
+        else:
+            checkpoint = os.path.join(self.directory, 'model.pdparams')
+            model_dict = paddle.load(checkpoint)
+            self.set_dict(model_dict)
+            print("load pretrained parameters success")
+    def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+        return self.transforms(img)
+    def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+        x_hw = paddle.shape(x)[2:]
+        feat_res2, feat_res4, feat_res8, _, feat_cp8, feat_cp16 = self.cp(x)
+        logit_list = []
+        if self.training:
+            feat_fuse = self.ffm(feat_res8, feat_cp8)
+            feat_out = self.conv_out(feat_fuse)
+            feat_out8 = self.conv_out8(feat_cp8)
+            feat_out16 = self.conv_out16(feat_cp16)
+            logit_list = [feat_out, feat_out8, feat_out16]
+            logit_list = [
+                F.interpolate(x, x_hw, mode='bilinear', align_corners=True)
+                for x in logit_list
+            ]
+            if self.use_boundary_2:
+                feat_out_sp2 = self.conv_out_sp2(feat_res2)
+                logit_list.append(feat_out_sp2)
+            if self.use_boundary_4:
+                feat_out_sp4 = self.conv_out_sp4(feat_res4)
+                logit_list.append(feat_out_sp4)
+            if self.use_boundary_8:
+                feat_out_sp8 = self.conv_out_sp8(feat_res8)
+                logit_list.append(feat_out_sp8)
+        else:
+            feat_fuse = self.ffm(feat_res8, feat_cp8)
+            feat_out = self.conv_out(feat_fuse)
+            feat_out = F.interpolate(
+                feat_out, x_hw, mode='bilinear', align_corners=True)
+            logit_list = [feat_out]
+        return logit_list
+class SegHead(nn.Layer):
+    def __init__(self, in_chan: int, mid_chan: int, n_classes:int):
+        super(SegHead, self).__init__()
+        self.conv = layers.ConvBNReLU(
+            in_chan, mid_chan, kernel_size=3, stride=1, padding=1)
+        self.conv_out = nn.Conv2D(
+            mid_chan, n_classes, kernel_size=1, bias_attr=None)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        x = self.conv(x)
+        x = self.conv_out(x)
+        return x
+class AttentionRefinementModule(nn.Layer):
+    def __init__(self, in_chan: int, out_chan: int):
+        super(AttentionRefinementModule, self).__init__()
+        self.conv = layers.ConvBNReLU(
+            in_chan, out_chan, kernel_size=3, stride=1, padding=1)
+        self.conv_atten = nn.Conv2D(
+            out_chan, out_chan, kernel_size=1, bias_attr=None)
+        self.bn_atten = nn.BatchNorm2D(out_chan)
+        self.sigmoid_atten = nn.Sigmoid()
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        feat = self.conv(x)
+        atten = F.adaptive_avg_pool2d(feat, 1)
+        atten = self.conv_atten(atten)
+        atten = self.bn_atten(atten)
+        atten = self.sigmoid_atten(atten)
+        out = paddle.multiply(feat, atten)
+        return out
+class ContextPath(nn.Layer):
+    def __init__(self, backbone, use_conv_last: bool = False):
+        super(ContextPath, self).__init__()
+        self.backbone = backbone
+        self.arm16 = AttentionRefinementModule(512, 128)
+        inplanes = 1024
+        if use_conv_last:
+            inplanes = 1024
+        self.arm32 = AttentionRefinementModule(inplanes, 128)
+        self.conv_head32 = layers.ConvBNReLU(
+            128, 128, kernel_size=3, stride=1, padding=1)
+        self.conv_head16 = layers.ConvBNReLU(
+            128, 128, kernel_size=3, stride=1, padding=1)
+        self.conv_avg = layers.ConvBNReLU(
+            inplanes, 128, kernel_size=1, stride=1, padding=0)
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        feat2, feat4, feat8, feat16, feat32 = self.backbone(x)
+        feat8_hw = paddle.shape(feat8)[2:]
+        feat16_hw = paddle.shape(feat16)[2:]
+        feat32_hw = paddle.shape(feat32)[2:]
+        avg = F.adaptive_avg_pool2d(feat32, 1)
+        avg = self.conv_avg(avg)
+        avg_up = F.interpolate(avg, feat32_hw, mode='nearest')
+        feat32_arm = self.arm32(feat32)
+        feat32_sum = feat32_arm + avg_up
+        feat32_up = F.interpolate(feat32_sum, feat16_hw, mode='nearest')
+        feat32_up = self.conv_head32(feat32_up)
+        feat16_arm = self.arm16(feat16)
+        feat16_sum = feat16_arm + feat32_up
+        feat16_up = F.interpolate(feat16_sum, feat8_hw, mode='nearest')
+        feat16_up = self.conv_head16(feat16_up)
+        return feat2, feat4, feat8, feat16, feat16_up, feat32_up  # x8, x16
+class FeatureFusionModule(nn.Layer):
+    def __init__(self, in_chan:int , out_chan: int):
+        super(FeatureFusionModule, self).__init__()
+        self.convblk = layers.ConvBNReLU(
+            in_chan, out_chan, kernel_size=1, stride=1, padding=0)
+        self.conv1 = nn.Conv2D(
+            out_chan,
+            out_chan // 4,
+            kernel_size=1,
+            stride=1,
+            padding=0,
+            bias_attr=None)
+        self.conv2 = nn.Conv2D(
+            out_chan // 4,
+            out_chan,
+            kernel_size=1,
+            stride=1,
+            padding=0,
+            bias_attr=None)
+        self.relu = nn.ReLU()
+        self.sigmoid = nn.Sigmoid()
+    def forward(self, fsp: paddle.Tensor, fcp: paddle.Tensor) -> paddle.Tensor:
+        fcat = paddle.concat([fsp, fcp], axis=1)
+        feat = self.convblk(fcat)
+        atten = F.adaptive_avg_pool2d(feat, 1)
+        atten = self.conv1(atten)
+        atten = self.relu(atten)
+        atten = self.conv2(atten)
+        atten = self.sigmoid(atten)
+        feat_atten = paddle.multiply(feat, atten)
+        feat_out = feat_atten + feat
+        return feat_out
\ No newline at end of file
--- a/modules/image/semantic_segmentation/stdc1_seg_voc/stdcnet.py
+++ b/modules/image/semantic_segmentation/stdc1_seg_voc/stdcnet.py
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import math
+import paddle
+import paddle.nn as nn
+import stdc1_seg_voc.layers as L
+__all__ = ["STDC1", "STDC2"]
+class STDCNet(nn.Layer):
+    """
+    The STDCNet implementation based on PaddlePaddle.
+    The original article refers to Meituan
+    Fan, Mingyuan, et al. "Rethinking BiSeNet For Real-time Semantic Segmentation."
+    (https://arxiv.org/abs/2104.13188)
+    Args:
+        base(int, optional): base channels. Default: 64.
+        layers(list, optional): layers numbers list. It determines STDC block numbers of STDCNet's stage3\4\5. Defualt: [4, 5, 3].
+        block_num(int,optional): block_num of features block. Default: 4.
+        type(str,optional): feature fusion method "cat"/"add". Default: "cat".
+        num_classes(int, optional): class number for image classification. Default: 1000.
+        dropout(float,optional): dropout ratio. if >0,use dropout ratio.  Default: 0.20.
+        use_conv_last(bool,optional): whether to use the last ConvBNReLU layer . Default: False.
+        pretrained(str, optional): the path of pretrained model.
+    """
+    def __init__(self,
+                 base: int = 64,
+                 layers: List[int] = [4, 5, 3],
+                 block_num: int = 4,
+                 type: str = "cat",
+                 num_classes: int = 1000,
+                 dropout: float = 0.20,
+                 use_conv_last: bool = False):
+        super(STDCNet, self).__init__()
+        if type == "cat":
+            block = CatBottleneck
+        elif type == "add":
+            block = AddBottleneck
+        self.use_conv_last = use_conv_last
+        self.features = self._make_layers(base, layers, block_num, block)
+        self.conv_last = ConvBNRelu(base * 16, max(1024, base * 16), 1, 1)
+        if (layers == [4, 5, 3]):  #stdc1446
+            self.x2 = nn.Sequential(self.features[:1])
+            self.x4 = nn.Sequential(self.features[1:2])
+            self.x8 = nn.Sequential(self.features[2:6])
+            self.x16 = nn.Sequential(self.features[6:11])
+            self.x32 = nn.Sequential(self.features[11:])
+        elif (layers == [2, 2, 2]):  #stdc813
+            self.x2 = nn.Sequential(self.features[:1])
+            self.x4 = nn.Sequential(self.features[1:2])
+            self.x8 = nn.Sequential(self.features[2:4])
+            self.x16 = nn.Sequential(self.features[4:6])
+            self.x32 = nn.Sequential(self.features[6:])
+        else:
+            raise NotImplementedError(
+                "model with layers:{} is not implemented!".format(layers))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        """
+        forward function for feature extract.
+        """
+        feat2 = self.x2(x)
+        feat4 = self.x4(feat2)
+        feat8 = self.x8(feat4)
+        feat16 = self.x16(feat8)
+        feat32 = self.x32(feat16)
+        if self.use_conv_last:
+            feat32 = self.conv_last(feat32)
+        return feat2, feat4, feat8, feat16, feat32
+    def _make_layers(self, base, layers, block_num, block):
+        features = []
+        features += [ConvBNRelu(3, base // 2, 3, 2)]
+        features += [ConvBNRelu(base // 2, base, 3, 2)]
+        for i, layer in enumerate(layers):
+            for j in range(layer):
+                if i == 0 and j == 0:
+                    features.append(block(base, base * 4, block_num, 2))
+                elif j == 0:
+                    features.append(
+                        block(base * int(math.pow(2, i + 1)),
+                              base * int(math.pow(2, i + 2)), block_num, 2))
+                else:
+                    features.append(
+                        block(base * int(math.pow(2, i + 2)),
+                              base * int(math.pow(2, i + 2)), block_num, 1))
+        return nn.Sequential(*features)
+class ConvBNRelu(nn.Layer):
+    def __init__(self, in_planes: int, out_planes: int, kernel: int = 3, stride: int = 1):
+        super(ConvBNRelu, self).__init__()
+        self.conv = nn.Conv2D(
+            in_planes,
+            out_planes,
+            kernel_size=kernel,
+            stride=stride,
+            padding=kernel // 2,
+            bias_attr=False)
+        self.bn = L.SyncBatchNorm(out_planes, data_format='NCHW')
+        self.relu = nn.ReLU()
+    def forward(self, x):
+        out = self.relu(self.bn(self.conv(x)))
+        return out
+class AddBottleneck(nn.Layer):
+    def __init__(self, in_planes: int, out_planes: int, block_num: int = 3, stride: int = 1):
+        super(AddBottleneck, self).__init__()
+        assert block_num > 1, "block number should be larger than 1."
+        self.conv_list = nn.LayerList()
+        self.stride = stride
+        if stride == 2:
+            self.avd_layer = nn.Sequential(
+                nn.Conv2D(
+                    out_planes // 2,
+                    out_planes // 2,
+                    kernel_size=3,
+                    stride=2,
+                    padding=1,
+                    groups=out_planes // 2,
+                    bias_attr=False),
+                nn.BatchNorm2D(out_planes // 2),
+            )
+            self.skip = nn.Sequential(
+                nn.Conv2D(
+                    in_planes,
+                    in_planes,
+                    kernel_size=3,
+                    stride=2,
+                    padding=1,
+                    groups=in_planes,
+                    bias_attr=False),
+                nn.BatchNorm2D(in_planes),
+                nn.Conv2D(
+                    in_planes, out_planes, kernel_size=1, bias_attr=False),
+                nn.BatchNorm2D(out_planes),
+            )
+            stride = 1
+        for idx in range(block_num):
+            if idx == 0:
+                self.conv_list.append(
+                    ConvBNRelu(in_planes, out_planes // 2, kernel=1))
+            elif idx == 1 and block_num == 2:
+                self.conv_list.append(
+                    ConvBNRelu(out_planes // 2, out_planes // 2, stride=stride))
+            elif idx == 1 and block_num > 2:
+                self.conv_list.append(
+                    ConvBNRelu(out_planes // 2, out_planes // 4, stride=stride))
+            elif idx < block_num - 1:
+                self.conv_list.append(
+                    ConvBNRelu(out_planes // int(math.pow(2, idx)),
+                               out_planes // int(math.pow(2, idx + 1))))
+            else:
+                self.conv_list.append(
+                    ConvBNRelu(out_planes // int(math.pow(2, idx)),
+                               out_planes // int(math.pow(2, idx))))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        out_list = []
+        out = x
+        for idx, conv in enumerate(self.conv_list):
+            if idx == 0 and self.stride == 2:
+                out = self.avd_layer(conv(out))
+            else:
+                out = conv(out)
+            out_list.append(out)
+        if self.stride == 2:
+            x = self.skip(x)
+        return paddle.concat(out_list, axis=1) + x
+class CatBottleneck(nn.Layer):
+    def __init__(self, in_planes: int, out_planes: int, block_num: int = 3, stride: int = 1):
+        super(CatBottleneck, self).__init__()
+        assert block_num > 1, "block number should be larger than 1."
+        self.conv_list = nn.LayerList()
+        self.stride = stride
+        if stride == 2:
+            self.avd_layer = nn.Sequential(
+                nn.Conv2D(
+                    out_planes // 2,
+                    out_planes // 2,
+                    kernel_size=3,
+                    stride=2,
+                    padding=1,
+                    groups=out_planes // 2,
+                    bias_attr=False),
+                nn.BatchNorm2D(out_planes // 2),
+            )
+            self.skip = nn.AvgPool2D(kernel_size=3, stride=2, padding=1)
+            stride = 1
+        for idx in range(block_num):
+            if idx == 0:
+                self.conv_list.append(
+                    ConvBNRelu(in_planes, out_planes // 2, kernel=1))
+            elif idx == 1 and block_num == 2:
+                self.conv_list.append(
+                    ConvBNRelu(out_planes // 2, out_planes // 2, stride=stride))
+            elif idx == 1 and block_num > 2:
+                self.conv_list.append(
+                    ConvBNRelu(out_planes // 2, out_planes // 4, stride=stride))
+            elif idx < block_num - 1:
+                self.conv_list.append(
+                    ConvBNRelu(out_planes // int(math.pow(2, idx)),
+                               out_planes // int(math.pow(2, idx + 1))))
+            else:
+                self.conv_list.append(
+                    ConvBNRelu(out_planes // int(math.pow(2, idx)),
+                               out_planes // int(math.pow(2, idx))))
+    def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+        out_list = []
+        out1 = self.conv_list[0](x)
+        for idx, conv in enumerate(self.conv_list[1:]):
+            if idx == 0:
+                if self.stride == 2:
+                    out = conv(self.avd_layer(out1))
+                else:
+                    out = conv(out1)
+            else:
+                out = conv(out)
+            out_list.append(out)
+        if self.stride == 2:
+            out1 = self.skip(out1)
+        out_list.insert(0, out1)
+        out = paddle.concat(out_list, axis=1)
+        return out
+def STDC2(**kwargs):
+    model = STDCNet(base=64, layers=[4, 5, 3], **kwargs)
+    return model
+def STDC1(**kwargs):
+    model = STDCNet(base=64, layers=[2, 2, 2], **kwargs)
+    return model
\ No newline at end of file