diff --git a/modules/image/semantic_segmentation/ann_resnet50_cityscapes/README.md b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..ddbc7cd6ace39e073884fe220e1a26cf74e196d6
--- /dev/null
+++ b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/README.md
@@ -0,0 +1,182 @@
+# ann_resnet50_cityscapes
+
+|模型名称|ann_resnet50_cityscapes|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|ann_resnet50vd|
+|数据集|Cityscapes|
+|是否支持Fine-tuning|是|
+|模型大小|228MB|
+|指标|-|
+|最新更新日期|2022-03-22|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[ann](https://arxiv.org/pdf/1908.07678.pdf)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install ann_resnet50_cityscapes
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ann_resnet50_cityscapes')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ann_resnet50_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ann_resnet50_cityscapes', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ann_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m ann_resnet50_cityscapes
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/semantic_segmentation/ann_resnet50_cityscapes/README_en.md b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..43c29951a10009f8fc6dbca9cc39a92ead11c262
--- /dev/null
+++ b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/README_en.md
@@ -0,0 +1,184 @@
+# ann_resnet50_cityscapes
+
+|Module Name|ann_resnet50_cityscapes|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|ann_resnet50vd|
+|Dataset|Cityscapes|
+|Fine-tuning supported or not|Yes|
+|Module Size|228MB|
+|Data indicators|-|
+|Latest update date|2022-03-22|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [ann](https://arxiv.org/pdf/1908.07678.pdf)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install ann_resnet50_cityscapes
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ann_resnet50_cityscapes')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the ann_resnet50_cityscapes model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ann_resnet50_cityscapes', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ann_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m ann_resnet50_cityscapes
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ann_resnet50_cityscapes"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/semantic_segmentation/ann_resnet50_cityscapes/layers.py b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..083c8d2fa09fea0eb51af3d3c89b9aba84ae94db
--- /dev/null
+++ b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/layers.py
@@ -0,0 +1,275 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+ Returns:
+ A callable object of Activation.
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+ Examples:
+ from paddleseg.models.common.activation import Activation
+ relu = Activation("relu")
+ print(relu)
+ #
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool = False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
+
+
+class AuxLayer(nn.Layer):
+ """
+ The auxiliary layer implementation for auxiliary loss.
+
+ Args:
+ in_channels (int): The number of input channels.
+ inter_channels (int): The intermediate channels.
+ out_channels (int): The number of output channels, and usually it is num_classes.
+ dropout_prob (float, optional): The drop rate. Default: 0.1.
+ """
+
+ def __init__(self,
+ in_channels: int,
+ inter_channels: int,
+ out_channels: int,
+ dropout_prob: float = 0.1,
+ **kwargs):
+ super().__init__()
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=in_channels,
+ out_channels=inter_channels,
+ kernel_size=3,
+ padding=1,
+ **kwargs)
+
+ self.dropout = nn.Dropout(p=dropout_prob)
+
+ self.conv = nn.Conv2D(
+ in_channels=inter_channels,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+ x = self.conv(x)
+ return x
+
+
+class Add(nn.Layer):
+ def __init__(self):
+ super().__init__()
+
+ def forward(self, x: paddle.Tensor, y: paddle.Tensor, name: str = None):
+ return paddle.add(x, y, name)
diff --git a/modules/image/semantic_segmentation/ann_resnet50_cityscapes/module.py b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..d892c47c7dff84a269a0d0a52cb3c31da30e6cc9
--- /dev/null
+++ b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/module.py
@@ -0,0 +1,452 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import numpy as np
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+from ann_resnet50_cityscapes.resnet import ResNet50_vd
+import ann_resnet50_cityscapes.layers as layers
+
+@moduleinfo(
+ name="ann_resnet50_cityscapes",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="ANNResnet50 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class ANN(nn.Layer):
+ """
+ The ANN implementation based on PaddlePaddle.
+
+ The original article refers to
+ Zhen, Zhu, et al. "Asymmetric Non-local Neural Networks for Semantic Segmentation"
+ (https://arxiv.org/pdf/1908.07678.pdf).
+
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple, optional): Two values in the tuple indicate the indices of output of backbone.
+ key_value_channels (int, optional): The key and value channels of self-attention map in both AFNB and APNB modules.
+ Default: 256.
+ inter_channels (int, optional): Both input and output channels of APNB modules. Default: 512.
+ psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+ align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
+ e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 19,
+ backbone_indices: Tuple[int] = (2, 3),
+ key_value_channels: int = 256,
+ inter_channels: int = 512,
+ psp_size: Tuple[int] = (1, 3, 6, 8),
+ align_corners: bool = False,
+ pretrained: str = None):
+ super(ANN, self).__init__()
+
+ self.backbone = ResNet50_vd()
+ backbone_channels = [
+ self.backbone.feat_channels[i] for i in backbone_indices
+ ]
+
+ self.head = ANNHead(num_classes, backbone_indices, backbone_channels,
+ key_value_channels, inter_channels, psp_size)
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ feat_list = self.backbone(x)
+ logit_list = self.head(feat_list)
+ return [
+ F.interpolate(
+ logit,
+ paddle.shape(x)[2:],
+ mode='bilinear',
+ align_corners=self.align_corners) for logit in logit_list
+ ]
+
+
+
+class ANNHead(nn.Layer):
+ """
+ The ANNHead implementation.
+
+ It mainly consists of AFNB and APNB modules.
+
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple): Two values in the tuple indicate the indices of output of backbone.
+ The first index will be taken as low-level features; the second one will be
+ taken as high-level features in AFNB module. Usually backbone consists of four
+ downsampling stage, such as ResNet, and return an output of each stage. If it is (2, 3),
+ it means taking feature map of the third stage and the fourth stage in backbone.
+ backbone_channels (tuple): The same length with "backbone_indices". It indicates the channels of corresponding index.
+ key_value_channels (int): The key and value channels of self-attention map in both AFNB and APNB modules.
+ inter_channels (int): Both input and output channels of APNB modules.
+ psp_size (tuple): The out size of pooled feature maps.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: False
+ """
+
+ def __init__(self,
+ num_classes: int,
+ backbone_indices: Tuple[int],
+ backbone_channels: Tuple[int],
+ key_value_channels: int,
+ inter_channels: int,
+ psp_size: Tuple[int],
+ enable_auxiliary_loss: bool = False):
+ super().__init__()
+
+ low_in_channels = backbone_channels[0]
+ high_in_channels = backbone_channels[1]
+
+ self.fusion = AFNB(
+ low_in_channels=low_in_channels,
+ high_in_channels=high_in_channels,
+ out_channels=high_in_channels,
+ key_channels=key_value_channels,
+ value_channels=key_value_channels,
+ dropout_prob=0.05,
+ repeat_sizes=([1]),
+ psp_size=psp_size)
+
+ self.context = nn.Sequential(
+ layers.ConvBNReLU(
+ in_channels=high_in_channels,
+ out_channels=inter_channels,
+ kernel_size=3,
+ padding=1),
+ APNB(
+ in_channels=inter_channels,
+ out_channels=inter_channels,
+ key_channels=key_value_channels,
+ value_channels=key_value_channels,
+ dropout_prob=0.05,
+ repeat_sizes=([1]),
+ psp_size=psp_size))
+
+ self.cls = nn.Conv2D(
+ in_channels=inter_channels, out_channels=num_classes, kernel_size=1)
+ self.auxlayer = layers.AuxLayer(
+ in_channels=low_in_channels,
+ inter_channels=low_in_channels // 2,
+ out_channels=num_classes,
+ dropout_prob=0.05)
+
+ self.backbone_indices = backbone_indices
+ self.enable_auxiliary_loss = enable_auxiliary_loss
+
+ def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]:
+ logit_list = []
+ low_level_x = feat_list[self.backbone_indices[0]]
+ high_level_x = feat_list[self.backbone_indices[1]]
+ x = self.fusion(low_level_x, high_level_x)
+ x = self.context(x)
+ logit = self.cls(x)
+ logit_list.append(logit)
+
+ if self.enable_auxiliary_loss:
+ auxiliary_logit = self.auxlayer(low_level_x)
+ logit_list.append(auxiliary_logit)
+
+ return logit_list
+
+
+class AFNB(nn.Layer):
+ """
+ Asymmetric Fusion Non-local Block.
+
+ Args:
+ low_in_channels (int): Low-level-feature channels.
+ high_in_channels (int): High-level-feature channels.
+ out_channels (int): Out channels of AFNB module.
+ key_channels (int): The key channels in self-attention block.
+ value_channels (int): The value channels in self-attention block.
+ dropout_prob (float): The dropout rate of output.
+ repeat_sizes (tuple, optional): The number of AFNB modules. Default: ([1]).
+ psp_size (tuple. optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+ """
+
+ def __init__(self,
+ low_in_channels: int,
+ high_in_channels: int,
+ out_channels: int,
+ key_channels: int,
+ value_channels: int,
+ dropout_prob: float,
+ repeat_sizes: Tuple[int] = ([1]),
+ psp_size: Tuple[int] = (1, 3, 6, 8)):
+ super().__init__()
+
+ self.psp_size = psp_size
+ self.stages = nn.LayerList([
+ SelfAttentionBlock_AFNB(low_in_channels, high_in_channels,
+ key_channels, value_channels, out_channels,
+ size) for size in repeat_sizes
+ ])
+ self.conv_bn = layers.ConvBN(
+ in_channels=out_channels + high_in_channels,
+ out_channels=out_channels,
+ kernel_size=1)
+ self.dropout = nn.Dropout(p=dropout_prob)
+
+ def forward(self, low_feats: List[paddle.Tensor], high_feats: List[paddle.Tensor]) -> paddle.Tensor:
+ priors = [stage(low_feats, high_feats) for stage in self.stages]
+ context = priors[0]
+ for i in range(1, len(priors)):
+ context += priors[i]
+
+ output = self.conv_bn(paddle.concat([context, high_feats], axis=1))
+ output = self.dropout(output)
+
+ return output
+
+
+class APNB(nn.Layer):
+ """
+ Asymmetric Pyramid Non-local Block.
+
+ Args:
+ in_channels (int): The input channels of APNB module.
+ out_channels (int): Out channels of APNB module.
+ key_channels (int): The key channels in self-attention block.
+ value_channels (int): The value channels in self-attention block.
+ dropout_prob (float): The dropout rate of output.
+ repeat_sizes (tuple, optional): The number of AFNB modules. Default: ([1]).
+ psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+ """
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ key_channels: int,
+ value_channels: int,
+ dropout_prob: float,
+ repeat_sizes: Tuple[int] = ([1]),
+ psp_size: Tuple[int] = (1, 3, 6, 8)):
+ super().__init__()
+
+ self.psp_size = psp_size
+ self.stages = nn.LayerList([
+ SelfAttentionBlock_APNB(in_channels, out_channels, key_channels,
+ value_channels, size)
+ for size in repeat_sizes
+ ])
+ self.conv_bn = layers.ConvBNReLU(
+ in_channels=in_channels * 2,
+ out_channels=out_channels,
+ kernel_size=1)
+ self.dropout = nn.Dropout(p=dropout_prob)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ priors = [stage(x) for stage in self.stages]
+ context = priors[0]
+ for i in range(1, len(priors)):
+ context += priors[i]
+
+ output = self.conv_bn(paddle.concat([context, x], axis=1))
+ output = self.dropout(output)
+
+ return output
+
+
+def _pp_module(x: paddle.Tensor, psp_size: List[int]) -> paddle.Tensor:
+ n, c, h, w = x.shape
+ priors = []
+ for size in psp_size:
+ feat = F.adaptive_avg_pool2d(x, size)
+ feat = paddle.reshape(feat, shape=(0, c, -1))
+ priors.append(feat)
+ center = paddle.concat(priors, axis=-1)
+ return center
+
+
+class SelfAttentionBlock_AFNB(nn.Layer):
+ """
+ Self-Attention Block for AFNB module.
+
+ Args:
+ low_in_channels (int): Low-level-feature channels.
+ high_in_channels (int): High-level-feature channels.
+ key_channels (int): The key channels in self-attention block.
+ value_channels (int): The value channels in self-attention block.
+ out_channels (int, optional): Out channels of AFNB module. Default: None.
+ scale (int, optional): Pooling size. Default: 1.
+ psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+ """
+
+ def __init__(self,
+ low_in_channels: int,
+ high_in_channels: int,
+ key_channels: int,
+ value_channels: int,
+ out_channels: int = None,
+ scale: int = 1,
+ psp_size: Tuple[int] = (1, 3, 6, 8)):
+ super().__init__()
+
+ self.scale = scale
+ self.in_channels = low_in_channels
+ self.out_channels = out_channels
+ self.key_channels = key_channels
+ self.value_channels = value_channels
+ if out_channels == None:
+ self.out_channels = high_in_channels
+ self.pool = nn.MaxPool2D(scale)
+ self.f_key = layers.ConvBNReLU(
+ in_channels=low_in_channels,
+ out_channels=key_channels,
+ kernel_size=1)
+ self.f_query = layers.ConvBNReLU(
+ in_channels=high_in_channels,
+ out_channels=key_channels,
+ kernel_size=1)
+ self.f_value = nn.Conv2D(
+ in_channels=low_in_channels,
+ out_channels=value_channels,
+ kernel_size=1)
+
+ self.W = nn.Conv2D(
+ in_channels=value_channels,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.psp_size = psp_size
+
+ def forward(self, low_feats: List[paddle.Tensor], high_feats: List[paddle.Tensor]) -> paddle.Tensor:
+ batch_size, _, h, w = high_feats.shape
+
+ value = self.f_value(low_feats)
+ value = _pp_module(value, self.psp_size)
+ value = paddle.transpose(value, (0, 2, 1))
+
+ query = self.f_query(high_feats)
+ query = paddle.reshape(query, shape=(0, self.key_channels, -1))
+ query = paddle.transpose(query, perm=(0, 2, 1))
+
+ key = self.f_key(low_feats)
+ key = _pp_module(key, self.psp_size)
+
+ sim_map = paddle.matmul(query, key)
+ sim_map = (self.key_channels**-.5) * sim_map
+ sim_map = F.softmax(sim_map, axis=-1)
+
+ context = paddle.matmul(sim_map, value)
+ context = paddle.transpose(context, perm=(0, 2, 1))
+ hf_shape = paddle.shape(high_feats)
+ context = paddle.reshape(
+ context, shape=[0, self.value_channels, hf_shape[2], hf_shape[3]])
+
+ context = self.W(context)
+
+ return context
+
+
+class SelfAttentionBlock_APNB(nn.Layer):
+ """
+ Self-Attention Block for APNB module.
+
+ Args:
+ in_channels (int): The input channels of APNB module.
+ out_channels (int): The out channels of APNB module.
+ key_channels (int): The key channels in self-attention block.
+ value_channels (int): The value channels in self-attention block.
+ scale (int, optional): Pooling size. Default: 1.
+ psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+ """
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ key_channels: int,
+ value_channels: int,
+ scale: int = 1,
+ psp_size: Tuple[int] = (1, 3, 6, 8)):
+ super().__init__()
+
+ self.scale = scale
+ self.in_channels = in_channels
+ self.out_channels = out_channels
+ self.key_channels = key_channels
+ self.value_channels = value_channels
+ self.pool = nn.MaxPool2D(scale)
+ self.f_key = layers.ConvBNReLU(
+ in_channels=self.in_channels,
+ out_channels=self.key_channels,
+ kernel_size=1)
+ self.f_query = self.f_key
+ self.f_value = nn.Conv2D(
+ in_channels=self.in_channels,
+ out_channels=self.value_channels,
+ kernel_size=1)
+ self.W = nn.Conv2D(
+ in_channels=self.value_channels,
+ out_channels=self.out_channels,
+ kernel_size=1)
+
+ self.psp_size = psp_size
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ batch_size, _, h, w = x.shape
+ if self.scale > 1:
+ x = self.pool(x)
+
+ value = self.f_value(x)
+ value = _pp_module(value, self.psp_size)
+ value = paddle.transpose(value, perm=(0, 2, 1))
+
+ query = self.f_query(x)
+ query = paddle.reshape(query, shape=(0, self.key_channels, -1))
+ query = paddle.transpose(query, perm=(0, 2, 1))
+
+ key = self.f_key(x)
+ key = _pp_module(key, self.psp_size)
+
+ sim_map = paddle.matmul(query, key)
+ sim_map = (self.key_channels**-.5) * sim_map
+ sim_map = F.softmax(sim_map, axis=-1)
+
+ context = paddle.matmul(sim_map, value)
+ context = paddle.transpose(context, perm=(0, 2, 1))
+
+ x_shape = paddle.shape(x)
+ context = paddle.reshape(
+ context, shape=[0, self.value_channels, x_shape[2], x_shape[3]])
+ context = self.W(context)
+
+ return context
diff --git a/modules/image/semantic_segmentation/ann_resnet50_cityscapes/resnet.py b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..efa7ba57045337ef5c6ee2f84dc9de0e72e73e32
--- /dev/null
+++ b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/resnet.py
@@ -0,0 +1,361 @@
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import Union, List, Tuple
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import ann_resnet50_cityscapes.layers as layers
+
+
+class ConvBNLayer(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ data_format: str = 'NCHW'):
+ super(ConvBNLayer, self).__init__()
+ if dilation != 1 and kernel_size != 3:
+ raise RuntimeError("When the dilation isn't 1," \
+ "the kernel_size should be 3.")
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = nn.AvgPool2D(
+ kernel_size=2,
+ stride=2,
+ padding=0,
+ ceil_mode=True,
+ data_format=data_format)
+ self._conv = nn.Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 \
+ if dilation == 1 else dilation,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False,
+ data_format=data_format)
+
+ self._batch_norm = layers.SyncBatchNorm(
+ out_channels, data_format=data_format)
+ self._act_op = layers.Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ data_format: str = 'NCHW'):
+ super(BottleneckBlock, self).__init__()
+
+ self.data_format = data_format
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ data_format=data_format)
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ data_format=data_format)
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ data_format=data_format)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ data_format=data_format)
+
+ self.shortcut = shortcut
+ # NOTE: Use the wrap layer for quantization training
+ self.add = layers.Add()
+ self.relu = layers.Activation(act="relu")
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = self.add(short, conv2)
+ y = self.relu(y)
+ return y
+
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ dilation: int = 1,
+ shortcut: bool = True,
+ if_first: bool = False,
+ data_format: str = 'NCHW'):
+ super(BasicBlock, self).__init__()
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ dilation=dilation,
+ act='relu',
+ data_format=data_format)
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ dilation=dilation,
+ act=None,
+ data_format=data_format)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ data_format=data_format)
+
+ self.shortcut = shortcut
+ self.dilation = dilation
+ self.data_format = data_format
+ self.add = layers.Add()
+ self.relu = layers.Activation(act="relu")
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = self.add(short, conv1)
+ y = self.relu(y)
+
+ return y
+
+
+class ResNet_vd(nn.Layer):
+ """
+ The ResNet_vd implementation based on PaddlePaddle.
+
+ The original article refers to Jingdong
+ Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
+ (https://arxiv.org/pdf/1812.01187.pdf).
+
+ Args:
+ layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50.
+ output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8.
+ multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1).
+ pretrained (str, optional): The path of pretrained model.
+
+ """
+
+ def __init__(self,
+ layers: int = 50,
+ output_stride: int = 8,
+ multi_grid: Tuple[int] = (1, 1, 1),
+ pretrained: str = None,
+ data_format: str = 'NCHW'):
+ super(ResNet_vd, self).__init__()
+
+ self.data_format = data_format
+ self.conv1_logit = None # for gscnn shape stream
+ self.layers = layers
+ supported_layers = [18, 34, 50, 101, 152, 200]
+ assert layers in supported_layers, \
+ "supported layers are {} but input layer is {}".format(
+ supported_layers, layers)
+
+ if layers == 18:
+ depth = [2, 2, 2, 2]
+ elif layers == 34 or layers == 50:
+ depth = [3, 4, 6, 3]
+ elif layers == 101:
+ depth = [3, 4, 23, 3]
+ elif layers == 152:
+ depth = [3, 8, 36, 3]
+ elif layers == 200:
+ depth = [3, 12, 48, 3]
+ num_channels = [64, 256, 512, 1024
+ ] if layers >= 50 else [64, 64, 128, 256]
+ num_filters = [64, 128, 256, 512]
+
+ # for channels of four returned stages
+ self.feat_channels = [c * 4 for c in num_filters
+ ] if layers >= 50 else num_filters
+
+ dilation_dict = None
+ if output_stride == 8:
+ dilation_dict = {2: 2, 3: 4}
+ elif output_stride == 16:
+ dilation_dict = {3: 2}
+
+ self.conv1_1 = ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ data_format=data_format)
+ self.conv1_2 = ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ data_format=data_format)
+ self.conv1_3 = ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ data_format=data_format)
+ self.pool2d_max = nn.MaxPool2D(
+ kernel_size=3, stride=2, padding=1, data_format=data_format)
+
+ # self.block_list = []
+ self.stage_list = []
+ if layers >= 50:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ if layers in [101, 152] and block == 2:
+ if i == 0:
+ conv_name = "res" + str(block + 2) + "a"
+ else:
+ conv_name = "res" + str(block + 2) + "b" + str(i)
+ else:
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+
+ ###############################################################################
+ # Add dilation rate for some segmentation tasks, if dilation_dict is not None.
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+
+ # Actually block here is 'stage', and i is 'block' in 'stage'
+ # At the stage 4, expand the the dilation_rate if given multi_grid
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ ###############################################################################
+
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ dilation=dilation_rate,
+ data_format=data_format))
+
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+ else:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ dilation_rate = dilation_dict[block] \
+ if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+
+ basic_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BasicBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block],
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0 \
+ and dilation_rate == 1 else 1,
+ dilation=dilation_rate,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ data_format=data_format))
+ block_list.append(basic_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ self.pretrained = pretrained
+
+ def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ self.conv1_logit = y.clone()
+ y = self.pool2d_max(y)
+
+ # A feature list saves the output feature map of each stage.
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+
+ return feat_list
+
+
+def ResNet50_vd(**args):
+ model = ResNet_vd(layers=50, **args)
+ return model
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ann_resnet50_voc/README.md b/modules/image/semantic_segmentation/ann_resnet50_voc/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..bd91399116cd00d8c55177270ce06fe758ad7b5c
--- /dev/null
+++ b/modules/image/semantic_segmentation/ann_resnet50_voc/README.md
@@ -0,0 +1,182 @@
+# ann_resnet50_voc
+
+|模型名称|ann_resnet50_voc|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|ann_resnet50vd|
+|数据集|PascalVOC2012|
+|是否支持Fine-tuning|是|
+|模型大小|228MB|
+|指标|-|
+|最新更新日期|2022-03-21|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[ann](https://arxiv.org/pdf/1908.07678.pdf)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install ann_resnet50_voc
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ann_resnet50_voc')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ann_resnet50_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ann_resnet50_voc', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ann_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m ann_resnet50_voc
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/semantic_segmentation/ann_resnet50_voc/README_en.md b/modules/image/semantic_segmentation/ann_resnet50_voc/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..2226a22d6039c5480ec097aa0dc491545bc2a8ec
--- /dev/null
+++ b/modules/image/semantic_segmentation/ann_resnet50_voc/README_en.md
@@ -0,0 +1,182 @@
+# ann_resnet50_voc
+
+|Module Name|ann_resnet50_voc|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|ann_resnet50vd|
+|Dataset|PascalVOC2012|
+|Fine-tuning supported or not|Yes|
+|Module Size|228MB|
+|Data indicators|-|
+|Latest update date|2022-03-22|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [ann](https://arxiv.org/pdf/1908.07678.pdf)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install ann_resnet50_voc
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ann_resnet50_voc')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the ann_resnet50_voc model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ann_resnet50_voc', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ann_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m ann_resnet50_voc
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ann_resnet50_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/semantic_segmentation/ann_resnet50_voc/layers.py b/modules/image/semantic_segmentation/ann_resnet50_voc/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..8060d63d280962a3e99f2dd7b910d0a8bf8445eb
--- /dev/null
+++ b/modules/image/semantic_segmentation/ann_resnet50_voc/layers.py
@@ -0,0 +1,276 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Union, List, Tuple
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+ Returns:
+ A callable object of Activation.
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+ Examples:
+ from paddleseg.models.common.activation import Activation
+ relu = Activation("relu")
+ print(relu)
+ #
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool = False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
+
+
+class AuxLayer(nn.Layer):
+ """
+ The auxiliary layer implementation for auxiliary loss.
+
+ Args:
+ in_channels (int): The number of input channels.
+ inter_channels (int): The intermediate channels.
+ out_channels (int): The number of output channels, and usually it is num_classes.
+ dropout_prob (float, optional): The drop rate. Default: 0.1.
+ """
+
+ def __init__(self,
+ in_channels: int,
+ inter_channels: int,
+ out_channels: int,
+ dropout_prob: float = 0.1,
+ **kwargs):
+ super().__init__()
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=in_channels,
+ out_channels=inter_channels,
+ kernel_size=3,
+ padding=1,
+ **kwargs)
+
+ self.dropout = nn.Dropout(p=dropout_prob)
+
+ self.conv = nn.Conv2D(
+ in_channels=inter_channels,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+ x = self.conv(x)
+ return x
+
+
+class Add(nn.Layer):
+ def __init__(self):
+ super().__init__()
+
+ def forward(self, x: paddle.Tensor, y: paddle.Tensor, name: str = None) -> paddle.Tensor:
+ return paddle.add(x, y, name)
diff --git a/modules/image/semantic_segmentation/ann_resnet50_voc/module.py b/modules/image/semantic_segmentation/ann_resnet50_voc/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..f0218dde73f6f09ef61040b55b02699829bdb7fb
--- /dev/null
+++ b/modules/image/semantic_segmentation/ann_resnet50_voc/module.py
@@ -0,0 +1,452 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import numpy as np
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+from ann_resnet50_voc.resnet import ResNet50_vd
+import ann_resnet50_voc.layers as layers
+
+@moduleinfo(
+ name="ann_resnet50_voc",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="ANNResnet50 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class ANN(nn.Layer):
+ """
+ The ANN implementation based on PaddlePaddle.
+
+ The original article refers to
+ Zhen, Zhu, et al. "Asymmetric Non-local Neural Networks for Semantic Segmentation"
+ (https://arxiv.org/pdf/1908.07678.pdf).
+
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple, optional): Two values in the tuple indicate the indices of output of backbone.
+ key_value_channels (int, optional): The key and value channels of self-attention map in both AFNB and APNB modules.
+ Default: 256.
+ inter_channels (int, optional): Both input and output channels of APNB modules. Default: 512.
+ psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+ align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
+ e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 21,
+ backbone_indices: Tuple[int] = (2, 3),
+ key_value_channels: int = 256,
+ inter_channels: int = 512,
+ psp_size: Tuple[int] = (1, 3, 6, 8),
+ align_corners: bool = False,
+ pretrained: str = None):
+ super(ANN, self).__init__()
+
+ self.backbone = ResNet50_vd()
+ backbone_channels = [
+ self.backbone.feat_channels[i] for i in backbone_indices
+ ]
+
+ self.head = ANNHead(num_classes, backbone_indices, backbone_channels,
+ key_value_channels, inter_channels, psp_size)
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ feat_list = self.backbone(x)
+ logit_list = self.head(feat_list)
+ return [
+ F.interpolate(
+ logit,
+ paddle.shape(x)[2:],
+ mode='bilinear',
+ align_corners=self.align_corners) for logit in logit_list
+ ]
+
+
+
+class ANNHead(nn.Layer):
+ """
+ The ANNHead implementation.
+
+ It mainly consists of AFNB and APNB modules.
+
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple): Two values in the tuple indicate the indices of output of backbone.
+ The first index will be taken as low-level features; the second one will be
+ taken as high-level features in AFNB module. Usually backbone consists of four
+ downsampling stage, such as ResNet, and return an output of each stage. If it is (2, 3),
+ it means taking feature map of the third stage and the fourth stage in backbone.
+ backbone_channels (tuple): The same length with "backbone_indices". It indicates the channels of corresponding index.
+ key_value_channels (int): The key and value channels of self-attention map in both AFNB and APNB modules.
+ inter_channels (int): Both input and output channels of APNB modules.
+ psp_size (tuple): The out size of pooled feature maps.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: False
+ """
+
+ def __init__(self,
+ num_classes: int,
+ backbone_indices: Tuple[int],
+ backbone_channels: Tuple[int],
+ key_value_channels: int,
+ inter_channels: int,
+ psp_size: Tuple[int],
+ enable_auxiliary_loss: bool = False):
+ super().__init__()
+
+ low_in_channels = backbone_channels[0]
+ high_in_channels = backbone_channels[1]
+
+ self.fusion = AFNB(
+ low_in_channels=low_in_channels,
+ high_in_channels=high_in_channels,
+ out_channels=high_in_channels,
+ key_channels=key_value_channels,
+ value_channels=key_value_channels,
+ dropout_prob=0.05,
+ repeat_sizes=([1]),
+ psp_size=psp_size)
+
+ self.context = nn.Sequential(
+ layers.ConvBNReLU(
+ in_channels=high_in_channels,
+ out_channels=inter_channels,
+ kernel_size=3,
+ padding=1),
+ APNB(
+ in_channels=inter_channels,
+ out_channels=inter_channels,
+ key_channels=key_value_channels,
+ value_channels=key_value_channels,
+ dropout_prob=0.05,
+ repeat_sizes=([1]),
+ psp_size=psp_size))
+
+ self.cls = nn.Conv2D(
+ in_channels=inter_channels, out_channels=num_classes, kernel_size=1)
+ self.auxlayer = layers.AuxLayer(
+ in_channels=low_in_channels,
+ inter_channels=low_in_channels // 2,
+ out_channels=num_classes,
+ dropout_prob=0.05)
+
+ self.backbone_indices = backbone_indices
+ self.enable_auxiliary_loss = enable_auxiliary_loss
+
+ def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]:
+ logit_list = []
+ low_level_x = feat_list[self.backbone_indices[0]]
+ high_level_x = feat_list[self.backbone_indices[1]]
+ x = self.fusion(low_level_x, high_level_x)
+ x = self.context(x)
+ logit = self.cls(x)
+ logit_list.append(logit)
+
+ if self.enable_auxiliary_loss:
+ auxiliary_logit = self.auxlayer(low_level_x)
+ logit_list.append(auxiliary_logit)
+
+ return logit_list
+
+
+class AFNB(nn.Layer):
+ """
+ Asymmetric Fusion Non-local Block.
+
+ Args:
+ low_in_channels (int): Low-level-feature channels.
+ high_in_channels (int): High-level-feature channels.
+ out_channels (int): Out channels of AFNB module.
+ key_channels (int): The key channels in self-attention block.
+ value_channels (int): The value channels in self-attention block.
+ dropout_prob (float): The dropout rate of output.
+ repeat_sizes (tuple, optional): The number of AFNB modules. Default: ([1]).
+ psp_size (tuple. optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+ """
+
+ def __init__(self,
+ low_in_channels: int,
+ high_in_channels: int,
+ out_channels: int,
+ key_channels: int,
+ value_channels: int,
+ dropout_prob: float,
+ repeat_sizes: Tuple[int] = ([1]),
+ psp_size: Tuple[int] = (1, 3, 6, 8)):
+ super().__init__()
+
+ self.psp_size = psp_size
+ self.stages = nn.LayerList([
+ SelfAttentionBlock_AFNB(low_in_channels, high_in_channels,
+ key_channels, value_channels, out_channels,
+ size) for size in repeat_sizes
+ ])
+ self.conv_bn = layers.ConvBN(
+ in_channels=out_channels + high_in_channels,
+ out_channels=out_channels,
+ kernel_size=1)
+ self.dropout = nn.Dropout(p=dropout_prob)
+
+ def forward(self, low_feats: List[paddle.Tensor], high_feats: List[paddle.Tensor]) -> paddle.Tensor:
+ priors = [stage(low_feats, high_feats) for stage in self.stages]
+ context = priors[0]
+ for i in range(1, len(priors)):
+ context += priors[i]
+
+ output = self.conv_bn(paddle.concat([context, high_feats], axis=1))
+ output = self.dropout(output)
+
+ return output
+
+
+class APNB(nn.Layer):
+ """
+ Asymmetric Pyramid Non-local Block.
+
+ Args:
+ in_channels (int): The input channels of APNB module.
+ out_channels (int): Out channels of APNB module.
+ key_channels (int): The key channels in self-attention block.
+ value_channels (int): The value channels in self-attention block.
+ dropout_prob (float): The dropout rate of output.
+ repeat_sizes (tuple, optional): The number of AFNB modules. Default: ([1]).
+ psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+ """
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ key_channels: int,
+ value_channels: int,
+ dropout_prob: float,
+ repeat_sizes: Tuple[int] = ([1]),
+ psp_size: Tuple[int] = (1, 3, 6, 8)):
+ super().__init__()
+
+ self.psp_size = psp_size
+ self.stages = nn.LayerList([
+ SelfAttentionBlock_APNB(in_channels, out_channels, key_channels,
+ value_channels, size)
+ for size in repeat_sizes
+ ])
+ self.conv_bn = layers.ConvBNReLU(
+ in_channels=in_channels * 2,
+ out_channels=out_channels,
+ kernel_size=1)
+ self.dropout = nn.Dropout(p=dropout_prob)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ priors = [stage(x) for stage in self.stages]
+ context = priors[0]
+ for i in range(1, len(priors)):
+ context += priors[i]
+
+ output = self.conv_bn(paddle.concat([context, x], axis=1))
+ output = self.dropout(output)
+
+ return output
+
+
+def _pp_module(x: paddle.Tensor, psp_size: List[int]) -> paddle.Tensor:
+ n, c, h, w = x.shape
+ priors = []
+ for size in psp_size:
+ feat = F.adaptive_avg_pool2d(x, size)
+ feat = paddle.reshape(feat, shape=(0, c, -1))
+ priors.append(feat)
+ center = paddle.concat(priors, axis=-1)
+ return center
+
+
+class SelfAttentionBlock_AFNB(nn.Layer):
+ """
+ Self-Attention Block for AFNB module.
+
+ Args:
+ low_in_channels (int): Low-level-feature channels.
+ high_in_channels (int): High-level-feature channels.
+ key_channels (int): The key channels in self-attention block.
+ value_channels (int): The value channels in self-attention block.
+ out_channels (int, optional): Out channels of AFNB module. Default: None.
+ scale (int, optional): Pooling size. Default: 1.
+ psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+ """
+
+ def __init__(self,
+ low_in_channels: int,
+ high_in_channels: int,
+ key_channels: int,
+ value_channels: int,
+ out_channels: int = None,
+ scale: int = 1,
+ psp_size: Tuple[int] = (1, 3, 6, 8)):
+ super().__init__()
+
+ self.scale = scale
+ self.in_channels = low_in_channels
+ self.out_channels = out_channels
+ self.key_channels = key_channels
+ self.value_channels = value_channels
+ if out_channels == None:
+ self.out_channels = high_in_channels
+ self.pool = nn.MaxPool2D(scale)
+ self.f_key = layers.ConvBNReLU(
+ in_channels=low_in_channels,
+ out_channels=key_channels,
+ kernel_size=1)
+ self.f_query = layers.ConvBNReLU(
+ in_channels=high_in_channels,
+ out_channels=key_channels,
+ kernel_size=1)
+ self.f_value = nn.Conv2D(
+ in_channels=low_in_channels,
+ out_channels=value_channels,
+ kernel_size=1)
+
+ self.W = nn.Conv2D(
+ in_channels=value_channels,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.psp_size = psp_size
+
+ def forward(self, low_feats: List[paddle.Tensor], high_feats: List[paddle.Tensor]) -> paddle.Tensor:
+ batch_size, _, h, w = high_feats.shape
+
+ value = self.f_value(low_feats)
+ value = _pp_module(value, self.psp_size)
+ value = paddle.transpose(value, (0, 2, 1))
+
+ query = self.f_query(high_feats)
+ query = paddle.reshape(query, shape=(0, self.key_channels, -1))
+ query = paddle.transpose(query, perm=(0, 2, 1))
+
+ key = self.f_key(low_feats)
+ key = _pp_module(key, self.psp_size)
+
+ sim_map = paddle.matmul(query, key)
+ sim_map = (self.key_channels**-.5) * sim_map
+ sim_map = F.softmax(sim_map, axis=-1)
+
+ context = paddle.matmul(sim_map, value)
+ context = paddle.transpose(context, perm=(0, 2, 1))
+ hf_shape = paddle.shape(high_feats)
+ context = paddle.reshape(
+ context, shape=[0, self.value_channels, hf_shape[2], hf_shape[3]])
+
+ context = self.W(context)
+
+ return context
+
+
+class SelfAttentionBlock_APNB(nn.Layer):
+ """
+ Self-Attention Block for APNB module.
+
+ Args:
+ in_channels (int): The input channels of APNB module.
+ out_channels (int): The out channels of APNB module.
+ key_channels (int): The key channels in self-attention block.
+ value_channels (int): The value channels in self-attention block.
+ scale (int, optional): Pooling size. Default: 1.
+ psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8).
+ """
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ key_channels: int,
+ value_channels: int,
+ scale: int = 1,
+ psp_size: Tuple[int] = (1, 3, 6, 8)):
+ super().__init__()
+
+ self.scale = scale
+ self.in_channels = in_channels
+ self.out_channels = out_channels
+ self.key_channels = key_channels
+ self.value_channels = value_channels
+ self.pool = nn.MaxPool2D(scale)
+ self.f_key = layers.ConvBNReLU(
+ in_channels=self.in_channels,
+ out_channels=self.key_channels,
+ kernel_size=1)
+ self.f_query = self.f_key
+ self.f_value = nn.Conv2D(
+ in_channels=self.in_channels,
+ out_channels=self.value_channels,
+ kernel_size=1)
+ self.W = nn.Conv2D(
+ in_channels=self.value_channels,
+ out_channels=self.out_channels,
+ kernel_size=1)
+
+ self.psp_size = psp_size
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ batch_size, _, h, w = x.shape
+ if self.scale > 1:
+ x = self.pool(x)
+
+ value = self.f_value(x)
+ value = _pp_module(value, self.psp_size)
+ value = paddle.transpose(value, perm=(0, 2, 1))
+
+ query = self.f_query(x)
+ query = paddle.reshape(query, shape=(0, self.key_channels, -1))
+ query = paddle.transpose(query, perm=(0, 2, 1))
+
+ key = self.f_key(x)
+ key = _pp_module(key, self.psp_size)
+
+ sim_map = paddle.matmul(query, key)
+ sim_map = (self.key_channels**-.5) * sim_map
+ sim_map = F.softmax(sim_map, axis=-1)
+
+ context = paddle.matmul(sim_map, value)
+ context = paddle.transpose(context, perm=(0, 2, 1))
+
+ x_shape = paddle.shape(x)
+ context = paddle.reshape(
+ context, shape=[0, self.value_channels, x_shape[2], x_shape[3]])
+ context = self.W(context)
+
+ return context
diff --git a/modules/image/semantic_segmentation/ann_resnet50_voc/resnet.py b/modules/image/semantic_segmentation/ann_resnet50_voc/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..949f180ce7b9a408f4583df714d0fc93271b8f99
--- /dev/null
+++ b/modules/image/semantic_segmentation/ann_resnet50_voc/resnet.py
@@ -0,0 +1,361 @@
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import Union, List, Tuple
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import ann_resnet50_voc.layers as layers
+
+
+class ConvBNLayer(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ data_format: str = 'NCHW'):
+ super(ConvBNLayer, self).__init__()
+ if dilation != 1 and kernel_size != 3:
+ raise RuntimeError("When the dilation isn't 1," \
+ "the kernel_size should be 3.")
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = nn.AvgPool2D(
+ kernel_size=2,
+ stride=2,
+ padding=0,
+ ceil_mode=True,
+ data_format=data_format)
+ self._conv = nn.Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 \
+ if dilation == 1 else dilation,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False,
+ data_format=data_format)
+
+ self._batch_norm = layers.SyncBatchNorm(
+ out_channels, data_format=data_format)
+ self._act_op = layers.Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ data_format: str = 'NCHW'):
+ super(BottleneckBlock, self).__init__()
+
+ self.data_format = data_format
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ data_format=data_format)
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ data_format=data_format)
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ data_format=data_format)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ data_format=data_format)
+
+ self.shortcut = shortcut
+ # NOTE: Use the wrap layer for quantization training
+ self.add = layers.Add()
+ self.relu = layers.Activation(act="relu")
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = self.add(short, conv2)
+ y = self.relu(y)
+ return y
+
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ dilation: int = 1,
+ shortcut: bool = True,
+ if_first: bool = False,
+ data_format: str = 'NCHW'):
+ super(BasicBlock, self).__init__()
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ dilation=dilation,
+ act='relu',
+ data_format=data_format)
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ dilation=dilation,
+ act=None,
+ data_format=data_format)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ data_format=data_format)
+
+ self.shortcut = shortcut
+ self.dilation = dilation
+ self.data_format = data_format
+ self.add = layers.Add()
+ self.relu = layers.Activation(act="relu")
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = self.add(short, conv1)
+ y = self.relu(y)
+
+ return y
+
+
+class ResNet_vd(nn.Layer):
+ """
+ The ResNet_vd implementation based on PaddlePaddle.
+
+ The original article refers to Jingdong
+ Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
+ (https://arxiv.org/pdf/1812.01187.pdf).
+
+ Args:
+ layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50.
+ output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8.
+ multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1).
+ pretrained (str, optional): The path of pretrained model.
+
+ """
+
+ def __init__(self,
+ layers: int = 50,
+ output_stride: int = 8,
+ multi_grid: Tuple[int]=(1, 1, 1),
+ pretrained: str = None,
+ data_format: str = 'NCHW'):
+ super(ResNet_vd, self).__init__()
+
+ self.data_format = data_format
+ self.conv1_logit = None # for gscnn shape stream
+ self.layers = layers
+ supported_layers = [18, 34, 50, 101, 152, 200]
+ assert layers in supported_layers, \
+ "supported layers are {} but input layer is {}".format(
+ supported_layers, layers)
+
+ if layers == 18:
+ depth = [2, 2, 2, 2]
+ elif layers == 34 or layers == 50:
+ depth = [3, 4, 6, 3]
+ elif layers == 101:
+ depth = [3, 4, 23, 3]
+ elif layers == 152:
+ depth = [3, 8, 36, 3]
+ elif layers == 200:
+ depth = [3, 12, 48, 3]
+ num_channels = [64, 256, 512, 1024
+ ] if layers >= 50 else [64, 64, 128, 256]
+ num_filters = [64, 128, 256, 512]
+
+ # for channels of four returned stages
+ self.feat_channels = [c * 4 for c in num_filters
+ ] if layers >= 50 else num_filters
+
+ dilation_dict = None
+ if output_stride == 8:
+ dilation_dict = {2: 2, 3: 4}
+ elif output_stride == 16:
+ dilation_dict = {3: 2}
+
+ self.conv1_1 = ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ data_format=data_format)
+ self.conv1_2 = ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ data_format=data_format)
+ self.conv1_3 = ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ data_format=data_format)
+ self.pool2d_max = nn.MaxPool2D(
+ kernel_size=3, stride=2, padding=1, data_format=data_format)
+
+ # self.block_list = []
+ self.stage_list = []
+ if layers >= 50:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ if layers in [101, 152] and block == 2:
+ if i == 0:
+ conv_name = "res" + str(block + 2) + "a"
+ else:
+ conv_name = "res" + str(block + 2) + "b" + str(i)
+ else:
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+
+ ###############################################################################
+ # Add dilation rate for some segmentation tasks, if dilation_dict is not None.
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+
+ # Actually block here is 'stage', and i is 'block' in 'stage'
+ # At the stage 4, expand the the dilation_rate if given multi_grid
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ ###############################################################################
+
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ dilation=dilation_rate,
+ data_format=data_format))
+
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+ else:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ dilation_rate = dilation_dict[block] \
+ if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+
+ basic_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BasicBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block],
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0 \
+ and dilation_rate == 1 else 1,
+ dilation=dilation_rate,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ data_format=data_format))
+ block_list.append(basic_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ self.pretrained = pretrained
+
+ def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ self.conv1_logit = y.clone()
+ y = self.pool2d_max(y)
+
+ # A feature list saves the output feature map of each stage.
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+
+ return feat_list
+
+
+def ResNet50_vd(**args):
+ model = ResNet_vd(layers=50, **args)
+ return model
diff --git a/modules/image/semantic_segmentation/danet_resnet50_cityscapes/README.md b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..a88ce5a828e997bd5954d7d254a49b134b9859c3
--- /dev/null
+++ b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/README.md
@@ -0,0 +1,182 @@
+# danet_resnet50_cityscapes
+
+|模型名称|danet_resnet50_cityscapes|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|danet_resnet50vd|
+|数据集|Cityscapes|
+|是否支持Fine-tuning|是|
+|模型大小|272MB|
+|指标|-|
+|最新更新日期|2022-03-21|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[ann](https://arxiv.org/pdf/1908.07678.pdf)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install danet_resnet50_cityscapes
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='danet_resnet50_cityscapes')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用danet_resnet50_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='danet_resnet50_cityscapes', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='danet_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m danet_resnet50_cityscapes
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/semantic_segmentation/danet_resnet50_cityscapes/README_en.md b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..9794b0f3aca16cda1c22a838caebb8b8a010b69b
--- /dev/null
+++ b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/README_en.md
@@ -0,0 +1,182 @@
+# danet_resnet50_cityscapes
+
+|Module Name|danet_resnet50_cityscapes|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|danet_resnet50vd|
+|Dataset|Cityscapes|
+|Fine-tuning supported or not|Yes|
+|Module Size|272MB|
+|Data indicators|-|
+|Latest update date|2022-03-21|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install danet_resnet50_cityscapes
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='danet_resnet50_cityscapes')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the danet_resnet50_cityscapes model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='danet_resnet50_cityscapes', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='danet_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m danet_resnet50_cityscapes
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/danet_resnet50_cityscapes"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/semantic_segmentation/danet_resnet50_cityscapes/layers.py b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..b6d7c005ef6d498c70536c7e8db049d64ea3223f
--- /dev/null
+++ b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/layers.py
@@ -0,0 +1,349 @@
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(
+ self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ name: str = None):
+ super(ConvBNLayer, self).__init__()
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = AvgPool2D(
+ kernel_size=2, stride=2, padding=0, ceil_mode=True)
+ self._conv = Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False)
+
+ self._batch_norm = SyncBatchNorm(out_channels)
+ self._act_op = Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ """Residual bottleneck block"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ name: str = None):
+ super(BottleneckBlock, self).__init__()
+
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ name=name + "_branch2a")
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ name=name + "_branch2b")
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ name=name + "_branch2c")
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ if self.dilation > 1:
+ padding = self.dilation
+ y = F.pad(y, [padding, padding, padding, padding])
+
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = paddle.add(x=short, y=conv2)
+ y = F.relu(y)
+ return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+
+ Returns:
+ A callable object of Activation.
+
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+
+ Examples:
+
+ from paddleseg.models.common.activation import Activation
+
+ relu = Activation("relu")
+ print(relu)
+ #
+
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool= False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
+
+
+
+
diff --git a/modules/image/semantic_segmentation/danet_resnet50_cityscapes/module.py b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..9bb6e562631292793de2ab1a5a933d207f9f95cf
--- /dev/null
+++ b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/module.py
@@ -0,0 +1,239 @@
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+from danet_resnet50_voc.resnet import ResNet50_vd
+import danet_resnet50_voc.layers as L
+
+
+@moduleinfo(
+ name="danet_resnet50_cityscapes",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="DANetResnet50 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class DANet(nn.Layer):
+ """
+ The DANet implementation based on PaddlePaddle.
+
+ The original article refers to
+ Fu, jun, et al. "Dual Attention Network for Scene Segmentation"
+ (https://arxiv.org/pdf/1809.02983.pdf)
+
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone (Paddle.nn.Layer): A backbone network.
+ backbone_indices (tuple): The values in the tuple indicate the indices of
+ output of backbone.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 19,
+ backbone_indices: Tuple[int] = (2, 3),
+ align_corners: bool = False,
+ pretrained: str = None):
+ super(DANet, self).__init__()
+
+ self.backbone = ResNet50_vd()
+ self.backbone_indices = backbone_indices
+ in_channels = [self.backbone.feat_channels[i] for i in backbone_indices]
+
+ self.head = DAHead(num_classes=num_classes, in_channels=in_channels)
+
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ feats = self.backbone(x)
+ feats = [feats[i] for i in self.backbone_indices]
+ logit_list = self.head(feats)
+ if not self.training:
+ logit_list = [logit_list[0]]
+
+ logit_list = [
+ F.interpolate(
+ logit,
+ paddle.shape(x)[2:],
+ mode='bilinear',
+ align_corners=self.align_corners,
+ align_mode=1) for logit in logit_list
+ ]
+ return logit_list
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+
+class DAHead(nn.Layer):
+ """
+ The Dual attention head.
+
+ Args:
+ num_classes (int): The unique number of target classes.
+ in_channels (tuple): The number of input channels.
+ """
+
+ def __init__(self, num_classes: int, in_channels: int):
+ super().__init__()
+ in_channels = in_channels[-1]
+ inter_channels = in_channels // 4
+
+ self.channel_conv = L.ConvBNReLU(in_channels, inter_channels, 3)
+ self.position_conv = L.ConvBNReLU(in_channels, inter_channels, 3)
+ self.pam = PAM(inter_channels)
+ self.cam = CAM(inter_channels)
+ self.conv1 = L.ConvBNReLU(inter_channels, inter_channels, 3)
+ self.conv2 = L.ConvBNReLU(inter_channels, inter_channels, 3)
+
+ self.aux_head = nn.Sequential(
+ nn.Dropout2D(0.1), nn.Conv2D(in_channels, num_classes, 1))
+
+ self.aux_head_pam = nn.Sequential(
+ nn.Dropout2D(0.1), nn.Conv2D(inter_channels, num_classes, 1))
+
+ self.aux_head_cam = nn.Sequential(
+ nn.Dropout2D(0.1), nn.Conv2D(inter_channels, num_classes, 1))
+
+ self.cls_head = nn.Sequential(
+ nn.Dropout2D(0.1), nn.Conv2D(inter_channels, num_classes, 1))
+
+ def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]:
+ feats = feat_list[-1]
+ channel_feats = self.channel_conv(feats)
+ channel_feats = self.cam(channel_feats)
+ channel_feats = self.conv1(channel_feats)
+
+ position_feats = self.position_conv(feats)
+ position_feats = self.pam(position_feats)
+ position_feats = self.conv2(position_feats)
+
+ feats_sum = position_feats + channel_feats
+ logit = self.cls_head(feats_sum)
+
+ if not self.training:
+ return [logit]
+
+ cam_logit = self.aux_head_cam(channel_feats)
+ pam_logit = self.aux_head_cam(position_feats)
+ aux_logit = self.aux_head(feats)
+ return [logit, cam_logit, pam_logit, aux_logit]
+
+
+class PAM(nn.Layer):
+ """Position attention module."""
+
+ def __init__(self, in_channels: int):
+ super().__init__()
+ mid_channels = in_channels // 8
+ self.mid_channels = mid_channels
+ self.in_channels = in_channels
+
+ self.query_conv = nn.Conv2D(in_channels, mid_channels, 1, 1)
+ self.key_conv = nn.Conv2D(in_channels, mid_channels, 1, 1)
+ self.value_conv = nn.Conv2D(in_channels, in_channels, 1, 1)
+
+ self.gamma = self.create_parameter(
+ shape=[1],
+ dtype='float32',
+ default_initializer=nn.initializer.Constant(0))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x_shape = paddle.shape(x)
+
+ # query: n, h * w, c1
+ query = self.query_conv(x)
+ query = paddle.reshape(query, (0, self.mid_channels, -1))
+ query = paddle.transpose(query, (0, 2, 1))
+
+ # key: n, c1, h * w
+ key = self.key_conv(x)
+ key = paddle.reshape(key, (0, self.mid_channels, -1))
+
+ # sim: n, h * w, h * w
+ sim = paddle.bmm(query, key)
+ sim = F.softmax(sim, axis=-1)
+
+ value = self.value_conv(x)
+ value = paddle.reshape(value, (0, self.in_channels, -1))
+ sim = paddle.transpose(sim, (0, 2, 1))
+
+ # feat: from (n, c2, h * w) -> (n, c2, h, w)
+ feat = paddle.bmm(value, sim)
+ feat = paddle.reshape(feat,
+ (0, self.in_channels, x_shape[2], x_shape[3]))
+
+ out = self.gamma * feat + x
+ return out
+
+
+class CAM(nn.Layer):
+ """Channel attention module."""
+
+ def __init__(self, channels: int):
+ super().__init__()
+
+ self.channels = channels
+ self.gamma = self.create_parameter(
+ shape=[1],
+ dtype='float32',
+ default_initializer=nn.initializer.Constant(0))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x_shape = paddle.shape(x)
+ # query: n, c, h * w
+ query = paddle.reshape(x, (0, self.channels, -1))
+ # key: n, h * w, c
+ key = paddle.reshape(x, (0, self.channels, -1))
+ key = paddle.transpose(key, (0, 2, 1))
+
+ # sim: n, c, c
+ sim = paddle.bmm(query, key)
+ # The danet author claims that this can avoid gradient divergence
+ sim = paddle.max(
+ sim, axis=-1, keepdim=True).tile([1, 1, self.channels]) - sim
+ sim = F.softmax(sim, axis=-1)
+
+ # feat: from (n, c, h * w) to (n, c, h, w)
+ value = paddle.reshape(x, (0, self.channels, -1))
+ feat = paddle.bmm(sim, value)
+ feat = paddle.reshape(feat, (0, self.channels, x_shape[2], x_shape[3]))
+
+ out = self.gamma * feat + x
+ return out
diff --git a/modules/image/semantic_segmentation/danet_resnet50_cityscapes/resnet.py b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..12102a3fed8e810046e2d40a8796f93175d459fb
--- /dev/null
+++ b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/resnet.py
@@ -0,0 +1,359 @@
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import Union, List, Tuple
+
+import paddle.nn as nn
+import ann_resnet50_voc.layers as layers
+
+
+class ConvBNLayer(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ data_format: str = 'NCHW'):
+ super(ConvBNLayer, self).__init__()
+ if dilation != 1 and kernel_size != 3:
+ raise RuntimeError("When the dilation isn't 1," \
+ "the kernel_size should be 3.")
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = nn.AvgPool2D(
+ kernel_size=2,
+ stride=2,
+ padding=0,
+ ceil_mode=True,
+ data_format=data_format)
+ self._conv = nn.Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 \
+ if dilation == 1 else dilation,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False,
+ data_format=data_format)
+
+ self._batch_norm = layers.SyncBatchNorm(
+ out_channels, data_format=data_format)
+ self._act_op = layers.Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ data_format: str = 'NCHW'):
+ super(BottleneckBlock, self).__init__()
+
+ self.data_format = data_format
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ data_format=data_format)
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ data_format=data_format)
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ data_format=data_format)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ data_format=data_format)
+
+ self.shortcut = shortcut
+ # NOTE: Use the wrap layer for quantization training
+ self.add = layers.Add()
+ self.relu = layers.Activation(act="relu")
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = self.add(short, conv2)
+ y = self.relu(y)
+ return y
+
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ dilation: int = 1,
+ shortcut: bool = True,
+ if_first: bool = False,
+ data_format: str = 'NCHW'):
+ super(BasicBlock, self).__init__()
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ dilation=dilation,
+ act='relu',
+ data_format=data_format)
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ dilation=dilation,
+ act=None,
+ data_format=data_format)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ data_format=data_format)
+
+ self.shortcut = shortcut
+ self.dilation = dilation
+ self.data_format = data_format
+ self.add = layers.Add()
+ self.relu = layers.Activation(act="relu")
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = self.add(short, conv1)
+ y = self.relu(y)
+
+ return y
+
+
+class ResNet_vd(nn.Layer):
+ """
+ The ResNet_vd implementation based on PaddlePaddle.
+
+ The original article refers to Jingdong
+ Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
+ (https://arxiv.org/pdf/1812.01187.pdf).
+
+ Args:
+ layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50.
+ output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8.
+ multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1).
+ pretrained (str, optional): The path of pretrained model.
+
+ """
+
+ def __init__(self,
+ layers: int = 50,
+ output_stride: int = 8,
+ multi_grid: Tuple[int] = (1, 1, 1),
+ pretrained: str = None,
+ data_format: str = 'NCHW'):
+ super(ResNet_vd, self).__init__()
+
+ self.data_format = data_format
+ self.conv1_logit = None # for gscnn shape stream
+ self.layers = layers
+ supported_layers = [18, 34, 50, 101, 152, 200]
+ assert layers in supported_layers, \
+ "supported layers are {} but input layer is {}".format(
+ supported_layers, layers)
+
+ if layers == 18:
+ depth = [2, 2, 2, 2]
+ elif layers == 34 or layers == 50:
+ depth = [3, 4, 6, 3]
+ elif layers == 101:
+ depth = [3, 4, 23, 3]
+ elif layers == 152:
+ depth = [3, 8, 36, 3]
+ elif layers == 200:
+ depth = [3, 12, 48, 3]
+ num_channels = [64, 256, 512, 1024
+ ] if layers >= 50 else [64, 64, 128, 256]
+ num_filters = [64, 128, 256, 512]
+
+ # for channels of four returned stages
+ self.feat_channels = [c * 4 for c in num_filters
+ ] if layers >= 50 else num_filters
+
+ dilation_dict = None
+ if output_stride == 8:
+ dilation_dict = {2: 2, 3: 4}
+ elif output_stride == 16:
+ dilation_dict = {3: 2}
+
+ self.conv1_1 = ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ data_format=data_format)
+ self.conv1_2 = ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ data_format=data_format)
+ self.conv1_3 = ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ data_format=data_format)
+ self.pool2d_max = nn.MaxPool2D(
+ kernel_size=3, stride=2, padding=1, data_format=data_format)
+
+ # self.block_list = []
+ self.stage_list = []
+ if layers >= 50:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ if layers in [101, 152] and block == 2:
+ if i == 0:
+ conv_name = "res" + str(block + 2) + "a"
+ else:
+ conv_name = "res" + str(block + 2) + "b" + str(i)
+ else:
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+
+ ###############################################################################
+ # Add dilation rate for some segmentation tasks, if dilation_dict is not None.
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+
+ # Actually block here is 'stage', and i is 'block' in 'stage'
+ # At the stage 4, expand the the dilation_rate if given multi_grid
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ ###############################################################################
+
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ dilation=dilation_rate,
+ data_format=data_format))
+
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+ else:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ dilation_rate = dilation_dict[block] \
+ if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+
+ basic_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BasicBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block],
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0 \
+ and dilation_rate == 1 else 1,
+ dilation=dilation_rate,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ data_format=data_format))
+ block_list.append(basic_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ self.pretrained = pretrained
+
+ def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ self.conv1_logit = y.clone()
+ y = self.pool2d_max(y)
+
+ # A feature list saves the output feature map of each stage.
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+
+ return feat_list
+
+
+def ResNet50_vd(**args):
+ model = ResNet_vd(layers=50, **args)
+ return model
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/danet_resnet50_voc/README.md b/modules/image/semantic_segmentation/danet_resnet50_voc/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..8ee72c8c86867420a0b739779f66eb12251131ef
--- /dev/null
+++ b/modules/image/semantic_segmentation/danet_resnet50_voc/README.md
@@ -0,0 +1,182 @@
+# danet_resnet50_voc
+
+|模型名称|danet_resnet50_voc|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|danet_resnet50vd|
+|数据集|PascalVOC2012|
+|是否支持Fine-tuning|是|
+|模型大小|273MB|
+|指标|-|
+|最新更新日期|2022-03-21|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[danet](https://arxiv.org/pdf/1809.02983.pdf)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install danet_resnet50_voc
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='danet_resnet50_voc')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用danet_resnet50_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='danet_resnet50_voc', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='danet_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m danet_resnet50_voc
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/semantic_segmentation/danet_resnet50_voc/README_en.md b/modules/image/semantic_segmentation/danet_resnet50_voc/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..6fecdfc23c39fff1dd1d312ccd3b9fe6315f9618
--- /dev/null
+++ b/modules/image/semantic_segmentation/danet_resnet50_voc/README_en.md
@@ -0,0 +1,181 @@
+# danet_resnet50_voc
+
+|Module Name|danet_resnet50_voc|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|danet_resnet50vd|
+|Dataset|PascalVOC2012|
+|Fine-tuning supported or not|Yes|
+|Module Size|273MB|
+|Data indicators|-|
+|Latest update date|2022-03-22|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [danet](https://arxiv.org/pdf/1809.02983.pdf)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install danet_resnet50_voc
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='danet_resnet50_voc')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the danet_resnet50_voc model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='danet_resnet50_voc', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='danet_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m danet_resnet50_voc
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/danet_resnet50_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/semantic_segmentation/danet_resnet50_voc/layers.py b/modules/image/semantic_segmentation/danet_resnet50_voc/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..96b307dc8750c5422837ebfd0382c198d7d49ff1
--- /dev/null
+++ b/modules/image/semantic_segmentation/danet_resnet50_voc/layers.py
@@ -0,0 +1,349 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(
+ self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ name: str = None):
+ super(ConvBNLayer, self).__init__()
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = AvgPool2D(
+ kernel_size=2, stride=2, padding=0, ceil_mode=True)
+ self._conv = Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False)
+
+ self._batch_norm = SyncBatchNorm(out_channels)
+ self._act_op = Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ """Residual bottleneck block"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ name: str = None):
+ super(BottleneckBlock, self).__init__()
+
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ name=name + "_branch2a")
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ name=name + "_branch2b")
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ name=name + "_branch2c")
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ if self.dilation > 1:
+ padding = self.dilation
+ y = F.pad(y, [padding, padding, padding, padding])
+
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = paddle.add(x=short, y=conv2)
+ y = F.relu(y)
+ return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+
+ Returns:
+ A callable object of Activation.
+
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+
+ Examples:
+
+ from paddleseg.models.common.activation import Activation
+
+ relu = Activation("relu")
+ print(relu)
+ #
+
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool= False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
+
+
+
+
diff --git a/modules/image/semantic_segmentation/danet_resnet50_voc/module.py b/modules/image/semantic_segmentation/danet_resnet50_voc/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..2dd4c60b9f787d0462d1e7c53b0c50e425494872
--- /dev/null
+++ b/modules/image/semantic_segmentation/danet_resnet50_voc/module.py
@@ -0,0 +1,245 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+from danet_resnet50_voc.resnet import ResNet50_vd
+import danet_resnet50_voc.layers as L
+
+
+@moduleinfo(
+ name="danet_resnet50_voc",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="DeepLabV3PResnet50 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class DANet(nn.Layer):
+ """
+ The DANet implementation based on PaddlePaddle.
+
+ The original article refers to
+ Fu, jun, et al. "Dual Attention Network for Scene Segmentation"
+ (https://arxiv.org/pdf/1809.02983.pdf)
+
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone (Paddle.nn.Layer): A backbone network.
+ backbone_indices (tuple): The values in the tuple indicate the indices of
+ output of backbone.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 21,
+ backbone_indices: Tuple[int] = (2, 3),
+ align_corners: bool = False,
+ pretrained: str = None):
+ super(DANet, self).__init__()
+
+ self.backbone = ResNet50_vd()
+ self.backbone_indices = backbone_indices
+ in_channels = [self.backbone.feat_channels[i] for i in backbone_indices]
+
+ self.head = DAHead(num_classes=num_classes, in_channels=in_channels)
+
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ feats = self.backbone(x)
+ feats = [feats[i] for i in self.backbone_indices]
+ logit_list = self.head(feats)
+ if not self.training:
+ logit_list = [logit_list[0]]
+
+ logit_list = [
+ F.interpolate(
+ logit,
+ paddle.shape(x)[2:],
+ mode='bilinear',
+ align_corners=self.align_corners,
+ align_mode=1) for logit in logit_list
+ ]
+ return logit_list
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+
+
+class DAHead(nn.Layer):
+ """
+ The Dual attention head.
+
+ Args:
+ num_classes (int): The unique number of target classes.
+ in_channels (tuple): The number of input channels.
+ """
+
+ def __init__(self, num_classes: int, in_channels: int):
+ super().__init__()
+ in_channels = in_channels[-1]
+ inter_channels = in_channels // 4
+
+ self.channel_conv = L.ConvBNReLU(in_channels, inter_channels, 3)
+ self.position_conv = L.ConvBNReLU(in_channels, inter_channels, 3)
+ self.pam = PAM(inter_channels)
+ self.cam = CAM(inter_channels)
+ self.conv1 = L.ConvBNReLU(inter_channels, inter_channels, 3)
+ self.conv2 = L.ConvBNReLU(inter_channels, inter_channels, 3)
+
+ self.aux_head = nn.Sequential(
+ nn.Dropout2D(0.1), nn.Conv2D(in_channels, num_classes, 1))
+
+ self.aux_head_pam = nn.Sequential(
+ nn.Dropout2D(0.1), nn.Conv2D(inter_channels, num_classes, 1))
+
+ self.aux_head_cam = nn.Sequential(
+ nn.Dropout2D(0.1), nn.Conv2D(inter_channels, num_classes, 1))
+
+ self.cls_head = nn.Sequential(
+ nn.Dropout2D(0.1), nn.Conv2D(inter_channels, num_classes, 1))
+
+ def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]:
+ feats = feat_list[-1]
+ channel_feats = self.channel_conv(feats)
+ channel_feats = self.cam(channel_feats)
+ channel_feats = self.conv1(channel_feats)
+
+ position_feats = self.position_conv(feats)
+ position_feats = self.pam(position_feats)
+ position_feats = self.conv2(position_feats)
+
+ feats_sum = position_feats + channel_feats
+ logit = self.cls_head(feats_sum)
+
+ if not self.training:
+ return [logit]
+
+ cam_logit = self.aux_head_cam(channel_feats)
+ pam_logit = self.aux_head_cam(position_feats)
+ aux_logit = self.aux_head(feats)
+ return [logit, cam_logit, pam_logit, aux_logit]
+
+
+class PAM(nn.Layer):
+ """Position attention module."""
+
+ def __init__(self, in_channels: int):
+ super().__init__()
+ mid_channels = in_channels // 8
+ self.mid_channels = mid_channels
+ self.in_channels = in_channels
+
+ self.query_conv = nn.Conv2D(in_channels, mid_channels, 1, 1)
+ self.key_conv = nn.Conv2D(in_channels, mid_channels, 1, 1)
+ self.value_conv = nn.Conv2D(in_channels, in_channels, 1, 1)
+
+ self.gamma = self.create_parameter(
+ shape=[1],
+ dtype='float32',
+ default_initializer=nn.initializer.Constant(0))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x_shape = paddle.shape(x)
+
+ # query: n, h * w, c1
+ query = self.query_conv(x)
+ query = paddle.reshape(query, (0, self.mid_channels, -1))
+ query = paddle.transpose(query, (0, 2, 1))
+
+ # key: n, c1, h * w
+ key = self.key_conv(x)
+ key = paddle.reshape(key, (0, self.mid_channels, -1))
+
+ # sim: n, h * w, h * w
+ sim = paddle.bmm(query, key)
+ sim = F.softmax(sim, axis=-1)
+
+ value = self.value_conv(x)
+ value = paddle.reshape(value, (0, self.in_channels, -1))
+ sim = paddle.transpose(sim, (0, 2, 1))
+
+ # feat: from (n, c2, h * w) -> (n, c2, h, w)
+ feat = paddle.bmm(value, sim)
+ feat = paddle.reshape(feat,
+ (0, self.in_channels, x_shape[2], x_shape[3]))
+
+ out = self.gamma * feat + x
+ return out
+
+
+class CAM(nn.Layer):
+ """Channel attention module."""
+
+ def __init__(self, channels: int):
+ super().__init__()
+
+ self.channels = channels
+ self.gamma = self.create_parameter(
+ shape=[1],
+ dtype='float32',
+ default_initializer=nn.initializer.Constant(0))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x_shape = paddle.shape(x)
+ # query: n, c, h * w
+ query = paddle.reshape(x, (0, self.channels, -1))
+ # key: n, h * w, c
+ key = paddle.reshape(x, (0, self.channels, -1))
+ key = paddle.transpose(key, (0, 2, 1))
+
+ # sim: n, c, c
+ sim = paddle.bmm(query, key)
+ # The danet author claims that this can avoid gradient divergence
+ sim = paddle.max(
+ sim, axis=-1, keepdim=True).tile([1, 1, self.channels]) - sim
+ sim = F.softmax(sim, axis=-1)
+
+ # feat: from (n, c, h * w) to (n, c, h, w)
+ value = paddle.reshape(x, (0, self.channels, -1))
+ feat = paddle.bmm(sim, value)
+ feat = paddle.reshape(feat, (0, self.channels, x_shape[2], x_shape[3]))
+
+ out = self.gamma * feat + x
+ return out
+
+
+
+
+
diff --git a/modules/image/semantic_segmentation/danet_resnet50_voc/resnet.py b/modules/image/semantic_segmentation/danet_resnet50_voc/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..12102a3fed8e810046e2d40a8796f93175d459fb
--- /dev/null
+++ b/modules/image/semantic_segmentation/danet_resnet50_voc/resnet.py
@@ -0,0 +1,359 @@
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import Union, List, Tuple
+
+import paddle.nn as nn
+import ann_resnet50_voc.layers as layers
+
+
+class ConvBNLayer(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ data_format: str = 'NCHW'):
+ super(ConvBNLayer, self).__init__()
+ if dilation != 1 and kernel_size != 3:
+ raise RuntimeError("When the dilation isn't 1," \
+ "the kernel_size should be 3.")
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = nn.AvgPool2D(
+ kernel_size=2,
+ stride=2,
+ padding=0,
+ ceil_mode=True,
+ data_format=data_format)
+ self._conv = nn.Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 \
+ if dilation == 1 else dilation,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False,
+ data_format=data_format)
+
+ self._batch_norm = layers.SyncBatchNorm(
+ out_channels, data_format=data_format)
+ self._act_op = layers.Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ data_format: str = 'NCHW'):
+ super(BottleneckBlock, self).__init__()
+
+ self.data_format = data_format
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ data_format=data_format)
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ data_format=data_format)
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ data_format=data_format)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ data_format=data_format)
+
+ self.shortcut = shortcut
+ # NOTE: Use the wrap layer for quantization training
+ self.add = layers.Add()
+ self.relu = layers.Activation(act="relu")
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = self.add(short, conv2)
+ y = self.relu(y)
+ return y
+
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ dilation: int = 1,
+ shortcut: bool = True,
+ if_first: bool = False,
+ data_format: str = 'NCHW'):
+ super(BasicBlock, self).__init__()
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ dilation=dilation,
+ act='relu',
+ data_format=data_format)
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ dilation=dilation,
+ act=None,
+ data_format=data_format)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ data_format=data_format)
+
+ self.shortcut = shortcut
+ self.dilation = dilation
+ self.data_format = data_format
+ self.add = layers.Add()
+ self.relu = layers.Activation(act="relu")
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = self.add(short, conv1)
+ y = self.relu(y)
+
+ return y
+
+
+class ResNet_vd(nn.Layer):
+ """
+ The ResNet_vd implementation based on PaddlePaddle.
+
+ The original article refers to Jingdong
+ Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
+ (https://arxiv.org/pdf/1812.01187.pdf).
+
+ Args:
+ layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50.
+ output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8.
+ multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1).
+ pretrained (str, optional): The path of pretrained model.
+
+ """
+
+ def __init__(self,
+ layers: int = 50,
+ output_stride: int = 8,
+ multi_grid: Tuple[int] = (1, 1, 1),
+ pretrained: str = None,
+ data_format: str = 'NCHW'):
+ super(ResNet_vd, self).__init__()
+
+ self.data_format = data_format
+ self.conv1_logit = None # for gscnn shape stream
+ self.layers = layers
+ supported_layers = [18, 34, 50, 101, 152, 200]
+ assert layers in supported_layers, \
+ "supported layers are {} but input layer is {}".format(
+ supported_layers, layers)
+
+ if layers == 18:
+ depth = [2, 2, 2, 2]
+ elif layers == 34 or layers == 50:
+ depth = [3, 4, 6, 3]
+ elif layers == 101:
+ depth = [3, 4, 23, 3]
+ elif layers == 152:
+ depth = [3, 8, 36, 3]
+ elif layers == 200:
+ depth = [3, 12, 48, 3]
+ num_channels = [64, 256, 512, 1024
+ ] if layers >= 50 else [64, 64, 128, 256]
+ num_filters = [64, 128, 256, 512]
+
+ # for channels of four returned stages
+ self.feat_channels = [c * 4 for c in num_filters
+ ] if layers >= 50 else num_filters
+
+ dilation_dict = None
+ if output_stride == 8:
+ dilation_dict = {2: 2, 3: 4}
+ elif output_stride == 16:
+ dilation_dict = {3: 2}
+
+ self.conv1_1 = ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ data_format=data_format)
+ self.conv1_2 = ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ data_format=data_format)
+ self.conv1_3 = ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ data_format=data_format)
+ self.pool2d_max = nn.MaxPool2D(
+ kernel_size=3, stride=2, padding=1, data_format=data_format)
+
+ # self.block_list = []
+ self.stage_list = []
+ if layers >= 50:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ if layers in [101, 152] and block == 2:
+ if i == 0:
+ conv_name = "res" + str(block + 2) + "a"
+ else:
+ conv_name = "res" + str(block + 2) + "b" + str(i)
+ else:
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+
+ ###############################################################################
+ # Add dilation rate for some segmentation tasks, if dilation_dict is not None.
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+
+ # Actually block here is 'stage', and i is 'block' in 'stage'
+ # At the stage 4, expand the the dilation_rate if given multi_grid
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ ###############################################################################
+
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ dilation=dilation_rate,
+ data_format=data_format))
+
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+ else:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ dilation_rate = dilation_dict[block] \
+ if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+
+ basic_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BasicBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block],
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0 \
+ and dilation_rate == 1 else 1,
+ dilation=dilation_rate,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ data_format=data_format))
+ block_list.append(basic_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ self.pretrained = pretrained
+
+ def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ self.conv1_logit = y.clone()
+ y = self.pool2d_max(y)
+
+ # A feature list saves the output feature map of each stage.
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+
+ return feat_list
+
+
+def ResNet50_vd(**args):
+ model = ResNet_vd(layers=50, **args)
+ return model
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/README.md b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..f4a52885d8e60a9c3569cf4515a20cf0a722d8c9
--- /dev/null
+++ b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/README.md
@@ -0,0 +1,182 @@
+# isanet_resnet50_cityscapes
+
+|模型名称|isanet_resnet50_cityscapes|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|isanet_resnet50vd|
+|数据集|Cityscapes|
+|是否支持Fine-tuning|是|
+|模型大小|217MB|
+|指标|-|
+|最新更新日期|2022-03-21|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[isanet](https://arxiv.org/abs/1907.12273)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install isanet_resnet50_cityscapes
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='isanet_resnet50_cityscapes')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用isanet_resnet50_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='isanet_resnet50_cityscapes', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='isanet_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m isanet_resnet50_cityscapes
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/README_en.md b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..ec784ba9f014e44bcef1a441a9a49ebdbcf2918b
--- /dev/null
+++ b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/README_en.md
@@ -0,0 +1,181 @@
+# isanet_resnet50_cityscapes
+
+|Module Name|isanet_resnet50_cityscapes|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|isanet_resnet50vd|
+|Dataset|Cityscapes|
+|Fine-tuning supported or not|Yes|
+|Module Size|217MB|
+|Data indicators|-|
+|Latest update date|2022-03-21|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [isanet](https://arxiv.org/abs/1907.12273)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install isanet_resnet50_cityscapes
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='isanet_resnet50_cityscapes')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the isanet_resnet50_cityscapes model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='isanet_resnet50_cityscapes', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='isanet_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m isanet_resnet50_cityscapes
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/isanet_resnet50_cityscapes"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/layers.py b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..3e42fb7f2ec66e0daf9123e59464258f33cafc57
--- /dev/null
+++ b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/layers.py
@@ -0,0 +1,401 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+ Returns:
+ A callable object of Activation.
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+ Examples:
+ from paddleseg.models.common.activation import Activation
+ relu = Activation("relu")
+ print(relu)
+ #
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool = False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
+
+
+class AuxLayer(nn.Layer):
+ """
+ The auxiliary layer implementation for auxiliary loss.
+
+ Args:
+ in_channels (int): The number of input channels.
+ inter_channels (int): The intermediate channels.
+ out_channels (int): The number of output channels, and usually it is num_classes.
+ dropout_prob (float, optional): The drop rate. Default: 0.1.
+ """
+
+ def __init__(self,
+ in_channels: int,
+ inter_channels: int,
+ out_channels: int,
+ dropout_prob: float = 0.1,
+ **kwargs):
+ super().__init__()
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=in_channels,
+ out_channels=inter_channels,
+ kernel_size=3,
+ padding=1,
+ **kwargs)
+
+ self.dropout = nn.Dropout(p=dropout_prob)
+
+ self.conv = nn.Conv2D(
+ in_channels=inter_channels,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+ x = self.conv(x)
+ return x
+
+
+class Add(nn.Layer):
+ def __init__(self):
+ super().__init__()
+
+ def forward(self, x: paddle.Tensor, y: paddle.Tensor, name: str = None):
+ return paddle.add(x, y, name)
+
+class AttentionBlock(nn.Layer):
+ """General self-attention block/non-local block.
+
+ The original article refers to refer to https://arxiv.org/abs/1706.03762.
+ Args:
+ key_in_channels (int): Input channels of key feature.
+ query_in_channels (int): Input channels of query feature.
+ channels (int): Output channels of key/query transform.
+ out_channels (int): Output channels.
+ share_key_query (bool): Whether share projection weight between key
+ and query projection.
+ query_downsample (nn.Module): Query downsample module.
+ key_downsample (nn.Module): Key downsample module.
+ key_query_num_convs (int): Number of convs for key/query projection.
+ value_out_num_convs (int): Number of convs for value projection.
+ key_query_norm (bool): Whether to use BN for key/query projection.
+ value_out_norm (bool): Whether to use BN for value projection.
+ matmul_norm (bool): Whether normalize attention map with sqrt of
+ channels
+ with_out (bool): Whether use out projection.
+ """
+
+ def __init__(self, key_in_channels, query_in_channels, channels,
+ out_channels, share_key_query, query_downsample,
+ key_downsample, key_query_num_convs, value_out_num_convs,
+ key_query_norm, value_out_norm, matmul_norm, with_out):
+ super(AttentionBlock, self).__init__()
+ if share_key_query:
+ assert key_in_channels == query_in_channels
+ self.with_out = with_out
+ self.key_in_channels = key_in_channels
+ self.query_in_channels = query_in_channels
+ self.out_channels = out_channels
+ self.channels = channels
+ self.share_key_query = share_key_query
+ self.key_project = self.build_project(
+ key_in_channels,
+ channels,
+ num_convs=key_query_num_convs,
+ use_conv_module=key_query_norm)
+ if share_key_query:
+ self.query_project = self.key_project
+ else:
+ self.query_project = self.build_project(
+ query_in_channels,
+ channels,
+ num_convs=key_query_num_convs,
+ use_conv_module=key_query_norm)
+
+ self.value_project = self.build_project(
+ key_in_channels,
+ channels if self.with_out else out_channels,
+ num_convs=value_out_num_convs,
+ use_conv_module=value_out_norm)
+
+ if self.with_out:
+ self.out_project = self.build_project(
+ channels,
+ out_channels,
+ num_convs=value_out_num_convs,
+ use_conv_module=value_out_norm)
+ else:
+ self.out_project = None
+
+ self.query_downsample = query_downsample
+ self.key_downsample = key_downsample
+ self.matmul_norm = matmul_norm
+
+ def build_project(self, in_channels: int , channels: int, num_convs: int, use_conv_module: bool):
+ if use_conv_module:
+ convs = [
+ ConvBNReLU(
+ in_channels=in_channels,
+ out_channels=channels,
+ kernel_size=1,
+ bias_attr=False)
+ ]
+ for _ in range(num_convs - 1):
+ convs.append(
+ ConvBNReLU(
+ in_channels=channels,
+ out_channels=channels,
+ kernel_size=1,
+ bias_attr=False))
+ else:
+ convs = [nn.Conv2D(in_channels, channels, 1)]
+ for _ in range(num_convs - 1):
+ convs.append(nn.Conv2D(channels, channels, 1))
+
+ if len(convs) > 1:
+ convs = nn.Sequential(*convs)
+ else:
+ convs = convs[0]
+ return convs
+
+ def forward(self, query_feats: paddle.Tensor, key_feats: paddle.Tensor) -> paddle.Tensor:
+ query_shape = paddle.shape(query_feats)
+ query = self.query_project(query_feats)
+ if self.query_downsample is not None:
+ query = self.query_downsample(query)
+ query = query.flatten(2).transpose([0, 2, 1])
+
+ key = self.key_project(key_feats)
+ value = self.value_project(key_feats)
+
+ if self.key_downsample is not None:
+ key = self.key_downsample(key)
+ value = self.key_downsample(value)
+
+ key = key.flatten(2)
+ value = value.flatten(2).transpose([0, 2, 1])
+ sim_map = paddle.matmul(query, key)
+ if self.matmul_norm:
+ sim_map = (self.channels**-0.5) * sim_map
+ sim_map = F.softmax(sim_map, axis=-1)
+
+ context = paddle.matmul(sim_map, value)
+ context = paddle.transpose(context, [0, 2, 1])
+
+ context = paddle.reshape(
+ context, [0, self.out_channels, query_shape[2], query_shape[3]])
+
+ if self.out_project is not None:
+ context = self.out_project(context)
+ return context
diff --git a/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/module.py b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..6b20ac094dad2e29de83a0f5ba374564509aad5c
--- /dev/null
+++ b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/module.py
@@ -0,0 +1,221 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import numpy as np
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+from isanet_resnet50_cityscapes.resnet import ResNet50_vd
+import isanet_resnet50_cityscapes.layers as layers
+
+
+@moduleinfo(
+ name="isanet_resnet50_cityscapes",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="ISANetResnet50 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class ISANet(nn.Layer):
+ """Interlaced Sparse Self-Attention for Semantic Segmentation.
+
+ The original article refers to Lang Huang, et al. "Interlaced Sparse Self-Attention for Semantic Segmentation"
+ (https://arxiv.org/abs/1907.12273).
+
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple): The values in the tuple indicate the indices of output of backbone.
+ isa_channels (int): The channels of ISA Module.
+ down_factor (tuple): Divide the height and width dimension to (Ph, PW) groups.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+
+ """
+
+ def __init__(self,
+ num_classes: int = 19,
+ backbone_indices: Tuple[int] = (2, 3),
+ isa_channels: int = 256,
+ down_factor: Tuple[int] = (8, 8),
+ enable_auxiliary_loss: bool = True,
+ align_corners: bool = False,
+ pretrained: str = None):
+ super(ISANet, self).__init__()
+
+ self.backbone = ResNet50_vd()
+ self.backbone_indices = backbone_indices
+ in_channels = [self.backbone.feat_channels[i] for i in backbone_indices]
+ self.head = ISAHead(num_classes, in_channels, isa_channels, down_factor,
+ enable_auxiliary_loss)
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ feats = self.backbone(x)
+ feats = [feats[i] for i in self.backbone_indices]
+ logit_list = self.head(feats)
+ logit_list = [
+ F.interpolate(
+ logit,
+ paddle.shape(x)[2:],
+ mode='bilinear',
+ align_corners=self.align_corners,
+ align_mode=1) for logit in logit_list
+ ]
+
+ return logit_list
+
+
+class ISAHead(nn.Layer):
+ """
+ The ISAHead.
+
+ Args:
+ num_classes (int): The unique number of target classes.
+ in_channels (tuple): The number of input channels.
+ isa_channels (int): The channels of ISA Module.
+ down_factor (tuple): Divide the height and width dimension to (Ph, PW) groups.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+ """
+
+ def __init__(self,
+ num_classes: int,
+ in_channels: int,
+ isa_channels: int,
+ down_factor: Tuple[int],
+ enable_auxiliary_loss: bool):
+ super(ISAHead, self).__init__()
+ self.in_channels = in_channels[-1]
+ inter_channels = self.in_channels // 4
+ self.inter_channels = inter_channels
+ self.down_factor = down_factor
+ self.enable_auxiliary_loss = enable_auxiliary_loss
+ self.in_conv = layers.ConvBNReLU(
+ self.in_channels, inter_channels, 3, bias_attr=False)
+ self.global_relation = SelfAttentionBlock(inter_channels, isa_channels)
+ self.local_relation = SelfAttentionBlock(inter_channels, isa_channels)
+ self.out_conv = layers.ConvBNReLU(
+ inter_channels * 2, inter_channels, 1, bias_attr=False)
+ self.cls = nn.Sequential(
+ nn.Dropout2D(p=0.1), nn.Conv2D(inter_channels, num_classes, 1))
+ self.aux = nn.Sequential(
+ layers.ConvBNReLU(
+ in_channels=1024,
+ out_channels=256,
+ kernel_size=3,
+ bias_attr=False), nn.Dropout2D(p=0.1),
+ nn.Conv2D(256, num_classes, 1))
+
+ def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]:
+ C3, C4 = feat_list
+ x = self.in_conv(C4)
+ x_shape = paddle.shape(x)
+ P_h, P_w = self.down_factor
+ Q_h, Q_w = paddle.ceil(x_shape[2] / P_h).astype('int32'), paddle.ceil(
+ x_shape[3] / P_w).astype('int32')
+ pad_h, pad_w = (Q_h * P_h - x_shape[2]).astype('int32'), (
+ Q_w * P_w - x_shape[3]).astype('int32')
+ if pad_h > 0 or pad_w > 0:
+ padding = paddle.concat([
+ pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2
+ ],
+ axis=0)
+ feat = F.pad(x, padding)
+ else:
+ feat = x
+
+ feat = feat.reshape([0, x_shape[1], Q_h, P_h, Q_w, P_w])
+ feat = feat.transpose([0, 3, 5, 1, 2,
+ 4]).reshape([-1, self.inter_channels, Q_h, Q_w])
+ feat = self.global_relation(feat)
+
+ feat = feat.reshape([x_shape[0], P_h, P_w, x_shape[1], Q_h, Q_w])
+ feat = feat.transpose([0, 4, 5, 3, 1,
+ 2]).reshape([-1, self.inter_channels, P_h, P_w])
+ feat = self.local_relation(feat)
+
+ feat = feat.reshape([x_shape[0], Q_h, Q_w, x_shape[1], P_h, P_w])
+ feat = feat.transpose([0, 3, 1, 4, 2, 5]).reshape(
+ [0, self.inter_channels, P_h * Q_h, P_w * Q_w])
+ if pad_h > 0 or pad_w > 0:
+ feat = paddle.slice(
+ feat,
+ axes=[2, 3],
+ starts=[pad_h // 2, pad_w // 2],
+ ends=[pad_h // 2 + x_shape[2], pad_w // 2 + x_shape[3]])
+
+ feat = self.out_conv(paddle.concat([feat, x], axis=1))
+ output = self.cls(feat)
+
+ if self.enable_auxiliary_loss:
+ auxout = self.aux(C3)
+ return [output, auxout]
+ else:
+ return [output]
+
+
+class SelfAttentionBlock(layers.AttentionBlock):
+ """General self-attention block/non-local block.
+
+ Args:
+ in_channels (int): Input channels of key/query feature.
+ channels (int): Output channels of key/query transform.
+ """
+
+ def __init__(self, in_channels: int, channels: int):
+ super(SelfAttentionBlock, self).__init__(
+ key_in_channels=in_channels,
+ query_in_channels=in_channels,
+ channels=channels,
+ out_channels=in_channels,
+ share_key_query=False,
+ query_downsample=None,
+ key_downsample=None,
+ key_query_num_convs=2,
+ key_query_norm=True,
+ value_out_num_convs=1,
+ value_out_norm=False,
+ matmul_norm=True,
+ with_out=False)
+
+ self.output_project = self.build_project(
+ in_channels, in_channels, num_convs=1, use_conv_module=True)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ context = super(SelfAttentionBlock, self).forward(x, x)
+ return self.output_project(context)
diff --git a/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/resnet.py b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..f7de1ee294b6def1fea6dbaf6ef38e915f50a21e
--- /dev/null
+++ b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/resnet.py
@@ -0,0 +1,359 @@
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import isanet_resnet50_cityscapes.layers as layers
+
+
+class ConvBNLayer(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ data_format: str = 'NCHW'):
+ super(ConvBNLayer, self).__init__()
+ if dilation != 1 and kernel_size != 3:
+ raise RuntimeError("When the dilation isn't 1," \
+ "the kernel_size should be 3.")
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = nn.AvgPool2D(
+ kernel_size=2,
+ stride=2,
+ padding=0,
+ ceil_mode=True,
+ data_format=data_format)
+ self._conv = nn.Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 \
+ if dilation == 1 else dilation,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False,
+ data_format=data_format)
+
+ self._batch_norm = layers.SyncBatchNorm(
+ out_channels, data_format=data_format)
+ self._act_op = layers.Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ data_format: str = 'NCHW'):
+ super(BottleneckBlock, self).__init__()
+
+ self.data_format = data_format
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ data_format=data_format)
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ data_format=data_format)
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ data_format=data_format)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ data_format=data_format)
+
+ self.shortcut = shortcut
+ # NOTE: Use the wrap layer for quantization training
+ self.add = layers.Add()
+ self.relu = layers.Activation(act="relu")
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = self.add(short, conv2)
+ y = self.relu(y)
+ return y
+
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ dilation: int = 1,
+ shortcut: bool = True,
+ if_first: bool = False,
+ data_format: str = 'NCHW'):
+ super(BasicBlock, self).__init__()
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ dilation=dilation,
+ act='relu',
+ data_format=data_format)
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ dilation=dilation,
+ act=None,
+ data_format=data_format)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ data_format=data_format)
+
+ self.shortcut = shortcut
+ self.dilation = dilation
+ self.data_format = data_format
+ self.add = layers.Add()
+ self.relu = layers.Activation(act="relu")
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = self.add(short, conv1)
+ y = self.relu(y)
+
+ return y
+
+
+class ResNet_vd(nn.Layer):
+ """
+ The ResNet_vd implementation based on PaddlePaddle.
+
+ The original article refers to Jingdong
+ Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
+ (https://arxiv.org/pdf/1812.01187.pdf).
+
+ Args:
+ layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50.
+ output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8.
+ multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1).
+ pretrained (str, optional): The path of pretrained model.
+
+ """
+
+ def __init__(self,
+ layers: int = 50,
+ output_stride: int = 8,
+ multi_grid: Tuple[int] = (1, 1, 1),
+ pretrained: str = None,
+ data_format: str = 'NCHW'):
+ super(ResNet_vd, self).__init__()
+
+ self.data_format = data_format
+ self.conv1_logit = None # for gscnn shape stream
+ self.layers = layers
+ supported_layers = [18, 34, 50, 101, 152, 200]
+ assert layers in supported_layers, \
+ "supported layers are {} but input layer is {}".format(
+ supported_layers, layers)
+
+ if layers == 18:
+ depth = [2, 2, 2, 2]
+ elif layers == 34 or layers == 50:
+ depth = [3, 4, 6, 3]
+ elif layers == 101:
+ depth = [3, 4, 23, 3]
+ elif layers == 152:
+ depth = [3, 8, 36, 3]
+ elif layers == 200:
+ depth = [3, 12, 48, 3]
+ num_channels = [64, 256, 512, 1024
+ ] if layers >= 50 else [64, 64, 128, 256]
+ num_filters = [64, 128, 256, 512]
+
+ # for channels of four returned stages
+ self.feat_channels = [c * 4 for c in num_filters
+ ] if layers >= 50 else num_filters
+
+ dilation_dict = None
+ if output_stride == 8:
+ dilation_dict = {2: 2, 3: 4}
+ elif output_stride == 16:
+ dilation_dict = {3: 2}
+
+ self.conv1_1 = ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ data_format=data_format)
+ self.conv1_2 = ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ data_format=data_format)
+ self.conv1_3 = ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ data_format=data_format)
+ self.pool2d_max = nn.MaxPool2D(
+ kernel_size=3, stride=2, padding=1, data_format=data_format)
+
+ # self.block_list = []
+ self.stage_list = []
+ if layers >= 50:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ if layers in [101, 152] and block == 2:
+ if i == 0:
+ conv_name = "res" + str(block + 2) + "a"
+ else:
+ conv_name = "res" + str(block + 2) + "b" + str(i)
+ else:
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+
+ ###############################################################################
+ # Add dilation rate for some segmentation tasks, if dilation_dict is not None.
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+
+ # Actually block here is 'stage', and i is 'block' in 'stage'
+ # At the stage 4, expand the the dilation_rate if given multi_grid
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ ###############################################################################
+
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ dilation=dilation_rate,
+ data_format=data_format))
+
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+ else:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ dilation_rate = dilation_dict[block] \
+ if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+
+ basic_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BasicBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block],
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0 \
+ and dilation_rate == 1 else 1,
+ dilation=dilation_rate,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ data_format=data_format))
+ block_list.append(basic_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ self.pretrained = pretrained
+
+ def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ self.conv1_logit = y.clone()
+ y = self.pool2d_max(y)
+
+ # A feature list saves the output feature map of each stage.
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+
+ return feat_list
+
+
+def ResNet50_vd(**args):
+ model = ResNet_vd(layers=50, **args)
+ return model
diff --git a/modules/image/semantic_segmentation/isanet_resnet50_voc/README.md b/modules/image/semantic_segmentation/isanet_resnet50_voc/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..e7e56aa3ef443e71c7da42e076e9c20244cfebec
--- /dev/null
+++ b/modules/image/semantic_segmentation/isanet_resnet50_voc/README.md
@@ -0,0 +1,182 @@
+# isanet_resnet50_voc
+
+|模型名称|isanet_resnet50_voc|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|isanet_resnet50vd|
+|数据集|PascalVOC2012|
+|是否支持Fine-tuning|是|
+|模型大小|217MB|
+|指标|-|
+|最新更新日期|2022-03-21|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[isanet](https://arxiv.org/abs/1907.12273)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install isanet_resnet50_voc
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='isanet_resnet50_voc')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用isanet_resnet50_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='isanet_resnet50_voc', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='isanet_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m isanet_resnet50_voc
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/semantic_segmentation/isanet_resnet50_voc/README_en.md b/modules/image/semantic_segmentation/isanet_resnet50_voc/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..b80886911fcd931e6898983d85aa6213f62e1b34
--- /dev/null
+++ b/modules/image/semantic_segmentation/isanet_resnet50_voc/README_en.md
@@ -0,0 +1,181 @@
+# isanet_resnet50_voc
+
+|Module Name|isanet_resnet50_voc|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|isanet_resnet50vd|
+|Dataset|PascalVOC2012|
+|Fine-tuning supported or not|Yes|
+|Module Size|217MB|
+|Data indicators|-|
+|Latest update date|2022-03-22|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [isanet](https://arxiv.org/abs/1907.12273)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install isanet_resnet50_voc
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='isanet_resnet50_voc')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the isanet_resnet50_voc model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='isanet_resnet50_voc', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='isanet_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m isanet_resnet50_voc
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/isanet_resnet50_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/semantic_segmentation/isanet_resnet50_voc/layers.py b/modules/image/semantic_segmentation/isanet_resnet50_voc/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..7f6ee5748d15f632c5b35ac84b6262d1f00a7c72
--- /dev/null
+++ b/modules/image/semantic_segmentation/isanet_resnet50_voc/layers.py
@@ -0,0 +1,401 @@
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+ Returns:
+ A callable object of Activation.
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+ Examples:
+ from paddleseg.models.common.activation import Activation
+ relu = Activation("relu")
+ print(relu)
+ #
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool = False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
+
+
+class AuxLayer(nn.Layer):
+ """
+ The auxiliary layer implementation for auxiliary loss.
+
+ Args:
+ in_channels (int): The number of input channels.
+ inter_channels (int): The intermediate channels.
+ out_channels (int): The number of output channels, and usually it is num_classes.
+ dropout_prob (float, optional): The drop rate. Default: 0.1.
+ """
+
+ def __init__(self,
+ in_channels: int,
+ inter_channels: int,
+ out_channels: int,
+ dropout_prob: float = 0.1,
+ **kwargs):
+ super().__init__()
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=in_channels,
+ out_channels=inter_channels,
+ kernel_size=3,
+ padding=1,
+ **kwargs)
+
+ self.dropout = nn.Dropout(p=dropout_prob)
+
+ self.conv = nn.Conv2D(
+ in_channels=inter_channels,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+ x = self.conv(x)
+ return x
+
+
+class Add(nn.Layer):
+ def __init__(self):
+ super().__init__()
+
+ def forward(self, x: paddle.Tensor, y: paddle.Tensor, name: str = None):
+ return paddle.add(x, y, name)
+
+class AttentionBlock(nn.Layer):
+ """General self-attention block/non-local block.
+
+ The original article refers to refer to https://arxiv.org/abs/1706.03762.
+ Args:
+ key_in_channels (int): Input channels of key feature.
+ query_in_channels (int): Input channels of query feature.
+ channels (int): Output channels of key/query transform.
+ out_channels (int): Output channels.
+ share_key_query (bool): Whether share projection weight between key
+ and query projection.
+ query_downsample (nn.Module): Query downsample module.
+ key_downsample (nn.Module): Key downsample module.
+ key_query_num_convs (int): Number of convs for key/query projection.
+ value_out_num_convs (int): Number of convs for value projection.
+ key_query_norm (bool): Whether to use BN for key/query projection.
+ value_out_norm (bool): Whether to use BN for value projection.
+ matmul_norm (bool): Whether normalize attention map with sqrt of
+ channels
+ with_out (bool): Whether use out projection.
+ """
+
+ def __init__(self, key_in_channels, query_in_channels, channels,
+ out_channels, share_key_query, query_downsample,
+ key_downsample, key_query_num_convs, value_out_num_convs,
+ key_query_norm, value_out_norm, matmul_norm, with_out):
+ super(AttentionBlock, self).__init__()
+ if share_key_query:
+ assert key_in_channels == query_in_channels
+ self.with_out = with_out
+ self.key_in_channels = key_in_channels
+ self.query_in_channels = query_in_channels
+ self.out_channels = out_channels
+ self.channels = channels
+ self.share_key_query = share_key_query
+ self.key_project = self.build_project(
+ key_in_channels,
+ channels,
+ num_convs=key_query_num_convs,
+ use_conv_module=key_query_norm)
+ if share_key_query:
+ self.query_project = self.key_project
+ else:
+ self.query_project = self.build_project(
+ query_in_channels,
+ channels,
+ num_convs=key_query_num_convs,
+ use_conv_module=key_query_norm)
+
+ self.value_project = self.build_project(
+ key_in_channels,
+ channels if self.with_out else out_channels,
+ num_convs=value_out_num_convs,
+ use_conv_module=value_out_norm)
+
+ if self.with_out:
+ self.out_project = self.build_project(
+ channels,
+ out_channels,
+ num_convs=value_out_num_convs,
+ use_conv_module=value_out_norm)
+ else:
+ self.out_project = None
+
+ self.query_downsample = query_downsample
+ self.key_downsample = key_downsample
+ self.matmul_norm = matmul_norm
+
+ def build_project(self, in_channels: int, channels: int, num_convs: int, use_conv_module: bool):
+ if use_conv_module:
+ convs = [
+ ConvBNReLU(
+ in_channels=in_channels,
+ out_channels=channels,
+ kernel_size=1,
+ bias_attr=False)
+ ]
+ for _ in range(num_convs - 1):
+ convs.append(
+ ConvBNReLU(
+ in_channels=channels,
+ out_channels=channels,
+ kernel_size=1,
+ bias_attr=False))
+ else:
+ convs = [nn.Conv2D(in_channels, channels, 1)]
+ for _ in range(num_convs - 1):
+ convs.append(nn.Conv2D(channels, channels, 1))
+
+ if len(convs) > 1:
+ convs = nn.Sequential(*convs)
+ else:
+ convs = convs[0]
+ return convs
+
+ def forward(self, query_feats: paddle.Tensor, key_feats: paddle.Tensor) -> paddle.Tensor:
+ query_shape = paddle.shape(query_feats)
+ query = self.query_project(query_feats)
+ if self.query_downsample is not None:
+ query = self.query_downsample(query)
+ query = query.flatten(2).transpose([0, 2, 1])
+
+ key = self.key_project(key_feats)
+ value = self.value_project(key_feats)
+
+ if self.key_downsample is not None:
+ key = self.key_downsample(key)
+ value = self.key_downsample(value)
+
+ key = key.flatten(2)
+ value = value.flatten(2).transpose([0, 2, 1])
+ sim_map = paddle.matmul(query, key)
+ if self.matmul_norm:
+ sim_map = (self.channels**-0.5) * sim_map
+ sim_map = F.softmax(sim_map, axis=-1)
+
+ context = paddle.matmul(sim_map, value)
+ context = paddle.transpose(context, [0, 2, 1])
+
+ context = paddle.reshape(
+ context, [0, self.out_channels, query_shape[2], query_shape[3]])
+
+ if self.out_project is not None:
+ context = self.out_project(context)
+ return context
diff --git a/modules/image/semantic_segmentation/isanet_resnet50_voc/module.py b/modules/image/semantic_segmentation/isanet_resnet50_voc/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..ed92c128629c74e23522c42b7efb0cc85425a05d
--- /dev/null
+++ b/modules/image/semantic_segmentation/isanet_resnet50_voc/module.py
@@ -0,0 +1,221 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import numpy as np
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+from isanet_resnet50_voc.resnet import ResNet50_vd
+import isanet_resnet50_voc.layers as layers
+
+
+@moduleinfo(
+ name="isanet_resnet50_voc",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="ISANetResnet50 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class ISANet(nn.Layer):
+ """Interlaced Sparse Self-Attention for Semantic Segmentation.
+
+ The original article refers to Lang Huang, et al. "Interlaced Sparse Self-Attention for Semantic Segmentation"
+ (https://arxiv.org/abs/1907.12273).
+
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple): The values in the tuple indicate the indices of output of backbone.
+ isa_channels (int): The channels of ISA Module.
+ down_factor (tuple): Divide the height and width dimension to (Ph, PW) groups.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+
+ """
+
+ def __init__(self,
+ num_classes: int = 21,
+ backbone_indices: Tuple[int] = (2, 3),
+ isa_channels: int = 256,
+ down_factor: Tuple[int] = (8, 8),
+ enable_auxiliary_loss: bool = True,
+ align_corners: bool = False,
+ pretrained: str = None):
+ super(ISANet, self).__init__()
+
+ self.backbone = ResNet50_vd()
+ self.backbone_indices = backbone_indices
+ in_channels = [self.backbone.feat_channels[i] for i in backbone_indices]
+ self.head = ISAHead(num_classes, in_channels, isa_channels, down_factor,
+ enable_auxiliary_loss)
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ feats = self.backbone(x)
+ feats = [feats[i] for i in self.backbone_indices]
+ logit_list = self.head(feats)
+ logit_list = [
+ F.interpolate(
+ logit,
+ paddle.shape(x)[2:],
+ mode='bilinear',
+ align_corners=self.align_corners,
+ align_mode=1) for logit in logit_list
+ ]
+
+ return logit_list
+
+
+class ISAHead(nn.Layer):
+ """
+ The ISAHead.
+
+ Args:
+ num_classes (int): The unique number of target classes.
+ in_channels (tuple): The number of input channels.
+ isa_channels (int): The channels of ISA Module.
+ down_factor (tuple): Divide the height and width dimension to (Ph, PW) groups.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+ """
+
+ def __init__(self,
+ num_classes: int,
+ in_channels: Tuple[int],
+ isa_channels: int,
+ down_factor: Tuple[int],
+ enable_auxiliary_loss: bool):
+ super(ISAHead, self).__init__()
+ self.in_channels = in_channels[-1]
+ inter_channels = self.in_channels // 4
+ self.inter_channels = inter_channels
+ self.down_factor = down_factor
+ self.enable_auxiliary_loss = enable_auxiliary_loss
+ self.in_conv = layers.ConvBNReLU(
+ self.in_channels, inter_channels, 3, bias_attr=False)
+ self.global_relation = SelfAttentionBlock(inter_channels, isa_channels)
+ self.local_relation = SelfAttentionBlock(inter_channels, isa_channels)
+ self.out_conv = layers.ConvBNReLU(
+ inter_channels * 2, inter_channels, 1, bias_attr=False)
+ self.cls = nn.Sequential(
+ nn.Dropout2D(p=0.1), nn.Conv2D(inter_channels, num_classes, 1))
+ self.aux = nn.Sequential(
+ layers.ConvBNReLU(
+ in_channels=1024,
+ out_channels=256,
+ kernel_size=3,
+ bias_attr=False), nn.Dropout2D(p=0.1),
+ nn.Conv2D(256, num_classes, 1))
+
+ def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]:
+ C3, C4 = feat_list
+ x = self.in_conv(C4)
+ x_shape = paddle.shape(x)
+ P_h, P_w = self.down_factor
+ Q_h, Q_w = paddle.ceil(x_shape[2] / P_h).astype('int32'), paddle.ceil(
+ x_shape[3] / P_w).astype('int32')
+ pad_h, pad_w = (Q_h * P_h - x_shape[2]).astype('int32'), (
+ Q_w * P_w - x_shape[3]).astype('int32')
+ if pad_h > 0 or pad_w > 0:
+ padding = paddle.concat([
+ pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2
+ ],
+ axis=0)
+ feat = F.pad(x, padding)
+ else:
+ feat = x
+
+ feat = feat.reshape([0, x_shape[1], Q_h, P_h, Q_w, P_w])
+ feat = feat.transpose([0, 3, 5, 1, 2,
+ 4]).reshape([-1, self.inter_channels, Q_h, Q_w])
+ feat = self.global_relation(feat)
+
+ feat = feat.reshape([x_shape[0], P_h, P_w, x_shape[1], Q_h, Q_w])
+ feat = feat.transpose([0, 4, 5, 3, 1,
+ 2]).reshape([-1, self.inter_channels, P_h, P_w])
+ feat = self.local_relation(feat)
+
+ feat = feat.reshape([x_shape[0], Q_h, Q_w, x_shape[1], P_h, P_w])
+ feat = feat.transpose([0, 3, 1, 4, 2, 5]).reshape(
+ [0, self.inter_channels, P_h * Q_h, P_w * Q_w])
+ if pad_h > 0 or pad_w > 0:
+ feat = paddle.slice(
+ feat,
+ axes=[2, 3],
+ starts=[pad_h // 2, pad_w // 2],
+ ends=[pad_h // 2 + x_shape[2], pad_w // 2 + x_shape[3]])
+
+ feat = self.out_conv(paddle.concat([feat, x], axis=1))
+ output = self.cls(feat)
+
+ if self.enable_auxiliary_loss:
+ auxout = self.aux(C3)
+ return [output, auxout]
+ else:
+ return [output]
+
+
+class SelfAttentionBlock(layers.AttentionBlock):
+ """General self-attention block/non-local block.
+
+ Args:
+ in_channels (int): Input channels of key/query feature.
+ channels (int): Output channels of key/query transform.
+ """
+
+ def __init__(self, in_channels, channels):
+ super(SelfAttentionBlock, self).__init__(
+ key_in_channels=in_channels,
+ query_in_channels=in_channels,
+ channels=channels,
+ out_channels=in_channels,
+ share_key_query=False,
+ query_downsample=None,
+ key_downsample=None,
+ key_query_num_convs=2,
+ key_query_norm=True,
+ value_out_num_convs=1,
+ value_out_norm=False,
+ matmul_norm=True,
+ with_out=False)
+
+ self.output_project = self.build_project(
+ in_channels, in_channels, num_convs=1, use_conv_module=True)
+
+ def forward(self, x):
+ context = super(SelfAttentionBlock, self).forward(x, x)
+ return self.output_project(context)
diff --git a/modules/image/semantic_segmentation/isanet_resnet50_voc/resnet.py b/modules/image/semantic_segmentation/isanet_resnet50_voc/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..39327564de43c1999fe19e562a821a4f1eb8100b
--- /dev/null
+++ b/modules/image/semantic_segmentation/isanet_resnet50_voc/resnet.py
@@ -0,0 +1,359 @@
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import isanet_resnet50_voc.layers as layers
+
+
+class ConvBNLayer(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ data_format: str = 'NCHW'):
+ super(ConvBNLayer, self).__init__()
+ if dilation != 1 and kernel_size != 3:
+ raise RuntimeError("When the dilation isn't 1," \
+ "the kernel_size should be 3.")
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = nn.AvgPool2D(
+ kernel_size=2,
+ stride=2,
+ padding=0,
+ ceil_mode=True,
+ data_format=data_format)
+ self._conv = nn.Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 \
+ if dilation == 1 else dilation,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False,
+ data_format=data_format)
+
+ self._batch_norm = layers.SyncBatchNorm(
+ out_channels, data_format=data_format)
+ self._act_op = layers.Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ data_format: str = 'NCHW'):
+ super(BottleneckBlock, self).__init__()
+
+ self.data_format = data_format
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ data_format=data_format)
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ data_format=data_format)
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ data_format=data_format)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ data_format=data_format)
+
+ self.shortcut = shortcut
+ # NOTE: Use the wrap layer for quantization training
+ self.add = layers.Add()
+ self.relu = layers.Activation(act="relu")
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = self.add(short, conv2)
+ y = self.relu(y)
+ return y
+
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ dilation: int = 1,
+ shortcut: bool = True,
+ if_first: bool = False,
+ data_format: str = 'NCHW'):
+ super(BasicBlock, self).__init__()
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ dilation=dilation,
+ act='relu',
+ data_format=data_format)
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ dilation=dilation,
+ act=None,
+ data_format=data_format)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ data_format=data_format)
+
+ self.shortcut = shortcut
+ self.dilation = dilation
+ self.data_format = data_format
+ self.add = layers.Add()
+ self.relu = layers.Activation(act="relu")
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = self.add(short, conv1)
+ y = self.relu(y)
+
+ return y
+
+
+class ResNet_vd(nn.Layer):
+ """
+ The ResNet_vd implementation based on PaddlePaddle.
+
+ The original article refers to Jingdong
+ Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
+ (https://arxiv.org/pdf/1812.01187.pdf).
+
+ Args:
+ layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50.
+ output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8.
+ multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1).
+ pretrained (str, optional): The path of pretrained model.
+
+ """
+
+ def __init__(self,
+ layers: int = 50,
+ output_stride: int = 8,
+ multi_grid: Tuple[int] = (1, 1, 1),
+ pretrained: str = None,
+ data_format: str = 'NCHW'):
+ super(ResNet_vd, self).__init__()
+
+ self.data_format = data_format
+ self.conv1_logit = None # for gscnn shape stream
+ self.layers = layers
+ supported_layers = [18, 34, 50, 101, 152, 200]
+ assert layers in supported_layers, \
+ "supported layers are {} but input layer is {}".format(
+ supported_layers, layers)
+
+ if layers == 18:
+ depth = [2, 2, 2, 2]
+ elif layers == 34 or layers == 50:
+ depth = [3, 4, 6, 3]
+ elif layers == 101:
+ depth = [3, 4, 23, 3]
+ elif layers == 152:
+ depth = [3, 8, 36, 3]
+ elif layers == 200:
+ depth = [3, 12, 48, 3]
+ num_channels = [64, 256, 512, 1024
+ ] if layers >= 50 else [64, 64, 128, 256]
+ num_filters = [64, 128, 256, 512]
+
+ # for channels of four returned stages
+ self.feat_channels = [c * 4 for c in num_filters
+ ] if layers >= 50 else num_filters
+
+ dilation_dict = None
+ if output_stride == 8:
+ dilation_dict = {2: 2, 3: 4}
+ elif output_stride == 16:
+ dilation_dict = {3: 2}
+
+ self.conv1_1 = ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ data_format=data_format)
+ self.conv1_2 = ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ data_format=data_format)
+ self.conv1_3 = ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ data_format=data_format)
+ self.pool2d_max = nn.MaxPool2D(
+ kernel_size=3, stride=2, padding=1, data_format=data_format)
+
+ # self.block_list = []
+ self.stage_list = []
+ if layers >= 50:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ if layers in [101, 152] and block == 2:
+ if i == 0:
+ conv_name = "res" + str(block + 2) + "a"
+ else:
+ conv_name = "res" + str(block + 2) + "b" + str(i)
+ else:
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+
+ ###############################################################################
+ # Add dilation rate for some segmentation tasks, if dilation_dict is not None.
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+
+ # Actually block here is 'stage', and i is 'block' in 'stage'
+ # At the stage 4, expand the the dilation_rate if given multi_grid
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ ###############################################################################
+
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ dilation=dilation_rate,
+ data_format=data_format))
+
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+ else:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ dilation_rate = dilation_dict[block] \
+ if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+
+ basic_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BasicBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block],
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0 \
+ and dilation_rate == 1 else 1,
+ dilation=dilation_rate,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ data_format=data_format))
+ block_list.append(basic_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ self.pretrained = pretrained
+
+ def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ self.conv1_logit = y.clone()
+ y = self.pool2d_max(y)
+
+ # A feature list saves the output feature map of each stage.
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+
+ return feat_list
+
+
+def ResNet50_vd(**args):
+ model = ResNet_vd(layers=50, **args)
+ return model
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/README.md b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..4e9f9b5138fdd72c72542a2bdf1955f48266ac94
--- /dev/null
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/README.md
@@ -0,0 +1,182 @@
+# pspnet_resnet50_cityscapes
+
+|模型名称|pspnet_resnet50_cityscapes|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|pspnet_resnet50vd|
+|数据集|Cityscapes|
+|是否支持Fine-tuning|是|
+|模型大小|390MB|
+|指标|-|
+|最新更新日期|2022-03-21|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[pspnet](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install pspnet_resnet50_cityscapes
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='pspnet_resnet50_cityscapes')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用pspnet_resnet50_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='pspnet_resnet50_cityscapes', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='pspnet_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m pspnet_resnet50_cityscapes
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/README_en.md b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..975a846291dad727ca73918aa98644f4c70fa7a0
--- /dev/null
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/README_en.md
@@ -0,0 +1,181 @@
+# pspnet_resnet50_cityscapes
+
+|Module Name|pspnet_resnet50_cityscapes|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|pspnet_resnet50vd|
+|Dataset|Cityscapes|
+|Fine-tuning supported or not|Yes|
+|Module Size|390MB|
+|Data indicators|-|
+|Latest update date|2022-03-21|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [pspnet](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install pspnet_resnet50_cityscapes
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='pspnet_resnet50_cityscapes')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the pspnet_resnet50_cityscapes model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='pspnet_resnet50_cityscapes', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='pspnet_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m pspnet_resnet50_cityscapes
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/pspnet_resnet50_cityscapes"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/layers.py b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..af3c8765f8e26760cae5fad963d7949d3d1fdc3d
--- /dev/null
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/layers.py
@@ -0,0 +1,356 @@
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+ Returns:
+ A callable object of Activation.
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+ Examples:
+ from paddleseg.models.common.activation import Activation
+ relu = Activation("relu")
+ print(relu)
+ #
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool = False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
+
+
+class AuxLayer(nn.Layer):
+ """
+ The auxiliary layer implementation for auxiliary loss.
+
+ Args:
+ in_channels (int): The number of input channels.
+ inter_channels (int): The intermediate channels.
+ out_channels (int): The number of output channels, and usually it is num_classes.
+ dropout_prob (float, optional): The drop rate. Default: 0.1.
+ """
+
+ def __init__(self,
+ in_channels: int,
+ inter_channels: int,
+ out_channels: int,
+ dropout_prob: float = 0.1,
+ **kwargs):
+ super().__init__()
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=in_channels,
+ out_channels=inter_channels,
+ kernel_size=3,
+ padding=1,
+ **kwargs)
+
+ self.dropout = nn.Dropout(p=dropout_prob)
+
+ self.conv = nn.Conv2D(
+ in_channels=inter_channels,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+ x = self.conv(x)
+ return x
+
+
+class Add(nn.Layer):
+ def __init__(self):
+ super().__init__()
+
+ def forward(self, x: paddle.Tensor, y: paddle.Tensor, name: str = None) -> paddle.Tensor:
+ return paddle.add(x, y, name)
+
+class PPModule(nn.Layer):
+ """
+ Pyramid pooling module originally in PSPNet.
+
+ Args:
+ in_channels (int): The number of intput channels to pyramid pooling module.
+ out_channels (int): The number of output channels after pyramid pooling module.
+ bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1, 2, 3, 6).
+ dim_reduction (bool, optional): A bool value represents if reducing dimension after pooling. Default: True.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ """
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ bin_sizes: Tuple[int],
+ dim_reduction: bool,
+ align_corners: bool):
+ super().__init__()
+
+ self.bin_sizes = bin_sizes
+
+ inter_channels = in_channels
+ if dim_reduction:
+ inter_channels = in_channels // len(bin_sizes)
+
+ # we use dimension reduction after pooling mentioned in original implementation.
+ self.stages = nn.LayerList([
+ self._make_stage(in_channels, inter_channels, size)
+ for size in bin_sizes
+ ])
+
+ self.conv_bn_relu2 = ConvBNReLU(
+ in_channels=in_channels + inter_channels * len(bin_sizes),
+ out_channels=out_channels,
+ kernel_size=3,
+ padding=1)
+
+ self.align_corners = align_corners
+
+ def _make_stage(self, in_channels: int, out_channels: int, size: int):
+ """
+ Create one pooling layer.
+
+ In our implementation, we adopt the same dimension reduction as the original paper that might be
+ slightly different with other implementations.
+
+ After pooling, the channels are reduced to 1/len(bin_sizes) immediately, while some other implementations
+ keep the channels to be same.
+
+ Args:
+ in_channels (int): The number of intput channels to pyramid pooling module.
+ size (int): The out size of the pooled layer.
+
+ Returns:
+ conv (Tensor): A tensor after Pyramid Pooling Module.
+ """
+
+ prior = nn.AdaptiveAvgPool2D(output_size=(size, size))
+ conv = ConvBNReLU(
+ in_channels=in_channels, out_channels=out_channels, kernel_size=1)
+
+ return nn.Sequential(prior, conv)
+
+ def forward(self, input: paddle.Tensor) -> paddle.Tensor:
+ cat_layers = []
+ for stage in self.stages:
+ x = stage(input)
+ x = F.interpolate(
+ x,
+ paddle.shape(input)[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ cat_layers.append(x)
+ cat_layers = [input] + cat_layers[::-1]
+ cat = paddle.concat(cat_layers, axis=1)
+ out = self.conv_bn_relu2(cat)
+
+ return out
diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/module.py b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..8657af0d849b6d6141a1f4a533313858753c28aa
--- /dev/null
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/module.py
@@ -0,0 +1,165 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import numpy as np
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+from pspnet_resnet50_cityscapes.resnet import ResNet50_vd
+import pspnet_resnet50_cityscapes.layers as layers
+
+@moduleinfo(
+ name="pspnet_resnet50_cityscapes",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="PSPNetResnet50 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class PSPNet(nn.Layer):
+ """
+ The PSPNet implementation based on PaddlePaddle.
+
+ The original article refers to
+ Zhao, Hengshuang, et al. "Pyramid scene parsing network"
+ (https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf).
+
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple, optional): Two values in the tuple indicate the indices of output of backbone.
+ pp_out_channels (int, optional): The output channels after Pyramid Pooling Module. Default: 1024.
+ bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1,2,3,6).
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+ align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
+ e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 19,
+ backbone_indices: Tuple[int] = (2, 3),
+ pp_out_channels: int = 1024,
+ bin_sizes: Tuple[int] = (1, 2, 3, 6),
+ enable_auxiliary_loss: bool = True,
+ align_corners: bool = False,
+ pretrained: str = None):
+ super(PSPNet, self).__init__()
+
+ self.backbone = ResNet50_vd()
+ backbone_channels = [
+ self.backbone.feat_channels[i] for i in backbone_indices
+ ]
+
+ self.head = PSPNetHead(num_classes, backbone_indices, backbone_channels,
+ pp_out_channels, bin_sizes,
+ enable_auxiliary_loss, align_corners)
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ feat_list = self.backbone(x)
+ logit_list = self.head(feat_list)
+ return [
+ F.interpolate(
+ logit,
+ paddle.shape(x)[2:],
+ mode='bilinear',
+ align_corners=self.align_corners) for logit in logit_list
+ ]
+
+
+class PSPNetHead(nn.Layer):
+ """
+ The PSPNetHead implementation.
+
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple): Two values in the tuple indicate the indices of output of backbone.
+ The first index will be taken as a deep-supervision feature in auxiliary layer;
+ the second one will be taken as input of Pyramid Pooling Module (PPModule).
+ Usually backbone consists of four downsampling stage, and return an output of
+ each stage. If we set it as (2, 3) in ResNet, that means taking feature map of the third
+ stage (res4b22) in backbone, and feature map of the fourth stage (res5c) as input of PPModule.
+ backbone_channels (tuple): The same length with "backbone_indices". It indicates the channels of corresponding index.
+ pp_out_channels (int): The output channels after Pyramid Pooling Module.
+ bin_sizes (tuple): The out size of pooled feature maps.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ """
+
+ def __init__(self, num_classes, backbone_indices, backbone_channels,
+ pp_out_channels, bin_sizes, enable_auxiliary_loss,
+ align_corners):
+
+ super().__init__()
+
+ self.backbone_indices = backbone_indices
+
+ self.psp_module = layers.PPModule(
+ in_channels=backbone_channels[1],
+ out_channels=pp_out_channels,
+ bin_sizes=bin_sizes,
+ dim_reduction=True,
+ align_corners=align_corners)
+
+ self.dropout = nn.Dropout(p=0.1) # dropout_prob
+
+ self.conv = nn.Conv2D(
+ in_channels=pp_out_channels,
+ out_channels=num_classes,
+ kernel_size=1)
+
+ if enable_auxiliary_loss:
+ self.auxlayer = layers.AuxLayer(
+ in_channels=backbone_channels[0],
+ inter_channels=backbone_channels[0] // 4,
+ out_channels=num_classes)
+
+ self.enable_auxiliary_loss = enable_auxiliary_loss
+
+ def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]:
+ logit_list = []
+ x = feat_list[self.backbone_indices[1]]
+ x = self.psp_module(x)
+ x = self.dropout(x)
+ logit = self.conv(x)
+ logit_list.append(logit)
+
+ if self.enable_auxiliary_loss:
+ auxiliary_feat = feat_list[self.backbone_indices[0]]
+ auxiliary_logit = self.auxlayer(auxiliary_feat)
+ logit_list.append(auxiliary_logit)
+
+ return logit_list
diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/resnet.py b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..f70720eeccde8e26e94ff0a3abb555c17d5cc7c7
--- /dev/null
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/resnet.py
@@ -0,0 +1,357 @@
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle.nn as nn
+import pspnet_resnet50_cityscapes.layers as layers
+
+
+class ConvBNLayer(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ data_format: str = 'NCHW'):
+ super(ConvBNLayer, self).__init__()
+ if dilation != 1 and kernel_size != 3:
+ raise RuntimeError("When the dilation isn't 1," \
+ "the kernel_size should be 3.")
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = nn.AvgPool2D(
+ kernel_size=2,
+ stride=2,
+ padding=0,
+ ceil_mode=True,
+ data_format=data_format)
+ self._conv = nn.Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 \
+ if dilation == 1 else dilation,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False,
+ data_format=data_format)
+
+ self._batch_norm = layers.SyncBatchNorm(
+ out_channels, data_format=data_format)
+ self._act_op = layers.Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ data_format: str = 'NCHW'):
+ super(BottleneckBlock, self).__init__()
+
+ self.data_format = data_format
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ data_format=data_format)
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ data_format=data_format)
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ data_format=data_format)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ data_format=data_format)
+
+ self.shortcut = shortcut
+ # NOTE: Use the wrap layer for quantization training
+ self.add = layers.Add()
+ self.relu = layers.Activation(act="relu")
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = self.add(short, conv2)
+ y = self.relu(y)
+ return y
+
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ dilation: int = 1,
+ shortcut: bool = True,
+ if_first: bool = False,
+ data_format: str = 'NCHW'):
+ super(BasicBlock, self).__init__()
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ dilation=dilation,
+ act='relu',
+ data_format=data_format)
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ dilation=dilation,
+ act=None,
+ data_format=data_format)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ data_format=data_format)
+
+ self.shortcut = shortcut
+ self.dilation = dilation
+ self.data_format = data_format
+ self.add = layers.Add()
+ self.relu = layers.Activation(act="relu")
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = self.add(short, conv1)
+ y = self.relu(y)
+
+ return y
+
+
+class ResNet_vd(nn.Layer):
+ """
+ The ResNet_vd implementation based on PaddlePaddle.
+
+ The original article refers to Jingdong
+ Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
+ (https://arxiv.org/pdf/1812.01187.pdf).
+
+ Args:
+ layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50.
+ output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8.
+ multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1).
+ pretrained (str, optional): The path of pretrained model.
+
+ """
+
+ def __init__(self,
+ layers: int = 50,
+ output_stride: int = 8,
+ multi_grid: Tuple[int] = (1, 1, 1),
+ pretrained: str = None,
+ data_format: str = 'NCHW'):
+ super(ResNet_vd, self).__init__()
+
+ self.data_format = data_format
+ self.conv1_logit = None # for gscnn shape stream
+ self.layers = layers
+ supported_layers = [18, 34, 50, 101, 152, 200]
+ assert layers in supported_layers, \
+ "supported layers are {} but input layer is {}".format(
+ supported_layers, layers)
+
+ if layers == 18:
+ depth = [2, 2, 2, 2]
+ elif layers == 34 or layers == 50:
+ depth = [3, 4, 6, 3]
+ elif layers == 101:
+ depth = [3, 4, 23, 3]
+ elif layers == 152:
+ depth = [3, 8, 36, 3]
+ elif layers == 200:
+ depth = [3, 12, 48, 3]
+ num_channels = [64, 256, 512, 1024
+ ] if layers >= 50 else [64, 64, 128, 256]
+ num_filters = [64, 128, 256, 512]
+
+ # for channels of four returned stages
+ self.feat_channels = [c * 4 for c in num_filters
+ ] if layers >= 50 else num_filters
+
+ dilation_dict = None
+ if output_stride == 8:
+ dilation_dict = {2: 2, 3: 4}
+ elif output_stride == 16:
+ dilation_dict = {3: 2}
+
+ self.conv1_1 = ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ data_format=data_format)
+ self.conv1_2 = ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ data_format=data_format)
+ self.conv1_3 = ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ data_format=data_format)
+ self.pool2d_max = nn.MaxPool2D(
+ kernel_size=3, stride=2, padding=1, data_format=data_format)
+
+ # self.block_list = []
+ self.stage_list = []
+ if layers >= 50:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ if layers in [101, 152] and block == 2:
+ if i == 0:
+ conv_name = "res" + str(block + 2) + "a"
+ else:
+ conv_name = "res" + str(block + 2) + "b" + str(i)
+ else:
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+
+ ###############################################################################
+ # Add dilation rate for some segmentation tasks, if dilation_dict is not None.
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+
+ # Actually block here is 'stage', and i is 'block' in 'stage'
+ # At the stage 4, expand the the dilation_rate if given multi_grid
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ ###############################################################################
+
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ dilation=dilation_rate,
+ data_format=data_format))
+
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+ else:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ dilation_rate = dilation_dict[block] \
+ if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+
+ basic_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BasicBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block],
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0 \
+ and dilation_rate == 1 else 1,
+ dilation=dilation_rate,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ data_format=data_format))
+ block_list.append(basic_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ self.pretrained = pretrained
+
+ def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ self.conv1_logit = y.clone()
+ y = self.pool2d_max(y)
+
+ # A feature list saves the output feature map of each stage.
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+
+ return feat_list
+
+
+def ResNet50_vd(**args):
+ model = ResNet_vd(layers=50, **args)
+ return model
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_voc/README.md b/modules/image/semantic_segmentation/pspnet_resnet50_voc/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..97e4c156d0075787e0f80ab68655853437bf4cbf
--- /dev/null
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_voc/README.md
@@ -0,0 +1,182 @@
+# pspnet_resnet50_voc
+
+|模型名称|pspnet_resnet50_voc|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|pspnet_resnet50vd|
+|数据集|PascalVOC2012|
+|是否支持Fine-tuning|是|
+|模型大小|390MB|
+|指标|-|
+|最新更新日期|2022-03-21|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[pspnet](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install pspnet_resnet50_voc
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='pspnet_resnet50_voc')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用pspnet_resnet50_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='pspnet_resnet50_voc', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='pspnet_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m pspnet_resnet50_voc
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_voc/README_en.md b/modules/image/semantic_segmentation/pspnet_resnet50_voc/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..27b1489c951746861fdd7e123c77cb8c276bedee
--- /dev/null
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_voc/README_en.md
@@ -0,0 +1,181 @@
+# pspnet_resnet50_voc
+
+|Module Name|pspnet_resnet50_voc|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|pspnet_resnet50vd|
+|Dataset|PascalVOC2012|
+|Fine-tuning supported or not|Yes|
+|Module Size|370MB|
+|Data indicators|-|
+|Latest update date|2022-03-22|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [pspnet](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install pspnet_resnet50_voc
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='pspnet_resnet50_voc')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the pspnet_resnet50_voc model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='pspnet_resnet50_voc', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='pspnet_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m pspnet_resnet50_voc
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/pspnet_resnet50_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_voc/layers.py b/modules/image/semantic_segmentation/pspnet_resnet50_voc/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..a40f65856efae6f29664a0ddc57f0f5b852139f8
--- /dev/null
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_voc/layers.py
@@ -0,0 +1,353 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+ Returns:
+ A callable object of Activation.
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+ Examples:
+ from paddleseg.models.common.activation import Activation
+ relu = Activation("relu")
+ print(relu)
+ #
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool = False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
+
+
+class AuxLayer(nn.Layer):
+ """
+ The auxiliary layer implementation for auxiliary loss.
+
+ Args:
+ in_channels (int): The number of input channels.
+ inter_channels (int): The intermediate channels.
+ out_channels (int): The number of output channels, and usually it is num_classes.
+ dropout_prob (float, optional): The drop rate. Default: 0.1.
+ """
+
+ def __init__(self,
+ in_channels: int,
+ inter_channels: int,
+ out_channels: int,
+ dropout_prob: float = 0.1,
+ **kwargs):
+ super().__init__()
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=in_channels,
+ out_channels=inter_channels,
+ kernel_size=3,
+ padding=1,
+ **kwargs)
+
+ self.dropout = nn.Dropout(p=dropout_prob)
+
+ self.conv = nn.Conv2D(
+ in_channels=inter_channels,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+ x = self.conv(x)
+ return x
+
+
+class Add(nn.Layer):
+ def __init__(self):
+ super().__init__()
+
+ def forward(self, x: paddle.Tensor, y: paddle.Tensor, name: str = None):
+ return paddle.add(x, y, name)
+
+class PPModule(nn.Layer):
+ """
+ Pyramid pooling module originally in PSPNet.
+
+ Args:
+ in_channels (int): The number of intput channels to pyramid pooling module.
+ out_channels (int): The number of output channels after pyramid pooling module.
+ bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1, 2, 3, 6).
+ dim_reduction (bool, optional): A bool value represents if reducing dimension after pooling. Default: True.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ """
+
+ def __init__(self, in_channels: int, out_channels: int, bin_sizes: tuple, dim_reduction: bool,
+ align_corners: bool):
+ super().__init__()
+
+ self.bin_sizes = bin_sizes
+
+ inter_channels = in_channels
+ if dim_reduction:
+ inter_channels = in_channels // len(bin_sizes)
+
+ # we use dimension reduction after pooling mentioned in original implementation.
+ self.stages = nn.LayerList([
+ self._make_stage(in_channels, inter_channels, size)
+ for size in bin_sizes
+ ])
+
+ self.conv_bn_relu2 = ConvBNReLU(
+ in_channels=in_channels + inter_channels * len(bin_sizes),
+ out_channels=out_channels,
+ kernel_size=3,
+ padding=1)
+
+ self.align_corners = align_corners
+
+ def _make_stage(self, in_channels: int, out_channels: int, size: int):
+ """
+ Create one pooling layer.
+
+ In our implementation, we adopt the same dimension reduction as the original paper that might be
+ slightly different with other implementations.
+
+ After pooling, the channels are reduced to 1/len(bin_sizes) immediately, while some other implementations
+ keep the channels to be same.
+
+ Args:
+ in_channels (int): The number of intput channels to pyramid pooling module.
+ out_channels (int): The number of output channels to pyramid pooling module.
+ size (int): The out size of the pooled layer.
+
+ Returns:
+ conv (Tensor): A tensor after Pyramid Pooling Module.
+ """
+
+ prior = nn.AdaptiveAvgPool2D(output_size=(size, size))
+ conv = ConvBNReLU(
+ in_channels=in_channels, out_channels=out_channels, kernel_size=1)
+
+ return nn.Sequential(prior, conv)
+
+ def forward(self, input: paddle.Tensor) -> paddle.Tensor:
+ cat_layers = []
+ for stage in self.stages:
+ x = stage(input)
+ x = F.interpolate(
+ x,
+ paddle.shape(input)[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ cat_layers.append(x)
+ cat_layers = [input] + cat_layers[::-1]
+ cat = paddle.concat(cat_layers, axis=1)
+ out = self.conv_bn_relu2(cat)
+
+ return out
diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_voc/module.py b/modules/image/semantic_segmentation/pspnet_resnet50_voc/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..417b0d3385b45312a9b9358c5200d8bcb424e2df
--- /dev/null
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_voc/module.py
@@ -0,0 +1,165 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import numpy as np
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+from pspnet_resnet50_voc.resnet import ResNet50_vd
+import pspnet_resnet50_voc.layers as layers
+
+@moduleinfo(
+ name="pspnet_resnet50_voc",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="PSPNetResnet50 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class PSPNet(nn.Layer):
+ """
+ The PSPNet implementation based on PaddlePaddle.
+
+ The original article refers to
+ Zhao, Hengshuang, et al. "Pyramid scene parsing network"
+ (https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf).
+
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple, optional): Two values in the tuple indicate the indices of output of backbone.
+ pp_out_channels (int, optional): The output channels after Pyramid Pooling Module. Default: 1024.
+ bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1,2,3,6).
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+ align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
+ e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 21,
+ backbone_indices: Tuple[int] = (2, 3),
+ pp_out_channels: int = 1024,
+ bin_sizes: Tuple[int] = (1, 2, 3, 6),
+ enable_auxiliary_loss: bool = True,
+ align_corners: bool = False,
+ pretrained: str = None):
+ super(PSPNet, self).__init__()
+
+ self.backbone = ResNet50_vd()
+ backbone_channels = [
+ self.backbone.feat_channels[i] for i in backbone_indices
+ ]
+
+ self.head = PSPNetHead(num_classes, backbone_indices, backbone_channels,
+ pp_out_channels, bin_sizes,
+ enable_auxiliary_loss, align_corners)
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ feat_list = self.backbone(x)
+ logit_list = self.head(feat_list)
+ return [
+ F.interpolate(
+ logit,
+ paddle.shape(x)[2:],
+ mode='bilinear',
+ align_corners=self.align_corners) for logit in logit_list
+ ]
+
+
+class PSPNetHead(nn.Layer):
+ """
+ The PSPNetHead implementation.
+
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple): Two values in the tuple indicate the indices of output of backbone.
+ The first index will be taken as a deep-supervision feature in auxiliary layer;
+ the second one will be taken as input of Pyramid Pooling Module (PPModule).
+ Usually backbone consists of four downsampling stage, and return an output of
+ each stage. If we set it as (2, 3) in ResNet, that means taking feature map of the third
+ stage (res4b22) in backbone, and feature map of the fourth stage (res5c) as input of PPModule.
+ backbone_channels (tuple): The same length with "backbone_indices". It indicates the channels of corresponding index.
+ pp_out_channels (int): The output channels after Pyramid Pooling Module.
+ bin_sizes (tuple): The out size of pooled feature maps.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ """
+
+ def __init__(self, num_classes, backbone_indices, backbone_channels,
+ pp_out_channels, bin_sizes, enable_auxiliary_loss,
+ align_corners):
+
+ super().__init__()
+
+ self.backbone_indices = backbone_indices
+
+ self.psp_module = layers.PPModule(
+ in_channels=backbone_channels[1],
+ out_channels=pp_out_channels,
+ bin_sizes=bin_sizes,
+ dim_reduction=True,
+ align_corners=align_corners)
+
+ self.dropout = nn.Dropout(p=0.1) # dropout_prob
+
+ self.conv = nn.Conv2D(
+ in_channels=pp_out_channels,
+ out_channels=num_classes,
+ kernel_size=1)
+
+ if enable_auxiliary_loss:
+ self.auxlayer = layers.AuxLayer(
+ in_channels=backbone_channels[0],
+ inter_channels=backbone_channels[0] // 4,
+ out_channels=num_classes)
+
+ self.enable_auxiliary_loss = enable_auxiliary_loss
+
+ def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]:
+ logit_list = []
+ x = feat_list[self.backbone_indices[1]]
+ x = self.psp_module(x)
+ x = self.dropout(x)
+ logit = self.conv(x)
+ logit_list.append(logit)
+
+ if self.enable_auxiliary_loss:
+ auxiliary_feat = feat_list[self.backbone_indices[0]]
+ auxiliary_logit = self.auxlayer(auxiliary_feat)
+ logit_list.append(auxiliary_logit)
+
+ return logit_list
diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_voc/resnet.py b/modules/image/semantic_segmentation/pspnet_resnet50_voc/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..71af8839000b188bcf2f2add75c31fb4e5eb45d0
--- /dev/null
+++ b/modules/image/semantic_segmentation/pspnet_resnet50_voc/resnet.py
@@ -0,0 +1,357 @@
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle.nn as nn
+import pspnet_resnet50_voc.layers as layers
+
+
+class ConvBNLayer(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ data_format: str = 'NCHW'):
+ super(ConvBNLayer, self).__init__()
+ if dilation != 1 and kernel_size != 3:
+ raise RuntimeError("When the dilation isn't 1," \
+ "the kernel_size should be 3.")
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = nn.AvgPool2D(
+ kernel_size=2,
+ stride=2,
+ padding=0,
+ ceil_mode=True,
+ data_format=data_format)
+ self._conv = nn.Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 \
+ if dilation == 1 else dilation,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False,
+ data_format=data_format)
+
+ self._batch_norm = layers.SyncBatchNorm(
+ out_channels, data_format=data_format)
+ self._act_op = layers.Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ data_format: str = 'NCHW'):
+ super(BottleneckBlock, self).__init__()
+
+ self.data_format = data_format
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ data_format=data_format)
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ data_format=data_format)
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ data_format=data_format)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ data_format=data_format)
+
+ self.shortcut = shortcut
+ # NOTE: Use the wrap layer for quantization training
+ self.add = layers.Add()
+ self.relu = layers.Activation(act="relu")
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = self.add(short, conv2)
+ y = self.relu(y)
+ return y
+
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ dilation: int = 1,
+ shortcut: bool = True,
+ if_first: bool = False,
+ data_format: str = 'NCHW'):
+ super(BasicBlock, self).__init__()
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ dilation=dilation,
+ act='relu',
+ data_format=data_format)
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ dilation=dilation,
+ act=None,
+ data_format=data_format)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ data_format=data_format)
+
+ self.shortcut = shortcut
+ self.dilation = dilation
+ self.data_format = data_format
+ self.add = layers.Add()
+ self.relu = layers.Activation(act="relu")
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = self.add(short, conv1)
+ y = self.relu(y)
+
+ return y
+
+
+class ResNet_vd(nn.Layer):
+ """
+ The ResNet_vd implementation based on PaddlePaddle.
+
+ The original article refers to Jingdong
+ Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
+ (https://arxiv.org/pdf/1812.01187.pdf).
+
+ Args:
+ layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50.
+ output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8.
+ multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1).
+ pretrained (str, optional): The path of pretrained model.
+
+ """
+
+ def __init__(self,
+ layers: int = 50,
+ output_stride: int = 8,
+ multi_grid: Tuple[int] = (1, 1, 1),
+ pretrained: str = None,
+ data_format: str = 'NCHW'):
+ super(ResNet_vd, self).__init__()
+
+ self.data_format = data_format
+ self.conv1_logit = None # for gscnn shape stream
+ self.layers = layers
+ supported_layers = [18, 34, 50, 101, 152, 200]
+ assert layers in supported_layers, \
+ "supported layers are {} but input layer is {}".format(
+ supported_layers, layers)
+
+ if layers == 18:
+ depth = [2, 2, 2, 2]
+ elif layers == 34 or layers == 50:
+ depth = [3, 4, 6, 3]
+ elif layers == 101:
+ depth = [3, 4, 23, 3]
+ elif layers == 152:
+ depth = [3, 8, 36, 3]
+ elif layers == 200:
+ depth = [3, 12, 48, 3]
+ num_channels = [64, 256, 512, 1024
+ ] if layers >= 50 else [64, 64, 128, 256]
+ num_filters = [64, 128, 256, 512]
+
+ # for channels of four returned stages
+ self.feat_channels = [c * 4 for c in num_filters
+ ] if layers >= 50 else num_filters
+
+ dilation_dict = None
+ if output_stride == 8:
+ dilation_dict = {2: 2, 3: 4}
+ elif output_stride == 16:
+ dilation_dict = {3: 2}
+
+ self.conv1_1 = ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ data_format=data_format)
+ self.conv1_2 = ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ data_format=data_format)
+ self.conv1_3 = ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ data_format=data_format)
+ self.pool2d_max = nn.MaxPool2D(
+ kernel_size=3, stride=2, padding=1, data_format=data_format)
+
+ # self.block_list = []
+ self.stage_list = []
+ if layers >= 50:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ if layers in [101, 152] and block == 2:
+ if i == 0:
+ conv_name = "res" + str(block + 2) + "a"
+ else:
+ conv_name = "res" + str(block + 2) + "b" + str(i)
+ else:
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+
+ ###############################################################################
+ # Add dilation rate for some segmentation tasks, if dilation_dict is not None.
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+
+ # Actually block here is 'stage', and i is 'block' in 'stage'
+ # At the stage 4, expand the the dilation_rate if given multi_grid
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ ###############################################################################
+
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ dilation=dilation_rate,
+ data_format=data_format))
+
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+ else:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ dilation_rate = dilation_dict[block] \
+ if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+
+ basic_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BasicBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block],
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0 \
+ and dilation_rate == 1 else 1,
+ dilation=dilation_rate,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ data_format=data_format))
+ block_list.append(basic_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ self.pretrained = pretrained
+
+ def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ self.conv1_logit = y.clone()
+ y = self.pool2d_max(y)
+
+ # A feature list saves the output feature map of each stage.
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+
+ return feat_list
+
+
+def ResNet50_vd(**args):
+ model = ResNet_vd(layers=50, **args)
+ return model
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/stdc1_seg_cityscapes/README.md b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..c2d0cbb3eff49a6ee2379355396af5aaf121b967
--- /dev/null
+++ b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/README.md
@@ -0,0 +1,182 @@
+# stdc1_seg_cityscapes
+
+|模型名称|stdc1_seg_cityscapes|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|stdc1_seg|
+|数据集|Cityscapes|
+|是否支持Fine-tuning|是|
+|模型大小|67MB|
+|指标|-|
+|最新更新日期|2022-03-21|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[stdc](https://arxiv.org/abs/2104.13188)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install stdc1_seg_cityscapes
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='stdc1_seg_cityscapes')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用stdc1_seg_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='stdc1_seg_cityscapes', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='stdc1_seg_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m stdc1_seg_cityscapes
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/semantic_segmentation/stdc1_seg_cityscapes/README_en.md b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..3198989fc877a9943c235408e4beb4bc3a6d22a9
--- /dev/null
+++ b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/README_en.md
@@ -0,0 +1,181 @@
+# stdc1_seg_cityscapes
+
+|Module Name|stdc1_seg_cityscapes|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|stdc1_seg|
+|Dataset|Cityscapes|
+|Fine-tuning supported or not|Yes|
+|Module Size|67MB|
+|Data indicators|-|
+|Latest update date|2022-03-21|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [pspnet](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install stdc1_seg_cityscapes
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='stdc1_seg_cityscapes')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the stdc1_seg_cityscapes model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='stdc1_seg_cityscapes', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='stdc1_seg_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m stdc1_seg_cityscapes
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/stdc1_seg_cityscapes"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/semantic_segmentation/stdc1_seg_cityscapes/layers.py b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..c65193f55ee6fee66ea2294328ff1c6f63cdcf11
--- /dev/null
+++ b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/layers.py
@@ -0,0 +1,357 @@
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+ Returns:
+ A callable object of Activation.
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+ Examples:
+ from paddleseg.models.common.activation import Activation
+ relu = Activation("relu")
+ print(relu)
+ #
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool = False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
+
+
+class AuxLayer(nn.Layer):
+ """
+ The auxiliary layer implementation for auxiliary loss.
+
+ Args:
+ in_channels (int): The number of input channels.
+ inter_channels (int): The intermediate channels.
+ out_channels (int): The number of output channels, and usually it is num_classes.
+ dropout_prob (float, optional): The drop rate. Default: 0.1.
+ """
+
+ def __init__(self,
+ in_channels: int,
+ inter_channels: int,
+ out_channels: int,
+ dropout_prob: float = 0.1,
+ **kwargs):
+ super().__init__()
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=in_channels,
+ out_channels=inter_channels,
+ kernel_size=3,
+ padding=1,
+ **kwargs)
+
+ self.dropout = nn.Dropout(p=dropout_prob)
+
+ self.conv = nn.Conv2D(
+ in_channels=inter_channels,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+ x = self.conv(x)
+ return x
+
+
+class Add(nn.Layer):
+ def __init__(self):
+ super().__init__()
+
+ def forward(self, x: paddle.Tensor, y: paddle.Tensor, name=None) -> paddle.Tensor:
+ return paddle.add(x, y, name)
+
+class PPModule(nn.Layer):
+ """
+ Pyramid pooling module originally in PSPNet.
+
+ Args:
+ in_channels (int): The number of intput channels to pyramid pooling module.
+ out_channels (int): The number of output channels after pyramid pooling module.
+ bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1, 2, 3, 6).
+ dim_reduction (bool, optional): A bool value represents if reducing dimension after pooling. Default: True.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ """
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ bin_sizes: tuple,
+ dim_reduction: bool,
+ align_corners: bool):
+ super().__init__()
+
+ self.bin_sizes = bin_sizes
+
+ inter_channels = in_channels
+ if dim_reduction:
+ inter_channels = in_channels // len(bin_sizes)
+
+ # we use dimension reduction after pooling mentioned in original implementation.
+ self.stages = nn.LayerList([
+ self._make_stage(in_channels, inter_channels, size)
+ for size in bin_sizes
+ ])
+
+ self.conv_bn_relu2 = ConvBNReLU(
+ in_channels=in_channels + inter_channels * len(bin_sizes),
+ out_channels=out_channels,
+ kernel_size=3,
+ padding=1)
+
+ self.align_corners = align_corners
+
+ def _make_stage(self, in_channels: int, out_channels: int, size: int):
+ """
+ Create one pooling layer.
+
+ In our implementation, we adopt the same dimension reduction as the original paper that might be
+ slightly different with other implementations.
+
+ After pooling, the channels are reduced to 1/len(bin_sizes) immediately, while some other implementations
+ keep the channels to be same.
+
+ Args:
+ in_channels (int): The number of intput channels to pyramid pooling module.
+ out_channels (int): The number of output channels to pyramid pooling module.
+ size (int): The out size of the pooled layer.
+
+ Returns:
+ conv (Tensor): A tensor after Pyramid Pooling Module.
+ """
+
+ prior = nn.AdaptiveAvgPool2D(output_size=(size, size))
+ conv = ConvBNReLU(
+ in_channels=in_channels, out_channels=out_channels, kernel_size=1)
+
+ return nn.Sequential(prior, conv)
+
+ def forward(self, input: paddle.Tensor) -> paddle.Tensor:
+ cat_layers = []
+ for stage in self.stages:
+ x = stage(input)
+ x = F.interpolate(
+ x,
+ paddle.shape(input)[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ cat_layers.append(x)
+ cat_layers = [input] + cat_layers[::-1]
+ cat = paddle.concat(cat_layers, axis=1)
+ out = self.conv_bn_relu2(cat)
+
+ return out
diff --git a/modules/image/semantic_segmentation/stdc1_seg_cityscapes/module.py b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..f942f225a48b2f01b089f2bc71a536a636a0a2bf
--- /dev/null
+++ b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/module.py
@@ -0,0 +1,235 @@
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import numpy as np
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+from stdc1_seg_cityscapes.stdcnet import STDC1
+import stdc1_seg_cityscapes.layers as layers
+
+
+@moduleinfo(
+ name="stdc1_seg_cityscapes",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="STDCSeg is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class STDCSeg(nn.Layer):
+ """
+ The STDCSeg implementation based on PaddlePaddle.
+
+ The original article refers to Meituan
+ Fan, Mingyuan, et al. "Rethinking BiSeNet For Real-time Semantic Segmentation."
+ (https://arxiv.org/abs/2104.13188)
+
+ Args:
+ num_classes(int,optional): The unique number of target classes.
+ use_boundary_8(bool,non-optional): Whether to use detail loss. it should be True accroding to paper for best metric. Default: True.
+ Actually,if you want to use _boundary_2/_boundary_4/_boundary_16,you should append loss function number of DetailAggregateLoss.It should work properly.
+ use_conv_last(bool,optional): Determine ContextPath 's inplanes variable according to whether to use bockbone's last conv. Default: False.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 19,
+ use_boundary_2: bool = False,
+ use_boundary_4: bool = False,
+ use_boundary_8: bool = True,
+ use_boundary_16: bool = False,
+ use_conv_last: bool = False,
+ pretrained: str = None):
+ super(STDCSeg, self).__init__()
+
+ self.use_boundary_2 = use_boundary_2
+ self.use_boundary_4 = use_boundary_4
+ self.use_boundary_8 = use_boundary_8
+ self.use_boundary_16 = use_boundary_16
+ self.cp = ContextPath(STDC1(), use_conv_last=use_conv_last)
+ self.ffm = FeatureFusionModule(384, 256)
+ self.conv_out = SegHead(256, 256, num_classes)
+ self.conv_out8 = SegHead(128, 64, num_classes)
+ self.conv_out16 = SegHead(128, 64, num_classes)
+ self.conv_out_sp16 = SegHead(512, 64, 1)
+ self.conv_out_sp8 = SegHead(256, 64, 1)
+ self.conv_out_sp4 = SegHead(64, 64, 1)
+ self.conv_out_sp2 = SegHead(32, 64, 1)
+ self.transforms = T.Compose([T.Normalize()])
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ x_hw = paddle.shape(x)[2:]
+ feat_res2, feat_res4, feat_res8, _, feat_cp8, feat_cp16 = self.cp(x)
+
+ logit_list = []
+ if self.training:
+ feat_fuse = self.ffm(feat_res8, feat_cp8)
+ feat_out = self.conv_out(feat_fuse)
+ feat_out8 = self.conv_out8(feat_cp8)
+ feat_out16 = self.conv_out16(feat_cp16)
+
+ logit_list = [feat_out, feat_out8, feat_out16]
+ logit_list = [
+ F.interpolate(x, x_hw, mode='bilinear', align_corners=True)
+ for x in logit_list
+ ]
+
+ if self.use_boundary_2:
+ feat_out_sp2 = self.conv_out_sp2(feat_res2)
+ logit_list.append(feat_out_sp2)
+ if self.use_boundary_4:
+ feat_out_sp4 = self.conv_out_sp4(feat_res4)
+ logit_list.append(feat_out_sp4)
+ if self.use_boundary_8:
+ feat_out_sp8 = self.conv_out_sp8(feat_res8)
+ logit_list.append(feat_out_sp8)
+ else:
+ feat_fuse = self.ffm(feat_res8, feat_cp8)
+ feat_out = self.conv_out(feat_fuse)
+ feat_out = F.interpolate(
+ feat_out, x_hw, mode='bilinear', align_corners=True)
+ logit_list = [feat_out]
+
+ return logit_list
+
+
+class SegHead(nn.Layer):
+ def __init__(self, in_chan: int, mid_chan: int, n_classes:int):
+ super(SegHead, self).__init__()
+ self.conv = layers.ConvBNReLU(
+ in_chan, mid_chan, kernel_size=3, stride=1, padding=1)
+ self.conv_out = nn.Conv2D(
+ mid_chan, n_classes, kernel_size=1, bias_attr=None)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.conv(x)
+ x = self.conv_out(x)
+ return x
+
+
+class AttentionRefinementModule(nn.Layer):
+ def __init__(self, in_chan: int, out_chan: int):
+ super(AttentionRefinementModule, self).__init__()
+ self.conv = layers.ConvBNReLU(
+ in_chan, out_chan, kernel_size=3, stride=1, padding=1)
+ self.conv_atten = nn.Conv2D(
+ out_chan, out_chan, kernel_size=1, bias_attr=None)
+ self.bn_atten = nn.BatchNorm2D(out_chan)
+ self.sigmoid_atten = nn.Sigmoid()
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ feat = self.conv(x)
+ atten = F.adaptive_avg_pool2d(feat, 1)
+ atten = self.conv_atten(atten)
+ atten = self.bn_atten(atten)
+ atten = self.sigmoid_atten(atten)
+ out = paddle.multiply(feat, atten)
+ return out
+
+
+class ContextPath(nn.Layer):
+ def __init__(self, backbone, use_conv_last: bool = False):
+ super(ContextPath, self).__init__()
+ self.backbone = backbone
+ self.arm16 = AttentionRefinementModule(512, 128)
+ inplanes = 1024
+ if use_conv_last:
+ inplanes = 1024
+ self.arm32 = AttentionRefinementModule(inplanes, 128)
+ self.conv_head32 = layers.ConvBNReLU(
+ 128, 128, kernel_size=3, stride=1, padding=1)
+ self.conv_head16 = layers.ConvBNReLU(
+ 128, 128, kernel_size=3, stride=1, padding=1)
+ self.conv_avg = layers.ConvBNReLU(
+ inplanes, 128, kernel_size=1, stride=1, padding=0)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ feat2, feat4, feat8, feat16, feat32 = self.backbone(x)
+
+ feat8_hw = paddle.shape(feat8)[2:]
+ feat16_hw = paddle.shape(feat16)[2:]
+ feat32_hw = paddle.shape(feat32)[2:]
+
+ avg = F.adaptive_avg_pool2d(feat32, 1)
+ avg = self.conv_avg(avg)
+ avg_up = F.interpolate(avg, feat32_hw, mode='nearest')
+
+ feat32_arm = self.arm32(feat32)
+ feat32_sum = feat32_arm + avg_up
+ feat32_up = F.interpolate(feat32_sum, feat16_hw, mode='nearest')
+ feat32_up = self.conv_head32(feat32_up)
+
+ feat16_arm = self.arm16(feat16)
+ feat16_sum = feat16_arm + feat32_up
+ feat16_up = F.interpolate(feat16_sum, feat8_hw, mode='nearest')
+ feat16_up = self.conv_head16(feat16_up)
+
+ return feat2, feat4, feat8, feat16, feat16_up, feat32_up # x8, x16
+
+
+class FeatureFusionModule(nn.Layer):
+ def __init__(self, in_chan:int , out_chan: int):
+ super(FeatureFusionModule, self).__init__()
+ self.convblk = layers.ConvBNReLU(
+ in_chan, out_chan, kernel_size=1, stride=1, padding=0)
+ self.conv1 = nn.Conv2D(
+ out_chan,
+ out_chan // 4,
+ kernel_size=1,
+ stride=1,
+ padding=0,
+ bias_attr=None)
+ self.conv2 = nn.Conv2D(
+ out_chan // 4,
+ out_chan,
+ kernel_size=1,
+ stride=1,
+ padding=0,
+ bias_attr=None)
+ self.relu = nn.ReLU()
+ self.sigmoid = nn.Sigmoid()
+
+ def forward(self, fsp: paddle.Tensor, fcp: paddle.Tensor) -> paddle.Tensor:
+ fcat = paddle.concat([fsp, fcp], axis=1)
+ feat = self.convblk(fcat)
+ atten = F.adaptive_avg_pool2d(feat, 1)
+ atten = self.conv1(atten)
+ atten = self.relu(atten)
+ atten = self.conv2(atten)
+ atten = self.sigmoid(atten)
+ feat_atten = paddle.multiply(feat, atten)
+ feat_out = feat_atten + feat
+ return feat_out
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/stdc1_seg_cityscapes/stdcnet.py b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/stdcnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..ddf0f043128d49f4df3b6e70b6a6b4d92bbd2590
--- /dev/null
+++ b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/stdcnet.py
@@ -0,0 +1,263 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import Union, List, Tuple
+import math
+
+import paddle
+import paddle.nn as nn
+
+import stdc1_seg_cityscapes.layers as L
+
+__all__ = ["STDC1", "STDC2"]
+
+
+class STDCNet(nn.Layer):
+ """
+ The STDCNet implementation based on PaddlePaddle.
+
+ The original article refers to Meituan
+ Fan, Mingyuan, et al. "Rethinking BiSeNet For Real-time Semantic Segmentation."
+ (https://arxiv.org/abs/2104.13188)
+
+ Args:
+ base(int, optional): base channels. Default: 64.
+ layers(list, optional): layers numbers list. It determines STDC block numbers of STDCNet's stage3\4\5. Defualt: [4, 5, 3].
+ block_num(int,optional): block_num of features block. Default: 4.
+ type(str,optional): feature fusion method "cat"/"add". Default: "cat".
+ num_classes(int, optional): class number for image classification. Default: 1000.
+ dropout(float,optional): dropout ratio. if >0,use dropout ratio. Default: 0.20.
+ use_conv_last(bool,optional): whether to use the last ConvBNReLU layer . Default: False.
+ pretrained(str, optional): the path of pretrained model.
+ """
+
+ def __init__(self,
+ base: int = 64,
+ layers: List[int] = [4, 5, 3],
+ block_num: int = 4,
+ type: str = "cat",
+ num_classes: int = 1000,
+ dropout: float = 0.20,
+ use_conv_last: bool = False):
+ super(STDCNet, self).__init__()
+ if type == "cat":
+ block = CatBottleneck
+ elif type == "add":
+ block = AddBottleneck
+ self.use_conv_last = use_conv_last
+ self.features = self._make_layers(base, layers, block_num, block)
+ self.conv_last = ConvBNRelu(base * 16, max(1024, base * 16), 1, 1)
+
+ if (layers == [4, 5, 3]): #stdc1446
+ self.x2 = nn.Sequential(self.features[:1])
+ self.x4 = nn.Sequential(self.features[1:2])
+ self.x8 = nn.Sequential(self.features[2:6])
+ self.x16 = nn.Sequential(self.features[6:11])
+ self.x32 = nn.Sequential(self.features[11:])
+ elif (layers == [2, 2, 2]): #stdc813
+ self.x2 = nn.Sequential(self.features[:1])
+ self.x4 = nn.Sequential(self.features[1:2])
+ self.x8 = nn.Sequential(self.features[2:4])
+ self.x16 = nn.Sequential(self.features[4:6])
+ self.x32 = nn.Sequential(self.features[6:])
+ else:
+ raise NotImplementedError(
+ "model with layers:{} is not implemented!".format(layers))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ """
+ forward function for feature extract.
+ """
+ feat2 = self.x2(x)
+ feat4 = self.x4(feat2)
+ feat8 = self.x8(feat4)
+ feat16 = self.x16(feat8)
+ feat32 = self.x32(feat16)
+ if self.use_conv_last:
+ feat32 = self.conv_last(feat32)
+ return feat2, feat4, feat8, feat16, feat32
+
+ def _make_layers(self, base, layers, block_num, block):
+ features = []
+ features += [ConvBNRelu(3, base // 2, 3, 2)]
+ features += [ConvBNRelu(base // 2, base, 3, 2)]
+
+ for i, layer in enumerate(layers):
+ for j in range(layer):
+ if i == 0 and j == 0:
+ features.append(block(base, base * 4, block_num, 2))
+ elif j == 0:
+ features.append(
+ block(base * int(math.pow(2, i + 1)),
+ base * int(math.pow(2, i + 2)), block_num, 2))
+ else:
+ features.append(
+ block(base * int(math.pow(2, i + 2)),
+ base * int(math.pow(2, i + 2)), block_num, 1))
+
+ return nn.Sequential(*features)
+
+
+class ConvBNRelu(nn.Layer):
+ def __init__(self, in_planes: int, out_planes: int, kernel: int = 3, stride: int = 1):
+ super(ConvBNRelu, self).__init__()
+ self.conv = nn.Conv2D(
+ in_planes,
+ out_planes,
+ kernel_size=kernel,
+ stride=stride,
+ padding=kernel // 2,
+ bias_attr=False)
+ self.bn = L.SyncBatchNorm(out_planes, data_format='NCHW')
+ self.relu = nn.ReLU()
+
+ def forward(self, x):
+ out = self.relu(self.bn(self.conv(x)))
+ return out
+
+
+class AddBottleneck(nn.Layer):
+ def __init__(self, in_planes: int, out_planes: int, block_num: int = 3, stride: int = 1):
+ super(AddBottleneck, self).__init__()
+ assert block_num > 1, "block number should be larger than 1."
+ self.conv_list = nn.LayerList()
+ self.stride = stride
+ if stride == 2:
+ self.avd_layer = nn.Sequential(
+ nn.Conv2D(
+ out_planes // 2,
+ out_planes // 2,
+ kernel_size=3,
+ stride=2,
+ padding=1,
+ groups=out_planes // 2,
+ bias_attr=False),
+ nn.BatchNorm2D(out_planes // 2),
+ )
+ self.skip = nn.Sequential(
+ nn.Conv2D(
+ in_planes,
+ in_planes,
+ kernel_size=3,
+ stride=2,
+ padding=1,
+ groups=in_planes,
+ bias_attr=False),
+ nn.BatchNorm2D(in_planes),
+ nn.Conv2D(
+ in_planes, out_planes, kernel_size=1, bias_attr=False),
+ nn.BatchNorm2D(out_planes),
+ )
+ stride = 1
+
+ for idx in range(block_num):
+ if idx == 0:
+ self.conv_list.append(
+ ConvBNRelu(in_planes, out_planes // 2, kernel=1))
+ elif idx == 1 and block_num == 2:
+ self.conv_list.append(
+ ConvBNRelu(out_planes // 2, out_planes // 2, stride=stride))
+ elif idx == 1 and block_num > 2:
+ self.conv_list.append(
+ ConvBNRelu(out_planes // 2, out_planes // 4, stride=stride))
+ elif idx < block_num - 1:
+ self.conv_list.append(
+ ConvBNRelu(out_planes // int(math.pow(2, idx)),
+ out_planes // int(math.pow(2, idx + 1))))
+ else:
+ self.conv_list.append(
+ ConvBNRelu(out_planes // int(math.pow(2, idx)),
+ out_planes // int(math.pow(2, idx))))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ out_list = []
+ out = x
+ for idx, conv in enumerate(self.conv_list):
+ if idx == 0 and self.stride == 2:
+ out = self.avd_layer(conv(out))
+ else:
+ out = conv(out)
+ out_list.append(out)
+ if self.stride == 2:
+ x = self.skip(x)
+ return paddle.concat(out_list, axis=1) + x
+
+
+class CatBottleneck(nn.Layer):
+ def __init__(self, in_planes: int, out_planes: int, block_num: int = 3, stride: int = 1):
+ super(CatBottleneck, self).__init__()
+ assert block_num > 1, "block number should be larger than 1."
+ self.conv_list = nn.LayerList()
+ self.stride = stride
+ if stride == 2:
+ self.avd_layer = nn.Sequential(
+ nn.Conv2D(
+ out_planes // 2,
+ out_planes // 2,
+ kernel_size=3,
+ stride=2,
+ padding=1,
+ groups=out_planes // 2,
+ bias_attr=False),
+ nn.BatchNorm2D(out_planes // 2),
+ )
+ self.skip = nn.AvgPool2D(kernel_size=3, stride=2, padding=1)
+ stride = 1
+
+ for idx in range(block_num):
+ if idx == 0:
+ self.conv_list.append(
+ ConvBNRelu(in_planes, out_planes // 2, kernel=1))
+ elif idx == 1 and block_num == 2:
+ self.conv_list.append(
+ ConvBNRelu(out_planes // 2, out_planes // 2, stride=stride))
+ elif idx == 1 and block_num > 2:
+ self.conv_list.append(
+ ConvBNRelu(out_planes // 2, out_planes // 4, stride=stride))
+ elif idx < block_num - 1:
+ self.conv_list.append(
+ ConvBNRelu(out_planes // int(math.pow(2, idx)),
+ out_planes // int(math.pow(2, idx + 1))))
+ else:
+ self.conv_list.append(
+ ConvBNRelu(out_planes // int(math.pow(2, idx)),
+ out_planes // int(math.pow(2, idx))))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ out_list = []
+ out1 = self.conv_list[0](x)
+ for idx, conv in enumerate(self.conv_list[1:]):
+ if idx == 0:
+ if self.stride == 2:
+ out = conv(self.avd_layer(out1))
+ else:
+ out = conv(out1)
+ else:
+ out = conv(out)
+ out_list.append(out)
+
+ if self.stride == 2:
+ out1 = self.skip(out1)
+ out_list.insert(0, out1)
+ out = paddle.concat(out_list, axis=1)
+ return out
+
+
+def STDC2(**kwargs):
+ model = STDCNet(base=64, layers=[4, 5, 3], **kwargs)
+ return model
+
+def STDC1(**kwargs):
+ model = STDCNet(base=64, layers=[2, 2, 2], **kwargs)
+ return model
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/stdc1_seg_voc/README.md b/modules/image/semantic_segmentation/stdc1_seg_voc/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..f24a2a813c9bbf184d4714b1d96c23f8082b6b25
--- /dev/null
+++ b/modules/image/semantic_segmentation/stdc1_seg_voc/README.md
@@ -0,0 +1,182 @@
+# stdc1_seg_voc
+
+|模型名称|stdc1_seg_voc|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|stdc1_seg|
+|数据集|PascalVOC2012|
+|是否支持Fine-tuning|是|
+|模型大小|67MB|
+|指标|-|
+|最新更新日期|2022-03-21|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[stdc](https://arxiv.org/abs/2104.13188)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install stdc1_seg_voc
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='stdc1_seg_voc')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用stdc1_seg_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='stdc1_seg_voc', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='stdc1_seg_voc', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m stdc1_seg_voc
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/semantic_segmentation/stdc1_seg_voc/README_en.md b/modules/image/semantic_segmentation/stdc1_seg_voc/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..fd11504b94825347bcdf9028486820afe9c5c8d1
--- /dev/null
+++ b/modules/image/semantic_segmentation/stdc1_seg_voc/README_en.md
@@ -0,0 +1,181 @@
+# stdc1_seg_voc
+
+|Module Name|stdc1_seg_voc|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|stdc1_seg|
+|Dataset|PascalVOC2012|
+|Fine-tuning supported or not|Yes|
+|Module Size|370MB|
+|Data indicators|-|
+|Latest update date|2022-03-22|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [stdc](https://arxiv.org/abs/2104.13188)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install stdc1_seg_voc
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='stdc1_seg_voc')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the stdc1_seg_voc model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='stdc1_seg_voc', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='stdc1_seg_voc', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m stdc1_seg_voc
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/stdc1_seg_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/semantic_segmentation/stdc1_seg_voc/layers.py b/modules/image/semantic_segmentation/stdc1_seg_voc/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..2304610304e0149a1aceb4c1fbf4897edf5220af
--- /dev/null
+++ b/modules/image/semantic_segmentation/stdc1_seg_voc/layers.py
@@ -0,0 +1,357 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+ Returns:
+ A callable object of Activation.
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+ Examples:
+ from paddleseg.models.common.activation import Activation
+ relu = Activation("relu")
+ print(relu)
+ #
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool = False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
+
+
+class AuxLayer(nn.Layer):
+ """
+ The auxiliary layer implementation for auxiliary loss.
+
+ Args:
+ in_channels (int): The number of input channels.
+ inter_channels (int): The intermediate channels.
+ out_channels (int): The number of output channels, and usually it is num_classes.
+ dropout_prob (float, optional): The drop rate. Default: 0.1.
+ """
+
+ def __init__(self,
+ in_channels: int,
+ inter_channels: int,
+ out_channels: int,
+ dropout_prob: float = 0.1,
+ **kwargs):
+ super().__init__()
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=in_channels,
+ out_channels=inter_channels,
+ kernel_size=3,
+ padding=1,
+ **kwargs)
+
+ self.dropout = nn.Dropout(p=dropout_prob)
+
+ self.conv = nn.Conv2D(
+ in_channels=inter_channels,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+ x = self.conv(x)
+ return x
+
+
+class Add(nn.Layer):
+ def __init__(self):
+ super().__init__()
+
+ def forward(self, x: paddle.Tensor, y: paddle.Tensor, name=None) -> paddle.Tensor:
+ return paddle.add(x, y, name)
+
+class PPModule(nn.Layer):
+ """
+ Pyramid pooling module originally in PSPNet.
+
+ Args:
+ in_channels (int): The number of intput channels to pyramid pooling module.
+ out_channels (int): The number of output channels after pyramid pooling module.
+ bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1, 2, 3, 6).
+ dim_reduction (bool, optional): A bool value represents if reducing dimension after pooling. Default: True.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ """
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ bin_sizes: tuple,
+ dim_reduction: bool,
+ align_corners: bool):
+ super().__init__()
+
+ self.bin_sizes = bin_sizes
+
+ inter_channels = in_channels
+ if dim_reduction:
+ inter_channels = in_channels // len(bin_sizes)
+
+ # we use dimension reduction after pooling mentioned in original implementation.
+ self.stages = nn.LayerList([
+ self._make_stage(in_channels, inter_channels, size)
+ for size in bin_sizes
+ ])
+
+ self.conv_bn_relu2 = ConvBNReLU(
+ in_channels=in_channels + inter_channels * len(bin_sizes),
+ out_channels=out_channels,
+ kernel_size=3,
+ padding=1)
+
+ self.align_corners = align_corners
+
+ def _make_stage(self, in_channels: int, out_channels: int, size: int):
+ """
+ Create one pooling layer.
+
+ In our implementation, we adopt the same dimension reduction as the original paper that might be
+ slightly different with other implementations.
+
+ After pooling, the channels are reduced to 1/len(bin_sizes) immediately, while some other implementations
+ keep the channels to be same.
+
+ Args:
+ in_channels (int): The number of intput channels to pyramid pooling module.
+ out_channels (int): The number of output channels to pyramid pooling module.
+ size (int): The out size of the pooled layer.
+
+ Returns:
+ conv (Tensor): A tensor after Pyramid Pooling Module.
+ """
+
+ prior = nn.AdaptiveAvgPool2D(output_size=(size, size))
+ conv = ConvBNReLU(
+ in_channels=in_channels, out_channels=out_channels, kernel_size=1)
+
+ return nn.Sequential(prior, conv)
+
+ def forward(self, input: paddle.Tensor) -> paddle.Tensor:
+ cat_layers = []
+ for stage in self.stages:
+ x = stage(input)
+ x = F.interpolate(
+ x,
+ paddle.shape(input)[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ cat_layers.append(x)
+ cat_layers = [input] + cat_layers[::-1]
+ cat = paddle.concat(cat_layers, axis=1)
+ out = self.conv_bn_relu2(cat)
+
+ return out
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/stdc1_seg_voc/module.py b/modules/image/semantic_segmentation/stdc1_seg_voc/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..642628dc9d647c64605bd9016ddee7ab4de547d5
--- /dev/null
+++ b/modules/image/semantic_segmentation/stdc1_seg_voc/module.py
@@ -0,0 +1,235 @@
+# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import numpy as np
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+
+from stdc1_seg_voc.stdcnet import STDC1
+import stdc1_seg_voc.layers as layers
+
+
+@moduleinfo(
+ name="stdc1_seg_voc",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="STDCSeg is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class STDCSeg(nn.Layer):
+ """
+ The STDCSeg implementation based on PaddlePaddle.
+
+ The original article refers to Meituan
+ Fan, Mingyuan, et al. "Rethinking BiSeNet For Real-time Semantic Segmentation."
+ (https://arxiv.org/abs/2104.13188)
+
+ Args:
+ num_classes(int,optional): The unique number of target classes.
+ use_boundary_8(bool,non-optional): Whether to use detail loss. it should be True accroding to paper for best metric. Default: True.
+ Actually,if you want to use _boundary_2/_boundary_4/_boundary_16,you should append loss function number of DetailAggregateLoss.It should work properly.
+ use_conv_last(bool,optional): Determine ContextPath 's inplanes variable according to whether to use bockbone's last conv. Default: False.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 19,
+ use_boundary_2: bool = False,
+ use_boundary_4: bool = False,
+ use_boundary_8: bool = True,
+ use_boundary_16: bool = False,
+ use_conv_last: bool = False,
+ pretrained: str = None):
+ super(STDCSeg, self).__init__()
+
+ self.use_boundary_2 = use_boundary_2
+ self.use_boundary_4 = use_boundary_4
+ self.use_boundary_8 = use_boundary_8
+ self.use_boundary_16 = use_boundary_16
+ self.cp = ContextPath(STDC1(), use_conv_last=use_conv_last)
+ self.ffm = FeatureFusionModule(384, 256)
+ self.conv_out = SegHead(256, 256, num_classes)
+ self.conv_out8 = SegHead(128, 64, num_classes)
+ self.conv_out16 = SegHead(128, 64, num_classes)
+ self.conv_out_sp16 = SegHead(512, 64, 1)
+ self.conv_out_sp8 = SegHead(256, 64, 1)
+ self.conv_out_sp4 = SegHead(64, 64, 1)
+ self.conv_out_sp2 = SegHead(32, 64, 1)
+ self.transforms = T.Compose([T.Normalize()])
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ x_hw = paddle.shape(x)[2:]
+ feat_res2, feat_res4, feat_res8, _, feat_cp8, feat_cp16 = self.cp(x)
+
+ logit_list = []
+ if self.training:
+ feat_fuse = self.ffm(feat_res8, feat_cp8)
+ feat_out = self.conv_out(feat_fuse)
+ feat_out8 = self.conv_out8(feat_cp8)
+ feat_out16 = self.conv_out16(feat_cp16)
+
+ logit_list = [feat_out, feat_out8, feat_out16]
+ logit_list = [
+ F.interpolate(x, x_hw, mode='bilinear', align_corners=True)
+ for x in logit_list
+ ]
+
+ if self.use_boundary_2:
+ feat_out_sp2 = self.conv_out_sp2(feat_res2)
+ logit_list.append(feat_out_sp2)
+ if self.use_boundary_4:
+ feat_out_sp4 = self.conv_out_sp4(feat_res4)
+ logit_list.append(feat_out_sp4)
+ if self.use_boundary_8:
+ feat_out_sp8 = self.conv_out_sp8(feat_res8)
+ logit_list.append(feat_out_sp8)
+ else:
+ feat_fuse = self.ffm(feat_res8, feat_cp8)
+ feat_out = self.conv_out(feat_fuse)
+ feat_out = F.interpolate(
+ feat_out, x_hw, mode='bilinear', align_corners=True)
+ logit_list = [feat_out]
+
+ return logit_list
+
+
+class SegHead(nn.Layer):
+ def __init__(self, in_chan: int, mid_chan: int, n_classes:int):
+ super(SegHead, self).__init__()
+ self.conv = layers.ConvBNReLU(
+ in_chan, mid_chan, kernel_size=3, stride=1, padding=1)
+ self.conv_out = nn.Conv2D(
+ mid_chan, n_classes, kernel_size=1, bias_attr=None)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.conv(x)
+ x = self.conv_out(x)
+ return x
+
+
+class AttentionRefinementModule(nn.Layer):
+ def __init__(self, in_chan: int, out_chan: int):
+ super(AttentionRefinementModule, self).__init__()
+ self.conv = layers.ConvBNReLU(
+ in_chan, out_chan, kernel_size=3, stride=1, padding=1)
+ self.conv_atten = nn.Conv2D(
+ out_chan, out_chan, kernel_size=1, bias_attr=None)
+ self.bn_atten = nn.BatchNorm2D(out_chan)
+ self.sigmoid_atten = nn.Sigmoid()
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ feat = self.conv(x)
+ atten = F.adaptive_avg_pool2d(feat, 1)
+ atten = self.conv_atten(atten)
+ atten = self.bn_atten(atten)
+ atten = self.sigmoid_atten(atten)
+ out = paddle.multiply(feat, atten)
+ return out
+
+
+class ContextPath(nn.Layer):
+ def __init__(self, backbone, use_conv_last: bool = False):
+ super(ContextPath, self).__init__()
+ self.backbone = backbone
+ self.arm16 = AttentionRefinementModule(512, 128)
+ inplanes = 1024
+ if use_conv_last:
+ inplanes = 1024
+ self.arm32 = AttentionRefinementModule(inplanes, 128)
+ self.conv_head32 = layers.ConvBNReLU(
+ 128, 128, kernel_size=3, stride=1, padding=1)
+ self.conv_head16 = layers.ConvBNReLU(
+ 128, 128, kernel_size=3, stride=1, padding=1)
+ self.conv_avg = layers.ConvBNReLU(
+ inplanes, 128, kernel_size=1, stride=1, padding=0)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ feat2, feat4, feat8, feat16, feat32 = self.backbone(x)
+
+ feat8_hw = paddle.shape(feat8)[2:]
+ feat16_hw = paddle.shape(feat16)[2:]
+ feat32_hw = paddle.shape(feat32)[2:]
+
+ avg = F.adaptive_avg_pool2d(feat32, 1)
+ avg = self.conv_avg(avg)
+ avg_up = F.interpolate(avg, feat32_hw, mode='nearest')
+
+ feat32_arm = self.arm32(feat32)
+ feat32_sum = feat32_arm + avg_up
+ feat32_up = F.interpolate(feat32_sum, feat16_hw, mode='nearest')
+ feat32_up = self.conv_head32(feat32_up)
+
+ feat16_arm = self.arm16(feat16)
+ feat16_sum = feat16_arm + feat32_up
+ feat16_up = F.interpolate(feat16_sum, feat8_hw, mode='nearest')
+ feat16_up = self.conv_head16(feat16_up)
+
+ return feat2, feat4, feat8, feat16, feat16_up, feat32_up # x8, x16
+
+
+class FeatureFusionModule(nn.Layer):
+ def __init__(self, in_chan:int , out_chan: int):
+ super(FeatureFusionModule, self).__init__()
+ self.convblk = layers.ConvBNReLU(
+ in_chan, out_chan, kernel_size=1, stride=1, padding=0)
+ self.conv1 = nn.Conv2D(
+ out_chan,
+ out_chan // 4,
+ kernel_size=1,
+ stride=1,
+ padding=0,
+ bias_attr=None)
+ self.conv2 = nn.Conv2D(
+ out_chan // 4,
+ out_chan,
+ kernel_size=1,
+ stride=1,
+ padding=0,
+ bias_attr=None)
+ self.relu = nn.ReLU()
+ self.sigmoid = nn.Sigmoid()
+
+ def forward(self, fsp: paddle.Tensor, fcp: paddle.Tensor) -> paddle.Tensor:
+ fcat = paddle.concat([fsp, fcp], axis=1)
+ feat = self.convblk(fcat)
+ atten = F.adaptive_avg_pool2d(feat, 1)
+ atten = self.conv1(atten)
+ atten = self.relu(atten)
+ atten = self.conv2(atten)
+ atten = self.sigmoid(atten)
+ feat_atten = paddle.multiply(feat, atten)
+ feat_out = feat_atten + feat
+ return feat_out
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/stdc1_seg_voc/stdcnet.py b/modules/image/semantic_segmentation/stdc1_seg_voc/stdcnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..d2716a83b5fcf975c4d7a5e4291199d6b09689f9
--- /dev/null
+++ b/modules/image/semantic_segmentation/stdc1_seg_voc/stdcnet.py
@@ -0,0 +1,262 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+
+import paddle
+import paddle.nn as nn
+
+import stdc1_seg_voc.layers as L
+
+__all__ = ["STDC1", "STDC2"]
+
+
+class STDCNet(nn.Layer):
+ """
+ The STDCNet implementation based on PaddlePaddle.
+
+ The original article refers to Meituan
+ Fan, Mingyuan, et al. "Rethinking BiSeNet For Real-time Semantic Segmentation."
+ (https://arxiv.org/abs/2104.13188)
+
+ Args:
+ base(int, optional): base channels. Default: 64.
+ layers(list, optional): layers numbers list. It determines STDC block numbers of STDCNet's stage3\4\5. Defualt: [4, 5, 3].
+ block_num(int,optional): block_num of features block. Default: 4.
+ type(str,optional): feature fusion method "cat"/"add". Default: "cat".
+ num_classes(int, optional): class number for image classification. Default: 1000.
+ dropout(float,optional): dropout ratio. if >0,use dropout ratio. Default: 0.20.
+ use_conv_last(bool,optional): whether to use the last ConvBNReLU layer . Default: False.
+ pretrained(str, optional): the path of pretrained model.
+ """
+
+ def __init__(self,
+ base: int = 64,
+ layers: List[int] = [4, 5, 3],
+ block_num: int = 4,
+ type: str = "cat",
+ num_classes: int = 1000,
+ dropout: float = 0.20,
+ use_conv_last: bool = False):
+ super(STDCNet, self).__init__()
+ if type == "cat":
+ block = CatBottleneck
+ elif type == "add":
+ block = AddBottleneck
+ self.use_conv_last = use_conv_last
+ self.features = self._make_layers(base, layers, block_num, block)
+ self.conv_last = ConvBNRelu(base * 16, max(1024, base * 16), 1, 1)
+
+ if (layers == [4, 5, 3]): #stdc1446
+ self.x2 = nn.Sequential(self.features[:1])
+ self.x4 = nn.Sequential(self.features[1:2])
+ self.x8 = nn.Sequential(self.features[2:6])
+ self.x16 = nn.Sequential(self.features[6:11])
+ self.x32 = nn.Sequential(self.features[11:])
+ elif (layers == [2, 2, 2]): #stdc813
+ self.x2 = nn.Sequential(self.features[:1])
+ self.x4 = nn.Sequential(self.features[1:2])
+ self.x8 = nn.Sequential(self.features[2:4])
+ self.x16 = nn.Sequential(self.features[4:6])
+ self.x32 = nn.Sequential(self.features[6:])
+ else:
+ raise NotImplementedError(
+ "model with layers:{} is not implemented!".format(layers))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ """
+ forward function for feature extract.
+ """
+ feat2 = self.x2(x)
+ feat4 = self.x4(feat2)
+ feat8 = self.x8(feat4)
+ feat16 = self.x16(feat8)
+ feat32 = self.x32(feat16)
+ if self.use_conv_last:
+ feat32 = self.conv_last(feat32)
+ return feat2, feat4, feat8, feat16, feat32
+
+ def _make_layers(self, base, layers, block_num, block):
+ features = []
+ features += [ConvBNRelu(3, base // 2, 3, 2)]
+ features += [ConvBNRelu(base // 2, base, 3, 2)]
+
+ for i, layer in enumerate(layers):
+ for j in range(layer):
+ if i == 0 and j == 0:
+ features.append(block(base, base * 4, block_num, 2))
+ elif j == 0:
+ features.append(
+ block(base * int(math.pow(2, i + 1)),
+ base * int(math.pow(2, i + 2)), block_num, 2))
+ else:
+ features.append(
+ block(base * int(math.pow(2, i + 2)),
+ base * int(math.pow(2, i + 2)), block_num, 1))
+
+ return nn.Sequential(*features)
+
+
+class ConvBNRelu(nn.Layer):
+ def __init__(self, in_planes: int, out_planes: int, kernel: int = 3, stride: int = 1):
+ super(ConvBNRelu, self).__init__()
+ self.conv = nn.Conv2D(
+ in_planes,
+ out_planes,
+ kernel_size=kernel,
+ stride=stride,
+ padding=kernel // 2,
+ bias_attr=False)
+ self.bn = L.SyncBatchNorm(out_planes, data_format='NCHW')
+ self.relu = nn.ReLU()
+
+ def forward(self, x):
+ out = self.relu(self.bn(self.conv(x)))
+ return out
+
+
+class AddBottleneck(nn.Layer):
+ def __init__(self, in_planes: int, out_planes: int, block_num: int = 3, stride: int = 1):
+ super(AddBottleneck, self).__init__()
+ assert block_num > 1, "block number should be larger than 1."
+ self.conv_list = nn.LayerList()
+ self.stride = stride
+ if stride == 2:
+ self.avd_layer = nn.Sequential(
+ nn.Conv2D(
+ out_planes // 2,
+ out_planes // 2,
+ kernel_size=3,
+ stride=2,
+ padding=1,
+ groups=out_planes // 2,
+ bias_attr=False),
+ nn.BatchNorm2D(out_planes // 2),
+ )
+ self.skip = nn.Sequential(
+ nn.Conv2D(
+ in_planes,
+ in_planes,
+ kernel_size=3,
+ stride=2,
+ padding=1,
+ groups=in_planes,
+ bias_attr=False),
+ nn.BatchNorm2D(in_planes),
+ nn.Conv2D(
+ in_planes, out_planes, kernel_size=1, bias_attr=False),
+ nn.BatchNorm2D(out_planes),
+ )
+ stride = 1
+
+ for idx in range(block_num):
+ if idx == 0:
+ self.conv_list.append(
+ ConvBNRelu(in_planes, out_planes // 2, kernel=1))
+ elif idx == 1 and block_num == 2:
+ self.conv_list.append(
+ ConvBNRelu(out_planes // 2, out_planes // 2, stride=stride))
+ elif idx == 1 and block_num > 2:
+ self.conv_list.append(
+ ConvBNRelu(out_planes // 2, out_planes // 4, stride=stride))
+ elif idx < block_num - 1:
+ self.conv_list.append(
+ ConvBNRelu(out_planes // int(math.pow(2, idx)),
+ out_planes // int(math.pow(2, idx + 1))))
+ else:
+ self.conv_list.append(
+ ConvBNRelu(out_planes // int(math.pow(2, idx)),
+ out_planes // int(math.pow(2, idx))))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ out_list = []
+ out = x
+ for idx, conv in enumerate(self.conv_list):
+ if idx == 0 and self.stride == 2:
+ out = self.avd_layer(conv(out))
+ else:
+ out = conv(out)
+ out_list.append(out)
+ if self.stride == 2:
+ x = self.skip(x)
+ return paddle.concat(out_list, axis=1) + x
+
+
+class CatBottleneck(nn.Layer):
+ def __init__(self, in_planes: int, out_planes: int, block_num: int = 3, stride: int = 1):
+ super(CatBottleneck, self).__init__()
+ assert block_num > 1, "block number should be larger than 1."
+ self.conv_list = nn.LayerList()
+ self.stride = stride
+ if stride == 2:
+ self.avd_layer = nn.Sequential(
+ nn.Conv2D(
+ out_planes // 2,
+ out_planes // 2,
+ kernel_size=3,
+ stride=2,
+ padding=1,
+ groups=out_planes // 2,
+ bias_attr=False),
+ nn.BatchNorm2D(out_planes // 2),
+ )
+ self.skip = nn.AvgPool2D(kernel_size=3, stride=2, padding=1)
+ stride = 1
+
+ for idx in range(block_num):
+ if idx == 0:
+ self.conv_list.append(
+ ConvBNRelu(in_planes, out_planes // 2, kernel=1))
+ elif idx == 1 and block_num == 2:
+ self.conv_list.append(
+ ConvBNRelu(out_planes // 2, out_planes // 2, stride=stride))
+ elif idx == 1 and block_num > 2:
+ self.conv_list.append(
+ ConvBNRelu(out_planes // 2, out_planes // 4, stride=stride))
+ elif idx < block_num - 1:
+ self.conv_list.append(
+ ConvBNRelu(out_planes // int(math.pow(2, idx)),
+ out_planes // int(math.pow(2, idx + 1))))
+ else:
+ self.conv_list.append(
+ ConvBNRelu(out_planes // int(math.pow(2, idx)),
+ out_planes // int(math.pow(2, idx))))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ out_list = []
+ out1 = self.conv_list[0](x)
+ for idx, conv in enumerate(self.conv_list[1:]):
+ if idx == 0:
+ if self.stride == 2:
+ out = conv(self.avd_layer(out1))
+ else:
+ out = conv(out1)
+ else:
+ out = conv(out)
+ out_list.append(out)
+
+ if self.stride == 2:
+ out1 = self.skip(out1)
+ out_list.insert(0, out1)
+ out = paddle.concat(out_list, axis=1)
+ return out
+
+
+def STDC2(**kwargs):
+ model = STDCNet(base=64, layers=[4, 5, 3], **kwargs)
+ return model
+
+def STDC1(**kwargs):
+ model = STDCNet(base=64, layers=[2, 2, 2], **kwargs)
+ return model
\ No newline at end of file