diff --git a/modules/image/semantic_segmentation/ann_resnet50_cityscapes/README.md b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/README.md new file mode 100644 index 0000000000000000000000000000000000000000..ddbc7cd6ace39e073884fe220e1a26cf74e196d6 --- /dev/null +++ b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/README.md @@ -0,0 +1,182 @@ +# ann_resnet50_cityscapes + +|模型名称|ann_resnet50_cityscapes| +| :--- | :---: | +|类别|图像-图像分割| +|网络|ann_resnet50vd| +|数据集|Cityscapes| +|是否支持Fine-tuning|是| +|模型大小|228MB| +|指标|-| +|最新更新日期|2022-03-22| + +## 一、模型基本信息 + + - 样例结果示例: +

+ +

+ +- ### 模型介绍 + + - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。 + - 更多详情请参考:[ann](https://arxiv.org/pdf/1908.07678.pdf) + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、安装 + + - ```shell + $ hub install ann_resnet50_cityscapes + ``` + + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + +## 三、模型API预测 + +- ### 1.预测代码示例 + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='ann_resnet50_cityscapes') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.如何开始Fine-tune + + - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ann_resnet50_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下: + + - 代码步骤 + + - Step1: 定义数据预处理方式 + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + + - Step2: 下载数据集并使用 + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + - `transforms`: 数据预处理方式。 + - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + + - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + + - Step3: 加载预训练模型 + + - ```python + import paddlehub as hub + + model = hub.Module(name='ann_resnet50_cityscapes', num_classes=2, pretrained=None) + ``` + - `name`: 选择预训练模型的名字。 + - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + + - Step4: 选择优化策略和运行配置 + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + - 模型预测 + + - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='ann_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - 参数配置正确后,请执行脚本`python predict.py`。 + + - **Args** + * `images`:原始图像路径或BGR格式图片; + * `visualization`: 是否可视化,默认为True; + * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + + **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 四、服务部署 + +- PaddleHub Serving可以部署一个在线图像分割服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + + - ```shell + $ hub serving start -m ann_resnet50_cityscapes + ``` + + - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + # 发送HTTP请求 + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## 五、更新历史 + +* 1.0.0 + + 初始发布 diff --git a/modules/image/semantic_segmentation/ann_resnet50_cityscapes/README_en.md b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..43c29951a10009f8fc6dbca9cc39a92ead11c262 --- /dev/null +++ b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/README_en.md @@ -0,0 +1,184 @@ +# ann_resnet50_cityscapes + +|Module Name|ann_resnet50_cityscapes| +| :--- | :---: | +|Category|Image Segmentation| +|Network|ann_resnet50vd| +|Dataset|Cityscapes| +|Fine-tuning supported or not|Yes| +|Module Size|228MB| +|Data indicators|-| +|Latest update date|2022-03-22| + +## I. Basic Information + +- ### Application Effect Display + - Sample results: +

+ +

+ +- ### Module Introduction + + - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction. + - For more information, please refer to: [ann](https://arxiv.org/pdf/1908.07678.pdf) + +## II. Installation + +- ### 1、Environmental Dependence + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、Installation + + - ```shell + $ hub install ann_resnet50_cityscapes + ``` + + - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) + | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md) + + +## III. Module API Prediction + +- ### 1、Prediction Code Example + + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='ann_resnet50_cityscapes') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.Fine-tune and Encapsulation + + - After completing the installation of PaddlePaddle and PaddleHub, you can start using the ann_resnet50_cityscapes model to fine-tune datasets such as OpticDiscSeg. + + - Steps: + + - Step1: Define the data preprocessing method + + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs. + + - Step2: Download the dataset + + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + * `transforms`: data preprocessing methods. + + * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`. + + * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory. + + - Step3: Load the pre-trained model + + - ```python + import paddlehub as hub + + model = hub.Module(name='ann_resnet50_cityscapes', num_classes=2, pretrained=None) + ``` + - `name`: model name. + - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters. + + - Step4: Optimization strategy + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + + - Model prediction + + - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='ann_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - **Args** + * `images`: Image path or ndarray data with format [H, W, C], BGR. + * `visualization`: Whether to save the recognition results as picture files. + * `save_path`: Save path of the result, default is 'seg_result'. + + +## IV. Server Deployment + +- PaddleHub Serving can deploy an online service of image segmentation. + +- ### Step 1: Start PaddleHub Serving + + - Run the startup command: + + - ```shell + $ hub serving start -m ann_resnet50_cityscapes + ``` + + - The servitization API is now deployed and the default port number is 8866. + + - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set. + +- ### Step 2: Send a predictive request + + - With a configured server, use the following lines of code to send the prediction request and obtain the result: + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/ann_resnet50_cityscapes" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## V. Release Note + +- 1.0.0 + + First release diff --git a/modules/image/semantic_segmentation/ann_resnet50_cityscapes/layers.py b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..083c8d2fa09fea0eb51af3d3c89b9aba84ae94db --- /dev/null +++ b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/layers.py @@ -0,0 +1,275 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.layer import activation +from paddle.nn import Conv2D, AvgPool2D + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu': + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class SeparableConvBNReLU(nn.Layer): + """Depthwise Separable Convolution.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(SeparableConvBNReLU, self).__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU( + in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBN, self).__init__() + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBNReLU, self).__init__() + + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + Returns: + A callable object of Activation. + Raises: + KeyError: When parameter `act` is not in the optional range. + Examples: + from paddleseg.models.common.activation import Activation + relu = Activation("relu") + print(relu) + # + sigmoid = Activation("sigmoid") + print(sigmoid) + # + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format( + act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + + if self._act is not None: + return self.act_func(x) + else: + return x + + +class ASPPModule(nn.Layer): + """ + Atrous Spatial Pyramid Pooling. + Args: + aspp_ratios (tuple): The dilation rate using in ASSP module. + in_channels (int): The number of input channels. + out_channels (int): The number of output channels. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False. + image_pooling (bool, optional): If augmented with image-level features. Default: False + """ + + def __init__(self, + aspp_ratios: tuple, + in_channels: int, + out_channels: int, + align_corners: bool, + use_sep_conv: bool = False, + image_pooling: bool = False): + super().__init__() + + self.align_corners = align_corners + self.aspp_blocks = nn.LayerList() + + for ratio in aspp_ratios: + if use_sep_conv and ratio > 1: + conv_func = SeparableConvBNReLU + else: + conv_func = ConvBNReLU + + block = conv_func( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1 if ratio == 1 else 3, + dilation=ratio, + padding=0 if ratio == 1 else ratio) + self.aspp_blocks.append(block) + + out_size = len(self.aspp_blocks) + + if image_pooling: + self.global_avg_pool = nn.Sequential( + nn.AdaptiveAvgPool2D(output_size=(1, 1)), + ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False)) + out_size += 1 + self.image_pooling = image_pooling + + self.conv_bn_relu = ConvBNReLU( + in_channels=out_channels * out_size, + out_channels=out_channels, + kernel_size=1) + + self.dropout = nn.Dropout(p=0.1) # drop rate + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outputs = [] + for block in self.aspp_blocks: + y = block(x) + y = F.interpolate( + y, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(y) + + if self.image_pooling: + img_avg = self.global_avg_pool(x) + img_avg = F.interpolate( + img_avg, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(img_avg) + + x = paddle.concat(outputs, axis=1) + x = self.conv_bn_relu(x) + x = self.dropout(x) + + return x + + +class AuxLayer(nn.Layer): + """ + The auxiliary layer implementation for auxiliary loss. + + Args: + in_channels (int): The number of input channels. + inter_channels (int): The intermediate channels. + out_channels (int): The number of output channels, and usually it is num_classes. + dropout_prob (float, optional): The drop rate. Default: 0.1. + """ + + def __init__(self, + in_channels: int, + inter_channels: int, + out_channels: int, + dropout_prob: float = 0.1, + **kwargs): + super().__init__() + + self.conv_bn_relu = ConvBNReLU( + in_channels=in_channels, + out_channels=inter_channels, + kernel_size=3, + padding=1, + **kwargs) + + self.dropout = nn.Dropout(p=dropout_prob) + + self.conv = nn.Conv2D( + in_channels=inter_channels, + out_channels=out_channels, + kernel_size=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv_bn_relu(x) + x = self.dropout(x) + x = self.conv(x) + return x + + +class Add(nn.Layer): + def __init__(self): + super().__init__() + + def forward(self, x: paddle.Tensor, y: paddle.Tensor, name: str = None): + return paddle.add(x, y, name) diff --git a/modules/image/semantic_segmentation/ann_resnet50_cityscapes/module.py b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/module.py new file mode 100644 index 0000000000000000000000000000000000000000..d892c47c7dff84a269a0d0a52cb3c31da30e6cc9 --- /dev/null +++ b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/module.py @@ -0,0 +1,452 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from typing import Union, List, Tuple + +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +from ann_resnet50_cityscapes.resnet import ResNet50_vd +import ann_resnet50_cityscapes.layers as layers + +@moduleinfo( + name="ann_resnet50_cityscapes", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="ANNResnet50 is a segmentation model.", + version="1.0.0", + meta=ImageSegmentationModule) +class ANN(nn.Layer): + """ + The ANN implementation based on PaddlePaddle. + + The original article refers to + Zhen, Zhu, et al. "Asymmetric Non-local Neural Networks for Semantic Segmentation" + (https://arxiv.org/pdf/1908.07678.pdf). + + Args: + num_classes (int): The unique number of target classes. + backbone_indices (tuple, optional): Two values in the tuple indicate the indices of output of backbone. + key_value_channels (int, optional): The key and value channels of self-attention map in both AFNB and APNB modules. + Default: 256. + inter_channels (int, optional): Both input and output channels of APNB modules. Default: 512. + psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8). + enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True. + align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even, + e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None. + """ + + def __init__(self, + num_classes: int = 19, + backbone_indices: Tuple[int] = (2, 3), + key_value_channels: int = 256, + inter_channels: int = 512, + psp_size: Tuple[int] = (1, 3, 6, 8), + align_corners: bool = False, + pretrained: str = None): + super(ANN, self).__init__() + + self.backbone = ResNet50_vd() + backbone_channels = [ + self.backbone.feat_channels[i] for i in backbone_indices + ] + + self.head = ANNHead(num_classes, backbone_indices, backbone_channels, + key_value_channels, inter_channels, psp_size) + self.align_corners = align_corners + self.transforms = T.Compose([T.Normalize()]) + + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]: + return self.transforms(img) + + def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]: + feat_list = self.backbone(x) + logit_list = self.head(feat_list) + return [ + F.interpolate( + logit, + paddle.shape(x)[2:], + mode='bilinear', + align_corners=self.align_corners) for logit in logit_list + ] + + + +class ANNHead(nn.Layer): + """ + The ANNHead implementation. + + It mainly consists of AFNB and APNB modules. + + Args: + num_classes (int): The unique number of target classes. + backbone_indices (tuple): Two values in the tuple indicate the indices of output of backbone. + The first index will be taken as low-level features; the second one will be + taken as high-level features in AFNB module. Usually backbone consists of four + downsampling stage, such as ResNet, and return an output of each stage. If it is (2, 3), + it means taking feature map of the third stage and the fourth stage in backbone. + backbone_channels (tuple): The same length with "backbone_indices". It indicates the channels of corresponding index. + key_value_channels (int): The key and value channels of self-attention map in both AFNB and APNB modules. + inter_channels (int): Both input and output channels of APNB modules. + psp_size (tuple): The out size of pooled feature maps. + enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: False + """ + + def __init__(self, + num_classes: int, + backbone_indices: Tuple[int], + backbone_channels: Tuple[int], + key_value_channels: int, + inter_channels: int, + psp_size: Tuple[int], + enable_auxiliary_loss: bool = False): + super().__init__() + + low_in_channels = backbone_channels[0] + high_in_channels = backbone_channels[1] + + self.fusion = AFNB( + low_in_channels=low_in_channels, + high_in_channels=high_in_channels, + out_channels=high_in_channels, + key_channels=key_value_channels, + value_channels=key_value_channels, + dropout_prob=0.05, + repeat_sizes=([1]), + psp_size=psp_size) + + self.context = nn.Sequential( + layers.ConvBNReLU( + in_channels=high_in_channels, + out_channels=inter_channels, + kernel_size=3, + padding=1), + APNB( + in_channels=inter_channels, + out_channels=inter_channels, + key_channels=key_value_channels, + value_channels=key_value_channels, + dropout_prob=0.05, + repeat_sizes=([1]), + psp_size=psp_size)) + + self.cls = nn.Conv2D( + in_channels=inter_channels, out_channels=num_classes, kernel_size=1) + self.auxlayer = layers.AuxLayer( + in_channels=low_in_channels, + inter_channels=low_in_channels // 2, + out_channels=num_classes, + dropout_prob=0.05) + + self.backbone_indices = backbone_indices + self.enable_auxiliary_loss = enable_auxiliary_loss + + def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]: + logit_list = [] + low_level_x = feat_list[self.backbone_indices[0]] + high_level_x = feat_list[self.backbone_indices[1]] + x = self.fusion(low_level_x, high_level_x) + x = self.context(x) + logit = self.cls(x) + logit_list.append(logit) + + if self.enable_auxiliary_loss: + auxiliary_logit = self.auxlayer(low_level_x) + logit_list.append(auxiliary_logit) + + return logit_list + + +class AFNB(nn.Layer): + """ + Asymmetric Fusion Non-local Block. + + Args: + low_in_channels (int): Low-level-feature channels. + high_in_channels (int): High-level-feature channels. + out_channels (int): Out channels of AFNB module. + key_channels (int): The key channels in self-attention block. + value_channels (int): The value channels in self-attention block. + dropout_prob (float): The dropout rate of output. + repeat_sizes (tuple, optional): The number of AFNB modules. Default: ([1]). + psp_size (tuple. optional): The out size of pooled feature maps. Default: (1, 3, 6, 8). + """ + + def __init__(self, + low_in_channels: int, + high_in_channels: int, + out_channels: int, + key_channels: int, + value_channels: int, + dropout_prob: float, + repeat_sizes: Tuple[int] = ([1]), + psp_size: Tuple[int] = (1, 3, 6, 8)): + super().__init__() + + self.psp_size = psp_size + self.stages = nn.LayerList([ + SelfAttentionBlock_AFNB(low_in_channels, high_in_channels, + key_channels, value_channels, out_channels, + size) for size in repeat_sizes + ]) + self.conv_bn = layers.ConvBN( + in_channels=out_channels + high_in_channels, + out_channels=out_channels, + kernel_size=1) + self.dropout = nn.Dropout(p=dropout_prob) + + def forward(self, low_feats: List[paddle.Tensor], high_feats: List[paddle.Tensor]) -> paddle.Tensor: + priors = [stage(low_feats, high_feats) for stage in self.stages] + context = priors[0] + for i in range(1, len(priors)): + context += priors[i] + + output = self.conv_bn(paddle.concat([context, high_feats], axis=1)) + output = self.dropout(output) + + return output + + +class APNB(nn.Layer): + """ + Asymmetric Pyramid Non-local Block. + + Args: + in_channels (int): The input channels of APNB module. + out_channels (int): Out channels of APNB module. + key_channels (int): The key channels in self-attention block. + value_channels (int): The value channels in self-attention block. + dropout_prob (float): The dropout rate of output. + repeat_sizes (tuple, optional): The number of AFNB modules. Default: ([1]). + psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8). + """ + + def __init__(self, + in_channels: int, + out_channels: int, + key_channels: int, + value_channels: int, + dropout_prob: float, + repeat_sizes: Tuple[int] = ([1]), + psp_size: Tuple[int] = (1, 3, 6, 8)): + super().__init__() + + self.psp_size = psp_size + self.stages = nn.LayerList([ + SelfAttentionBlock_APNB(in_channels, out_channels, key_channels, + value_channels, size) + for size in repeat_sizes + ]) + self.conv_bn = layers.ConvBNReLU( + in_channels=in_channels * 2, + out_channels=out_channels, + kernel_size=1) + self.dropout = nn.Dropout(p=dropout_prob) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + priors = [stage(x) for stage in self.stages] + context = priors[0] + for i in range(1, len(priors)): + context += priors[i] + + output = self.conv_bn(paddle.concat([context, x], axis=1)) + output = self.dropout(output) + + return output + + +def _pp_module(x: paddle.Tensor, psp_size: List[int]) -> paddle.Tensor: + n, c, h, w = x.shape + priors = [] + for size in psp_size: + feat = F.adaptive_avg_pool2d(x, size) + feat = paddle.reshape(feat, shape=(0, c, -1)) + priors.append(feat) + center = paddle.concat(priors, axis=-1) + return center + + +class SelfAttentionBlock_AFNB(nn.Layer): + """ + Self-Attention Block for AFNB module. + + Args: + low_in_channels (int): Low-level-feature channels. + high_in_channels (int): High-level-feature channels. + key_channels (int): The key channels in self-attention block. + value_channels (int): The value channels in self-attention block. + out_channels (int, optional): Out channels of AFNB module. Default: None. + scale (int, optional): Pooling size. Default: 1. + psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8). + """ + + def __init__(self, + low_in_channels: int, + high_in_channels: int, + key_channels: int, + value_channels: int, + out_channels: int = None, + scale: int = 1, + psp_size: Tuple[int] = (1, 3, 6, 8)): + super().__init__() + + self.scale = scale + self.in_channels = low_in_channels + self.out_channels = out_channels + self.key_channels = key_channels + self.value_channels = value_channels + if out_channels == None: + self.out_channels = high_in_channels + self.pool = nn.MaxPool2D(scale) + self.f_key = layers.ConvBNReLU( + in_channels=low_in_channels, + out_channels=key_channels, + kernel_size=1) + self.f_query = layers.ConvBNReLU( + in_channels=high_in_channels, + out_channels=key_channels, + kernel_size=1) + self.f_value = nn.Conv2D( + in_channels=low_in_channels, + out_channels=value_channels, + kernel_size=1) + + self.W = nn.Conv2D( + in_channels=value_channels, + out_channels=out_channels, + kernel_size=1) + + self.psp_size = psp_size + + def forward(self, low_feats: List[paddle.Tensor], high_feats: List[paddle.Tensor]) -> paddle.Tensor: + batch_size, _, h, w = high_feats.shape + + value = self.f_value(low_feats) + value = _pp_module(value, self.psp_size) + value = paddle.transpose(value, (0, 2, 1)) + + query = self.f_query(high_feats) + query = paddle.reshape(query, shape=(0, self.key_channels, -1)) + query = paddle.transpose(query, perm=(0, 2, 1)) + + key = self.f_key(low_feats) + key = _pp_module(key, self.psp_size) + + sim_map = paddle.matmul(query, key) + sim_map = (self.key_channels**-.5) * sim_map + sim_map = F.softmax(sim_map, axis=-1) + + context = paddle.matmul(sim_map, value) + context = paddle.transpose(context, perm=(0, 2, 1)) + hf_shape = paddle.shape(high_feats) + context = paddle.reshape( + context, shape=[0, self.value_channels, hf_shape[2], hf_shape[3]]) + + context = self.W(context) + + return context + + +class SelfAttentionBlock_APNB(nn.Layer): + """ + Self-Attention Block for APNB module. + + Args: + in_channels (int): The input channels of APNB module. + out_channels (int): The out channels of APNB module. + key_channels (int): The key channels in self-attention block. + value_channels (int): The value channels in self-attention block. + scale (int, optional): Pooling size. Default: 1. + psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8). + """ + + def __init__(self, + in_channels: int, + out_channels: int, + key_channels: int, + value_channels: int, + scale: int = 1, + psp_size: Tuple[int] = (1, 3, 6, 8)): + super().__init__() + + self.scale = scale + self.in_channels = in_channels + self.out_channels = out_channels + self.key_channels = key_channels + self.value_channels = value_channels + self.pool = nn.MaxPool2D(scale) + self.f_key = layers.ConvBNReLU( + in_channels=self.in_channels, + out_channels=self.key_channels, + kernel_size=1) + self.f_query = self.f_key + self.f_value = nn.Conv2D( + in_channels=self.in_channels, + out_channels=self.value_channels, + kernel_size=1) + self.W = nn.Conv2D( + in_channels=self.value_channels, + out_channels=self.out_channels, + kernel_size=1) + + self.psp_size = psp_size + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + batch_size, _, h, w = x.shape + if self.scale > 1: + x = self.pool(x) + + value = self.f_value(x) + value = _pp_module(value, self.psp_size) + value = paddle.transpose(value, perm=(0, 2, 1)) + + query = self.f_query(x) + query = paddle.reshape(query, shape=(0, self.key_channels, -1)) + query = paddle.transpose(query, perm=(0, 2, 1)) + + key = self.f_key(x) + key = _pp_module(key, self.psp_size) + + sim_map = paddle.matmul(query, key) + sim_map = (self.key_channels**-.5) * sim_map + sim_map = F.softmax(sim_map, axis=-1) + + context = paddle.matmul(sim_map, value) + context = paddle.transpose(context, perm=(0, 2, 1)) + + x_shape = paddle.shape(x) + context = paddle.reshape( + context, shape=[0, self.value_channels, x_shape[2], x_shape[3]]) + context = self.W(context) + + return context diff --git a/modules/image/semantic_segmentation/ann_resnet50_cityscapes/resnet.py b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/resnet.py new file mode 100644 index 0000000000000000000000000000000000000000..efa7ba57045337ef5c6ee2f84dc9de0e72e73e32 --- /dev/null +++ b/modules/image/semantic_segmentation/ann_resnet50_cityscapes/resnet.py @@ -0,0 +1,361 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from typing import Union, List, Tuple + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +import ann_resnet50_cityscapes.layers as layers + + +class ConvBNLayer(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + stride: int = 1, + dilation: int = 1, + groups: int = 1, + is_vd_mode: bool = False, + act: str = None, + data_format: str = 'NCHW'): + super(ConvBNLayer, self).__init__() + if dilation != 1 and kernel_size != 3: + raise RuntimeError("When the dilation isn't 1," \ + "the kernel_size should be 3.") + + self.is_vd_mode = is_vd_mode + self._pool2d_avg = nn.AvgPool2D( + kernel_size=2, + stride=2, + padding=0, + ceil_mode=True, + data_format=data_format) + self._conv = nn.Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2 \ + if dilation == 1 else dilation, + dilation=dilation, + groups=groups, + bias_attr=False, + data_format=data_format) + + self._batch_norm = layers.SyncBatchNorm( + out_channels, data_format=data_format) + self._act_op = layers.Activation(act=act) + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + if self.is_vd_mode: + inputs = self._pool2d_avg(inputs) + y = self._conv(inputs) + y = self._batch_norm(y) + y = self._act_op(y) + + return y + + +class BottleneckBlock(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + shortcut: bool = True, + if_first: bool = False, + dilation: int = 1, + data_format: str = 'NCHW'): + super(BottleneckBlock, self).__init__() + + self.data_format = data_format + self.conv0 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + act='relu', + data_format=data_format) + + self.dilation = dilation + + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + act='relu', + dilation=dilation, + data_format=data_format) + self.conv2 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels * 4, + kernel_size=1, + act=None, + data_format=data_format) + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels * 4, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + data_format=data_format) + + self.shortcut = shortcut + # NOTE: Use the wrap layer for quantization training + self.add = layers.Add() + self.relu = layers.Activation(act="relu") + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + conv1 = self.conv1(y) + conv2 = self.conv2(conv1) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + + y = self.add(short, conv2) + y = self.relu(y) + return y + + +class BasicBlock(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + dilation: int = 1, + shortcut: bool = True, + if_first: bool = False, + data_format: str = 'NCHW'): + super(BasicBlock, self).__init__() + self.conv0 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + dilation=dilation, + act='relu', + data_format=data_format) + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + dilation=dilation, + act=None, + data_format=data_format) + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + data_format=data_format) + + self.shortcut = shortcut + self.dilation = dilation + self.data_format = data_format + self.add = layers.Add() + self.relu = layers.Activation(act="relu") + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + conv1 = self.conv1(y) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + y = self.add(short, conv1) + y = self.relu(y) + + return y + + +class ResNet_vd(nn.Layer): + """ + The ResNet_vd implementation based on PaddlePaddle. + + The original article refers to Jingdong + Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks" + (https://arxiv.org/pdf/1812.01187.pdf). + + Args: + layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50. + output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8. + multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1). + pretrained (str, optional): The path of pretrained model. + + """ + + def __init__(self, + layers: int = 50, + output_stride: int = 8, + multi_grid: Tuple[int] = (1, 1, 1), + pretrained: str = None, + data_format: str = 'NCHW'): + super(ResNet_vd, self).__init__() + + self.data_format = data_format + self.conv1_logit = None # for gscnn shape stream + self.layers = layers + supported_layers = [18, 34, 50, 101, 152, 200] + assert layers in supported_layers, \ + "supported layers are {} but input layer is {}".format( + supported_layers, layers) + + if layers == 18: + depth = [2, 2, 2, 2] + elif layers == 34 or layers == 50: + depth = [3, 4, 6, 3] + elif layers == 101: + depth = [3, 4, 23, 3] + elif layers == 152: + depth = [3, 8, 36, 3] + elif layers == 200: + depth = [3, 12, 48, 3] + num_channels = [64, 256, 512, 1024 + ] if layers >= 50 else [64, 64, 128, 256] + num_filters = [64, 128, 256, 512] + + # for channels of four returned stages + self.feat_channels = [c * 4 for c in num_filters + ] if layers >= 50 else num_filters + + dilation_dict = None + if output_stride == 8: + dilation_dict = {2: 2, 3: 4} + elif output_stride == 16: + dilation_dict = {3: 2} + + self.conv1_1 = ConvBNLayer( + in_channels=3, + out_channels=32, + kernel_size=3, + stride=2, + act='relu', + data_format=data_format) + self.conv1_2 = ConvBNLayer( + in_channels=32, + out_channels=32, + kernel_size=3, + stride=1, + act='relu', + data_format=data_format) + self.conv1_3 = ConvBNLayer( + in_channels=32, + out_channels=64, + kernel_size=3, + stride=1, + act='relu', + data_format=data_format) + self.pool2d_max = nn.MaxPool2D( + kernel_size=3, stride=2, padding=1, data_format=data_format) + + # self.block_list = [] + self.stage_list = [] + if layers >= 50: + for block in range(len(depth)): + shortcut = False + block_list = [] + for i in range(depth[block]): + if layers in [101, 152] and block == 2: + if i == 0: + conv_name = "res" + str(block + 2) + "a" + else: + conv_name = "res" + str(block + 2) + "b" + str(i) + else: + conv_name = "res" + str(block + 2) + chr(97 + i) + + ############################################################################### + # Add dilation rate for some segmentation tasks, if dilation_dict is not None. + dilation_rate = dilation_dict[ + block] if dilation_dict and block in dilation_dict else 1 + + # Actually block here is 'stage', and i is 'block' in 'stage' + # At the stage 4, expand the the dilation_rate if given multi_grid + if block == 3: + dilation_rate = dilation_rate * multi_grid[i] + ############################################################################### + + bottleneck_block = self.add_sublayer( + 'bb_%d_%d' % (block, i), + BottleneckBlock( + in_channels=num_channels[block] + if i == 0 else num_filters[block] * 4, + out_channels=num_filters[block], + stride=2 if i == 0 and block != 0 + and dilation_rate == 1 else 1, + shortcut=shortcut, + if_first=block == i == 0, + dilation=dilation_rate, + data_format=data_format)) + + block_list.append(bottleneck_block) + shortcut = True + self.stage_list.append(block_list) + else: + for block in range(len(depth)): + shortcut = False + block_list = [] + for i in range(depth[block]): + dilation_rate = dilation_dict[block] \ + if dilation_dict and block in dilation_dict else 1 + if block == 3: + dilation_rate = dilation_rate * multi_grid[i] + + basic_block = self.add_sublayer( + 'bb_%d_%d' % (block, i), + BasicBlock( + in_channels=num_channels[block] + if i == 0 else num_filters[block], + out_channels=num_filters[block], + stride=2 if i == 0 and block != 0 \ + and dilation_rate == 1 else 1, + dilation=dilation_rate, + shortcut=shortcut, + if_first=block == i == 0, + data_format=data_format)) + block_list.append(basic_block) + shortcut = True + self.stage_list.append(block_list) + + self.pretrained = pretrained + + def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]: + y = self.conv1_1(inputs) + y = self.conv1_2(y) + y = self.conv1_3(y) + self.conv1_logit = y.clone() + y = self.pool2d_max(y) + + # A feature list saves the output feature map of each stage. + feat_list = [] + for stage in self.stage_list: + for block in stage: + y = block(y) + feat_list.append(y) + + return feat_list + + +def ResNet50_vd(**args): + model = ResNet_vd(layers=50, **args) + return model \ No newline at end of file diff --git a/modules/image/semantic_segmentation/ann_resnet50_voc/README.md b/modules/image/semantic_segmentation/ann_resnet50_voc/README.md new file mode 100644 index 0000000000000000000000000000000000000000..bd91399116cd00d8c55177270ce06fe758ad7b5c --- /dev/null +++ b/modules/image/semantic_segmentation/ann_resnet50_voc/README.md @@ -0,0 +1,182 @@ +# ann_resnet50_voc + +|模型名称|ann_resnet50_voc| +| :--- | :---: | +|类别|图像-图像分割| +|网络|ann_resnet50vd| +|数据集|PascalVOC2012| +|是否支持Fine-tuning|是| +|模型大小|228MB| +|指标|-| +|最新更新日期|2022-03-21| + +## 一、模型基本信息 + + - 样例结果示例: +

+ +

+ +- ### 模型介绍 + + - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。 + - 更多详情请参考:[ann](https://arxiv.org/pdf/1908.07678.pdf) + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、安装 + + - ```shell + $ hub install ann_resnet50_voc + ``` + + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + +## 三、模型API预测 + +- ### 1.预测代码示例 + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='ann_resnet50_voc') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.如何开始Fine-tune + + - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ann_resnet50_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下: + + - 代码步骤 + + - Step1: 定义数据预处理方式 + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + + - Step2: 下载数据集并使用 + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + - `transforms`: 数据预处理方式。 + - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + + - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + + - Step3: 加载预训练模型 + + - ```python + import paddlehub as hub + + model = hub.Module(name='ann_resnet50_voc', num_classes=2, pretrained=None) + ``` + - `name`: 选择预训练模型的名字。 + - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + + - Step4: 选择优化策略和运行配置 + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + - 模型预测 + + - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='ann_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - 参数配置正确后,请执行脚本`python predict.py`。 + + - **Args** + * `images`:原始图像路径或BGR格式图片; + * `visualization`: 是否可视化,默认为True; + * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + + **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 四、服务部署 + +- PaddleHub Serving可以部署一个在线图像分割服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + + - ```shell + $ hub serving start -m ann_resnet50_voc + ``` + + - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + # 发送HTTP请求 + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## 五、更新历史 + +* 1.0.0 + + 初始发布 diff --git a/modules/image/semantic_segmentation/ann_resnet50_voc/README_en.md b/modules/image/semantic_segmentation/ann_resnet50_voc/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..2226a22d6039c5480ec097aa0dc491545bc2a8ec --- /dev/null +++ b/modules/image/semantic_segmentation/ann_resnet50_voc/README_en.md @@ -0,0 +1,182 @@ +# ann_resnet50_voc + +|Module Name|ann_resnet50_voc| +| :--- | :---: | +|Category|Image Segmentation| +|Network|ann_resnet50vd| +|Dataset|PascalVOC2012| +|Fine-tuning supported or not|Yes| +|Module Size|228MB| +|Data indicators|-| +|Latest update date|2022-03-22| + +## I. Basic Information + +- ### Application Effect Display + - Sample results: +

+ +

+ +- ### Module Introduction + + - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction. + - For more information, please refer to: [ann](https://arxiv.org/pdf/1908.07678.pdf) + +## II. Installation + +- ### 1、Environmental Dependence + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、Installation + + - ```shell + $ hub install ann_resnet50_voc + ``` + + - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) + | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md) + + +## III. Module API Prediction + +- ### 1、Prediction Code Example + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='ann_resnet50_voc') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.Fine-tune and Encapsulation + + - After completing the installation of PaddlePaddle and PaddleHub, you can start using the ann_resnet50_voc model to fine-tune datasets such as OpticDiscSeg. + + - Steps: + + - Step1: Define the data preprocessing method + + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs. + + - Step2: Download the dataset + + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + * `transforms`: data preprocessing methods. + + * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`. + + * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory. + + - Step3: Load the pre-trained model + + - ```python + import paddlehub as hub + + model = hub.Module(name='ann_resnet50_voc', num_classes=2, pretrained=None) + ``` + - `name`: model name. + - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters. + + - Step4: Optimization strategy + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + - Model prediction + + - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='ann_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - **Args** + * `images`: Image path or ndarray data with format [H, W, C], BGR. + * `visualization`: Whether to save the recognition results as picture files. + * `save_path`: Save path of the result, default is 'seg_result'. + + +## IV. Server Deployment + +- PaddleHub Serving can deploy an online service of image segmentation. + +- ### Step 1: Start PaddleHub Serving + + - Run the startup command: + + - ```shell + $ hub serving start -m ann_resnet50_voc + ``` + + - The servitization API is now deployed and the default port number is 8866. + + - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set. + +- ### Step 2: Send a predictive request + + - With a configured server, use the following lines of code to send the prediction request and obtain the result: + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/ann_resnet50_voc" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## V. Release Note + +- 1.0.0 + + First release diff --git a/modules/image/semantic_segmentation/ann_resnet50_voc/layers.py b/modules/image/semantic_segmentation/ann_resnet50_voc/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..8060d63d280962a3e99f2dd7b910d0a8bf8445eb --- /dev/null +++ b/modules/image/semantic_segmentation/ann_resnet50_voc/layers.py @@ -0,0 +1,276 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from typing import Union, List, Tuple + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.layer import activation +from paddle.nn import Conv2D, AvgPool2D + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu': + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class SeparableConvBNReLU(nn.Layer): + """Depthwise Separable Convolution.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(SeparableConvBNReLU, self).__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU( + in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBN, self).__init__() + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBNReLU, self).__init__() + + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + Returns: + A callable object of Activation. + Raises: + KeyError: When parameter `act` is not in the optional range. + Examples: + from paddleseg.models.common.activation import Activation + relu = Activation("relu") + print(relu) + # + sigmoid = Activation("sigmoid") + print(sigmoid) + # + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format( + act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + + if self._act is not None: + return self.act_func(x) + else: + return x + + +class ASPPModule(nn.Layer): + """ + Atrous Spatial Pyramid Pooling. + Args: + aspp_ratios (tuple): The dilation rate using in ASSP module. + in_channels (int): The number of input channels. + out_channels (int): The number of output channels. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False. + image_pooling (bool, optional): If augmented with image-level features. Default: False + """ + + def __init__(self, + aspp_ratios: tuple, + in_channels: int, + out_channels: int, + align_corners: bool, + use_sep_conv: bool = False, + image_pooling: bool = False): + super().__init__() + + self.align_corners = align_corners + self.aspp_blocks = nn.LayerList() + + for ratio in aspp_ratios: + if use_sep_conv and ratio > 1: + conv_func = SeparableConvBNReLU + else: + conv_func = ConvBNReLU + + block = conv_func( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1 if ratio == 1 else 3, + dilation=ratio, + padding=0 if ratio == 1 else ratio) + self.aspp_blocks.append(block) + + out_size = len(self.aspp_blocks) + + if image_pooling: + self.global_avg_pool = nn.Sequential( + nn.AdaptiveAvgPool2D(output_size=(1, 1)), + ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False)) + out_size += 1 + self.image_pooling = image_pooling + + self.conv_bn_relu = ConvBNReLU( + in_channels=out_channels * out_size, + out_channels=out_channels, + kernel_size=1) + + self.dropout = nn.Dropout(p=0.1) # drop rate + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outputs = [] + for block in self.aspp_blocks: + y = block(x) + y = F.interpolate( + y, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(y) + + if self.image_pooling: + img_avg = self.global_avg_pool(x) + img_avg = F.interpolate( + img_avg, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(img_avg) + + x = paddle.concat(outputs, axis=1) + x = self.conv_bn_relu(x) + x = self.dropout(x) + + return x + + +class AuxLayer(nn.Layer): + """ + The auxiliary layer implementation for auxiliary loss. + + Args: + in_channels (int): The number of input channels. + inter_channels (int): The intermediate channels. + out_channels (int): The number of output channels, and usually it is num_classes. + dropout_prob (float, optional): The drop rate. Default: 0.1. + """ + + def __init__(self, + in_channels: int, + inter_channels: int, + out_channels: int, + dropout_prob: float = 0.1, + **kwargs): + super().__init__() + + self.conv_bn_relu = ConvBNReLU( + in_channels=in_channels, + out_channels=inter_channels, + kernel_size=3, + padding=1, + **kwargs) + + self.dropout = nn.Dropout(p=dropout_prob) + + self.conv = nn.Conv2D( + in_channels=inter_channels, + out_channels=out_channels, + kernel_size=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv_bn_relu(x) + x = self.dropout(x) + x = self.conv(x) + return x + + +class Add(nn.Layer): + def __init__(self): + super().__init__() + + def forward(self, x: paddle.Tensor, y: paddle.Tensor, name: str = None) -> paddle.Tensor: + return paddle.add(x, y, name) diff --git a/modules/image/semantic_segmentation/ann_resnet50_voc/module.py b/modules/image/semantic_segmentation/ann_resnet50_voc/module.py new file mode 100644 index 0000000000000000000000000000000000000000..f0218dde73f6f09ef61040b55b02699829bdb7fb --- /dev/null +++ b/modules/image/semantic_segmentation/ann_resnet50_voc/module.py @@ -0,0 +1,452 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from typing import Union, List, Tuple + +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +from ann_resnet50_voc.resnet import ResNet50_vd +import ann_resnet50_voc.layers as layers + +@moduleinfo( + name="ann_resnet50_voc", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="ANNResnet50 is a segmentation model.", + version="1.0.0", + meta=ImageSegmentationModule) +class ANN(nn.Layer): + """ + The ANN implementation based on PaddlePaddle. + + The original article refers to + Zhen, Zhu, et al. "Asymmetric Non-local Neural Networks for Semantic Segmentation" + (https://arxiv.org/pdf/1908.07678.pdf). + + Args: + num_classes (int): The unique number of target classes. + backbone_indices (tuple, optional): Two values in the tuple indicate the indices of output of backbone. + key_value_channels (int, optional): The key and value channels of self-attention map in both AFNB and APNB modules. + Default: 256. + inter_channels (int, optional): Both input and output channels of APNB modules. Default: 512. + psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8). + enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True. + align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even, + e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None. + """ + + def __init__(self, + num_classes: int = 21, + backbone_indices: Tuple[int] = (2, 3), + key_value_channels: int = 256, + inter_channels: int = 512, + psp_size: Tuple[int] = (1, 3, 6, 8), + align_corners: bool = False, + pretrained: str = None): + super(ANN, self).__init__() + + self.backbone = ResNet50_vd() + backbone_channels = [ + self.backbone.feat_channels[i] for i in backbone_indices + ] + + self.head = ANNHead(num_classes, backbone_indices, backbone_channels, + key_value_channels, inter_channels, psp_size) + self.align_corners = align_corners + self.transforms = T.Compose([T.Normalize()]) + + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]: + return self.transforms(img) + + def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]: + feat_list = self.backbone(x) + logit_list = self.head(feat_list) + return [ + F.interpolate( + logit, + paddle.shape(x)[2:], + mode='bilinear', + align_corners=self.align_corners) for logit in logit_list + ] + + + +class ANNHead(nn.Layer): + """ + The ANNHead implementation. + + It mainly consists of AFNB and APNB modules. + + Args: + num_classes (int): The unique number of target classes. + backbone_indices (tuple): Two values in the tuple indicate the indices of output of backbone. + The first index will be taken as low-level features; the second one will be + taken as high-level features in AFNB module. Usually backbone consists of four + downsampling stage, such as ResNet, and return an output of each stage. If it is (2, 3), + it means taking feature map of the third stage and the fourth stage in backbone. + backbone_channels (tuple): The same length with "backbone_indices". It indicates the channels of corresponding index. + key_value_channels (int): The key and value channels of self-attention map in both AFNB and APNB modules. + inter_channels (int): Both input and output channels of APNB modules. + psp_size (tuple): The out size of pooled feature maps. + enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: False + """ + + def __init__(self, + num_classes: int, + backbone_indices: Tuple[int], + backbone_channels: Tuple[int], + key_value_channels: int, + inter_channels: int, + psp_size: Tuple[int], + enable_auxiliary_loss: bool = False): + super().__init__() + + low_in_channels = backbone_channels[0] + high_in_channels = backbone_channels[1] + + self.fusion = AFNB( + low_in_channels=low_in_channels, + high_in_channels=high_in_channels, + out_channels=high_in_channels, + key_channels=key_value_channels, + value_channels=key_value_channels, + dropout_prob=0.05, + repeat_sizes=([1]), + psp_size=psp_size) + + self.context = nn.Sequential( + layers.ConvBNReLU( + in_channels=high_in_channels, + out_channels=inter_channels, + kernel_size=3, + padding=1), + APNB( + in_channels=inter_channels, + out_channels=inter_channels, + key_channels=key_value_channels, + value_channels=key_value_channels, + dropout_prob=0.05, + repeat_sizes=([1]), + psp_size=psp_size)) + + self.cls = nn.Conv2D( + in_channels=inter_channels, out_channels=num_classes, kernel_size=1) + self.auxlayer = layers.AuxLayer( + in_channels=low_in_channels, + inter_channels=low_in_channels // 2, + out_channels=num_classes, + dropout_prob=0.05) + + self.backbone_indices = backbone_indices + self.enable_auxiliary_loss = enable_auxiliary_loss + + def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]: + logit_list = [] + low_level_x = feat_list[self.backbone_indices[0]] + high_level_x = feat_list[self.backbone_indices[1]] + x = self.fusion(low_level_x, high_level_x) + x = self.context(x) + logit = self.cls(x) + logit_list.append(logit) + + if self.enable_auxiliary_loss: + auxiliary_logit = self.auxlayer(low_level_x) + logit_list.append(auxiliary_logit) + + return logit_list + + +class AFNB(nn.Layer): + """ + Asymmetric Fusion Non-local Block. + + Args: + low_in_channels (int): Low-level-feature channels. + high_in_channels (int): High-level-feature channels. + out_channels (int): Out channels of AFNB module. + key_channels (int): The key channels in self-attention block. + value_channels (int): The value channels in self-attention block. + dropout_prob (float): The dropout rate of output. + repeat_sizes (tuple, optional): The number of AFNB modules. Default: ([1]). + psp_size (tuple. optional): The out size of pooled feature maps. Default: (1, 3, 6, 8). + """ + + def __init__(self, + low_in_channels: int, + high_in_channels: int, + out_channels: int, + key_channels: int, + value_channels: int, + dropout_prob: float, + repeat_sizes: Tuple[int] = ([1]), + psp_size: Tuple[int] = (1, 3, 6, 8)): + super().__init__() + + self.psp_size = psp_size + self.stages = nn.LayerList([ + SelfAttentionBlock_AFNB(low_in_channels, high_in_channels, + key_channels, value_channels, out_channels, + size) for size in repeat_sizes + ]) + self.conv_bn = layers.ConvBN( + in_channels=out_channels + high_in_channels, + out_channels=out_channels, + kernel_size=1) + self.dropout = nn.Dropout(p=dropout_prob) + + def forward(self, low_feats: List[paddle.Tensor], high_feats: List[paddle.Tensor]) -> paddle.Tensor: + priors = [stage(low_feats, high_feats) for stage in self.stages] + context = priors[0] + for i in range(1, len(priors)): + context += priors[i] + + output = self.conv_bn(paddle.concat([context, high_feats], axis=1)) + output = self.dropout(output) + + return output + + +class APNB(nn.Layer): + """ + Asymmetric Pyramid Non-local Block. + + Args: + in_channels (int): The input channels of APNB module. + out_channels (int): Out channels of APNB module. + key_channels (int): The key channels in self-attention block. + value_channels (int): The value channels in self-attention block. + dropout_prob (float): The dropout rate of output. + repeat_sizes (tuple, optional): The number of AFNB modules. Default: ([1]). + psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8). + """ + + def __init__(self, + in_channels: int, + out_channels: int, + key_channels: int, + value_channels: int, + dropout_prob: float, + repeat_sizes: Tuple[int] = ([1]), + psp_size: Tuple[int] = (1, 3, 6, 8)): + super().__init__() + + self.psp_size = psp_size + self.stages = nn.LayerList([ + SelfAttentionBlock_APNB(in_channels, out_channels, key_channels, + value_channels, size) + for size in repeat_sizes + ]) + self.conv_bn = layers.ConvBNReLU( + in_channels=in_channels * 2, + out_channels=out_channels, + kernel_size=1) + self.dropout = nn.Dropout(p=dropout_prob) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + priors = [stage(x) for stage in self.stages] + context = priors[0] + for i in range(1, len(priors)): + context += priors[i] + + output = self.conv_bn(paddle.concat([context, x], axis=1)) + output = self.dropout(output) + + return output + + +def _pp_module(x: paddle.Tensor, psp_size: List[int]) -> paddle.Tensor: + n, c, h, w = x.shape + priors = [] + for size in psp_size: + feat = F.adaptive_avg_pool2d(x, size) + feat = paddle.reshape(feat, shape=(0, c, -1)) + priors.append(feat) + center = paddle.concat(priors, axis=-1) + return center + + +class SelfAttentionBlock_AFNB(nn.Layer): + """ + Self-Attention Block for AFNB module. + + Args: + low_in_channels (int): Low-level-feature channels. + high_in_channels (int): High-level-feature channels. + key_channels (int): The key channels in self-attention block. + value_channels (int): The value channels in self-attention block. + out_channels (int, optional): Out channels of AFNB module. Default: None. + scale (int, optional): Pooling size. Default: 1. + psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8). + """ + + def __init__(self, + low_in_channels: int, + high_in_channels: int, + key_channels: int, + value_channels: int, + out_channels: int = None, + scale: int = 1, + psp_size: Tuple[int] = (1, 3, 6, 8)): + super().__init__() + + self.scale = scale + self.in_channels = low_in_channels + self.out_channels = out_channels + self.key_channels = key_channels + self.value_channels = value_channels + if out_channels == None: + self.out_channels = high_in_channels + self.pool = nn.MaxPool2D(scale) + self.f_key = layers.ConvBNReLU( + in_channels=low_in_channels, + out_channels=key_channels, + kernel_size=1) + self.f_query = layers.ConvBNReLU( + in_channels=high_in_channels, + out_channels=key_channels, + kernel_size=1) + self.f_value = nn.Conv2D( + in_channels=low_in_channels, + out_channels=value_channels, + kernel_size=1) + + self.W = nn.Conv2D( + in_channels=value_channels, + out_channels=out_channels, + kernel_size=1) + + self.psp_size = psp_size + + def forward(self, low_feats: List[paddle.Tensor], high_feats: List[paddle.Tensor]) -> paddle.Tensor: + batch_size, _, h, w = high_feats.shape + + value = self.f_value(low_feats) + value = _pp_module(value, self.psp_size) + value = paddle.transpose(value, (0, 2, 1)) + + query = self.f_query(high_feats) + query = paddle.reshape(query, shape=(0, self.key_channels, -1)) + query = paddle.transpose(query, perm=(0, 2, 1)) + + key = self.f_key(low_feats) + key = _pp_module(key, self.psp_size) + + sim_map = paddle.matmul(query, key) + sim_map = (self.key_channels**-.5) * sim_map + sim_map = F.softmax(sim_map, axis=-1) + + context = paddle.matmul(sim_map, value) + context = paddle.transpose(context, perm=(0, 2, 1)) + hf_shape = paddle.shape(high_feats) + context = paddle.reshape( + context, shape=[0, self.value_channels, hf_shape[2], hf_shape[3]]) + + context = self.W(context) + + return context + + +class SelfAttentionBlock_APNB(nn.Layer): + """ + Self-Attention Block for APNB module. + + Args: + in_channels (int): The input channels of APNB module. + out_channels (int): The out channels of APNB module. + key_channels (int): The key channels in self-attention block. + value_channels (int): The value channels in self-attention block. + scale (int, optional): Pooling size. Default: 1. + psp_size (tuple, optional): The out size of pooled feature maps. Default: (1, 3, 6, 8). + """ + + def __init__(self, + in_channels: int, + out_channels: int, + key_channels: int, + value_channels: int, + scale: int = 1, + psp_size: Tuple[int] = (1, 3, 6, 8)): + super().__init__() + + self.scale = scale + self.in_channels = in_channels + self.out_channels = out_channels + self.key_channels = key_channels + self.value_channels = value_channels + self.pool = nn.MaxPool2D(scale) + self.f_key = layers.ConvBNReLU( + in_channels=self.in_channels, + out_channels=self.key_channels, + kernel_size=1) + self.f_query = self.f_key + self.f_value = nn.Conv2D( + in_channels=self.in_channels, + out_channels=self.value_channels, + kernel_size=1) + self.W = nn.Conv2D( + in_channels=self.value_channels, + out_channels=self.out_channels, + kernel_size=1) + + self.psp_size = psp_size + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + batch_size, _, h, w = x.shape + if self.scale > 1: + x = self.pool(x) + + value = self.f_value(x) + value = _pp_module(value, self.psp_size) + value = paddle.transpose(value, perm=(0, 2, 1)) + + query = self.f_query(x) + query = paddle.reshape(query, shape=(0, self.key_channels, -1)) + query = paddle.transpose(query, perm=(0, 2, 1)) + + key = self.f_key(x) + key = _pp_module(key, self.psp_size) + + sim_map = paddle.matmul(query, key) + sim_map = (self.key_channels**-.5) * sim_map + sim_map = F.softmax(sim_map, axis=-1) + + context = paddle.matmul(sim_map, value) + context = paddle.transpose(context, perm=(0, 2, 1)) + + x_shape = paddle.shape(x) + context = paddle.reshape( + context, shape=[0, self.value_channels, x_shape[2], x_shape[3]]) + context = self.W(context) + + return context diff --git a/modules/image/semantic_segmentation/ann_resnet50_voc/resnet.py b/modules/image/semantic_segmentation/ann_resnet50_voc/resnet.py new file mode 100644 index 0000000000000000000000000000000000000000..949f180ce7b9a408f4583df714d0fc93271b8f99 --- /dev/null +++ b/modules/image/semantic_segmentation/ann_resnet50_voc/resnet.py @@ -0,0 +1,361 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from typing import Union, List, Tuple + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +import ann_resnet50_voc.layers as layers + + +class ConvBNLayer(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + stride: int = 1, + dilation: int = 1, + groups: int = 1, + is_vd_mode: bool = False, + act: str = None, + data_format: str = 'NCHW'): + super(ConvBNLayer, self).__init__() + if dilation != 1 and kernel_size != 3: + raise RuntimeError("When the dilation isn't 1," \ + "the kernel_size should be 3.") + + self.is_vd_mode = is_vd_mode + self._pool2d_avg = nn.AvgPool2D( + kernel_size=2, + stride=2, + padding=0, + ceil_mode=True, + data_format=data_format) + self._conv = nn.Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2 \ + if dilation == 1 else dilation, + dilation=dilation, + groups=groups, + bias_attr=False, + data_format=data_format) + + self._batch_norm = layers.SyncBatchNorm( + out_channels, data_format=data_format) + self._act_op = layers.Activation(act=act) + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + if self.is_vd_mode: + inputs = self._pool2d_avg(inputs) + y = self._conv(inputs) + y = self._batch_norm(y) + y = self._act_op(y) + + return y + + +class BottleneckBlock(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + shortcut: bool = True, + if_first: bool = False, + dilation: int = 1, + data_format: str = 'NCHW'): + super(BottleneckBlock, self).__init__() + + self.data_format = data_format + self.conv0 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + act='relu', + data_format=data_format) + + self.dilation = dilation + + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + act='relu', + dilation=dilation, + data_format=data_format) + self.conv2 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels * 4, + kernel_size=1, + act=None, + data_format=data_format) + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels * 4, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + data_format=data_format) + + self.shortcut = shortcut + # NOTE: Use the wrap layer for quantization training + self.add = layers.Add() + self.relu = layers.Activation(act="relu") + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + conv1 = self.conv1(y) + conv2 = self.conv2(conv1) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + + y = self.add(short, conv2) + y = self.relu(y) + return y + + +class BasicBlock(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + dilation: int = 1, + shortcut: bool = True, + if_first: bool = False, + data_format: str = 'NCHW'): + super(BasicBlock, self).__init__() + self.conv0 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + dilation=dilation, + act='relu', + data_format=data_format) + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + dilation=dilation, + act=None, + data_format=data_format) + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + data_format=data_format) + + self.shortcut = shortcut + self.dilation = dilation + self.data_format = data_format + self.add = layers.Add() + self.relu = layers.Activation(act="relu") + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + conv1 = self.conv1(y) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + y = self.add(short, conv1) + y = self.relu(y) + + return y + + +class ResNet_vd(nn.Layer): + """ + The ResNet_vd implementation based on PaddlePaddle. + + The original article refers to Jingdong + Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks" + (https://arxiv.org/pdf/1812.01187.pdf). + + Args: + layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50. + output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8. + multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1). + pretrained (str, optional): The path of pretrained model. + + """ + + def __init__(self, + layers: int = 50, + output_stride: int = 8, + multi_grid: Tuple[int]=(1, 1, 1), + pretrained: str = None, + data_format: str = 'NCHW'): + super(ResNet_vd, self).__init__() + + self.data_format = data_format + self.conv1_logit = None # for gscnn shape stream + self.layers = layers + supported_layers = [18, 34, 50, 101, 152, 200] + assert layers in supported_layers, \ + "supported layers are {} but input layer is {}".format( + supported_layers, layers) + + if layers == 18: + depth = [2, 2, 2, 2] + elif layers == 34 or layers == 50: + depth = [3, 4, 6, 3] + elif layers == 101: + depth = [3, 4, 23, 3] + elif layers == 152: + depth = [3, 8, 36, 3] + elif layers == 200: + depth = [3, 12, 48, 3] + num_channels = [64, 256, 512, 1024 + ] if layers >= 50 else [64, 64, 128, 256] + num_filters = [64, 128, 256, 512] + + # for channels of four returned stages + self.feat_channels = [c * 4 for c in num_filters + ] if layers >= 50 else num_filters + + dilation_dict = None + if output_stride == 8: + dilation_dict = {2: 2, 3: 4} + elif output_stride == 16: + dilation_dict = {3: 2} + + self.conv1_1 = ConvBNLayer( + in_channels=3, + out_channels=32, + kernel_size=3, + stride=2, + act='relu', + data_format=data_format) + self.conv1_2 = ConvBNLayer( + in_channels=32, + out_channels=32, + kernel_size=3, + stride=1, + act='relu', + data_format=data_format) + self.conv1_3 = ConvBNLayer( + in_channels=32, + out_channels=64, + kernel_size=3, + stride=1, + act='relu', + data_format=data_format) + self.pool2d_max = nn.MaxPool2D( + kernel_size=3, stride=2, padding=1, data_format=data_format) + + # self.block_list = [] + self.stage_list = [] + if layers >= 50: + for block in range(len(depth)): + shortcut = False + block_list = [] + for i in range(depth[block]): + if layers in [101, 152] and block == 2: + if i == 0: + conv_name = "res" + str(block + 2) + "a" + else: + conv_name = "res" + str(block + 2) + "b" + str(i) + else: + conv_name = "res" + str(block + 2) + chr(97 + i) + + ############################################################################### + # Add dilation rate for some segmentation tasks, if dilation_dict is not None. + dilation_rate = dilation_dict[ + block] if dilation_dict and block in dilation_dict else 1 + + # Actually block here is 'stage', and i is 'block' in 'stage' + # At the stage 4, expand the the dilation_rate if given multi_grid + if block == 3: + dilation_rate = dilation_rate * multi_grid[i] + ############################################################################### + + bottleneck_block = self.add_sublayer( + 'bb_%d_%d' % (block, i), + BottleneckBlock( + in_channels=num_channels[block] + if i == 0 else num_filters[block] * 4, + out_channels=num_filters[block], + stride=2 if i == 0 and block != 0 + and dilation_rate == 1 else 1, + shortcut=shortcut, + if_first=block == i == 0, + dilation=dilation_rate, + data_format=data_format)) + + block_list.append(bottleneck_block) + shortcut = True + self.stage_list.append(block_list) + else: + for block in range(len(depth)): + shortcut = False + block_list = [] + for i in range(depth[block]): + dilation_rate = dilation_dict[block] \ + if dilation_dict and block in dilation_dict else 1 + if block == 3: + dilation_rate = dilation_rate * multi_grid[i] + + basic_block = self.add_sublayer( + 'bb_%d_%d' % (block, i), + BasicBlock( + in_channels=num_channels[block] + if i == 0 else num_filters[block], + out_channels=num_filters[block], + stride=2 if i == 0 and block != 0 \ + and dilation_rate == 1 else 1, + dilation=dilation_rate, + shortcut=shortcut, + if_first=block == i == 0, + data_format=data_format)) + block_list.append(basic_block) + shortcut = True + self.stage_list.append(block_list) + + self.pretrained = pretrained + + def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]: + y = self.conv1_1(inputs) + y = self.conv1_2(y) + y = self.conv1_3(y) + self.conv1_logit = y.clone() + y = self.pool2d_max(y) + + # A feature list saves the output feature map of each stage. + feat_list = [] + for stage in self.stage_list: + for block in stage: + y = block(y) + feat_list.append(y) + + return feat_list + + +def ResNet50_vd(**args): + model = ResNet_vd(layers=50, **args) + return model diff --git a/modules/image/semantic_segmentation/danet_resnet50_cityscapes/README.md b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a88ce5a828e997bd5954d7d254a49b134b9859c3 --- /dev/null +++ b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/README.md @@ -0,0 +1,182 @@ +# danet_resnet50_cityscapes + +|模型名称|danet_resnet50_cityscapes| +| :--- | :---: | +|类别|图像-图像分割| +|网络|danet_resnet50vd| +|数据集|Cityscapes| +|是否支持Fine-tuning|是| +|模型大小|272MB| +|指标|-| +|最新更新日期|2022-03-21| + +## 一、模型基本信息 + + - 样例结果示例: +

+ +

+ +- ### 模型介绍 + + - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。 + - 更多详情请参考:[ann](https://arxiv.org/pdf/1908.07678.pdf) + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、安装 + + - ```shell + $ hub install danet_resnet50_cityscapes + ``` + + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + +## 三、模型API预测 + +- ### 1.预测代码示例 + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='danet_resnet50_cityscapes') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.如何开始Fine-tune + + - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用danet_resnet50_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下: + + - 代码步骤 + + - Step1: 定义数据预处理方式 + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + + - Step2: 下载数据集并使用 + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + - `transforms`: 数据预处理方式。 + - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + + - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + + - Step3: 加载预训练模型 + + - ```python + import paddlehub as hub + + model = hub.Module(name='danet_resnet50_cityscapes', num_classes=2, pretrained=None) + ``` + - `name`: 选择预训练模型的名字。 + - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + + - Step4: 选择优化策略和运行配置 + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + - 模型预测 + + - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='danet_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - 参数配置正确后,请执行脚本`python predict.py`。 + + - **Args** + * `images`:原始图像路径或BGR格式图片; + * `visualization`: 是否可视化,默认为True; + * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + + **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 四、服务部署 + +- PaddleHub Serving可以部署一个在线图像分割服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + + - ```shell + $ hub serving start -m danet_resnet50_cityscapes + ``` + + - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + # 发送HTTP请求 + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## 五、更新历史 + +* 1.0.0 + + 初始发布 diff --git a/modules/image/semantic_segmentation/danet_resnet50_cityscapes/README_en.md b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..9794b0f3aca16cda1c22a838caebb8b8a010b69b --- /dev/null +++ b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/README_en.md @@ -0,0 +1,182 @@ +# danet_resnet50_cityscapes + +|Module Name|danet_resnet50_cityscapes| +| :--- | :---: | +|Category|Image Segmentation| +|Network|danet_resnet50vd| +|Dataset|Cityscapes| +|Fine-tuning supported or not|Yes| +|Module Size|272MB| +|Data indicators|-| +|Latest update date|2022-03-21| + +## I. Basic Information + +- ### Application Effect Display + - Sample results: +

+ +

+ +- ### Module Introduction + + - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction. + - For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160) + +## II. Installation + +- ### 1、Environmental Dependence + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、Installation + + - ```shell + $ hub install danet_resnet50_cityscapes + ``` + + - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) + | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md) + + +## III. Module API Prediction + +- ### 1、Prediction Code Example + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='danet_resnet50_cityscapes') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.Fine-tune and Encapsulation + + - After completing the installation of PaddlePaddle and PaddleHub, you can start using the danet_resnet50_cityscapes model to fine-tune datasets such as OpticDiscSeg. + + - Steps: + + - Step1: Define the data preprocessing method + + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs. + + - Step2: Download the dataset + + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + * `transforms`: data preprocessing methods. + + * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`. + + * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory. + + - Step3: Load the pre-trained model + + - ```python + import paddlehub as hub + + model = hub.Module(name='danet_resnet50_cityscapes', num_classes=2, pretrained=None) + ``` + - `name`: model name. + - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters. + + - Step4: Optimization strategy + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + - Model prediction + + - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='danet_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - **Args** + * `images`: Image path or ndarray data with format [H, W, C], BGR. + * `visualization`: Whether to save the recognition results as picture files. + * `save_path`: Save path of the result, default is 'seg_result'. + + +## IV. Server Deployment + +- PaddleHub Serving can deploy an online service of image segmentation. + +- ### Step 1: Start PaddleHub Serving + + - Run the startup command: + + - ```shell + $ hub serving start -m danet_resnet50_cityscapes + ``` + + - The servitization API is now deployed and the default port number is 8866. + + - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set. + +- ### Step 2: Send a predictive request + + - With a configured server, use the following lines of code to send the prediction request and obtain the result: + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/danet_resnet50_cityscapes" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## V. Release Note + +- 1.0.0 + + First release diff --git a/modules/image/semantic_segmentation/danet_resnet50_cityscapes/layers.py b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..b6d7c005ef6d498c70536c7e8db049d64ea3223f --- /dev/null +++ b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/layers.py @@ -0,0 +1,349 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.layer import activation +from paddle.nn import Conv2D, AvgPool2D + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu': + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class ConvBNLayer(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__( + self, + in_channels: int, + out_channels: int, + kernel_size: int, + stride: int = 1, + dilation: int = 1, + groups: int = 1, + is_vd_mode: bool = False, + act: str = None, + name: str = None): + super(ConvBNLayer, self).__init__() + + self.is_vd_mode = is_vd_mode + self._pool2d_avg = AvgPool2D( + kernel_size=2, stride=2, padding=0, ceil_mode=True) + self._conv = Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2 if dilation == 1 else 0, + dilation=dilation, + groups=groups, + bias_attr=False) + + self._batch_norm = SyncBatchNorm(out_channels) + self._act_op = Activation(act=act) + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + if self.is_vd_mode: + inputs = self._pool2d_avg(inputs) + y = self._conv(inputs) + y = self._batch_norm(y) + y = self._act_op(y) + + return y + + +class BottleneckBlock(nn.Layer): + """Residual bottleneck block""" + + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + shortcut: bool = True, + if_first: bool = False, + dilation: int = 1, + name: str = None): + super(BottleneckBlock, self).__init__() + + self.conv0 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + act='relu', + name=name + "_branch2a") + + self.dilation = dilation + + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + act='relu', + dilation=dilation, + name=name + "_branch2b") + self.conv2 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels * 4, + kernel_size=1, + act=None, + name=name + "_branch2c") + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels * 4, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + name=name + "_branch1") + + self.shortcut = shortcut + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + if self.dilation > 1: + padding = self.dilation + y = F.pad(y, [padding, padding, padding, padding]) + + conv1 = self.conv1(y) + conv2 = self.conv2(conv1) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + + y = paddle.add(x=short, y=conv2) + y = F.relu(y) + return y + + +class SeparableConvBNReLU(nn.Layer): + """Depthwise Separable Convolution.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(SeparableConvBNReLU, self).__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU( + in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBN, self).__init__() + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBNReLU, self).__init__() + + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + + Returns: + A callable object of Activation. + + Raises: + KeyError: When parameter `act` is not in the optional range. + + Examples: + + from paddleseg.models.common.activation import Activation + + relu = Activation("relu") + print(relu) + # + + sigmoid = Activation("sigmoid") + print(sigmoid) + # + + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format( + act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + + if self._act is not None: + return self.act_func(x) + else: + return x + + +class ASPPModule(nn.Layer): + """ + Atrous Spatial Pyramid Pooling. + + Args: + aspp_ratios (tuple): The dilation rate using in ASSP module. + in_channels (int): The number of input channels. + out_channels (int): The number of output channels. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False. + image_pooling (bool, optional): If augmented with image-level features. Default: False + """ + + def __init__(self, + aspp_ratios: tuple, + in_channels: int, + out_channels: int, + align_corners: bool, + use_sep_conv: bool= False, + image_pooling: bool = False): + super().__init__() + + self.align_corners = align_corners + self.aspp_blocks = nn.LayerList() + + for ratio in aspp_ratios: + if use_sep_conv and ratio > 1: + conv_func = SeparableConvBNReLU + else: + conv_func = ConvBNReLU + + block = conv_func( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1 if ratio == 1 else 3, + dilation=ratio, + padding=0 if ratio == 1 else ratio) + self.aspp_blocks.append(block) + + out_size = len(self.aspp_blocks) + + if image_pooling: + self.global_avg_pool = nn.Sequential( + nn.AdaptiveAvgPool2D(output_size=(1, 1)), + ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False)) + out_size += 1 + self.image_pooling = image_pooling + + self.conv_bn_relu = ConvBNReLU( + in_channels=out_channels * out_size, + out_channels=out_channels, + kernel_size=1) + + self.dropout = nn.Dropout(p=0.1) # drop rate + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outputs = [] + for block in self.aspp_blocks: + y = block(x) + y = F.interpolate( + y, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(y) + + if self.image_pooling: + img_avg = self.global_avg_pool(x) + img_avg = F.interpolate( + img_avg, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(img_avg) + + x = paddle.concat(outputs, axis=1) + x = self.conv_bn_relu(x) + x = self.dropout(x) + + return x + + + + diff --git a/modules/image/semantic_segmentation/danet_resnet50_cityscapes/module.py b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/module.py new file mode 100644 index 0000000000000000000000000000000000000000..9bb6e562631292793de2ab1a5a933d207f9f95cf --- /dev/null +++ b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/module.py @@ -0,0 +1,239 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from typing import Union, List, Tuple + +import paddle +from paddle import nn +import paddle.nn.functional as F +import numpy as np +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +from danet_resnet50_voc.resnet import ResNet50_vd +import danet_resnet50_voc.layers as L + + +@moduleinfo( + name="danet_resnet50_cityscapes", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="DANetResnet50 is a segmentation model.", + version="1.0.0", + meta=ImageSegmentationModule) +class DANet(nn.Layer): + """ + The DANet implementation based on PaddlePaddle. + + The original article refers to + Fu, jun, et al. "Dual Attention Network for Scene Segmentation" + (https://arxiv.org/pdf/1809.02983.pdf) + + Args: + num_classes (int): The unique number of target classes. + backbone (Paddle.nn.Layer): A backbone network. + backbone_indices (tuple): The values in the tuple indicate the indices of + output of backbone. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None. + """ + + def __init__(self, + num_classes: int = 19, + backbone_indices: Tuple[int] = (2, 3), + align_corners: bool = False, + pretrained: str = None): + super(DANet, self).__init__() + + self.backbone = ResNet50_vd() + self.backbone_indices = backbone_indices + in_channels = [self.backbone.feat_channels[i] for i in backbone_indices] + + self.head = DAHead(num_classes=num_classes, in_channels=in_channels) + + self.align_corners = align_corners + self.transforms = T.Compose([T.Normalize()]) + + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]: + feats = self.backbone(x) + feats = [feats[i] for i in self.backbone_indices] + logit_list = self.head(feats) + if not self.training: + logit_list = [logit_list[0]] + + logit_list = [ + F.interpolate( + logit, + paddle.shape(x)[2:], + mode='bilinear', + align_corners=self.align_corners, + align_mode=1) for logit in logit_list + ] + return logit_list + + def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]: + return self.transforms(img) + + +class DAHead(nn.Layer): + """ + The Dual attention head. + + Args: + num_classes (int): The unique number of target classes. + in_channels (tuple): The number of input channels. + """ + + def __init__(self, num_classes: int, in_channels: int): + super().__init__() + in_channels = in_channels[-1] + inter_channels = in_channels // 4 + + self.channel_conv = L.ConvBNReLU(in_channels, inter_channels, 3) + self.position_conv = L.ConvBNReLU(in_channels, inter_channels, 3) + self.pam = PAM(inter_channels) + self.cam = CAM(inter_channels) + self.conv1 = L.ConvBNReLU(inter_channels, inter_channels, 3) + self.conv2 = L.ConvBNReLU(inter_channels, inter_channels, 3) + + self.aux_head = nn.Sequential( + nn.Dropout2D(0.1), nn.Conv2D(in_channels, num_classes, 1)) + + self.aux_head_pam = nn.Sequential( + nn.Dropout2D(0.1), nn.Conv2D(inter_channels, num_classes, 1)) + + self.aux_head_cam = nn.Sequential( + nn.Dropout2D(0.1), nn.Conv2D(inter_channels, num_classes, 1)) + + self.cls_head = nn.Sequential( + nn.Dropout2D(0.1), nn.Conv2D(inter_channels, num_classes, 1)) + + def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]: + feats = feat_list[-1] + channel_feats = self.channel_conv(feats) + channel_feats = self.cam(channel_feats) + channel_feats = self.conv1(channel_feats) + + position_feats = self.position_conv(feats) + position_feats = self.pam(position_feats) + position_feats = self.conv2(position_feats) + + feats_sum = position_feats + channel_feats + logit = self.cls_head(feats_sum) + + if not self.training: + return [logit] + + cam_logit = self.aux_head_cam(channel_feats) + pam_logit = self.aux_head_cam(position_feats) + aux_logit = self.aux_head(feats) + return [logit, cam_logit, pam_logit, aux_logit] + + +class PAM(nn.Layer): + """Position attention module.""" + + def __init__(self, in_channels: int): + super().__init__() + mid_channels = in_channels // 8 + self.mid_channels = mid_channels + self.in_channels = in_channels + + self.query_conv = nn.Conv2D(in_channels, mid_channels, 1, 1) + self.key_conv = nn.Conv2D(in_channels, mid_channels, 1, 1) + self.value_conv = nn.Conv2D(in_channels, in_channels, 1, 1) + + self.gamma = self.create_parameter( + shape=[1], + dtype='float32', + default_initializer=nn.initializer.Constant(0)) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x_shape = paddle.shape(x) + + # query: n, h * w, c1 + query = self.query_conv(x) + query = paddle.reshape(query, (0, self.mid_channels, -1)) + query = paddle.transpose(query, (0, 2, 1)) + + # key: n, c1, h * w + key = self.key_conv(x) + key = paddle.reshape(key, (0, self.mid_channels, -1)) + + # sim: n, h * w, h * w + sim = paddle.bmm(query, key) + sim = F.softmax(sim, axis=-1) + + value = self.value_conv(x) + value = paddle.reshape(value, (0, self.in_channels, -1)) + sim = paddle.transpose(sim, (0, 2, 1)) + + # feat: from (n, c2, h * w) -> (n, c2, h, w) + feat = paddle.bmm(value, sim) + feat = paddle.reshape(feat, + (0, self.in_channels, x_shape[2], x_shape[3])) + + out = self.gamma * feat + x + return out + + +class CAM(nn.Layer): + """Channel attention module.""" + + def __init__(self, channels: int): + super().__init__() + + self.channels = channels + self.gamma = self.create_parameter( + shape=[1], + dtype='float32', + default_initializer=nn.initializer.Constant(0)) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x_shape = paddle.shape(x) + # query: n, c, h * w + query = paddle.reshape(x, (0, self.channels, -1)) + # key: n, h * w, c + key = paddle.reshape(x, (0, self.channels, -1)) + key = paddle.transpose(key, (0, 2, 1)) + + # sim: n, c, c + sim = paddle.bmm(query, key) + # The danet author claims that this can avoid gradient divergence + sim = paddle.max( + sim, axis=-1, keepdim=True).tile([1, 1, self.channels]) - sim + sim = F.softmax(sim, axis=-1) + + # feat: from (n, c, h * w) to (n, c, h, w) + value = paddle.reshape(x, (0, self.channels, -1)) + feat = paddle.bmm(sim, value) + feat = paddle.reshape(feat, (0, self.channels, x_shape[2], x_shape[3])) + + out = self.gamma * feat + x + return out diff --git a/modules/image/semantic_segmentation/danet_resnet50_cityscapes/resnet.py b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/resnet.py new file mode 100644 index 0000000000000000000000000000000000000000..12102a3fed8e810046e2d40a8796f93175d459fb --- /dev/null +++ b/modules/image/semantic_segmentation/danet_resnet50_cityscapes/resnet.py @@ -0,0 +1,359 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from typing import Union, List, Tuple + +import paddle.nn as nn +import ann_resnet50_voc.layers as layers + + +class ConvBNLayer(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + stride: int = 1, + dilation: int = 1, + groups: int = 1, + is_vd_mode: bool = False, + act: str = None, + data_format: str = 'NCHW'): + super(ConvBNLayer, self).__init__() + if dilation != 1 and kernel_size != 3: + raise RuntimeError("When the dilation isn't 1," \ + "the kernel_size should be 3.") + + self.is_vd_mode = is_vd_mode + self._pool2d_avg = nn.AvgPool2D( + kernel_size=2, + stride=2, + padding=0, + ceil_mode=True, + data_format=data_format) + self._conv = nn.Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2 \ + if dilation == 1 else dilation, + dilation=dilation, + groups=groups, + bias_attr=False, + data_format=data_format) + + self._batch_norm = layers.SyncBatchNorm( + out_channels, data_format=data_format) + self._act_op = layers.Activation(act=act) + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + if self.is_vd_mode: + inputs = self._pool2d_avg(inputs) + y = self._conv(inputs) + y = self._batch_norm(y) + y = self._act_op(y) + + return y + + +class BottleneckBlock(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + shortcut: bool = True, + if_first: bool = False, + dilation: int = 1, + data_format: str = 'NCHW'): + super(BottleneckBlock, self).__init__() + + self.data_format = data_format + self.conv0 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + act='relu', + data_format=data_format) + + self.dilation = dilation + + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + act='relu', + dilation=dilation, + data_format=data_format) + self.conv2 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels * 4, + kernel_size=1, + act=None, + data_format=data_format) + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels * 4, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + data_format=data_format) + + self.shortcut = shortcut + # NOTE: Use the wrap layer for quantization training + self.add = layers.Add() + self.relu = layers.Activation(act="relu") + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + conv1 = self.conv1(y) + conv2 = self.conv2(conv1) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + + y = self.add(short, conv2) + y = self.relu(y) + return y + + +class BasicBlock(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + dilation: int = 1, + shortcut: bool = True, + if_first: bool = False, + data_format: str = 'NCHW'): + super(BasicBlock, self).__init__() + self.conv0 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + dilation=dilation, + act='relu', + data_format=data_format) + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + dilation=dilation, + act=None, + data_format=data_format) + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + data_format=data_format) + + self.shortcut = shortcut + self.dilation = dilation + self.data_format = data_format + self.add = layers.Add() + self.relu = layers.Activation(act="relu") + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + conv1 = self.conv1(y) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + y = self.add(short, conv1) + y = self.relu(y) + + return y + + +class ResNet_vd(nn.Layer): + """ + The ResNet_vd implementation based on PaddlePaddle. + + The original article refers to Jingdong + Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks" + (https://arxiv.org/pdf/1812.01187.pdf). + + Args: + layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50. + output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8. + multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1). + pretrained (str, optional): The path of pretrained model. + + """ + + def __init__(self, + layers: int = 50, + output_stride: int = 8, + multi_grid: Tuple[int] = (1, 1, 1), + pretrained: str = None, + data_format: str = 'NCHW'): + super(ResNet_vd, self).__init__() + + self.data_format = data_format + self.conv1_logit = None # for gscnn shape stream + self.layers = layers + supported_layers = [18, 34, 50, 101, 152, 200] + assert layers in supported_layers, \ + "supported layers are {} but input layer is {}".format( + supported_layers, layers) + + if layers == 18: + depth = [2, 2, 2, 2] + elif layers == 34 or layers == 50: + depth = [3, 4, 6, 3] + elif layers == 101: + depth = [3, 4, 23, 3] + elif layers == 152: + depth = [3, 8, 36, 3] + elif layers == 200: + depth = [3, 12, 48, 3] + num_channels = [64, 256, 512, 1024 + ] if layers >= 50 else [64, 64, 128, 256] + num_filters = [64, 128, 256, 512] + + # for channels of four returned stages + self.feat_channels = [c * 4 for c in num_filters + ] if layers >= 50 else num_filters + + dilation_dict = None + if output_stride == 8: + dilation_dict = {2: 2, 3: 4} + elif output_stride == 16: + dilation_dict = {3: 2} + + self.conv1_1 = ConvBNLayer( + in_channels=3, + out_channels=32, + kernel_size=3, + stride=2, + act='relu', + data_format=data_format) + self.conv1_2 = ConvBNLayer( + in_channels=32, + out_channels=32, + kernel_size=3, + stride=1, + act='relu', + data_format=data_format) + self.conv1_3 = ConvBNLayer( + in_channels=32, + out_channels=64, + kernel_size=3, + stride=1, + act='relu', + data_format=data_format) + self.pool2d_max = nn.MaxPool2D( + kernel_size=3, stride=2, padding=1, data_format=data_format) + + # self.block_list = [] + self.stage_list = [] + if layers >= 50: + for block in range(len(depth)): + shortcut = False + block_list = [] + for i in range(depth[block]): + if layers in [101, 152] and block == 2: + if i == 0: + conv_name = "res" + str(block + 2) + "a" + else: + conv_name = "res" + str(block + 2) + "b" + str(i) + else: + conv_name = "res" + str(block + 2) + chr(97 + i) + + ############################################################################### + # Add dilation rate for some segmentation tasks, if dilation_dict is not None. + dilation_rate = dilation_dict[ + block] if dilation_dict and block in dilation_dict else 1 + + # Actually block here is 'stage', and i is 'block' in 'stage' + # At the stage 4, expand the the dilation_rate if given multi_grid + if block == 3: + dilation_rate = dilation_rate * multi_grid[i] + ############################################################################### + + bottleneck_block = self.add_sublayer( + 'bb_%d_%d' % (block, i), + BottleneckBlock( + in_channels=num_channels[block] + if i == 0 else num_filters[block] * 4, + out_channels=num_filters[block], + stride=2 if i == 0 and block != 0 + and dilation_rate == 1 else 1, + shortcut=shortcut, + if_first=block == i == 0, + dilation=dilation_rate, + data_format=data_format)) + + block_list.append(bottleneck_block) + shortcut = True + self.stage_list.append(block_list) + else: + for block in range(len(depth)): + shortcut = False + block_list = [] + for i in range(depth[block]): + dilation_rate = dilation_dict[block] \ + if dilation_dict and block in dilation_dict else 1 + if block == 3: + dilation_rate = dilation_rate * multi_grid[i] + + basic_block = self.add_sublayer( + 'bb_%d_%d' % (block, i), + BasicBlock( + in_channels=num_channels[block] + if i == 0 else num_filters[block], + out_channels=num_filters[block], + stride=2 if i == 0 and block != 0 \ + and dilation_rate == 1 else 1, + dilation=dilation_rate, + shortcut=shortcut, + if_first=block == i == 0, + data_format=data_format)) + block_list.append(basic_block) + shortcut = True + self.stage_list.append(block_list) + + self.pretrained = pretrained + + def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]: + y = self.conv1_1(inputs) + y = self.conv1_2(y) + y = self.conv1_3(y) + self.conv1_logit = y.clone() + y = self.pool2d_max(y) + + # A feature list saves the output feature map of each stage. + feat_list = [] + for stage in self.stage_list: + for block in stage: + y = block(y) + feat_list.append(y) + + return feat_list + + +def ResNet50_vd(**args): + model = ResNet_vd(layers=50, **args) + return model \ No newline at end of file diff --git a/modules/image/semantic_segmentation/danet_resnet50_voc/README.md b/modules/image/semantic_segmentation/danet_resnet50_voc/README.md new file mode 100644 index 0000000000000000000000000000000000000000..8ee72c8c86867420a0b739779f66eb12251131ef --- /dev/null +++ b/modules/image/semantic_segmentation/danet_resnet50_voc/README.md @@ -0,0 +1,182 @@ +# danet_resnet50_voc + +|模型名称|danet_resnet50_voc| +| :--- | :---: | +|类别|图像-图像分割| +|网络|danet_resnet50vd| +|数据集|PascalVOC2012| +|是否支持Fine-tuning|是| +|模型大小|273MB| +|指标|-| +|最新更新日期|2022-03-21| + +## 一、模型基本信息 + + - 样例结果示例: +

+ +

+ +- ### 模型介绍 + + - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。 + - 更多详情请参考:[danet](https://arxiv.org/pdf/1809.02983.pdf) + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、安装 + + - ```shell + $ hub install danet_resnet50_voc + ``` + + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + +## 三、模型API预测 + +- ### 1.预测代码示例 + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='danet_resnet50_voc') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.如何开始Fine-tune + + - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用danet_resnet50_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下: + + - 代码步骤 + + - Step1: 定义数据预处理方式 + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + + - Step2: 下载数据集并使用 + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + - `transforms`: 数据预处理方式。 + - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + + - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + + - Step3: 加载预训练模型 + + - ```python + import paddlehub as hub + + model = hub.Module(name='danet_resnet50_voc', num_classes=2, pretrained=None) + ``` + - `name`: 选择预训练模型的名字。 + - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + + - Step4: 选择优化策略和运行配置 + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + - 模型预测 + + - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='danet_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - 参数配置正确后,请执行脚本`python predict.py`。 + + - **Args** + * `images`:原始图像路径或BGR格式图片; + * `visualization`: 是否可视化,默认为True; + * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + + **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 四、服务部署 + +- PaddleHub Serving可以部署一个在线图像分割服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + + - ```shell + $ hub serving start -m danet_resnet50_voc + ``` + + - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + # 发送HTTP请求 + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## 五、更新历史 + +* 1.0.0 + + 初始发布 diff --git a/modules/image/semantic_segmentation/danet_resnet50_voc/README_en.md b/modules/image/semantic_segmentation/danet_resnet50_voc/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..6fecdfc23c39fff1dd1d312ccd3b9fe6315f9618 --- /dev/null +++ b/modules/image/semantic_segmentation/danet_resnet50_voc/README_en.md @@ -0,0 +1,181 @@ +# danet_resnet50_voc + +|Module Name|danet_resnet50_voc| +| :--- | :---: | +|Category|Image Segmentation| +|Network|danet_resnet50vd| +|Dataset|PascalVOC2012| +|Fine-tuning supported or not|Yes| +|Module Size|273MB| +|Data indicators|-| +|Latest update date|2022-03-22| + +## I. Basic Information + +- ### Application Effect Display + - Sample results: +

+ +

+ +- ### Module Introduction + + - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction. + - For more information, please refer to: [danet](https://arxiv.org/pdf/1809.02983.pdf) + +## II. Installation + +- ### 1、Environmental Dependence + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、Installation + + - ```shell + $ hub install danet_resnet50_voc + ``` + + - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) + | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md) + + +## III. Module API Prediction + +- ### 1、Prediction Code Example + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='danet_resnet50_voc') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.Fine-tune and Encapsulation + + - After completing the installation of PaddlePaddle and PaddleHub, you can start using the danet_resnet50_voc model to fine-tune datasets such as OpticDiscSeg. + + - Steps: + + - Step1: Define the data preprocessing method + + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs. + + - Step2: Download the dataset + + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + * `transforms`: data preprocessing methods. + + * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`. + + * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory. + + - Step3: Load the pre-trained model + + - ```python + import paddlehub as hub + + model = hub.Module(name='danet_resnet50_voc', num_classes=2, pretrained=None) + ``` + - `name`: model name. + - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters. + + - Step4: Optimization strategy + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + - Model prediction + + - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='danet_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - **Args** + * `images`: Image path or ndarray data with format [H, W, C], BGR. + * `visualization`: Whether to save the recognition results as picture files. + * `save_path`: Save path of the result, default is 'seg_result'. + + +## IV. Server Deployment + +- PaddleHub Serving can deploy an online service of image segmentation. + +- ### Step 1: Start PaddleHub Serving + + - Run the startup command: + + - ```shell + $ hub serving start -m danet_resnet50_voc + ``` + + - The servitization API is now deployed and the default port number is 8866. + + - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set. + +- ### Step 2: Send a predictive request + + - With a configured server, use the following lines of code to send the prediction request and obtain the result: + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/danet_resnet50_voc" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## V. Release Note + +- 1.0.0 + + First release diff --git a/modules/image/semantic_segmentation/danet_resnet50_voc/layers.py b/modules/image/semantic_segmentation/danet_resnet50_voc/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..96b307dc8750c5422837ebfd0382c198d7d49ff1 --- /dev/null +++ b/modules/image/semantic_segmentation/danet_resnet50_voc/layers.py @@ -0,0 +1,349 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.layer import activation +from paddle.nn import Conv2D, AvgPool2D + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu': + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class ConvBNLayer(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__( + self, + in_channels: int, + out_channels: int, + kernel_size: int, + stride: int = 1, + dilation: int = 1, + groups: int = 1, + is_vd_mode: bool = False, + act: str = None, + name: str = None): + super(ConvBNLayer, self).__init__() + + self.is_vd_mode = is_vd_mode + self._pool2d_avg = AvgPool2D( + kernel_size=2, stride=2, padding=0, ceil_mode=True) + self._conv = Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2 if dilation == 1 else 0, + dilation=dilation, + groups=groups, + bias_attr=False) + + self._batch_norm = SyncBatchNorm(out_channels) + self._act_op = Activation(act=act) + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + if self.is_vd_mode: + inputs = self._pool2d_avg(inputs) + y = self._conv(inputs) + y = self._batch_norm(y) + y = self._act_op(y) + + return y + + +class BottleneckBlock(nn.Layer): + """Residual bottleneck block""" + + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + shortcut: bool = True, + if_first: bool = False, + dilation: int = 1, + name: str = None): + super(BottleneckBlock, self).__init__() + + self.conv0 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + act='relu', + name=name + "_branch2a") + + self.dilation = dilation + + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + act='relu', + dilation=dilation, + name=name + "_branch2b") + self.conv2 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels * 4, + kernel_size=1, + act=None, + name=name + "_branch2c") + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels * 4, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + name=name + "_branch1") + + self.shortcut = shortcut + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + if self.dilation > 1: + padding = self.dilation + y = F.pad(y, [padding, padding, padding, padding]) + + conv1 = self.conv1(y) + conv2 = self.conv2(conv1) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + + y = paddle.add(x=short, y=conv2) + y = F.relu(y) + return y + + +class SeparableConvBNReLU(nn.Layer): + """Depthwise Separable Convolution.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(SeparableConvBNReLU, self).__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU( + in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBN, self).__init__() + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBNReLU, self).__init__() + + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + + Returns: + A callable object of Activation. + + Raises: + KeyError: When parameter `act` is not in the optional range. + + Examples: + + from paddleseg.models.common.activation import Activation + + relu = Activation("relu") + print(relu) + # + + sigmoid = Activation("sigmoid") + print(sigmoid) + # + + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format( + act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + + if self._act is not None: + return self.act_func(x) + else: + return x + + +class ASPPModule(nn.Layer): + """ + Atrous Spatial Pyramid Pooling. + + Args: + aspp_ratios (tuple): The dilation rate using in ASSP module. + in_channels (int): The number of input channels. + out_channels (int): The number of output channels. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False. + image_pooling (bool, optional): If augmented with image-level features. Default: False + """ + + def __init__(self, + aspp_ratios: tuple, + in_channels: int, + out_channels: int, + align_corners: bool, + use_sep_conv: bool= False, + image_pooling: bool = False): + super().__init__() + + self.align_corners = align_corners + self.aspp_blocks = nn.LayerList() + + for ratio in aspp_ratios: + if use_sep_conv and ratio > 1: + conv_func = SeparableConvBNReLU + else: + conv_func = ConvBNReLU + + block = conv_func( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1 if ratio == 1 else 3, + dilation=ratio, + padding=0 if ratio == 1 else ratio) + self.aspp_blocks.append(block) + + out_size = len(self.aspp_blocks) + + if image_pooling: + self.global_avg_pool = nn.Sequential( + nn.AdaptiveAvgPool2D(output_size=(1, 1)), + ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False)) + out_size += 1 + self.image_pooling = image_pooling + + self.conv_bn_relu = ConvBNReLU( + in_channels=out_channels * out_size, + out_channels=out_channels, + kernel_size=1) + + self.dropout = nn.Dropout(p=0.1) # drop rate + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outputs = [] + for block in self.aspp_blocks: + y = block(x) + y = F.interpolate( + y, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(y) + + if self.image_pooling: + img_avg = self.global_avg_pool(x) + img_avg = F.interpolate( + img_avg, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(img_avg) + + x = paddle.concat(outputs, axis=1) + x = self.conv_bn_relu(x) + x = self.dropout(x) + + return x + + + + diff --git a/modules/image/semantic_segmentation/danet_resnet50_voc/module.py b/modules/image/semantic_segmentation/danet_resnet50_voc/module.py new file mode 100644 index 0000000000000000000000000000000000000000..2dd4c60b9f787d0462d1e7c53b0c50e425494872 --- /dev/null +++ b/modules/image/semantic_segmentation/danet_resnet50_voc/module.py @@ -0,0 +1,245 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from typing import Union, List, Tuple + +import paddle +from paddle import nn +import paddle.nn.functional as F +import numpy as np +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +from danet_resnet50_voc.resnet import ResNet50_vd +import danet_resnet50_voc.layers as L + + +@moduleinfo( + name="danet_resnet50_voc", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="DeepLabV3PResnet50 is a segmentation model.", + version="1.0.0", + meta=ImageSegmentationModule) +class DANet(nn.Layer): + """ + The DANet implementation based on PaddlePaddle. + + The original article refers to + Fu, jun, et al. "Dual Attention Network for Scene Segmentation" + (https://arxiv.org/pdf/1809.02983.pdf) + + Args: + num_classes (int): The unique number of target classes. + backbone (Paddle.nn.Layer): A backbone network. + backbone_indices (tuple): The values in the tuple indicate the indices of + output of backbone. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None. + """ + + def __init__(self, + num_classes: int = 21, + backbone_indices: Tuple[int] = (2, 3), + align_corners: bool = False, + pretrained: str = None): + super(DANet, self).__init__() + + self.backbone = ResNet50_vd() + self.backbone_indices = backbone_indices + in_channels = [self.backbone.feat_channels[i] for i in backbone_indices] + + self.head = DAHead(num_classes=num_classes, in_channels=in_channels) + + self.align_corners = align_corners + self.transforms = T.Compose([T.Normalize()]) + + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]: + feats = self.backbone(x) + feats = [feats[i] for i in self.backbone_indices] + logit_list = self.head(feats) + if not self.training: + logit_list = [logit_list[0]] + + logit_list = [ + F.interpolate( + logit, + paddle.shape(x)[2:], + mode='bilinear', + align_corners=self.align_corners, + align_mode=1) for logit in logit_list + ] + return logit_list + + def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]: + return self.transforms(img) + + + +class DAHead(nn.Layer): + """ + The Dual attention head. + + Args: + num_classes (int): The unique number of target classes. + in_channels (tuple): The number of input channels. + """ + + def __init__(self, num_classes: int, in_channels: int): + super().__init__() + in_channels = in_channels[-1] + inter_channels = in_channels // 4 + + self.channel_conv = L.ConvBNReLU(in_channels, inter_channels, 3) + self.position_conv = L.ConvBNReLU(in_channels, inter_channels, 3) + self.pam = PAM(inter_channels) + self.cam = CAM(inter_channels) + self.conv1 = L.ConvBNReLU(inter_channels, inter_channels, 3) + self.conv2 = L.ConvBNReLU(inter_channels, inter_channels, 3) + + self.aux_head = nn.Sequential( + nn.Dropout2D(0.1), nn.Conv2D(in_channels, num_classes, 1)) + + self.aux_head_pam = nn.Sequential( + nn.Dropout2D(0.1), nn.Conv2D(inter_channels, num_classes, 1)) + + self.aux_head_cam = nn.Sequential( + nn.Dropout2D(0.1), nn.Conv2D(inter_channels, num_classes, 1)) + + self.cls_head = nn.Sequential( + nn.Dropout2D(0.1), nn.Conv2D(inter_channels, num_classes, 1)) + + def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]: + feats = feat_list[-1] + channel_feats = self.channel_conv(feats) + channel_feats = self.cam(channel_feats) + channel_feats = self.conv1(channel_feats) + + position_feats = self.position_conv(feats) + position_feats = self.pam(position_feats) + position_feats = self.conv2(position_feats) + + feats_sum = position_feats + channel_feats + logit = self.cls_head(feats_sum) + + if not self.training: + return [logit] + + cam_logit = self.aux_head_cam(channel_feats) + pam_logit = self.aux_head_cam(position_feats) + aux_logit = self.aux_head(feats) + return [logit, cam_logit, pam_logit, aux_logit] + + +class PAM(nn.Layer): + """Position attention module.""" + + def __init__(self, in_channels: int): + super().__init__() + mid_channels = in_channels // 8 + self.mid_channels = mid_channels + self.in_channels = in_channels + + self.query_conv = nn.Conv2D(in_channels, mid_channels, 1, 1) + self.key_conv = nn.Conv2D(in_channels, mid_channels, 1, 1) + self.value_conv = nn.Conv2D(in_channels, in_channels, 1, 1) + + self.gamma = self.create_parameter( + shape=[1], + dtype='float32', + default_initializer=nn.initializer.Constant(0)) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x_shape = paddle.shape(x) + + # query: n, h * w, c1 + query = self.query_conv(x) + query = paddle.reshape(query, (0, self.mid_channels, -1)) + query = paddle.transpose(query, (0, 2, 1)) + + # key: n, c1, h * w + key = self.key_conv(x) + key = paddle.reshape(key, (0, self.mid_channels, -1)) + + # sim: n, h * w, h * w + sim = paddle.bmm(query, key) + sim = F.softmax(sim, axis=-1) + + value = self.value_conv(x) + value = paddle.reshape(value, (0, self.in_channels, -1)) + sim = paddle.transpose(sim, (0, 2, 1)) + + # feat: from (n, c2, h * w) -> (n, c2, h, w) + feat = paddle.bmm(value, sim) + feat = paddle.reshape(feat, + (0, self.in_channels, x_shape[2], x_shape[3])) + + out = self.gamma * feat + x + return out + + +class CAM(nn.Layer): + """Channel attention module.""" + + def __init__(self, channels: int): + super().__init__() + + self.channels = channels + self.gamma = self.create_parameter( + shape=[1], + dtype='float32', + default_initializer=nn.initializer.Constant(0)) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x_shape = paddle.shape(x) + # query: n, c, h * w + query = paddle.reshape(x, (0, self.channels, -1)) + # key: n, h * w, c + key = paddle.reshape(x, (0, self.channels, -1)) + key = paddle.transpose(key, (0, 2, 1)) + + # sim: n, c, c + sim = paddle.bmm(query, key) + # The danet author claims that this can avoid gradient divergence + sim = paddle.max( + sim, axis=-1, keepdim=True).tile([1, 1, self.channels]) - sim + sim = F.softmax(sim, axis=-1) + + # feat: from (n, c, h * w) to (n, c, h, w) + value = paddle.reshape(x, (0, self.channels, -1)) + feat = paddle.bmm(sim, value) + feat = paddle.reshape(feat, (0, self.channels, x_shape[2], x_shape[3])) + + out = self.gamma * feat + x + return out + + + + + diff --git a/modules/image/semantic_segmentation/danet_resnet50_voc/resnet.py b/modules/image/semantic_segmentation/danet_resnet50_voc/resnet.py new file mode 100644 index 0000000000000000000000000000000000000000..12102a3fed8e810046e2d40a8796f93175d459fb --- /dev/null +++ b/modules/image/semantic_segmentation/danet_resnet50_voc/resnet.py @@ -0,0 +1,359 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from typing import Union, List, Tuple + +import paddle.nn as nn +import ann_resnet50_voc.layers as layers + + +class ConvBNLayer(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + stride: int = 1, + dilation: int = 1, + groups: int = 1, + is_vd_mode: bool = False, + act: str = None, + data_format: str = 'NCHW'): + super(ConvBNLayer, self).__init__() + if dilation != 1 and kernel_size != 3: + raise RuntimeError("When the dilation isn't 1," \ + "the kernel_size should be 3.") + + self.is_vd_mode = is_vd_mode + self._pool2d_avg = nn.AvgPool2D( + kernel_size=2, + stride=2, + padding=0, + ceil_mode=True, + data_format=data_format) + self._conv = nn.Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2 \ + if dilation == 1 else dilation, + dilation=dilation, + groups=groups, + bias_attr=False, + data_format=data_format) + + self._batch_norm = layers.SyncBatchNorm( + out_channels, data_format=data_format) + self._act_op = layers.Activation(act=act) + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + if self.is_vd_mode: + inputs = self._pool2d_avg(inputs) + y = self._conv(inputs) + y = self._batch_norm(y) + y = self._act_op(y) + + return y + + +class BottleneckBlock(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + shortcut: bool = True, + if_first: bool = False, + dilation: int = 1, + data_format: str = 'NCHW'): + super(BottleneckBlock, self).__init__() + + self.data_format = data_format + self.conv0 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + act='relu', + data_format=data_format) + + self.dilation = dilation + + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + act='relu', + dilation=dilation, + data_format=data_format) + self.conv2 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels * 4, + kernel_size=1, + act=None, + data_format=data_format) + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels * 4, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + data_format=data_format) + + self.shortcut = shortcut + # NOTE: Use the wrap layer for quantization training + self.add = layers.Add() + self.relu = layers.Activation(act="relu") + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + conv1 = self.conv1(y) + conv2 = self.conv2(conv1) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + + y = self.add(short, conv2) + y = self.relu(y) + return y + + +class BasicBlock(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + dilation: int = 1, + shortcut: bool = True, + if_first: bool = False, + data_format: str = 'NCHW'): + super(BasicBlock, self).__init__() + self.conv0 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + dilation=dilation, + act='relu', + data_format=data_format) + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + dilation=dilation, + act=None, + data_format=data_format) + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + data_format=data_format) + + self.shortcut = shortcut + self.dilation = dilation + self.data_format = data_format + self.add = layers.Add() + self.relu = layers.Activation(act="relu") + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + conv1 = self.conv1(y) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + y = self.add(short, conv1) + y = self.relu(y) + + return y + + +class ResNet_vd(nn.Layer): + """ + The ResNet_vd implementation based on PaddlePaddle. + + The original article refers to Jingdong + Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks" + (https://arxiv.org/pdf/1812.01187.pdf). + + Args: + layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50. + output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8. + multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1). + pretrained (str, optional): The path of pretrained model. + + """ + + def __init__(self, + layers: int = 50, + output_stride: int = 8, + multi_grid: Tuple[int] = (1, 1, 1), + pretrained: str = None, + data_format: str = 'NCHW'): + super(ResNet_vd, self).__init__() + + self.data_format = data_format + self.conv1_logit = None # for gscnn shape stream + self.layers = layers + supported_layers = [18, 34, 50, 101, 152, 200] + assert layers in supported_layers, \ + "supported layers are {} but input layer is {}".format( + supported_layers, layers) + + if layers == 18: + depth = [2, 2, 2, 2] + elif layers == 34 or layers == 50: + depth = [3, 4, 6, 3] + elif layers == 101: + depth = [3, 4, 23, 3] + elif layers == 152: + depth = [3, 8, 36, 3] + elif layers == 200: + depth = [3, 12, 48, 3] + num_channels = [64, 256, 512, 1024 + ] if layers >= 50 else [64, 64, 128, 256] + num_filters = [64, 128, 256, 512] + + # for channels of four returned stages + self.feat_channels = [c * 4 for c in num_filters + ] if layers >= 50 else num_filters + + dilation_dict = None + if output_stride == 8: + dilation_dict = {2: 2, 3: 4} + elif output_stride == 16: + dilation_dict = {3: 2} + + self.conv1_1 = ConvBNLayer( + in_channels=3, + out_channels=32, + kernel_size=3, + stride=2, + act='relu', + data_format=data_format) + self.conv1_2 = ConvBNLayer( + in_channels=32, + out_channels=32, + kernel_size=3, + stride=1, + act='relu', + data_format=data_format) + self.conv1_3 = ConvBNLayer( + in_channels=32, + out_channels=64, + kernel_size=3, + stride=1, + act='relu', + data_format=data_format) + self.pool2d_max = nn.MaxPool2D( + kernel_size=3, stride=2, padding=1, data_format=data_format) + + # self.block_list = [] + self.stage_list = [] + if layers >= 50: + for block in range(len(depth)): + shortcut = False + block_list = [] + for i in range(depth[block]): + if layers in [101, 152] and block == 2: + if i == 0: + conv_name = "res" + str(block + 2) + "a" + else: + conv_name = "res" + str(block + 2) + "b" + str(i) + else: + conv_name = "res" + str(block + 2) + chr(97 + i) + + ############################################################################### + # Add dilation rate for some segmentation tasks, if dilation_dict is not None. + dilation_rate = dilation_dict[ + block] if dilation_dict and block in dilation_dict else 1 + + # Actually block here is 'stage', and i is 'block' in 'stage' + # At the stage 4, expand the the dilation_rate if given multi_grid + if block == 3: + dilation_rate = dilation_rate * multi_grid[i] + ############################################################################### + + bottleneck_block = self.add_sublayer( + 'bb_%d_%d' % (block, i), + BottleneckBlock( + in_channels=num_channels[block] + if i == 0 else num_filters[block] * 4, + out_channels=num_filters[block], + stride=2 if i == 0 and block != 0 + and dilation_rate == 1 else 1, + shortcut=shortcut, + if_first=block == i == 0, + dilation=dilation_rate, + data_format=data_format)) + + block_list.append(bottleneck_block) + shortcut = True + self.stage_list.append(block_list) + else: + for block in range(len(depth)): + shortcut = False + block_list = [] + for i in range(depth[block]): + dilation_rate = dilation_dict[block] \ + if dilation_dict and block in dilation_dict else 1 + if block == 3: + dilation_rate = dilation_rate * multi_grid[i] + + basic_block = self.add_sublayer( + 'bb_%d_%d' % (block, i), + BasicBlock( + in_channels=num_channels[block] + if i == 0 else num_filters[block], + out_channels=num_filters[block], + stride=2 if i == 0 and block != 0 \ + and dilation_rate == 1 else 1, + dilation=dilation_rate, + shortcut=shortcut, + if_first=block == i == 0, + data_format=data_format)) + block_list.append(basic_block) + shortcut = True + self.stage_list.append(block_list) + + self.pretrained = pretrained + + def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]: + y = self.conv1_1(inputs) + y = self.conv1_2(y) + y = self.conv1_3(y) + self.conv1_logit = y.clone() + y = self.pool2d_max(y) + + # A feature list saves the output feature map of each stage. + feat_list = [] + for stage in self.stage_list: + for block in stage: + y = block(y) + feat_list.append(y) + + return feat_list + + +def ResNet50_vd(**args): + model = ResNet_vd(layers=50, **args) + return model \ No newline at end of file diff --git a/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/README.md b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/README.md new file mode 100644 index 0000000000000000000000000000000000000000..f4a52885d8e60a9c3569cf4515a20cf0a722d8c9 --- /dev/null +++ b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/README.md @@ -0,0 +1,182 @@ +# isanet_resnet50_cityscapes + +|模型名称|isanet_resnet50_cityscapes| +| :--- | :---: | +|类别|图像-图像分割| +|网络|isanet_resnet50vd| +|数据集|Cityscapes| +|是否支持Fine-tuning|是| +|模型大小|217MB| +|指标|-| +|最新更新日期|2022-03-21| + +## 一、模型基本信息 + + - 样例结果示例: +

+ +

+ +- ### 模型介绍 + + - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。 + - 更多详情请参考:[isanet](https://arxiv.org/abs/1907.12273) + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、安装 + + - ```shell + $ hub install isanet_resnet50_cityscapes + ``` + + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + +## 三、模型API预测 + +- ### 1.预测代码示例 + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='isanet_resnet50_cityscapes') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.如何开始Fine-tune + + - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用isanet_resnet50_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下: + + - 代码步骤 + + - Step1: 定义数据预处理方式 + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + + - Step2: 下载数据集并使用 + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + - `transforms`: 数据预处理方式。 + - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + + - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + + - Step3: 加载预训练模型 + + - ```python + import paddlehub as hub + + model = hub.Module(name='isanet_resnet50_cityscapes', num_classes=2, pretrained=None) + ``` + - `name`: 选择预训练模型的名字。 + - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + + - Step4: 选择优化策略和运行配置 + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + - 模型预测 + + - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='isanet_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - 参数配置正确后,请执行脚本`python predict.py`。 + + - **Args** + * `images`:原始图像路径或BGR格式图片; + * `visualization`: 是否可视化,默认为True; + * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + + **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 四、服务部署 + +- PaddleHub Serving可以部署一个在线图像分割服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + + - ```shell + $ hub serving start -m isanet_resnet50_cityscapes + ``` + + - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + # 发送HTTP请求 + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## 五、更新历史 + +* 1.0.0 + + 初始发布 diff --git a/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/README_en.md b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..ec784ba9f014e44bcef1a441a9a49ebdbcf2918b --- /dev/null +++ b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/README_en.md @@ -0,0 +1,181 @@ +# isanet_resnet50_cityscapes + +|Module Name|isanet_resnet50_cityscapes| +| :--- | :---: | +|Category|Image Segmentation| +|Network|isanet_resnet50vd| +|Dataset|Cityscapes| +|Fine-tuning supported or not|Yes| +|Module Size|217MB| +|Data indicators|-| +|Latest update date|2022-03-21| + +## I. Basic Information + +- ### Application Effect Display + - Sample results: +

+ +

+ +- ### Module Introduction + + - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction. + - For more information, please refer to: [isanet](https://arxiv.org/abs/1907.12273) + +## II. Installation + +- ### 1、Environmental Dependence + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、Installation + + - ```shell + $ hub install isanet_resnet50_cityscapes + ``` + + - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) + | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md) + + +## III. Module API Prediction + +- ### 1、Prediction Code Example + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='isanet_resnet50_cityscapes') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.Fine-tune and Encapsulation + + - After completing the installation of PaddlePaddle and PaddleHub, you can start using the isanet_resnet50_cityscapes model to fine-tune datasets such as OpticDiscSeg. + + - Steps: + + - Step1: Define the data preprocessing method + + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs. + + - Step2: Download the dataset + + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + * `transforms`: data preprocessing methods. + + * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`. + + * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory. + + - Step3: Load the pre-trained model + + - ```python + import paddlehub as hub + + model = hub.Module(name='isanet_resnet50_cityscapes', num_classes=2, pretrained=None) + ``` + - `name`: model name. + - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters. + + - Step4: Optimization strategy + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + - Model prediction + + - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='isanet_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - **Args** + * `images`: Image path or ndarray data with format [H, W, C], BGR. + * `visualization`: Whether to save the recognition results as picture files. + * `save_path`: Save path of the result, default is 'seg_result'. + + +## IV. Server Deployment + +- PaddleHub Serving can deploy an online service of image segmentation. + +- ### Step 1: Start PaddleHub Serving + + - Run the startup command: + + - ```shell + $ hub serving start -m isanet_resnet50_cityscapes + ``` + + - The servitization API is now deployed and the default port number is 8866. + + - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set. + +- ### Step 2: Send a predictive request + + - With a configured server, use the following lines of code to send the prediction request and obtain the result: + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/isanet_resnet50_cityscapes" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## V. Release Note + +- 1.0.0 + + First release diff --git a/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/layers.py b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..3e42fb7f2ec66e0daf9123e59464258f33cafc57 --- /dev/null +++ b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/layers.py @@ -0,0 +1,401 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.layer import activation +from paddle.nn import Conv2D, AvgPool2D + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu': + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class SeparableConvBNReLU(nn.Layer): + """Depthwise Separable Convolution.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(SeparableConvBNReLU, self).__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU( + in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBN, self).__init__() + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBNReLU, self).__init__() + + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + Returns: + A callable object of Activation. + Raises: + KeyError: When parameter `act` is not in the optional range. + Examples: + from paddleseg.models.common.activation import Activation + relu = Activation("relu") + print(relu) + # + sigmoid = Activation("sigmoid") + print(sigmoid) + # + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format( + act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + + if self._act is not None: + return self.act_func(x) + else: + return x + + +class ASPPModule(nn.Layer): + """ + Atrous Spatial Pyramid Pooling. + Args: + aspp_ratios (tuple): The dilation rate using in ASSP module. + in_channels (int): The number of input channels. + out_channels (int): The number of output channels. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False. + image_pooling (bool, optional): If augmented with image-level features. Default: False + """ + + def __init__(self, + aspp_ratios: tuple, + in_channels: int, + out_channels: int, + align_corners: bool, + use_sep_conv: bool = False, + image_pooling: bool = False): + super().__init__() + + self.align_corners = align_corners + self.aspp_blocks = nn.LayerList() + + for ratio in aspp_ratios: + if use_sep_conv and ratio > 1: + conv_func = SeparableConvBNReLU + else: + conv_func = ConvBNReLU + + block = conv_func( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1 if ratio == 1 else 3, + dilation=ratio, + padding=0 if ratio == 1 else ratio) + self.aspp_blocks.append(block) + + out_size = len(self.aspp_blocks) + + if image_pooling: + self.global_avg_pool = nn.Sequential( + nn.AdaptiveAvgPool2D(output_size=(1, 1)), + ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False)) + out_size += 1 + self.image_pooling = image_pooling + + self.conv_bn_relu = ConvBNReLU( + in_channels=out_channels * out_size, + out_channels=out_channels, + kernel_size=1) + + self.dropout = nn.Dropout(p=0.1) # drop rate + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outputs = [] + for block in self.aspp_blocks: + y = block(x) + y = F.interpolate( + y, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(y) + + if self.image_pooling: + img_avg = self.global_avg_pool(x) + img_avg = F.interpolate( + img_avg, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(img_avg) + + x = paddle.concat(outputs, axis=1) + x = self.conv_bn_relu(x) + x = self.dropout(x) + + return x + + +class AuxLayer(nn.Layer): + """ + The auxiliary layer implementation for auxiliary loss. + + Args: + in_channels (int): The number of input channels. + inter_channels (int): The intermediate channels. + out_channels (int): The number of output channels, and usually it is num_classes. + dropout_prob (float, optional): The drop rate. Default: 0.1. + """ + + def __init__(self, + in_channels: int, + inter_channels: int, + out_channels: int, + dropout_prob: float = 0.1, + **kwargs): + super().__init__() + + self.conv_bn_relu = ConvBNReLU( + in_channels=in_channels, + out_channels=inter_channels, + kernel_size=3, + padding=1, + **kwargs) + + self.dropout = nn.Dropout(p=dropout_prob) + + self.conv = nn.Conv2D( + in_channels=inter_channels, + out_channels=out_channels, + kernel_size=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv_bn_relu(x) + x = self.dropout(x) + x = self.conv(x) + return x + + +class Add(nn.Layer): + def __init__(self): + super().__init__() + + def forward(self, x: paddle.Tensor, y: paddle.Tensor, name: str = None): + return paddle.add(x, y, name) + +class AttentionBlock(nn.Layer): + """General self-attention block/non-local block. + + The original article refers to refer to https://arxiv.org/abs/1706.03762. + Args: + key_in_channels (int): Input channels of key feature. + query_in_channels (int): Input channels of query feature. + channels (int): Output channels of key/query transform. + out_channels (int): Output channels. + share_key_query (bool): Whether share projection weight between key + and query projection. + query_downsample (nn.Module): Query downsample module. + key_downsample (nn.Module): Key downsample module. + key_query_num_convs (int): Number of convs for key/query projection. + value_out_num_convs (int): Number of convs for value projection. + key_query_norm (bool): Whether to use BN for key/query projection. + value_out_norm (bool): Whether to use BN for value projection. + matmul_norm (bool): Whether normalize attention map with sqrt of + channels + with_out (bool): Whether use out projection. + """ + + def __init__(self, key_in_channels, query_in_channels, channels, + out_channels, share_key_query, query_downsample, + key_downsample, key_query_num_convs, value_out_num_convs, + key_query_norm, value_out_norm, matmul_norm, with_out): + super(AttentionBlock, self).__init__() + if share_key_query: + assert key_in_channels == query_in_channels + self.with_out = with_out + self.key_in_channels = key_in_channels + self.query_in_channels = query_in_channels + self.out_channels = out_channels + self.channels = channels + self.share_key_query = share_key_query + self.key_project = self.build_project( + key_in_channels, + channels, + num_convs=key_query_num_convs, + use_conv_module=key_query_norm) + if share_key_query: + self.query_project = self.key_project + else: + self.query_project = self.build_project( + query_in_channels, + channels, + num_convs=key_query_num_convs, + use_conv_module=key_query_norm) + + self.value_project = self.build_project( + key_in_channels, + channels if self.with_out else out_channels, + num_convs=value_out_num_convs, + use_conv_module=value_out_norm) + + if self.with_out: + self.out_project = self.build_project( + channels, + out_channels, + num_convs=value_out_num_convs, + use_conv_module=value_out_norm) + else: + self.out_project = None + + self.query_downsample = query_downsample + self.key_downsample = key_downsample + self.matmul_norm = matmul_norm + + def build_project(self, in_channels: int , channels: int, num_convs: int, use_conv_module: bool): + if use_conv_module: + convs = [ + ConvBNReLU( + in_channels=in_channels, + out_channels=channels, + kernel_size=1, + bias_attr=False) + ] + for _ in range(num_convs - 1): + convs.append( + ConvBNReLU( + in_channels=channels, + out_channels=channels, + kernel_size=1, + bias_attr=False)) + else: + convs = [nn.Conv2D(in_channels, channels, 1)] + for _ in range(num_convs - 1): + convs.append(nn.Conv2D(channels, channels, 1)) + + if len(convs) > 1: + convs = nn.Sequential(*convs) + else: + convs = convs[0] + return convs + + def forward(self, query_feats: paddle.Tensor, key_feats: paddle.Tensor) -> paddle.Tensor: + query_shape = paddle.shape(query_feats) + query = self.query_project(query_feats) + if self.query_downsample is not None: + query = self.query_downsample(query) + query = query.flatten(2).transpose([0, 2, 1]) + + key = self.key_project(key_feats) + value = self.value_project(key_feats) + + if self.key_downsample is not None: + key = self.key_downsample(key) + value = self.key_downsample(value) + + key = key.flatten(2) + value = value.flatten(2).transpose([0, 2, 1]) + sim_map = paddle.matmul(query, key) + if self.matmul_norm: + sim_map = (self.channels**-0.5) * sim_map + sim_map = F.softmax(sim_map, axis=-1) + + context = paddle.matmul(sim_map, value) + context = paddle.transpose(context, [0, 2, 1]) + + context = paddle.reshape( + context, [0, self.out_channels, query_shape[2], query_shape[3]]) + + if self.out_project is not None: + context = self.out_project(context) + return context diff --git a/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/module.py b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/module.py new file mode 100644 index 0000000000000000000000000000000000000000..6b20ac094dad2e29de83a0f5ba374564509aad5c --- /dev/null +++ b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/module.py @@ -0,0 +1,221 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from typing import Union, List, Tuple + +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +from isanet_resnet50_cityscapes.resnet import ResNet50_vd +import isanet_resnet50_cityscapes.layers as layers + + +@moduleinfo( + name="isanet_resnet50_cityscapes", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="ISANetResnet50 is a segmentation model.", + version="1.0.0", + meta=ImageSegmentationModule) +class ISANet(nn.Layer): + """Interlaced Sparse Self-Attention for Semantic Segmentation. + + The original article refers to Lang Huang, et al. "Interlaced Sparse Self-Attention for Semantic Segmentation" + (https://arxiv.org/abs/1907.12273). + + Args: + num_classes (int): The unique number of target classes. + backbone_indices (tuple): The values in the tuple indicate the indices of output of backbone. + isa_channels (int): The channels of ISA Module. + down_factor (tuple): Divide the height and width dimension to (Ph, PW) groups. + enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None. + + """ + + def __init__(self, + num_classes: int = 19, + backbone_indices: Tuple[int] = (2, 3), + isa_channels: int = 256, + down_factor: Tuple[int] = (8, 8), + enable_auxiliary_loss: bool = True, + align_corners: bool = False, + pretrained: str = None): + super(ISANet, self).__init__() + + self.backbone = ResNet50_vd() + self.backbone_indices = backbone_indices + in_channels = [self.backbone.feat_channels[i] for i in backbone_indices] + self.head = ISAHead(num_classes, in_channels, isa_channels, down_factor, + enable_auxiliary_loss) + self.align_corners = align_corners + self.transforms = T.Compose([T.Normalize()]) + + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]: + return self.transforms(img) + + def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]: + feats = self.backbone(x) + feats = [feats[i] for i in self.backbone_indices] + logit_list = self.head(feats) + logit_list = [ + F.interpolate( + logit, + paddle.shape(x)[2:], + mode='bilinear', + align_corners=self.align_corners, + align_mode=1) for logit in logit_list + ] + + return logit_list + + +class ISAHead(nn.Layer): + """ + The ISAHead. + + Args: + num_classes (int): The unique number of target classes. + in_channels (tuple): The number of input channels. + isa_channels (int): The channels of ISA Module. + down_factor (tuple): Divide the height and width dimension to (Ph, PW) groups. + enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True. + """ + + def __init__(self, + num_classes: int, + in_channels: int, + isa_channels: int, + down_factor: Tuple[int], + enable_auxiliary_loss: bool): + super(ISAHead, self).__init__() + self.in_channels = in_channels[-1] + inter_channels = self.in_channels // 4 + self.inter_channels = inter_channels + self.down_factor = down_factor + self.enable_auxiliary_loss = enable_auxiliary_loss + self.in_conv = layers.ConvBNReLU( + self.in_channels, inter_channels, 3, bias_attr=False) + self.global_relation = SelfAttentionBlock(inter_channels, isa_channels) + self.local_relation = SelfAttentionBlock(inter_channels, isa_channels) + self.out_conv = layers.ConvBNReLU( + inter_channels * 2, inter_channels, 1, bias_attr=False) + self.cls = nn.Sequential( + nn.Dropout2D(p=0.1), nn.Conv2D(inter_channels, num_classes, 1)) + self.aux = nn.Sequential( + layers.ConvBNReLU( + in_channels=1024, + out_channels=256, + kernel_size=3, + bias_attr=False), nn.Dropout2D(p=0.1), + nn.Conv2D(256, num_classes, 1)) + + def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]: + C3, C4 = feat_list + x = self.in_conv(C4) + x_shape = paddle.shape(x) + P_h, P_w = self.down_factor + Q_h, Q_w = paddle.ceil(x_shape[2] / P_h).astype('int32'), paddle.ceil( + x_shape[3] / P_w).astype('int32') + pad_h, pad_w = (Q_h * P_h - x_shape[2]).astype('int32'), ( + Q_w * P_w - x_shape[3]).astype('int32') + if pad_h > 0 or pad_w > 0: + padding = paddle.concat([ + pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2 + ], + axis=0) + feat = F.pad(x, padding) + else: + feat = x + + feat = feat.reshape([0, x_shape[1], Q_h, P_h, Q_w, P_w]) + feat = feat.transpose([0, 3, 5, 1, 2, + 4]).reshape([-1, self.inter_channels, Q_h, Q_w]) + feat = self.global_relation(feat) + + feat = feat.reshape([x_shape[0], P_h, P_w, x_shape[1], Q_h, Q_w]) + feat = feat.transpose([0, 4, 5, 3, 1, + 2]).reshape([-1, self.inter_channels, P_h, P_w]) + feat = self.local_relation(feat) + + feat = feat.reshape([x_shape[0], Q_h, Q_w, x_shape[1], P_h, P_w]) + feat = feat.transpose([0, 3, 1, 4, 2, 5]).reshape( + [0, self.inter_channels, P_h * Q_h, P_w * Q_w]) + if pad_h > 0 or pad_w > 0: + feat = paddle.slice( + feat, + axes=[2, 3], + starts=[pad_h // 2, pad_w // 2], + ends=[pad_h // 2 + x_shape[2], pad_w // 2 + x_shape[3]]) + + feat = self.out_conv(paddle.concat([feat, x], axis=1)) + output = self.cls(feat) + + if self.enable_auxiliary_loss: + auxout = self.aux(C3) + return [output, auxout] + else: + return [output] + + +class SelfAttentionBlock(layers.AttentionBlock): + """General self-attention block/non-local block. + + Args: + in_channels (int): Input channels of key/query feature. + channels (int): Output channels of key/query transform. + """ + + def __init__(self, in_channels: int, channels: int): + super(SelfAttentionBlock, self).__init__( + key_in_channels=in_channels, + query_in_channels=in_channels, + channels=channels, + out_channels=in_channels, + share_key_query=False, + query_downsample=None, + key_downsample=None, + key_query_num_convs=2, + key_query_norm=True, + value_out_num_convs=1, + value_out_norm=False, + matmul_norm=True, + with_out=False) + + self.output_project = self.build_project( + in_channels, in_channels, num_convs=1, use_conv_module=True) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + context = super(SelfAttentionBlock, self).forward(x, x) + return self.output_project(context) diff --git a/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/resnet.py b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/resnet.py new file mode 100644 index 0000000000000000000000000000000000000000..f7de1ee294b6def1fea6dbaf6ef38e915f50a21e --- /dev/null +++ b/modules/image/semantic_segmentation/isanet_resnet50_cityscapes/resnet.py @@ -0,0 +1,359 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +import isanet_resnet50_cityscapes.layers as layers + + +class ConvBNLayer(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + stride: int = 1, + dilation: int = 1, + groups: int = 1, + is_vd_mode: bool = False, + act: str = None, + data_format: str = 'NCHW'): + super(ConvBNLayer, self).__init__() + if dilation != 1 and kernel_size != 3: + raise RuntimeError("When the dilation isn't 1," \ + "the kernel_size should be 3.") + + self.is_vd_mode = is_vd_mode + self._pool2d_avg = nn.AvgPool2D( + kernel_size=2, + stride=2, + padding=0, + ceil_mode=True, + data_format=data_format) + self._conv = nn.Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2 \ + if dilation == 1 else dilation, + dilation=dilation, + groups=groups, + bias_attr=False, + data_format=data_format) + + self._batch_norm = layers.SyncBatchNorm( + out_channels, data_format=data_format) + self._act_op = layers.Activation(act=act) + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + if self.is_vd_mode: + inputs = self._pool2d_avg(inputs) + y = self._conv(inputs) + y = self._batch_norm(y) + y = self._act_op(y) + + return y + + +class BottleneckBlock(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + shortcut: bool = True, + if_first: bool = False, + dilation: int = 1, + data_format: str = 'NCHW'): + super(BottleneckBlock, self).__init__() + + self.data_format = data_format + self.conv0 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + act='relu', + data_format=data_format) + + self.dilation = dilation + + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + act='relu', + dilation=dilation, + data_format=data_format) + self.conv2 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels * 4, + kernel_size=1, + act=None, + data_format=data_format) + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels * 4, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + data_format=data_format) + + self.shortcut = shortcut + # NOTE: Use the wrap layer for quantization training + self.add = layers.Add() + self.relu = layers.Activation(act="relu") + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + conv1 = self.conv1(y) + conv2 = self.conv2(conv1) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + + y = self.add(short, conv2) + y = self.relu(y) + return y + + +class BasicBlock(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + dilation: int = 1, + shortcut: bool = True, + if_first: bool = False, + data_format: str = 'NCHW'): + super(BasicBlock, self).__init__() + self.conv0 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + dilation=dilation, + act='relu', + data_format=data_format) + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + dilation=dilation, + act=None, + data_format=data_format) + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + data_format=data_format) + + self.shortcut = shortcut + self.dilation = dilation + self.data_format = data_format + self.add = layers.Add() + self.relu = layers.Activation(act="relu") + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + conv1 = self.conv1(y) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + y = self.add(short, conv1) + y = self.relu(y) + + return y + + +class ResNet_vd(nn.Layer): + """ + The ResNet_vd implementation based on PaddlePaddle. + + The original article refers to Jingdong + Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks" + (https://arxiv.org/pdf/1812.01187.pdf). + + Args: + layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50. + output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8. + multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1). + pretrained (str, optional): The path of pretrained model. + + """ + + def __init__(self, + layers: int = 50, + output_stride: int = 8, + multi_grid: Tuple[int] = (1, 1, 1), + pretrained: str = None, + data_format: str = 'NCHW'): + super(ResNet_vd, self).__init__() + + self.data_format = data_format + self.conv1_logit = None # for gscnn shape stream + self.layers = layers + supported_layers = [18, 34, 50, 101, 152, 200] + assert layers in supported_layers, \ + "supported layers are {} but input layer is {}".format( + supported_layers, layers) + + if layers == 18: + depth = [2, 2, 2, 2] + elif layers == 34 or layers == 50: + depth = [3, 4, 6, 3] + elif layers == 101: + depth = [3, 4, 23, 3] + elif layers == 152: + depth = [3, 8, 36, 3] + elif layers == 200: + depth = [3, 12, 48, 3] + num_channels = [64, 256, 512, 1024 + ] if layers >= 50 else [64, 64, 128, 256] + num_filters = [64, 128, 256, 512] + + # for channels of four returned stages + self.feat_channels = [c * 4 for c in num_filters + ] if layers >= 50 else num_filters + + dilation_dict = None + if output_stride == 8: + dilation_dict = {2: 2, 3: 4} + elif output_stride == 16: + dilation_dict = {3: 2} + + self.conv1_1 = ConvBNLayer( + in_channels=3, + out_channels=32, + kernel_size=3, + stride=2, + act='relu', + data_format=data_format) + self.conv1_2 = ConvBNLayer( + in_channels=32, + out_channels=32, + kernel_size=3, + stride=1, + act='relu', + data_format=data_format) + self.conv1_3 = ConvBNLayer( + in_channels=32, + out_channels=64, + kernel_size=3, + stride=1, + act='relu', + data_format=data_format) + self.pool2d_max = nn.MaxPool2D( + kernel_size=3, stride=2, padding=1, data_format=data_format) + + # self.block_list = [] + self.stage_list = [] + if layers >= 50: + for block in range(len(depth)): + shortcut = False + block_list = [] + for i in range(depth[block]): + if layers in [101, 152] and block == 2: + if i == 0: + conv_name = "res" + str(block + 2) + "a" + else: + conv_name = "res" + str(block + 2) + "b" + str(i) + else: + conv_name = "res" + str(block + 2) + chr(97 + i) + + ############################################################################### + # Add dilation rate for some segmentation tasks, if dilation_dict is not None. + dilation_rate = dilation_dict[ + block] if dilation_dict and block in dilation_dict else 1 + + # Actually block here is 'stage', and i is 'block' in 'stage' + # At the stage 4, expand the the dilation_rate if given multi_grid + if block == 3: + dilation_rate = dilation_rate * multi_grid[i] + ############################################################################### + + bottleneck_block = self.add_sublayer( + 'bb_%d_%d' % (block, i), + BottleneckBlock( + in_channels=num_channels[block] + if i == 0 else num_filters[block] * 4, + out_channels=num_filters[block], + stride=2 if i == 0 and block != 0 + and dilation_rate == 1 else 1, + shortcut=shortcut, + if_first=block == i == 0, + dilation=dilation_rate, + data_format=data_format)) + + block_list.append(bottleneck_block) + shortcut = True + self.stage_list.append(block_list) + else: + for block in range(len(depth)): + shortcut = False + block_list = [] + for i in range(depth[block]): + dilation_rate = dilation_dict[block] \ + if dilation_dict and block in dilation_dict else 1 + if block == 3: + dilation_rate = dilation_rate * multi_grid[i] + + basic_block = self.add_sublayer( + 'bb_%d_%d' % (block, i), + BasicBlock( + in_channels=num_channels[block] + if i == 0 else num_filters[block], + out_channels=num_filters[block], + stride=2 if i == 0 and block != 0 \ + and dilation_rate == 1 else 1, + dilation=dilation_rate, + shortcut=shortcut, + if_first=block == i == 0, + data_format=data_format)) + block_list.append(basic_block) + shortcut = True + self.stage_list.append(block_list) + + self.pretrained = pretrained + + def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]: + y = self.conv1_1(inputs) + y = self.conv1_2(y) + y = self.conv1_3(y) + self.conv1_logit = y.clone() + y = self.pool2d_max(y) + + # A feature list saves the output feature map of each stage. + feat_list = [] + for stage in self.stage_list: + for block in stage: + y = block(y) + feat_list.append(y) + + return feat_list + + +def ResNet50_vd(**args): + model = ResNet_vd(layers=50, **args) + return model diff --git a/modules/image/semantic_segmentation/isanet_resnet50_voc/README.md b/modules/image/semantic_segmentation/isanet_resnet50_voc/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e7e56aa3ef443e71c7da42e076e9c20244cfebec --- /dev/null +++ b/modules/image/semantic_segmentation/isanet_resnet50_voc/README.md @@ -0,0 +1,182 @@ +# isanet_resnet50_voc + +|模型名称|isanet_resnet50_voc| +| :--- | :---: | +|类别|图像-图像分割| +|网络|isanet_resnet50vd| +|数据集|PascalVOC2012| +|是否支持Fine-tuning|是| +|模型大小|217MB| +|指标|-| +|最新更新日期|2022-03-21| + +## 一、模型基本信息 + + - 样例结果示例: +

+ +

+ +- ### 模型介绍 + + - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。 + - 更多详情请参考:[isanet](https://arxiv.org/abs/1907.12273) + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、安装 + + - ```shell + $ hub install isanet_resnet50_voc + ``` + + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + +## 三、模型API预测 + +- ### 1.预测代码示例 + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='isanet_resnet50_voc') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.如何开始Fine-tune + + - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用isanet_resnet50_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下: + + - 代码步骤 + + - Step1: 定义数据预处理方式 + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + + - Step2: 下载数据集并使用 + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + - `transforms`: 数据预处理方式。 + - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + + - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + + - Step3: 加载预训练模型 + + - ```python + import paddlehub as hub + + model = hub.Module(name='isanet_resnet50_voc', num_classes=2, pretrained=None) + ``` + - `name`: 选择预训练模型的名字。 + - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + + - Step4: 选择优化策略和运行配置 + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + - 模型预测 + + - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='isanet_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - 参数配置正确后,请执行脚本`python predict.py`。 + + - **Args** + * `images`:原始图像路径或BGR格式图片; + * `visualization`: 是否可视化,默认为True; + * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + + **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 四、服务部署 + +- PaddleHub Serving可以部署一个在线图像分割服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + + - ```shell + $ hub serving start -m isanet_resnet50_voc + ``` + + - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + # 发送HTTP请求 + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## 五、更新历史 + +* 1.0.0 + + 初始发布 diff --git a/modules/image/semantic_segmentation/isanet_resnet50_voc/README_en.md b/modules/image/semantic_segmentation/isanet_resnet50_voc/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..b80886911fcd931e6898983d85aa6213f62e1b34 --- /dev/null +++ b/modules/image/semantic_segmentation/isanet_resnet50_voc/README_en.md @@ -0,0 +1,181 @@ +# isanet_resnet50_voc + +|Module Name|isanet_resnet50_voc| +| :--- | :---: | +|Category|Image Segmentation| +|Network|isanet_resnet50vd| +|Dataset|PascalVOC2012| +|Fine-tuning supported or not|Yes| +|Module Size|217MB| +|Data indicators|-| +|Latest update date|2022-03-22| + +## I. Basic Information + +- ### Application Effect Display + - Sample results: +

+ +

+ +- ### Module Introduction + + - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction. + - For more information, please refer to: [isanet](https://arxiv.org/abs/1907.12273) + +## II. Installation + +- ### 1、Environmental Dependence + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、Installation + + - ```shell + $ hub install isanet_resnet50_voc + ``` + + - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) + | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md) + + +## III. Module API Prediction + +- ### 1、Prediction Code Example + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='isanet_resnet50_voc') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.Fine-tune and Encapsulation + + - After completing the installation of PaddlePaddle and PaddleHub, you can start using the isanet_resnet50_voc model to fine-tune datasets such as OpticDiscSeg. + + - Steps: + + - Step1: Define the data preprocessing method + + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs. + + - Step2: Download the dataset + + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + * `transforms`: data preprocessing methods. + + * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`. + + * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory. + + - Step3: Load the pre-trained model + + - ```python + import paddlehub as hub + + model = hub.Module(name='isanet_resnet50_voc', num_classes=2, pretrained=None) + ``` + - `name`: model name. + - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters. + + - Step4: Optimization strategy + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + - Model prediction + + - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='isanet_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - **Args** + * `images`: Image path or ndarray data with format [H, W, C], BGR. + * `visualization`: Whether to save the recognition results as picture files. + * `save_path`: Save path of the result, default is 'seg_result'. + + +## IV. Server Deployment + +- PaddleHub Serving can deploy an online service of image segmentation. + +- ### Step 1: Start PaddleHub Serving + + - Run the startup command: + + - ```shell + $ hub serving start -m isanet_resnet50_voc + ``` + + - The servitization API is now deployed and the default port number is 8866. + + - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set. + +- ### Step 2: Send a predictive request + + - With a configured server, use the following lines of code to send the prediction request and obtain the result: + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/isanet_resnet50_voc" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## V. Release Note + +- 1.0.0 + + First release diff --git a/modules/image/semantic_segmentation/isanet_resnet50_voc/layers.py b/modules/image/semantic_segmentation/isanet_resnet50_voc/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..7f6ee5748d15f632c5b35ac84b6262d1f00a7c72 --- /dev/null +++ b/modules/image/semantic_segmentation/isanet_resnet50_voc/layers.py @@ -0,0 +1,401 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.layer import activation +from paddle.nn import Conv2D, AvgPool2D + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu': + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class SeparableConvBNReLU(nn.Layer): + """Depthwise Separable Convolution.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(SeparableConvBNReLU, self).__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU( + in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBN, self).__init__() + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBNReLU, self).__init__() + + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + Returns: + A callable object of Activation. + Raises: + KeyError: When parameter `act` is not in the optional range. + Examples: + from paddleseg.models.common.activation import Activation + relu = Activation("relu") + print(relu) + # + sigmoid = Activation("sigmoid") + print(sigmoid) + # + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format( + act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + + if self._act is not None: + return self.act_func(x) + else: + return x + + +class ASPPModule(nn.Layer): + """ + Atrous Spatial Pyramid Pooling. + Args: + aspp_ratios (tuple): The dilation rate using in ASSP module. + in_channels (int): The number of input channels. + out_channels (int): The number of output channels. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False. + image_pooling (bool, optional): If augmented with image-level features. Default: False + """ + + def __init__(self, + aspp_ratios: tuple, + in_channels: int, + out_channels: int, + align_corners: bool, + use_sep_conv: bool = False, + image_pooling: bool = False): + super().__init__() + + self.align_corners = align_corners + self.aspp_blocks = nn.LayerList() + + for ratio in aspp_ratios: + if use_sep_conv and ratio > 1: + conv_func = SeparableConvBNReLU + else: + conv_func = ConvBNReLU + + block = conv_func( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1 if ratio == 1 else 3, + dilation=ratio, + padding=0 if ratio == 1 else ratio) + self.aspp_blocks.append(block) + + out_size = len(self.aspp_blocks) + + if image_pooling: + self.global_avg_pool = nn.Sequential( + nn.AdaptiveAvgPool2D(output_size=(1, 1)), + ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False)) + out_size += 1 + self.image_pooling = image_pooling + + self.conv_bn_relu = ConvBNReLU( + in_channels=out_channels * out_size, + out_channels=out_channels, + kernel_size=1) + + self.dropout = nn.Dropout(p=0.1) # drop rate + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outputs = [] + for block in self.aspp_blocks: + y = block(x) + y = F.interpolate( + y, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(y) + + if self.image_pooling: + img_avg = self.global_avg_pool(x) + img_avg = F.interpolate( + img_avg, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(img_avg) + + x = paddle.concat(outputs, axis=1) + x = self.conv_bn_relu(x) + x = self.dropout(x) + + return x + + +class AuxLayer(nn.Layer): + """ + The auxiliary layer implementation for auxiliary loss. + + Args: + in_channels (int): The number of input channels. + inter_channels (int): The intermediate channels. + out_channels (int): The number of output channels, and usually it is num_classes. + dropout_prob (float, optional): The drop rate. Default: 0.1. + """ + + def __init__(self, + in_channels: int, + inter_channels: int, + out_channels: int, + dropout_prob: float = 0.1, + **kwargs): + super().__init__() + + self.conv_bn_relu = ConvBNReLU( + in_channels=in_channels, + out_channels=inter_channels, + kernel_size=3, + padding=1, + **kwargs) + + self.dropout = nn.Dropout(p=dropout_prob) + + self.conv = nn.Conv2D( + in_channels=inter_channels, + out_channels=out_channels, + kernel_size=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv_bn_relu(x) + x = self.dropout(x) + x = self.conv(x) + return x + + +class Add(nn.Layer): + def __init__(self): + super().__init__() + + def forward(self, x: paddle.Tensor, y: paddle.Tensor, name: str = None): + return paddle.add(x, y, name) + +class AttentionBlock(nn.Layer): + """General self-attention block/non-local block. + + The original article refers to refer to https://arxiv.org/abs/1706.03762. + Args: + key_in_channels (int): Input channels of key feature. + query_in_channels (int): Input channels of query feature. + channels (int): Output channels of key/query transform. + out_channels (int): Output channels. + share_key_query (bool): Whether share projection weight between key + and query projection. + query_downsample (nn.Module): Query downsample module. + key_downsample (nn.Module): Key downsample module. + key_query_num_convs (int): Number of convs for key/query projection. + value_out_num_convs (int): Number of convs for value projection. + key_query_norm (bool): Whether to use BN for key/query projection. + value_out_norm (bool): Whether to use BN for value projection. + matmul_norm (bool): Whether normalize attention map with sqrt of + channels + with_out (bool): Whether use out projection. + """ + + def __init__(self, key_in_channels, query_in_channels, channels, + out_channels, share_key_query, query_downsample, + key_downsample, key_query_num_convs, value_out_num_convs, + key_query_norm, value_out_norm, matmul_norm, with_out): + super(AttentionBlock, self).__init__() + if share_key_query: + assert key_in_channels == query_in_channels + self.with_out = with_out + self.key_in_channels = key_in_channels + self.query_in_channels = query_in_channels + self.out_channels = out_channels + self.channels = channels + self.share_key_query = share_key_query + self.key_project = self.build_project( + key_in_channels, + channels, + num_convs=key_query_num_convs, + use_conv_module=key_query_norm) + if share_key_query: + self.query_project = self.key_project + else: + self.query_project = self.build_project( + query_in_channels, + channels, + num_convs=key_query_num_convs, + use_conv_module=key_query_norm) + + self.value_project = self.build_project( + key_in_channels, + channels if self.with_out else out_channels, + num_convs=value_out_num_convs, + use_conv_module=value_out_norm) + + if self.with_out: + self.out_project = self.build_project( + channels, + out_channels, + num_convs=value_out_num_convs, + use_conv_module=value_out_norm) + else: + self.out_project = None + + self.query_downsample = query_downsample + self.key_downsample = key_downsample + self.matmul_norm = matmul_norm + + def build_project(self, in_channels: int, channels: int, num_convs: int, use_conv_module: bool): + if use_conv_module: + convs = [ + ConvBNReLU( + in_channels=in_channels, + out_channels=channels, + kernel_size=1, + bias_attr=False) + ] + for _ in range(num_convs - 1): + convs.append( + ConvBNReLU( + in_channels=channels, + out_channels=channels, + kernel_size=1, + bias_attr=False)) + else: + convs = [nn.Conv2D(in_channels, channels, 1)] + for _ in range(num_convs - 1): + convs.append(nn.Conv2D(channels, channels, 1)) + + if len(convs) > 1: + convs = nn.Sequential(*convs) + else: + convs = convs[0] + return convs + + def forward(self, query_feats: paddle.Tensor, key_feats: paddle.Tensor) -> paddle.Tensor: + query_shape = paddle.shape(query_feats) + query = self.query_project(query_feats) + if self.query_downsample is not None: + query = self.query_downsample(query) + query = query.flatten(2).transpose([0, 2, 1]) + + key = self.key_project(key_feats) + value = self.value_project(key_feats) + + if self.key_downsample is not None: + key = self.key_downsample(key) + value = self.key_downsample(value) + + key = key.flatten(2) + value = value.flatten(2).transpose([0, 2, 1]) + sim_map = paddle.matmul(query, key) + if self.matmul_norm: + sim_map = (self.channels**-0.5) * sim_map + sim_map = F.softmax(sim_map, axis=-1) + + context = paddle.matmul(sim_map, value) + context = paddle.transpose(context, [0, 2, 1]) + + context = paddle.reshape( + context, [0, self.out_channels, query_shape[2], query_shape[3]]) + + if self.out_project is not None: + context = self.out_project(context) + return context diff --git a/modules/image/semantic_segmentation/isanet_resnet50_voc/module.py b/modules/image/semantic_segmentation/isanet_resnet50_voc/module.py new file mode 100644 index 0000000000000000000000000000000000000000..ed92c128629c74e23522c42b7efb0cc85425a05d --- /dev/null +++ b/modules/image/semantic_segmentation/isanet_resnet50_voc/module.py @@ -0,0 +1,221 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from typing import Union, List, Tuple + +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +from isanet_resnet50_voc.resnet import ResNet50_vd +import isanet_resnet50_voc.layers as layers + + +@moduleinfo( + name="isanet_resnet50_voc", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="ISANetResnet50 is a segmentation model.", + version="1.0.0", + meta=ImageSegmentationModule) +class ISANet(nn.Layer): + """Interlaced Sparse Self-Attention for Semantic Segmentation. + + The original article refers to Lang Huang, et al. "Interlaced Sparse Self-Attention for Semantic Segmentation" + (https://arxiv.org/abs/1907.12273). + + Args: + num_classes (int): The unique number of target classes. + backbone_indices (tuple): The values in the tuple indicate the indices of output of backbone. + isa_channels (int): The channels of ISA Module. + down_factor (tuple): Divide the height and width dimension to (Ph, PW) groups. + enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None. + + """ + + def __init__(self, + num_classes: int = 21, + backbone_indices: Tuple[int] = (2, 3), + isa_channels: int = 256, + down_factor: Tuple[int] = (8, 8), + enable_auxiliary_loss: bool = True, + align_corners: bool = False, + pretrained: str = None): + super(ISANet, self).__init__() + + self.backbone = ResNet50_vd() + self.backbone_indices = backbone_indices + in_channels = [self.backbone.feat_channels[i] for i in backbone_indices] + self.head = ISAHead(num_classes, in_channels, isa_channels, down_factor, + enable_auxiliary_loss) + self.align_corners = align_corners + self.transforms = T.Compose([T.Normalize()]) + + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]: + return self.transforms(img) + + def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]: + feats = self.backbone(x) + feats = [feats[i] for i in self.backbone_indices] + logit_list = self.head(feats) + logit_list = [ + F.interpolate( + logit, + paddle.shape(x)[2:], + mode='bilinear', + align_corners=self.align_corners, + align_mode=1) for logit in logit_list + ] + + return logit_list + + +class ISAHead(nn.Layer): + """ + The ISAHead. + + Args: + num_classes (int): The unique number of target classes. + in_channels (tuple): The number of input channels. + isa_channels (int): The channels of ISA Module. + down_factor (tuple): Divide the height and width dimension to (Ph, PW) groups. + enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True. + """ + + def __init__(self, + num_classes: int, + in_channels: Tuple[int], + isa_channels: int, + down_factor: Tuple[int], + enable_auxiliary_loss: bool): + super(ISAHead, self).__init__() + self.in_channels = in_channels[-1] + inter_channels = self.in_channels // 4 + self.inter_channels = inter_channels + self.down_factor = down_factor + self.enable_auxiliary_loss = enable_auxiliary_loss + self.in_conv = layers.ConvBNReLU( + self.in_channels, inter_channels, 3, bias_attr=False) + self.global_relation = SelfAttentionBlock(inter_channels, isa_channels) + self.local_relation = SelfAttentionBlock(inter_channels, isa_channels) + self.out_conv = layers.ConvBNReLU( + inter_channels * 2, inter_channels, 1, bias_attr=False) + self.cls = nn.Sequential( + nn.Dropout2D(p=0.1), nn.Conv2D(inter_channels, num_classes, 1)) + self.aux = nn.Sequential( + layers.ConvBNReLU( + in_channels=1024, + out_channels=256, + kernel_size=3, + bias_attr=False), nn.Dropout2D(p=0.1), + nn.Conv2D(256, num_classes, 1)) + + def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]: + C3, C4 = feat_list + x = self.in_conv(C4) + x_shape = paddle.shape(x) + P_h, P_w = self.down_factor + Q_h, Q_w = paddle.ceil(x_shape[2] / P_h).astype('int32'), paddle.ceil( + x_shape[3] / P_w).astype('int32') + pad_h, pad_w = (Q_h * P_h - x_shape[2]).astype('int32'), ( + Q_w * P_w - x_shape[3]).astype('int32') + if pad_h > 0 or pad_w > 0: + padding = paddle.concat([ + pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2 + ], + axis=0) + feat = F.pad(x, padding) + else: + feat = x + + feat = feat.reshape([0, x_shape[1], Q_h, P_h, Q_w, P_w]) + feat = feat.transpose([0, 3, 5, 1, 2, + 4]).reshape([-1, self.inter_channels, Q_h, Q_w]) + feat = self.global_relation(feat) + + feat = feat.reshape([x_shape[0], P_h, P_w, x_shape[1], Q_h, Q_w]) + feat = feat.transpose([0, 4, 5, 3, 1, + 2]).reshape([-1, self.inter_channels, P_h, P_w]) + feat = self.local_relation(feat) + + feat = feat.reshape([x_shape[0], Q_h, Q_w, x_shape[1], P_h, P_w]) + feat = feat.transpose([0, 3, 1, 4, 2, 5]).reshape( + [0, self.inter_channels, P_h * Q_h, P_w * Q_w]) + if pad_h > 0 or pad_w > 0: + feat = paddle.slice( + feat, + axes=[2, 3], + starts=[pad_h // 2, pad_w // 2], + ends=[pad_h // 2 + x_shape[2], pad_w // 2 + x_shape[3]]) + + feat = self.out_conv(paddle.concat([feat, x], axis=1)) + output = self.cls(feat) + + if self.enable_auxiliary_loss: + auxout = self.aux(C3) + return [output, auxout] + else: + return [output] + + +class SelfAttentionBlock(layers.AttentionBlock): + """General self-attention block/non-local block. + + Args: + in_channels (int): Input channels of key/query feature. + channels (int): Output channels of key/query transform. + """ + + def __init__(self, in_channels, channels): + super(SelfAttentionBlock, self).__init__( + key_in_channels=in_channels, + query_in_channels=in_channels, + channels=channels, + out_channels=in_channels, + share_key_query=False, + query_downsample=None, + key_downsample=None, + key_query_num_convs=2, + key_query_norm=True, + value_out_num_convs=1, + value_out_norm=False, + matmul_norm=True, + with_out=False) + + self.output_project = self.build_project( + in_channels, in_channels, num_convs=1, use_conv_module=True) + + def forward(self, x): + context = super(SelfAttentionBlock, self).forward(x, x) + return self.output_project(context) diff --git a/modules/image/semantic_segmentation/isanet_resnet50_voc/resnet.py b/modules/image/semantic_segmentation/isanet_resnet50_voc/resnet.py new file mode 100644 index 0000000000000000000000000000000000000000..39327564de43c1999fe19e562a821a4f1eb8100b --- /dev/null +++ b/modules/image/semantic_segmentation/isanet_resnet50_voc/resnet.py @@ -0,0 +1,359 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +import isanet_resnet50_voc.layers as layers + + +class ConvBNLayer(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + stride: int = 1, + dilation: int = 1, + groups: int = 1, + is_vd_mode: bool = False, + act: str = None, + data_format: str = 'NCHW'): + super(ConvBNLayer, self).__init__() + if dilation != 1 and kernel_size != 3: + raise RuntimeError("When the dilation isn't 1," \ + "the kernel_size should be 3.") + + self.is_vd_mode = is_vd_mode + self._pool2d_avg = nn.AvgPool2D( + kernel_size=2, + stride=2, + padding=0, + ceil_mode=True, + data_format=data_format) + self._conv = nn.Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2 \ + if dilation == 1 else dilation, + dilation=dilation, + groups=groups, + bias_attr=False, + data_format=data_format) + + self._batch_norm = layers.SyncBatchNorm( + out_channels, data_format=data_format) + self._act_op = layers.Activation(act=act) + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + if self.is_vd_mode: + inputs = self._pool2d_avg(inputs) + y = self._conv(inputs) + y = self._batch_norm(y) + y = self._act_op(y) + + return y + + +class BottleneckBlock(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + shortcut: bool = True, + if_first: bool = False, + dilation: int = 1, + data_format: str = 'NCHW'): + super(BottleneckBlock, self).__init__() + + self.data_format = data_format + self.conv0 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + act='relu', + data_format=data_format) + + self.dilation = dilation + + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + act='relu', + dilation=dilation, + data_format=data_format) + self.conv2 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels * 4, + kernel_size=1, + act=None, + data_format=data_format) + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels * 4, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + data_format=data_format) + + self.shortcut = shortcut + # NOTE: Use the wrap layer for quantization training + self.add = layers.Add() + self.relu = layers.Activation(act="relu") + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + conv1 = self.conv1(y) + conv2 = self.conv2(conv1) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + + y = self.add(short, conv2) + y = self.relu(y) + return y + + +class BasicBlock(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + dilation: int = 1, + shortcut: bool = True, + if_first: bool = False, + data_format: str = 'NCHW'): + super(BasicBlock, self).__init__() + self.conv0 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + dilation=dilation, + act='relu', + data_format=data_format) + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + dilation=dilation, + act=None, + data_format=data_format) + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + data_format=data_format) + + self.shortcut = shortcut + self.dilation = dilation + self.data_format = data_format + self.add = layers.Add() + self.relu = layers.Activation(act="relu") + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + conv1 = self.conv1(y) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + y = self.add(short, conv1) + y = self.relu(y) + + return y + + +class ResNet_vd(nn.Layer): + """ + The ResNet_vd implementation based on PaddlePaddle. + + The original article refers to Jingdong + Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks" + (https://arxiv.org/pdf/1812.01187.pdf). + + Args: + layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50. + output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8. + multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1). + pretrained (str, optional): The path of pretrained model. + + """ + + def __init__(self, + layers: int = 50, + output_stride: int = 8, + multi_grid: Tuple[int] = (1, 1, 1), + pretrained: str = None, + data_format: str = 'NCHW'): + super(ResNet_vd, self).__init__() + + self.data_format = data_format + self.conv1_logit = None # for gscnn shape stream + self.layers = layers + supported_layers = [18, 34, 50, 101, 152, 200] + assert layers in supported_layers, \ + "supported layers are {} but input layer is {}".format( + supported_layers, layers) + + if layers == 18: + depth = [2, 2, 2, 2] + elif layers == 34 or layers == 50: + depth = [3, 4, 6, 3] + elif layers == 101: + depth = [3, 4, 23, 3] + elif layers == 152: + depth = [3, 8, 36, 3] + elif layers == 200: + depth = [3, 12, 48, 3] + num_channels = [64, 256, 512, 1024 + ] if layers >= 50 else [64, 64, 128, 256] + num_filters = [64, 128, 256, 512] + + # for channels of four returned stages + self.feat_channels = [c * 4 for c in num_filters + ] if layers >= 50 else num_filters + + dilation_dict = None + if output_stride == 8: + dilation_dict = {2: 2, 3: 4} + elif output_stride == 16: + dilation_dict = {3: 2} + + self.conv1_1 = ConvBNLayer( + in_channels=3, + out_channels=32, + kernel_size=3, + stride=2, + act='relu', + data_format=data_format) + self.conv1_2 = ConvBNLayer( + in_channels=32, + out_channels=32, + kernel_size=3, + stride=1, + act='relu', + data_format=data_format) + self.conv1_3 = ConvBNLayer( + in_channels=32, + out_channels=64, + kernel_size=3, + stride=1, + act='relu', + data_format=data_format) + self.pool2d_max = nn.MaxPool2D( + kernel_size=3, stride=2, padding=1, data_format=data_format) + + # self.block_list = [] + self.stage_list = [] + if layers >= 50: + for block in range(len(depth)): + shortcut = False + block_list = [] + for i in range(depth[block]): + if layers in [101, 152] and block == 2: + if i == 0: + conv_name = "res" + str(block + 2) + "a" + else: + conv_name = "res" + str(block + 2) + "b" + str(i) + else: + conv_name = "res" + str(block + 2) + chr(97 + i) + + ############################################################################### + # Add dilation rate for some segmentation tasks, if dilation_dict is not None. + dilation_rate = dilation_dict[ + block] if dilation_dict and block in dilation_dict else 1 + + # Actually block here is 'stage', and i is 'block' in 'stage' + # At the stage 4, expand the the dilation_rate if given multi_grid + if block == 3: + dilation_rate = dilation_rate * multi_grid[i] + ############################################################################### + + bottleneck_block = self.add_sublayer( + 'bb_%d_%d' % (block, i), + BottleneckBlock( + in_channels=num_channels[block] + if i == 0 else num_filters[block] * 4, + out_channels=num_filters[block], + stride=2 if i == 0 and block != 0 + and dilation_rate == 1 else 1, + shortcut=shortcut, + if_first=block == i == 0, + dilation=dilation_rate, + data_format=data_format)) + + block_list.append(bottleneck_block) + shortcut = True + self.stage_list.append(block_list) + else: + for block in range(len(depth)): + shortcut = False + block_list = [] + for i in range(depth[block]): + dilation_rate = dilation_dict[block] \ + if dilation_dict and block in dilation_dict else 1 + if block == 3: + dilation_rate = dilation_rate * multi_grid[i] + + basic_block = self.add_sublayer( + 'bb_%d_%d' % (block, i), + BasicBlock( + in_channels=num_channels[block] + if i == 0 else num_filters[block], + out_channels=num_filters[block], + stride=2 if i == 0 and block != 0 \ + and dilation_rate == 1 else 1, + dilation=dilation_rate, + shortcut=shortcut, + if_first=block == i == 0, + data_format=data_format)) + block_list.append(basic_block) + shortcut = True + self.stage_list.append(block_list) + + self.pretrained = pretrained + + def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]: + y = self.conv1_1(inputs) + y = self.conv1_2(y) + y = self.conv1_3(y) + self.conv1_logit = y.clone() + y = self.pool2d_max(y) + + # A feature list saves the output feature map of each stage. + feat_list = [] + for stage in self.stage_list: + for block in stage: + y = block(y) + feat_list.append(y) + + return feat_list + + +def ResNet50_vd(**args): + model = ResNet_vd(layers=50, **args) + return model \ No newline at end of file diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/README.md b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/README.md new file mode 100644 index 0000000000000000000000000000000000000000..4e9f9b5138fdd72c72542a2bdf1955f48266ac94 --- /dev/null +++ b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/README.md @@ -0,0 +1,182 @@ +# pspnet_resnet50_cityscapes + +|模型名称|pspnet_resnet50_cityscapes| +| :--- | :---: | +|类别|图像-图像分割| +|网络|pspnet_resnet50vd| +|数据集|Cityscapes| +|是否支持Fine-tuning|是| +|模型大小|390MB| +|指标|-| +|最新更新日期|2022-03-21| + +## 一、模型基本信息 + + - 样例结果示例: +

+ +

+ +- ### 模型介绍 + + - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。 + - 更多详情请参考:[pspnet](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf) + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、安装 + + - ```shell + $ hub install pspnet_resnet50_cityscapes + ``` + + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + +## 三、模型API预测 + +- ### 1.预测代码示例 + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='pspnet_resnet50_cityscapes') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.如何开始Fine-tune + + - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用pspnet_resnet50_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下: + + - 代码步骤 + + - Step1: 定义数据预处理方式 + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + + - Step2: 下载数据集并使用 + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + - `transforms`: 数据预处理方式。 + - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + + - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + + - Step3: 加载预训练模型 + + - ```python + import paddlehub as hub + + model = hub.Module(name='pspnet_resnet50_cityscapes', num_classes=2, pretrained=None) + ``` + - `name`: 选择预训练模型的名字。 + - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + + - Step4: 选择优化策略和运行配置 + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + - 模型预测 + + - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='pspnet_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - 参数配置正确后,请执行脚本`python predict.py`。 + + - **Args** + * `images`:原始图像路径或BGR格式图片; + * `visualization`: 是否可视化,默认为True; + * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + + **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 四、服务部署 + +- PaddleHub Serving可以部署一个在线图像分割服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + + - ```shell + $ hub serving start -m pspnet_resnet50_cityscapes + ``` + + - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + # 发送HTTP请求 + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## 五、更新历史 + +* 1.0.0 + + 初始发布 diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/README_en.md b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..975a846291dad727ca73918aa98644f4c70fa7a0 --- /dev/null +++ b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/README_en.md @@ -0,0 +1,181 @@ +# pspnet_resnet50_cityscapes + +|Module Name|pspnet_resnet50_cityscapes| +| :--- | :---: | +|Category|Image Segmentation| +|Network|pspnet_resnet50vd| +|Dataset|Cityscapes| +|Fine-tuning supported or not|Yes| +|Module Size|390MB| +|Data indicators|-| +|Latest update date|2022-03-21| + +## I. Basic Information + +- ### Application Effect Display + - Sample results: +

+ +

+ +- ### Module Introduction + + - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction. + - For more information, please refer to: [pspnet](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf) + +## II. Installation + +- ### 1、Environmental Dependence + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、Installation + + - ```shell + $ hub install pspnet_resnet50_cityscapes + ``` + + - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) + | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md) + + +## III. Module API Prediction + +- ### 1、Prediction Code Example + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='pspnet_resnet50_cityscapes') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.Fine-tune and Encapsulation + + - After completing the installation of PaddlePaddle and PaddleHub, you can start using the pspnet_resnet50_cityscapes model to fine-tune datasets such as OpticDiscSeg. + + - Steps: + + - Step1: Define the data preprocessing method + + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs. + + - Step2: Download the dataset + + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + * `transforms`: data preprocessing methods. + + * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`. + + * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory. + + - Step3: Load the pre-trained model + + - ```python + import paddlehub as hub + + model = hub.Module(name='pspnet_resnet50_cityscapes', num_classes=2, pretrained=None) + ``` + - `name`: model name. + - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters. + + - Step4: Optimization strategy + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + - Model prediction + + - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='pspnet_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - **Args** + * `images`: Image path or ndarray data with format [H, W, C], BGR. + * `visualization`: Whether to save the recognition results as picture files. + * `save_path`: Save path of the result, default is 'seg_result'. + + +## IV. Server Deployment + +- PaddleHub Serving can deploy an online service of image segmentation. + +- ### Step 1: Start PaddleHub Serving + + - Run the startup command: + + - ```shell + $ hub serving start -m pspnet_resnet50_cityscapes + ``` + + - The servitization API is now deployed and the default port number is 8866. + + - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set. + +- ### Step 2: Send a predictive request + + - With a configured server, use the following lines of code to send the prediction request and obtain the result: + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/pspnet_resnet50_cityscapes" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## V. Release Note + +- 1.0.0 + + First release diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/layers.py b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..af3c8765f8e26760cae5fad963d7949d3d1fdc3d --- /dev/null +++ b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/layers.py @@ -0,0 +1,356 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.layer import activation +from paddle.nn import Conv2D, AvgPool2D + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu': + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class SeparableConvBNReLU(nn.Layer): + """Depthwise Separable Convolution.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(SeparableConvBNReLU, self).__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU( + in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBN, self).__init__() + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBNReLU, self).__init__() + + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + Returns: + A callable object of Activation. + Raises: + KeyError: When parameter `act` is not in the optional range. + Examples: + from paddleseg.models.common.activation import Activation + relu = Activation("relu") + print(relu) + # + sigmoid = Activation("sigmoid") + print(sigmoid) + # + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format( + act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + + if self._act is not None: + return self.act_func(x) + else: + return x + + +class ASPPModule(nn.Layer): + """ + Atrous Spatial Pyramid Pooling. + Args: + aspp_ratios (tuple): The dilation rate using in ASSP module. + in_channels (int): The number of input channels. + out_channels (int): The number of output channels. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False. + image_pooling (bool, optional): If augmented with image-level features. Default: False + """ + + def __init__(self, + aspp_ratios: tuple, + in_channels: int, + out_channels: int, + align_corners: bool, + use_sep_conv: bool = False, + image_pooling: bool = False): + super().__init__() + + self.align_corners = align_corners + self.aspp_blocks = nn.LayerList() + + for ratio in aspp_ratios: + if use_sep_conv and ratio > 1: + conv_func = SeparableConvBNReLU + else: + conv_func = ConvBNReLU + + block = conv_func( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1 if ratio == 1 else 3, + dilation=ratio, + padding=0 if ratio == 1 else ratio) + self.aspp_blocks.append(block) + + out_size = len(self.aspp_blocks) + + if image_pooling: + self.global_avg_pool = nn.Sequential( + nn.AdaptiveAvgPool2D(output_size=(1, 1)), + ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False)) + out_size += 1 + self.image_pooling = image_pooling + + self.conv_bn_relu = ConvBNReLU( + in_channels=out_channels * out_size, + out_channels=out_channels, + kernel_size=1) + + self.dropout = nn.Dropout(p=0.1) # drop rate + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outputs = [] + for block in self.aspp_blocks: + y = block(x) + y = F.interpolate( + y, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(y) + + if self.image_pooling: + img_avg = self.global_avg_pool(x) + img_avg = F.interpolate( + img_avg, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(img_avg) + + x = paddle.concat(outputs, axis=1) + x = self.conv_bn_relu(x) + x = self.dropout(x) + + return x + + +class AuxLayer(nn.Layer): + """ + The auxiliary layer implementation for auxiliary loss. + + Args: + in_channels (int): The number of input channels. + inter_channels (int): The intermediate channels. + out_channels (int): The number of output channels, and usually it is num_classes. + dropout_prob (float, optional): The drop rate. Default: 0.1. + """ + + def __init__(self, + in_channels: int, + inter_channels: int, + out_channels: int, + dropout_prob: float = 0.1, + **kwargs): + super().__init__() + + self.conv_bn_relu = ConvBNReLU( + in_channels=in_channels, + out_channels=inter_channels, + kernel_size=3, + padding=1, + **kwargs) + + self.dropout = nn.Dropout(p=dropout_prob) + + self.conv = nn.Conv2D( + in_channels=inter_channels, + out_channels=out_channels, + kernel_size=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv_bn_relu(x) + x = self.dropout(x) + x = self.conv(x) + return x + + +class Add(nn.Layer): + def __init__(self): + super().__init__() + + def forward(self, x: paddle.Tensor, y: paddle.Tensor, name: str = None) -> paddle.Tensor: + return paddle.add(x, y, name) + +class PPModule(nn.Layer): + """ + Pyramid pooling module originally in PSPNet. + + Args: + in_channels (int): The number of intput channels to pyramid pooling module. + out_channels (int): The number of output channels after pyramid pooling module. + bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1, 2, 3, 6). + dim_reduction (bool, optional): A bool value represents if reducing dimension after pooling. Default: True. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + """ + + def __init__(self, + in_channels: int, + out_channels: int, + bin_sizes: Tuple[int], + dim_reduction: bool, + align_corners: bool): + super().__init__() + + self.bin_sizes = bin_sizes + + inter_channels = in_channels + if dim_reduction: + inter_channels = in_channels // len(bin_sizes) + + # we use dimension reduction after pooling mentioned in original implementation. + self.stages = nn.LayerList([ + self._make_stage(in_channels, inter_channels, size) + for size in bin_sizes + ]) + + self.conv_bn_relu2 = ConvBNReLU( + in_channels=in_channels + inter_channels * len(bin_sizes), + out_channels=out_channels, + kernel_size=3, + padding=1) + + self.align_corners = align_corners + + def _make_stage(self, in_channels: int, out_channels: int, size: int): + """ + Create one pooling layer. + + In our implementation, we adopt the same dimension reduction as the original paper that might be + slightly different with other implementations. + + After pooling, the channels are reduced to 1/len(bin_sizes) immediately, while some other implementations + keep the channels to be same. + + Args: + in_channels (int): The number of intput channels to pyramid pooling module. + size (int): The out size of the pooled layer. + + Returns: + conv (Tensor): A tensor after Pyramid Pooling Module. + """ + + prior = nn.AdaptiveAvgPool2D(output_size=(size, size)) + conv = ConvBNReLU( + in_channels=in_channels, out_channels=out_channels, kernel_size=1) + + return nn.Sequential(prior, conv) + + def forward(self, input: paddle.Tensor) -> paddle.Tensor: + cat_layers = [] + for stage in self.stages: + x = stage(input) + x = F.interpolate( + x, + paddle.shape(input)[2:], + mode='bilinear', + align_corners=self.align_corners) + cat_layers.append(x) + cat_layers = [input] + cat_layers[::-1] + cat = paddle.concat(cat_layers, axis=1) + out = self.conv_bn_relu2(cat) + + return out diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/module.py b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/module.py new file mode 100644 index 0000000000000000000000000000000000000000..8657af0d849b6d6141a1f4a533313858753c28aa --- /dev/null +++ b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/module.py @@ -0,0 +1,165 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from typing import Union, List, Tuple + +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +from pspnet_resnet50_cityscapes.resnet import ResNet50_vd +import pspnet_resnet50_cityscapes.layers as layers + +@moduleinfo( + name="pspnet_resnet50_cityscapes", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="PSPNetResnet50 is a segmentation model.", + version="1.0.0", + meta=ImageSegmentationModule) +class PSPNet(nn.Layer): + """ + The PSPNet implementation based on PaddlePaddle. + + The original article refers to + Zhao, Hengshuang, et al. "Pyramid scene parsing network" + (https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf). + + Args: + num_classes (int): The unique number of target classes. + backbone_indices (tuple, optional): Two values in the tuple indicate the indices of output of backbone. + pp_out_channels (int, optional): The output channels after Pyramid Pooling Module. Default: 1024. + bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1,2,3,6). + enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True. + align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even, + e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None. + """ + + def __init__(self, + num_classes: int = 19, + backbone_indices: Tuple[int] = (2, 3), + pp_out_channels: int = 1024, + bin_sizes: Tuple[int] = (1, 2, 3, 6), + enable_auxiliary_loss: bool = True, + align_corners: bool = False, + pretrained: str = None): + super(PSPNet, self).__init__() + + self.backbone = ResNet50_vd() + backbone_channels = [ + self.backbone.feat_channels[i] for i in backbone_indices + ] + + self.head = PSPNetHead(num_classes, backbone_indices, backbone_channels, + pp_out_channels, bin_sizes, + enable_auxiliary_loss, align_corners) + self.align_corners = align_corners + self.transforms = T.Compose([T.Normalize()]) + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]: + return self.transforms(img) + + def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]: + feat_list = self.backbone(x) + logit_list = self.head(feat_list) + return [ + F.interpolate( + logit, + paddle.shape(x)[2:], + mode='bilinear', + align_corners=self.align_corners) for logit in logit_list + ] + + +class PSPNetHead(nn.Layer): + """ + The PSPNetHead implementation. + + Args: + num_classes (int): The unique number of target classes. + backbone_indices (tuple): Two values in the tuple indicate the indices of output of backbone. + The first index will be taken as a deep-supervision feature in auxiliary layer; + the second one will be taken as input of Pyramid Pooling Module (PPModule). + Usually backbone consists of four downsampling stage, and return an output of + each stage. If we set it as (2, 3) in ResNet, that means taking feature map of the third + stage (res4b22) in backbone, and feature map of the fourth stage (res5c) as input of PPModule. + backbone_channels (tuple): The same length with "backbone_indices". It indicates the channels of corresponding index. + pp_out_channels (int): The output channels after Pyramid Pooling Module. + bin_sizes (tuple): The out size of pooled feature maps. + enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + """ + + def __init__(self, num_classes, backbone_indices, backbone_channels, + pp_out_channels, bin_sizes, enable_auxiliary_loss, + align_corners): + + super().__init__() + + self.backbone_indices = backbone_indices + + self.psp_module = layers.PPModule( + in_channels=backbone_channels[1], + out_channels=pp_out_channels, + bin_sizes=bin_sizes, + dim_reduction=True, + align_corners=align_corners) + + self.dropout = nn.Dropout(p=0.1) # dropout_prob + + self.conv = nn.Conv2D( + in_channels=pp_out_channels, + out_channels=num_classes, + kernel_size=1) + + if enable_auxiliary_loss: + self.auxlayer = layers.AuxLayer( + in_channels=backbone_channels[0], + inter_channels=backbone_channels[0] // 4, + out_channels=num_classes) + + self.enable_auxiliary_loss = enable_auxiliary_loss + + def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]: + logit_list = [] + x = feat_list[self.backbone_indices[1]] + x = self.psp_module(x) + x = self.dropout(x) + logit = self.conv(x) + logit_list.append(logit) + + if self.enable_auxiliary_loss: + auxiliary_feat = feat_list[self.backbone_indices[0]] + auxiliary_logit = self.auxlayer(auxiliary_feat) + logit_list.append(auxiliary_logit) + + return logit_list diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/resnet.py b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/resnet.py new file mode 100644 index 0000000000000000000000000000000000000000..f70720eeccde8e26e94ff0a3abb555c17d5cc7c7 --- /dev/null +++ b/modules/image/semantic_segmentation/pspnet_resnet50_cityscapes/resnet.py @@ -0,0 +1,357 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle.nn as nn +import pspnet_resnet50_cityscapes.layers as layers + + +class ConvBNLayer(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + stride: int = 1, + dilation: int = 1, + groups: int = 1, + is_vd_mode: bool = False, + act: str = None, + data_format: str = 'NCHW'): + super(ConvBNLayer, self).__init__() + if dilation != 1 and kernel_size != 3: + raise RuntimeError("When the dilation isn't 1," \ + "the kernel_size should be 3.") + + self.is_vd_mode = is_vd_mode + self._pool2d_avg = nn.AvgPool2D( + kernel_size=2, + stride=2, + padding=0, + ceil_mode=True, + data_format=data_format) + self._conv = nn.Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2 \ + if dilation == 1 else dilation, + dilation=dilation, + groups=groups, + bias_attr=False, + data_format=data_format) + + self._batch_norm = layers.SyncBatchNorm( + out_channels, data_format=data_format) + self._act_op = layers.Activation(act=act) + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + if self.is_vd_mode: + inputs = self._pool2d_avg(inputs) + y = self._conv(inputs) + y = self._batch_norm(y) + y = self._act_op(y) + + return y + + +class BottleneckBlock(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + shortcut: bool = True, + if_first: bool = False, + dilation: int = 1, + data_format: str = 'NCHW'): + super(BottleneckBlock, self).__init__() + + self.data_format = data_format + self.conv0 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + act='relu', + data_format=data_format) + + self.dilation = dilation + + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + act='relu', + dilation=dilation, + data_format=data_format) + self.conv2 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels * 4, + kernel_size=1, + act=None, + data_format=data_format) + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels * 4, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + data_format=data_format) + + self.shortcut = shortcut + # NOTE: Use the wrap layer for quantization training + self.add = layers.Add() + self.relu = layers.Activation(act="relu") + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + conv1 = self.conv1(y) + conv2 = self.conv2(conv1) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + + y = self.add(short, conv2) + y = self.relu(y) + return y + + +class BasicBlock(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + dilation: int = 1, + shortcut: bool = True, + if_first: bool = False, + data_format: str = 'NCHW'): + super(BasicBlock, self).__init__() + self.conv0 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + dilation=dilation, + act='relu', + data_format=data_format) + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + dilation=dilation, + act=None, + data_format=data_format) + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + data_format=data_format) + + self.shortcut = shortcut + self.dilation = dilation + self.data_format = data_format + self.add = layers.Add() + self.relu = layers.Activation(act="relu") + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + conv1 = self.conv1(y) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + y = self.add(short, conv1) + y = self.relu(y) + + return y + + +class ResNet_vd(nn.Layer): + """ + The ResNet_vd implementation based on PaddlePaddle. + + The original article refers to Jingdong + Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks" + (https://arxiv.org/pdf/1812.01187.pdf). + + Args: + layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50. + output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8. + multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1). + pretrained (str, optional): The path of pretrained model. + + """ + + def __init__(self, + layers: int = 50, + output_stride: int = 8, + multi_grid: Tuple[int] = (1, 1, 1), + pretrained: str = None, + data_format: str = 'NCHW'): + super(ResNet_vd, self).__init__() + + self.data_format = data_format + self.conv1_logit = None # for gscnn shape stream + self.layers = layers + supported_layers = [18, 34, 50, 101, 152, 200] + assert layers in supported_layers, \ + "supported layers are {} but input layer is {}".format( + supported_layers, layers) + + if layers == 18: + depth = [2, 2, 2, 2] + elif layers == 34 or layers == 50: + depth = [3, 4, 6, 3] + elif layers == 101: + depth = [3, 4, 23, 3] + elif layers == 152: + depth = [3, 8, 36, 3] + elif layers == 200: + depth = [3, 12, 48, 3] + num_channels = [64, 256, 512, 1024 + ] if layers >= 50 else [64, 64, 128, 256] + num_filters = [64, 128, 256, 512] + + # for channels of four returned stages + self.feat_channels = [c * 4 for c in num_filters + ] if layers >= 50 else num_filters + + dilation_dict = None + if output_stride == 8: + dilation_dict = {2: 2, 3: 4} + elif output_stride == 16: + dilation_dict = {3: 2} + + self.conv1_1 = ConvBNLayer( + in_channels=3, + out_channels=32, + kernel_size=3, + stride=2, + act='relu', + data_format=data_format) + self.conv1_2 = ConvBNLayer( + in_channels=32, + out_channels=32, + kernel_size=3, + stride=1, + act='relu', + data_format=data_format) + self.conv1_3 = ConvBNLayer( + in_channels=32, + out_channels=64, + kernel_size=3, + stride=1, + act='relu', + data_format=data_format) + self.pool2d_max = nn.MaxPool2D( + kernel_size=3, stride=2, padding=1, data_format=data_format) + + # self.block_list = [] + self.stage_list = [] + if layers >= 50: + for block in range(len(depth)): + shortcut = False + block_list = [] + for i in range(depth[block]): + if layers in [101, 152] and block == 2: + if i == 0: + conv_name = "res" + str(block + 2) + "a" + else: + conv_name = "res" + str(block + 2) + "b" + str(i) + else: + conv_name = "res" + str(block + 2) + chr(97 + i) + + ############################################################################### + # Add dilation rate for some segmentation tasks, if dilation_dict is not None. + dilation_rate = dilation_dict[ + block] if dilation_dict and block in dilation_dict else 1 + + # Actually block here is 'stage', and i is 'block' in 'stage' + # At the stage 4, expand the the dilation_rate if given multi_grid + if block == 3: + dilation_rate = dilation_rate * multi_grid[i] + ############################################################################### + + bottleneck_block = self.add_sublayer( + 'bb_%d_%d' % (block, i), + BottleneckBlock( + in_channels=num_channels[block] + if i == 0 else num_filters[block] * 4, + out_channels=num_filters[block], + stride=2 if i == 0 and block != 0 + and dilation_rate == 1 else 1, + shortcut=shortcut, + if_first=block == i == 0, + dilation=dilation_rate, + data_format=data_format)) + + block_list.append(bottleneck_block) + shortcut = True + self.stage_list.append(block_list) + else: + for block in range(len(depth)): + shortcut = False + block_list = [] + for i in range(depth[block]): + dilation_rate = dilation_dict[block] \ + if dilation_dict and block in dilation_dict else 1 + if block == 3: + dilation_rate = dilation_rate * multi_grid[i] + + basic_block = self.add_sublayer( + 'bb_%d_%d' % (block, i), + BasicBlock( + in_channels=num_channels[block] + if i == 0 else num_filters[block], + out_channels=num_filters[block], + stride=2 if i == 0 and block != 0 \ + and dilation_rate == 1 else 1, + dilation=dilation_rate, + shortcut=shortcut, + if_first=block == i == 0, + data_format=data_format)) + block_list.append(basic_block) + shortcut = True + self.stage_list.append(block_list) + + self.pretrained = pretrained + + def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]: + y = self.conv1_1(inputs) + y = self.conv1_2(y) + y = self.conv1_3(y) + self.conv1_logit = y.clone() + y = self.pool2d_max(y) + + # A feature list saves the output feature map of each stage. + feat_list = [] + for stage in self.stage_list: + for block in stage: + y = block(y) + feat_list.append(y) + + return feat_list + + +def ResNet50_vd(**args): + model = ResNet_vd(layers=50, **args) + return model \ No newline at end of file diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_voc/README.md b/modules/image/semantic_segmentation/pspnet_resnet50_voc/README.md new file mode 100644 index 0000000000000000000000000000000000000000..97e4c156d0075787e0f80ab68655853437bf4cbf --- /dev/null +++ b/modules/image/semantic_segmentation/pspnet_resnet50_voc/README.md @@ -0,0 +1,182 @@ +# pspnet_resnet50_voc + +|模型名称|pspnet_resnet50_voc| +| :--- | :---: | +|类别|图像-图像分割| +|网络|pspnet_resnet50vd| +|数据集|PascalVOC2012| +|是否支持Fine-tuning|是| +|模型大小|390MB| +|指标|-| +|最新更新日期|2022-03-21| + +## 一、模型基本信息 + + - 样例结果示例: +

+ +

+ +- ### 模型介绍 + + - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。 + - 更多详情请参考:[pspnet](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf) + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、安装 + + - ```shell + $ hub install pspnet_resnet50_voc + ``` + + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + +## 三、模型API预测 + +- ### 1.预测代码示例 + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='pspnet_resnet50_voc') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.如何开始Fine-tune + + - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用pspnet_resnet50_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下: + + - 代码步骤 + + - Step1: 定义数据预处理方式 + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + + - Step2: 下载数据集并使用 + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + - `transforms`: 数据预处理方式。 + - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + + - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + + - Step3: 加载预训练模型 + + - ```python + import paddlehub as hub + + model = hub.Module(name='pspnet_resnet50_voc', num_classes=2, pretrained=None) + ``` + - `name`: 选择预训练模型的名字。 + - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + + - Step4: 选择优化策略和运行配置 + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + - 模型预测 + + - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='pspnet_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - 参数配置正确后,请执行脚本`python predict.py`。 + + - **Args** + * `images`:原始图像路径或BGR格式图片; + * `visualization`: 是否可视化,默认为True; + * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + + **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 四、服务部署 + +- PaddleHub Serving可以部署一个在线图像分割服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + + - ```shell + $ hub serving start -m pspnet_resnet50_voc + ``` + + - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + # 发送HTTP请求 + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## 五、更新历史 + +* 1.0.0 + + 初始发布 diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_voc/README_en.md b/modules/image/semantic_segmentation/pspnet_resnet50_voc/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..27b1489c951746861fdd7e123c77cb8c276bedee --- /dev/null +++ b/modules/image/semantic_segmentation/pspnet_resnet50_voc/README_en.md @@ -0,0 +1,181 @@ +# pspnet_resnet50_voc + +|Module Name|pspnet_resnet50_voc| +| :--- | :---: | +|Category|Image Segmentation| +|Network|pspnet_resnet50vd| +|Dataset|PascalVOC2012| +|Fine-tuning supported or not|Yes| +|Module Size|370MB| +|Data indicators|-| +|Latest update date|2022-03-22| + +## I. Basic Information + +- ### Application Effect Display + - Sample results: +

+ +

+ +- ### Module Introduction + + - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction. + - For more information, please refer to: [pspnet](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf) + +## II. Installation + +- ### 1、Environmental Dependence + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、Installation + + - ```shell + $ hub install pspnet_resnet50_voc + ``` + + - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) + | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md) + + +## III. Module API Prediction + +- ### 1、Prediction Code Example + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='pspnet_resnet50_voc') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.Fine-tune and Encapsulation + + - After completing the installation of PaddlePaddle and PaddleHub, you can start using the pspnet_resnet50_voc model to fine-tune datasets such as OpticDiscSeg. + + - Steps: + + - Step1: Define the data preprocessing method + + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs. + + - Step2: Download the dataset + + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + * `transforms`: data preprocessing methods. + + * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`. + + * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory. + + - Step3: Load the pre-trained model + + - ```python + import paddlehub as hub + + model = hub.Module(name='pspnet_resnet50_voc', num_classes=2, pretrained=None) + ``` + - `name`: model name. + - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters. + + - Step4: Optimization strategy + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + - Model prediction + + - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='pspnet_resnet50_voc', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - **Args** + * `images`: Image path or ndarray data with format [H, W, C], BGR. + * `visualization`: Whether to save the recognition results as picture files. + * `save_path`: Save path of the result, default is 'seg_result'. + + +## IV. Server Deployment + +- PaddleHub Serving can deploy an online service of image segmentation. + +- ### Step 1: Start PaddleHub Serving + + - Run the startup command: + + - ```shell + $ hub serving start -m pspnet_resnet50_voc + ``` + + - The servitization API is now deployed and the default port number is 8866. + + - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set. + +- ### Step 2: Send a predictive request + + - With a configured server, use the following lines of code to send the prediction request and obtain the result: + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/pspnet_resnet50_voc" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## V. Release Note + +- 1.0.0 + + First release diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_voc/layers.py b/modules/image/semantic_segmentation/pspnet_resnet50_voc/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..a40f65856efae6f29664a0ddc57f0f5b852139f8 --- /dev/null +++ b/modules/image/semantic_segmentation/pspnet_resnet50_voc/layers.py @@ -0,0 +1,353 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.layer import activation +from paddle.nn import Conv2D, AvgPool2D + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu': + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class SeparableConvBNReLU(nn.Layer): + """Depthwise Separable Convolution.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(SeparableConvBNReLU, self).__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU( + in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBN, self).__init__() + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBNReLU, self).__init__() + + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + Returns: + A callable object of Activation. + Raises: + KeyError: When parameter `act` is not in the optional range. + Examples: + from paddleseg.models.common.activation import Activation + relu = Activation("relu") + print(relu) + # + sigmoid = Activation("sigmoid") + print(sigmoid) + # + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format( + act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + + if self._act is not None: + return self.act_func(x) + else: + return x + + +class ASPPModule(nn.Layer): + """ + Atrous Spatial Pyramid Pooling. + Args: + aspp_ratios (tuple): The dilation rate using in ASSP module. + in_channels (int): The number of input channels. + out_channels (int): The number of output channels. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False. + image_pooling (bool, optional): If augmented with image-level features. Default: False + """ + + def __init__(self, + aspp_ratios: tuple, + in_channels: int, + out_channels: int, + align_corners: bool, + use_sep_conv: bool = False, + image_pooling: bool = False): + super().__init__() + + self.align_corners = align_corners + self.aspp_blocks = nn.LayerList() + + for ratio in aspp_ratios: + if use_sep_conv and ratio > 1: + conv_func = SeparableConvBNReLU + else: + conv_func = ConvBNReLU + + block = conv_func( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1 if ratio == 1 else 3, + dilation=ratio, + padding=0 if ratio == 1 else ratio) + self.aspp_blocks.append(block) + + out_size = len(self.aspp_blocks) + + if image_pooling: + self.global_avg_pool = nn.Sequential( + nn.AdaptiveAvgPool2D(output_size=(1, 1)), + ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False)) + out_size += 1 + self.image_pooling = image_pooling + + self.conv_bn_relu = ConvBNReLU( + in_channels=out_channels * out_size, + out_channels=out_channels, + kernel_size=1) + + self.dropout = nn.Dropout(p=0.1) # drop rate + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outputs = [] + for block in self.aspp_blocks: + y = block(x) + y = F.interpolate( + y, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(y) + + if self.image_pooling: + img_avg = self.global_avg_pool(x) + img_avg = F.interpolate( + img_avg, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(img_avg) + + x = paddle.concat(outputs, axis=1) + x = self.conv_bn_relu(x) + x = self.dropout(x) + + return x + + +class AuxLayer(nn.Layer): + """ + The auxiliary layer implementation for auxiliary loss. + + Args: + in_channels (int): The number of input channels. + inter_channels (int): The intermediate channels. + out_channels (int): The number of output channels, and usually it is num_classes. + dropout_prob (float, optional): The drop rate. Default: 0.1. + """ + + def __init__(self, + in_channels: int, + inter_channels: int, + out_channels: int, + dropout_prob: float = 0.1, + **kwargs): + super().__init__() + + self.conv_bn_relu = ConvBNReLU( + in_channels=in_channels, + out_channels=inter_channels, + kernel_size=3, + padding=1, + **kwargs) + + self.dropout = nn.Dropout(p=dropout_prob) + + self.conv = nn.Conv2D( + in_channels=inter_channels, + out_channels=out_channels, + kernel_size=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv_bn_relu(x) + x = self.dropout(x) + x = self.conv(x) + return x + + +class Add(nn.Layer): + def __init__(self): + super().__init__() + + def forward(self, x: paddle.Tensor, y: paddle.Tensor, name: str = None): + return paddle.add(x, y, name) + +class PPModule(nn.Layer): + """ + Pyramid pooling module originally in PSPNet. + + Args: + in_channels (int): The number of intput channels to pyramid pooling module. + out_channels (int): The number of output channels after pyramid pooling module. + bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1, 2, 3, 6). + dim_reduction (bool, optional): A bool value represents if reducing dimension after pooling. Default: True. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + """ + + def __init__(self, in_channels: int, out_channels: int, bin_sizes: tuple, dim_reduction: bool, + align_corners: bool): + super().__init__() + + self.bin_sizes = bin_sizes + + inter_channels = in_channels + if dim_reduction: + inter_channels = in_channels // len(bin_sizes) + + # we use dimension reduction after pooling mentioned in original implementation. + self.stages = nn.LayerList([ + self._make_stage(in_channels, inter_channels, size) + for size in bin_sizes + ]) + + self.conv_bn_relu2 = ConvBNReLU( + in_channels=in_channels + inter_channels * len(bin_sizes), + out_channels=out_channels, + kernel_size=3, + padding=1) + + self.align_corners = align_corners + + def _make_stage(self, in_channels: int, out_channels: int, size: int): + """ + Create one pooling layer. + + In our implementation, we adopt the same dimension reduction as the original paper that might be + slightly different with other implementations. + + After pooling, the channels are reduced to 1/len(bin_sizes) immediately, while some other implementations + keep the channels to be same. + + Args: + in_channels (int): The number of intput channels to pyramid pooling module. + out_channels (int): The number of output channels to pyramid pooling module. + size (int): The out size of the pooled layer. + + Returns: + conv (Tensor): A tensor after Pyramid Pooling Module. + """ + + prior = nn.AdaptiveAvgPool2D(output_size=(size, size)) + conv = ConvBNReLU( + in_channels=in_channels, out_channels=out_channels, kernel_size=1) + + return nn.Sequential(prior, conv) + + def forward(self, input: paddle.Tensor) -> paddle.Tensor: + cat_layers = [] + for stage in self.stages: + x = stage(input) + x = F.interpolate( + x, + paddle.shape(input)[2:], + mode='bilinear', + align_corners=self.align_corners) + cat_layers.append(x) + cat_layers = [input] + cat_layers[::-1] + cat = paddle.concat(cat_layers, axis=1) + out = self.conv_bn_relu2(cat) + + return out diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_voc/module.py b/modules/image/semantic_segmentation/pspnet_resnet50_voc/module.py new file mode 100644 index 0000000000000000000000000000000000000000..417b0d3385b45312a9b9358c5200d8bcb424e2df --- /dev/null +++ b/modules/image/semantic_segmentation/pspnet_resnet50_voc/module.py @@ -0,0 +1,165 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from typing import Union, List, Tuple + +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +from pspnet_resnet50_voc.resnet import ResNet50_vd +import pspnet_resnet50_voc.layers as layers + +@moduleinfo( + name="pspnet_resnet50_voc", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="PSPNetResnet50 is a segmentation model.", + version="1.0.0", + meta=ImageSegmentationModule) +class PSPNet(nn.Layer): + """ + The PSPNet implementation based on PaddlePaddle. + + The original article refers to + Zhao, Hengshuang, et al. "Pyramid scene parsing network" + (https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf). + + Args: + num_classes (int): The unique number of target classes. + backbone_indices (tuple, optional): Two values in the tuple indicate the indices of output of backbone. + pp_out_channels (int, optional): The output channels after Pyramid Pooling Module. Default: 1024. + bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1,2,3,6). + enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True. + align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even, + e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None. + """ + + def __init__(self, + num_classes: int = 21, + backbone_indices: Tuple[int] = (2, 3), + pp_out_channels: int = 1024, + bin_sizes: Tuple[int] = (1, 2, 3, 6), + enable_auxiliary_loss: bool = True, + align_corners: bool = False, + pretrained: str = None): + super(PSPNet, self).__init__() + + self.backbone = ResNet50_vd() + backbone_channels = [ + self.backbone.feat_channels[i] for i in backbone_indices + ] + + self.head = PSPNetHead(num_classes, backbone_indices, backbone_channels, + pp_out_channels, bin_sizes, + enable_auxiliary_loss, align_corners) + self.align_corners = align_corners + self.transforms = T.Compose([T.Normalize()]) + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]: + return self.transforms(img) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + feat_list = self.backbone(x) + logit_list = self.head(feat_list) + return [ + F.interpolate( + logit, + paddle.shape(x)[2:], + mode='bilinear', + align_corners=self.align_corners) for logit in logit_list + ] + + +class PSPNetHead(nn.Layer): + """ + The PSPNetHead implementation. + + Args: + num_classes (int): The unique number of target classes. + backbone_indices (tuple): Two values in the tuple indicate the indices of output of backbone. + The first index will be taken as a deep-supervision feature in auxiliary layer; + the second one will be taken as input of Pyramid Pooling Module (PPModule). + Usually backbone consists of four downsampling stage, and return an output of + each stage. If we set it as (2, 3) in ResNet, that means taking feature map of the third + stage (res4b22) in backbone, and feature map of the fourth stage (res5c) as input of PPModule. + backbone_channels (tuple): The same length with "backbone_indices". It indicates the channels of corresponding index. + pp_out_channels (int): The output channels after Pyramid Pooling Module. + bin_sizes (tuple): The out size of pooled feature maps. + enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + """ + + def __init__(self, num_classes, backbone_indices, backbone_channels, + pp_out_channels, bin_sizes, enable_auxiliary_loss, + align_corners): + + super().__init__() + + self.backbone_indices = backbone_indices + + self.psp_module = layers.PPModule( + in_channels=backbone_channels[1], + out_channels=pp_out_channels, + bin_sizes=bin_sizes, + dim_reduction=True, + align_corners=align_corners) + + self.dropout = nn.Dropout(p=0.1) # dropout_prob + + self.conv = nn.Conv2D( + in_channels=pp_out_channels, + out_channels=num_classes, + kernel_size=1) + + if enable_auxiliary_loss: + self.auxlayer = layers.AuxLayer( + in_channels=backbone_channels[0], + inter_channels=backbone_channels[0] // 4, + out_channels=num_classes) + + self.enable_auxiliary_loss = enable_auxiliary_loss + + def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]: + logit_list = [] + x = feat_list[self.backbone_indices[1]] + x = self.psp_module(x) + x = self.dropout(x) + logit = self.conv(x) + logit_list.append(logit) + + if self.enable_auxiliary_loss: + auxiliary_feat = feat_list[self.backbone_indices[0]] + auxiliary_logit = self.auxlayer(auxiliary_feat) + logit_list.append(auxiliary_logit) + + return logit_list diff --git a/modules/image/semantic_segmentation/pspnet_resnet50_voc/resnet.py b/modules/image/semantic_segmentation/pspnet_resnet50_voc/resnet.py new file mode 100644 index 0000000000000000000000000000000000000000..71af8839000b188bcf2f2add75c31fb4e5eb45d0 --- /dev/null +++ b/modules/image/semantic_segmentation/pspnet_resnet50_voc/resnet.py @@ -0,0 +1,357 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle.nn as nn +import pspnet_resnet50_voc.layers as layers + + +class ConvBNLayer(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + stride: int = 1, + dilation: int = 1, + groups: int = 1, + is_vd_mode: bool = False, + act: str = None, + data_format: str = 'NCHW'): + super(ConvBNLayer, self).__init__() + if dilation != 1 and kernel_size != 3: + raise RuntimeError("When the dilation isn't 1," \ + "the kernel_size should be 3.") + + self.is_vd_mode = is_vd_mode + self._pool2d_avg = nn.AvgPool2D( + kernel_size=2, + stride=2, + padding=0, + ceil_mode=True, + data_format=data_format) + self._conv = nn.Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2 \ + if dilation == 1 else dilation, + dilation=dilation, + groups=groups, + bias_attr=False, + data_format=data_format) + + self._batch_norm = layers.SyncBatchNorm( + out_channels, data_format=data_format) + self._act_op = layers.Activation(act=act) + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + if self.is_vd_mode: + inputs = self._pool2d_avg(inputs) + y = self._conv(inputs) + y = self._batch_norm(y) + y = self._act_op(y) + + return y + + +class BottleneckBlock(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + shortcut: bool = True, + if_first: bool = False, + dilation: int = 1, + data_format: str = 'NCHW'): + super(BottleneckBlock, self).__init__() + + self.data_format = data_format + self.conv0 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + act='relu', + data_format=data_format) + + self.dilation = dilation + + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + act='relu', + dilation=dilation, + data_format=data_format) + self.conv2 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels * 4, + kernel_size=1, + act=None, + data_format=data_format) + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels * 4, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + data_format=data_format) + + self.shortcut = shortcut + # NOTE: Use the wrap layer for quantization training + self.add = layers.Add() + self.relu = layers.Activation(act="relu") + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + conv1 = self.conv1(y) + conv2 = self.conv2(conv1) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + + y = self.add(short, conv2) + y = self.relu(y) + return y + + +class BasicBlock(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + dilation: int = 1, + shortcut: bool = True, + if_first: bool = False, + data_format: str = 'NCHW'): + super(BasicBlock, self).__init__() + self.conv0 = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + dilation=dilation, + act='relu', + data_format=data_format) + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + dilation=dilation, + act=None, + data_format=data_format) + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + data_format=data_format) + + self.shortcut = shortcut + self.dilation = dilation + self.data_format = data_format + self.add = layers.Add() + self.relu = layers.Activation(act="relu") + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + conv1 = self.conv1(y) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + y = self.add(short, conv1) + y = self.relu(y) + + return y + + +class ResNet_vd(nn.Layer): + """ + The ResNet_vd implementation based on PaddlePaddle. + + The original article refers to Jingdong + Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks" + (https://arxiv.org/pdf/1812.01187.pdf). + + Args: + layers (int, optional): The layers of ResNet_vd. The supported layers are (18, 34, 50, 101, 152, 200). Default: 50. + output_stride (int, optional): The stride of output features compared to input images. It is 8 or 16. Default: 8. + multi_grid (tuple|list, optional): The grid of stage4. Defult: (1, 1, 1). + pretrained (str, optional): The path of pretrained model. + + """ + + def __init__(self, + layers: int = 50, + output_stride: int = 8, + multi_grid: Tuple[int] = (1, 1, 1), + pretrained: str = None, + data_format: str = 'NCHW'): + super(ResNet_vd, self).__init__() + + self.data_format = data_format + self.conv1_logit = None # for gscnn shape stream + self.layers = layers + supported_layers = [18, 34, 50, 101, 152, 200] + assert layers in supported_layers, \ + "supported layers are {} but input layer is {}".format( + supported_layers, layers) + + if layers == 18: + depth = [2, 2, 2, 2] + elif layers == 34 or layers == 50: + depth = [3, 4, 6, 3] + elif layers == 101: + depth = [3, 4, 23, 3] + elif layers == 152: + depth = [3, 8, 36, 3] + elif layers == 200: + depth = [3, 12, 48, 3] + num_channels = [64, 256, 512, 1024 + ] if layers >= 50 else [64, 64, 128, 256] + num_filters = [64, 128, 256, 512] + + # for channels of four returned stages + self.feat_channels = [c * 4 for c in num_filters + ] if layers >= 50 else num_filters + + dilation_dict = None + if output_stride == 8: + dilation_dict = {2: 2, 3: 4} + elif output_stride == 16: + dilation_dict = {3: 2} + + self.conv1_1 = ConvBNLayer( + in_channels=3, + out_channels=32, + kernel_size=3, + stride=2, + act='relu', + data_format=data_format) + self.conv1_2 = ConvBNLayer( + in_channels=32, + out_channels=32, + kernel_size=3, + stride=1, + act='relu', + data_format=data_format) + self.conv1_3 = ConvBNLayer( + in_channels=32, + out_channels=64, + kernel_size=3, + stride=1, + act='relu', + data_format=data_format) + self.pool2d_max = nn.MaxPool2D( + kernel_size=3, stride=2, padding=1, data_format=data_format) + + # self.block_list = [] + self.stage_list = [] + if layers >= 50: + for block in range(len(depth)): + shortcut = False + block_list = [] + for i in range(depth[block]): + if layers in [101, 152] and block == 2: + if i == 0: + conv_name = "res" + str(block + 2) + "a" + else: + conv_name = "res" + str(block + 2) + "b" + str(i) + else: + conv_name = "res" + str(block + 2) + chr(97 + i) + + ############################################################################### + # Add dilation rate for some segmentation tasks, if dilation_dict is not None. + dilation_rate = dilation_dict[ + block] if dilation_dict and block in dilation_dict else 1 + + # Actually block here is 'stage', and i is 'block' in 'stage' + # At the stage 4, expand the the dilation_rate if given multi_grid + if block == 3: + dilation_rate = dilation_rate * multi_grid[i] + ############################################################################### + + bottleneck_block = self.add_sublayer( + 'bb_%d_%d' % (block, i), + BottleneckBlock( + in_channels=num_channels[block] + if i == 0 else num_filters[block] * 4, + out_channels=num_filters[block], + stride=2 if i == 0 and block != 0 + and dilation_rate == 1 else 1, + shortcut=shortcut, + if_first=block == i == 0, + dilation=dilation_rate, + data_format=data_format)) + + block_list.append(bottleneck_block) + shortcut = True + self.stage_list.append(block_list) + else: + for block in range(len(depth)): + shortcut = False + block_list = [] + for i in range(depth[block]): + dilation_rate = dilation_dict[block] \ + if dilation_dict and block in dilation_dict else 1 + if block == 3: + dilation_rate = dilation_rate * multi_grid[i] + + basic_block = self.add_sublayer( + 'bb_%d_%d' % (block, i), + BasicBlock( + in_channels=num_channels[block] + if i == 0 else num_filters[block], + out_channels=num_filters[block], + stride=2 if i == 0 and block != 0 \ + and dilation_rate == 1 else 1, + dilation=dilation_rate, + shortcut=shortcut, + if_first=block == i == 0, + data_format=data_format)) + block_list.append(basic_block) + shortcut = True + self.stage_list.append(block_list) + + self.pretrained = pretrained + + def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]: + y = self.conv1_1(inputs) + y = self.conv1_2(y) + y = self.conv1_3(y) + self.conv1_logit = y.clone() + y = self.pool2d_max(y) + + # A feature list saves the output feature map of each stage. + feat_list = [] + for stage in self.stage_list: + for block in stage: + y = block(y) + feat_list.append(y) + + return feat_list + + +def ResNet50_vd(**args): + model = ResNet_vd(layers=50, **args) + return model \ No newline at end of file diff --git a/modules/image/semantic_segmentation/stdc1_seg_cityscapes/README.md b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/README.md new file mode 100644 index 0000000000000000000000000000000000000000..c2d0cbb3eff49a6ee2379355396af5aaf121b967 --- /dev/null +++ b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/README.md @@ -0,0 +1,182 @@ +# stdc1_seg_cityscapes + +|模型名称|stdc1_seg_cityscapes| +| :--- | :---: | +|类别|图像-图像分割| +|网络|stdc1_seg| +|数据集|Cityscapes| +|是否支持Fine-tuning|是| +|模型大小|67MB| +|指标|-| +|最新更新日期|2022-03-21| + +## 一、模型基本信息 + + - 样例结果示例: +

+ +

+ +- ### 模型介绍 + + - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。 + - 更多详情请参考:[stdc](https://arxiv.org/abs/2104.13188) + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、安装 + + - ```shell + $ hub install stdc1_seg_cityscapes + ``` + + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + +## 三、模型API预测 + +- ### 1.预测代码示例 + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='stdc1_seg_cityscapes') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.如何开始Fine-tune + + - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用stdc1_seg_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下: + + - 代码步骤 + + - Step1: 定义数据预处理方式 + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + + - Step2: 下载数据集并使用 + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + - `transforms`: 数据预处理方式。 + - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + + - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + + - Step3: 加载预训练模型 + + - ```python + import paddlehub as hub + + model = hub.Module(name='stdc1_seg_cityscapes', num_classes=2, pretrained=None) + ``` + - `name`: 选择预训练模型的名字。 + - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + + - Step4: 选择优化策略和运行配置 + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + - 模型预测 + + - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='stdc1_seg_cityscapes', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - 参数配置正确后,请执行脚本`python predict.py`。 + + - **Args** + * `images`:原始图像路径或BGR格式图片; + * `visualization`: 是否可视化,默认为True; + * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + + **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 四、服务部署 + +- PaddleHub Serving可以部署一个在线图像分割服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + + - ```shell + $ hub serving start -m stdc1_seg_cityscapes + ``` + + - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + # 发送HTTP请求 + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## 五、更新历史 + +* 1.0.0 + + 初始发布 diff --git a/modules/image/semantic_segmentation/stdc1_seg_cityscapes/README_en.md b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..3198989fc877a9943c235408e4beb4bc3a6d22a9 --- /dev/null +++ b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/README_en.md @@ -0,0 +1,181 @@ +# stdc1_seg_cityscapes + +|Module Name|stdc1_seg_cityscapes| +| :--- | :---: | +|Category|Image Segmentation| +|Network|stdc1_seg| +|Dataset|Cityscapes| +|Fine-tuning supported or not|Yes| +|Module Size|67MB| +|Data indicators|-| +|Latest update date|2022-03-21| + +## I. Basic Information + +- ### Application Effect Display + - Sample results: +

+ +

+ +- ### Module Introduction + + - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction. + - For more information, please refer to: [pspnet](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf) + +## II. Installation + +- ### 1、Environmental Dependence + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、Installation + + - ```shell + $ hub install stdc1_seg_cityscapes + ``` + + - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) + | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md) + + +## III. Module API Prediction + +- ### 1、Prediction Code Example + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='stdc1_seg_cityscapes') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.Fine-tune and Encapsulation + + - After completing the installation of PaddlePaddle and PaddleHub, you can start using the stdc1_seg_cityscapes model to fine-tune datasets such as OpticDiscSeg. + + - Steps: + + - Step1: Define the data preprocessing method + + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs. + + - Step2: Download the dataset + + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + * `transforms`: data preprocessing methods. + + * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`. + + * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory. + + - Step3: Load the pre-trained model + + - ```python + import paddlehub as hub + + model = hub.Module(name='stdc1_seg_cityscapes', num_classes=2, pretrained=None) + ``` + - `name`: model name. + - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters. + + - Step4: Optimization strategy + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + - Model prediction + + - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='stdc1_seg_cityscapes', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - **Args** + * `images`: Image path or ndarray data with format [H, W, C], BGR. + * `visualization`: Whether to save the recognition results as picture files. + * `save_path`: Save path of the result, default is 'seg_result'. + + +## IV. Server Deployment + +- PaddleHub Serving can deploy an online service of image segmentation. + +- ### Step 1: Start PaddleHub Serving + + - Run the startup command: + + - ```shell + $ hub serving start -m stdc1_seg_cityscapes + ``` + + - The servitization API is now deployed and the default port number is 8866. + + - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set. + +- ### Step 2: Send a predictive request + + - With a configured server, use the following lines of code to send the prediction request and obtain the result: + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/stdc1_seg_cityscapes" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## V. Release Note + +- 1.0.0 + + First release diff --git a/modules/image/semantic_segmentation/stdc1_seg_cityscapes/layers.py b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..c65193f55ee6fee66ea2294328ff1c6f63cdcf11 --- /dev/null +++ b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/layers.py @@ -0,0 +1,357 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.layer import activation +from paddle.nn import Conv2D, AvgPool2D + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu': + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class SeparableConvBNReLU(nn.Layer): + """Depthwise Separable Convolution.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(SeparableConvBNReLU, self).__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU( + in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBN, self).__init__() + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBNReLU, self).__init__() + + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + Returns: + A callable object of Activation. + Raises: + KeyError: When parameter `act` is not in the optional range. + Examples: + from paddleseg.models.common.activation import Activation + relu = Activation("relu") + print(relu) + # + sigmoid = Activation("sigmoid") + print(sigmoid) + # + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format( + act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + + if self._act is not None: + return self.act_func(x) + else: + return x + + +class ASPPModule(nn.Layer): + """ + Atrous Spatial Pyramid Pooling. + Args: + aspp_ratios (tuple): The dilation rate using in ASSP module. + in_channels (int): The number of input channels. + out_channels (int): The number of output channels. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False. + image_pooling (bool, optional): If augmented with image-level features. Default: False + """ + + def __init__(self, + aspp_ratios: tuple, + in_channels: int, + out_channels: int, + align_corners: bool, + use_sep_conv: bool = False, + image_pooling: bool = False): + super().__init__() + + self.align_corners = align_corners + self.aspp_blocks = nn.LayerList() + + for ratio in aspp_ratios: + if use_sep_conv and ratio > 1: + conv_func = SeparableConvBNReLU + else: + conv_func = ConvBNReLU + + block = conv_func( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1 if ratio == 1 else 3, + dilation=ratio, + padding=0 if ratio == 1 else ratio) + self.aspp_blocks.append(block) + + out_size = len(self.aspp_blocks) + + if image_pooling: + self.global_avg_pool = nn.Sequential( + nn.AdaptiveAvgPool2D(output_size=(1, 1)), + ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False)) + out_size += 1 + self.image_pooling = image_pooling + + self.conv_bn_relu = ConvBNReLU( + in_channels=out_channels * out_size, + out_channels=out_channels, + kernel_size=1) + + self.dropout = nn.Dropout(p=0.1) # drop rate + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outputs = [] + for block in self.aspp_blocks: + y = block(x) + y = F.interpolate( + y, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(y) + + if self.image_pooling: + img_avg = self.global_avg_pool(x) + img_avg = F.interpolate( + img_avg, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(img_avg) + + x = paddle.concat(outputs, axis=1) + x = self.conv_bn_relu(x) + x = self.dropout(x) + + return x + + +class AuxLayer(nn.Layer): + """ + The auxiliary layer implementation for auxiliary loss. + + Args: + in_channels (int): The number of input channels. + inter_channels (int): The intermediate channels. + out_channels (int): The number of output channels, and usually it is num_classes. + dropout_prob (float, optional): The drop rate. Default: 0.1. + """ + + def __init__(self, + in_channels: int, + inter_channels: int, + out_channels: int, + dropout_prob: float = 0.1, + **kwargs): + super().__init__() + + self.conv_bn_relu = ConvBNReLU( + in_channels=in_channels, + out_channels=inter_channels, + kernel_size=3, + padding=1, + **kwargs) + + self.dropout = nn.Dropout(p=dropout_prob) + + self.conv = nn.Conv2D( + in_channels=inter_channels, + out_channels=out_channels, + kernel_size=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv_bn_relu(x) + x = self.dropout(x) + x = self.conv(x) + return x + + +class Add(nn.Layer): + def __init__(self): + super().__init__() + + def forward(self, x: paddle.Tensor, y: paddle.Tensor, name=None) -> paddle.Tensor: + return paddle.add(x, y, name) + +class PPModule(nn.Layer): + """ + Pyramid pooling module originally in PSPNet. + + Args: + in_channels (int): The number of intput channels to pyramid pooling module. + out_channels (int): The number of output channels after pyramid pooling module. + bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1, 2, 3, 6). + dim_reduction (bool, optional): A bool value represents if reducing dimension after pooling. Default: True. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + """ + + def __init__(self, + in_channels: int, + out_channels: int, + bin_sizes: tuple, + dim_reduction: bool, + align_corners: bool): + super().__init__() + + self.bin_sizes = bin_sizes + + inter_channels = in_channels + if dim_reduction: + inter_channels = in_channels // len(bin_sizes) + + # we use dimension reduction after pooling mentioned in original implementation. + self.stages = nn.LayerList([ + self._make_stage(in_channels, inter_channels, size) + for size in bin_sizes + ]) + + self.conv_bn_relu2 = ConvBNReLU( + in_channels=in_channels + inter_channels * len(bin_sizes), + out_channels=out_channels, + kernel_size=3, + padding=1) + + self.align_corners = align_corners + + def _make_stage(self, in_channels: int, out_channels: int, size: int): + """ + Create one pooling layer. + + In our implementation, we adopt the same dimension reduction as the original paper that might be + slightly different with other implementations. + + After pooling, the channels are reduced to 1/len(bin_sizes) immediately, while some other implementations + keep the channels to be same. + + Args: + in_channels (int): The number of intput channels to pyramid pooling module. + out_channels (int): The number of output channels to pyramid pooling module. + size (int): The out size of the pooled layer. + + Returns: + conv (Tensor): A tensor after Pyramid Pooling Module. + """ + + prior = nn.AdaptiveAvgPool2D(output_size=(size, size)) + conv = ConvBNReLU( + in_channels=in_channels, out_channels=out_channels, kernel_size=1) + + return nn.Sequential(prior, conv) + + def forward(self, input: paddle.Tensor) -> paddle.Tensor: + cat_layers = [] + for stage in self.stages: + x = stage(input) + x = F.interpolate( + x, + paddle.shape(input)[2:], + mode='bilinear', + align_corners=self.align_corners) + cat_layers.append(x) + cat_layers = [input] + cat_layers[::-1] + cat = paddle.concat(cat_layers, axis=1) + out = self.conv_bn_relu2(cat) + + return out diff --git a/modules/image/semantic_segmentation/stdc1_seg_cityscapes/module.py b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/module.py new file mode 100644 index 0000000000000000000000000000000000000000..f942f225a48b2f01b089f2bc71a536a636a0a2bf --- /dev/null +++ b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/module.py @@ -0,0 +1,235 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from typing import Union, List, Tuple + +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +from stdc1_seg_cityscapes.stdcnet import STDC1 +import stdc1_seg_cityscapes.layers as layers + + +@moduleinfo( + name="stdc1_seg_cityscapes", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="STDCSeg is a segmentation model.", + version="1.0.0", + meta=ImageSegmentationModule) +class STDCSeg(nn.Layer): + """ + The STDCSeg implementation based on PaddlePaddle. + + The original article refers to Meituan + Fan, Mingyuan, et al. "Rethinking BiSeNet For Real-time Semantic Segmentation." + (https://arxiv.org/abs/2104.13188) + + Args: + num_classes(int,optional): The unique number of target classes. + use_boundary_8(bool,non-optional): Whether to use detail loss. it should be True accroding to paper for best metric. Default: True. + Actually,if you want to use _boundary_2/_boundary_4/_boundary_16,you should append loss function number of DetailAggregateLoss.It should work properly. + use_conv_last(bool,optional): Determine ContextPath 's inplanes variable according to whether to use bockbone's last conv. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None. + """ + + def __init__(self, + num_classes: int = 19, + use_boundary_2: bool = False, + use_boundary_4: bool = False, + use_boundary_8: bool = True, + use_boundary_16: bool = False, + use_conv_last: bool = False, + pretrained: str = None): + super(STDCSeg, self).__init__() + + self.use_boundary_2 = use_boundary_2 + self.use_boundary_4 = use_boundary_4 + self.use_boundary_8 = use_boundary_8 + self.use_boundary_16 = use_boundary_16 + self.cp = ContextPath(STDC1(), use_conv_last=use_conv_last) + self.ffm = FeatureFusionModule(384, 256) + self.conv_out = SegHead(256, 256, num_classes) + self.conv_out8 = SegHead(128, 64, num_classes) + self.conv_out16 = SegHead(128, 64, num_classes) + self.conv_out_sp16 = SegHead(512, 64, 1) + self.conv_out_sp8 = SegHead(256, 64, 1) + self.conv_out_sp4 = SegHead(64, 64, 1) + self.conv_out_sp2 = SegHead(32, 64, 1) + self.transforms = T.Compose([T.Normalize()]) + + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]: + return self.transforms(img) + + def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]: + x_hw = paddle.shape(x)[2:] + feat_res2, feat_res4, feat_res8, _, feat_cp8, feat_cp16 = self.cp(x) + + logit_list = [] + if self.training: + feat_fuse = self.ffm(feat_res8, feat_cp8) + feat_out = self.conv_out(feat_fuse) + feat_out8 = self.conv_out8(feat_cp8) + feat_out16 = self.conv_out16(feat_cp16) + + logit_list = [feat_out, feat_out8, feat_out16] + logit_list = [ + F.interpolate(x, x_hw, mode='bilinear', align_corners=True) + for x in logit_list + ] + + if self.use_boundary_2: + feat_out_sp2 = self.conv_out_sp2(feat_res2) + logit_list.append(feat_out_sp2) + if self.use_boundary_4: + feat_out_sp4 = self.conv_out_sp4(feat_res4) + logit_list.append(feat_out_sp4) + if self.use_boundary_8: + feat_out_sp8 = self.conv_out_sp8(feat_res8) + logit_list.append(feat_out_sp8) + else: + feat_fuse = self.ffm(feat_res8, feat_cp8) + feat_out = self.conv_out(feat_fuse) + feat_out = F.interpolate( + feat_out, x_hw, mode='bilinear', align_corners=True) + logit_list = [feat_out] + + return logit_list + + +class SegHead(nn.Layer): + def __init__(self, in_chan: int, mid_chan: int, n_classes:int): + super(SegHead, self).__init__() + self.conv = layers.ConvBNReLU( + in_chan, mid_chan, kernel_size=3, stride=1, padding=1) + self.conv_out = nn.Conv2D( + mid_chan, n_classes, kernel_size=1, bias_attr=None) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv(x) + x = self.conv_out(x) + return x + + +class AttentionRefinementModule(nn.Layer): + def __init__(self, in_chan: int, out_chan: int): + super(AttentionRefinementModule, self).__init__() + self.conv = layers.ConvBNReLU( + in_chan, out_chan, kernel_size=3, stride=1, padding=1) + self.conv_atten = nn.Conv2D( + out_chan, out_chan, kernel_size=1, bias_attr=None) + self.bn_atten = nn.BatchNorm2D(out_chan) + self.sigmoid_atten = nn.Sigmoid() + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + feat = self.conv(x) + atten = F.adaptive_avg_pool2d(feat, 1) + atten = self.conv_atten(atten) + atten = self.bn_atten(atten) + atten = self.sigmoid_atten(atten) + out = paddle.multiply(feat, atten) + return out + + +class ContextPath(nn.Layer): + def __init__(self, backbone, use_conv_last: bool = False): + super(ContextPath, self).__init__() + self.backbone = backbone + self.arm16 = AttentionRefinementModule(512, 128) + inplanes = 1024 + if use_conv_last: + inplanes = 1024 + self.arm32 = AttentionRefinementModule(inplanes, 128) + self.conv_head32 = layers.ConvBNReLU( + 128, 128, kernel_size=3, stride=1, padding=1) + self.conv_head16 = layers.ConvBNReLU( + 128, 128, kernel_size=3, stride=1, padding=1) + self.conv_avg = layers.ConvBNReLU( + inplanes, 128, kernel_size=1, stride=1, padding=0) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + feat2, feat4, feat8, feat16, feat32 = self.backbone(x) + + feat8_hw = paddle.shape(feat8)[2:] + feat16_hw = paddle.shape(feat16)[2:] + feat32_hw = paddle.shape(feat32)[2:] + + avg = F.adaptive_avg_pool2d(feat32, 1) + avg = self.conv_avg(avg) + avg_up = F.interpolate(avg, feat32_hw, mode='nearest') + + feat32_arm = self.arm32(feat32) + feat32_sum = feat32_arm + avg_up + feat32_up = F.interpolate(feat32_sum, feat16_hw, mode='nearest') + feat32_up = self.conv_head32(feat32_up) + + feat16_arm = self.arm16(feat16) + feat16_sum = feat16_arm + feat32_up + feat16_up = F.interpolate(feat16_sum, feat8_hw, mode='nearest') + feat16_up = self.conv_head16(feat16_up) + + return feat2, feat4, feat8, feat16, feat16_up, feat32_up # x8, x16 + + +class FeatureFusionModule(nn.Layer): + def __init__(self, in_chan:int , out_chan: int): + super(FeatureFusionModule, self).__init__() + self.convblk = layers.ConvBNReLU( + in_chan, out_chan, kernel_size=1, stride=1, padding=0) + self.conv1 = nn.Conv2D( + out_chan, + out_chan // 4, + kernel_size=1, + stride=1, + padding=0, + bias_attr=None) + self.conv2 = nn.Conv2D( + out_chan // 4, + out_chan, + kernel_size=1, + stride=1, + padding=0, + bias_attr=None) + self.relu = nn.ReLU() + self.sigmoid = nn.Sigmoid() + + def forward(self, fsp: paddle.Tensor, fcp: paddle.Tensor) -> paddle.Tensor: + fcat = paddle.concat([fsp, fcp], axis=1) + feat = self.convblk(fcat) + atten = F.adaptive_avg_pool2d(feat, 1) + atten = self.conv1(atten) + atten = self.relu(atten) + atten = self.conv2(atten) + atten = self.sigmoid(atten) + feat_atten = paddle.multiply(feat, atten) + feat_out = feat_atten + feat + return feat_out \ No newline at end of file diff --git a/modules/image/semantic_segmentation/stdc1_seg_cityscapes/stdcnet.py b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/stdcnet.py new file mode 100644 index 0000000000000000000000000000000000000000..ddf0f043128d49f4df3b6e70b6a6b4d92bbd2590 --- /dev/null +++ b/modules/image/semantic_segmentation/stdc1_seg_cityscapes/stdcnet.py @@ -0,0 +1,263 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from typing import Union, List, Tuple +import math + +import paddle +import paddle.nn as nn + +import stdc1_seg_cityscapes.layers as L + +__all__ = ["STDC1", "STDC2"] + + +class STDCNet(nn.Layer): + """ + The STDCNet implementation based on PaddlePaddle. + + The original article refers to Meituan + Fan, Mingyuan, et al. "Rethinking BiSeNet For Real-time Semantic Segmentation." + (https://arxiv.org/abs/2104.13188) + + Args: + base(int, optional): base channels. Default: 64. + layers(list, optional): layers numbers list. It determines STDC block numbers of STDCNet's stage3\4\5. Defualt: [4, 5, 3]. + block_num(int,optional): block_num of features block. Default: 4. + type(str,optional): feature fusion method "cat"/"add". Default: "cat". + num_classes(int, optional): class number for image classification. Default: 1000. + dropout(float,optional): dropout ratio. if >0,use dropout ratio. Default: 0.20. + use_conv_last(bool,optional): whether to use the last ConvBNReLU layer . Default: False. + pretrained(str, optional): the path of pretrained model. + """ + + def __init__(self, + base: int = 64, + layers: List[int] = [4, 5, 3], + block_num: int = 4, + type: str = "cat", + num_classes: int = 1000, + dropout: float = 0.20, + use_conv_last: bool = False): + super(STDCNet, self).__init__() + if type == "cat": + block = CatBottleneck + elif type == "add": + block = AddBottleneck + self.use_conv_last = use_conv_last + self.features = self._make_layers(base, layers, block_num, block) + self.conv_last = ConvBNRelu(base * 16, max(1024, base * 16), 1, 1) + + if (layers == [4, 5, 3]): #stdc1446 + self.x2 = nn.Sequential(self.features[:1]) + self.x4 = nn.Sequential(self.features[1:2]) + self.x8 = nn.Sequential(self.features[2:6]) + self.x16 = nn.Sequential(self.features[6:11]) + self.x32 = nn.Sequential(self.features[11:]) + elif (layers == [2, 2, 2]): #stdc813 + self.x2 = nn.Sequential(self.features[:1]) + self.x4 = nn.Sequential(self.features[1:2]) + self.x8 = nn.Sequential(self.features[2:4]) + self.x16 = nn.Sequential(self.features[4:6]) + self.x32 = nn.Sequential(self.features[6:]) + else: + raise NotImplementedError( + "model with layers:{} is not implemented!".format(layers)) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + """ + forward function for feature extract. + """ + feat2 = self.x2(x) + feat4 = self.x4(feat2) + feat8 = self.x8(feat4) + feat16 = self.x16(feat8) + feat32 = self.x32(feat16) + if self.use_conv_last: + feat32 = self.conv_last(feat32) + return feat2, feat4, feat8, feat16, feat32 + + def _make_layers(self, base, layers, block_num, block): + features = [] + features += [ConvBNRelu(3, base // 2, 3, 2)] + features += [ConvBNRelu(base // 2, base, 3, 2)] + + for i, layer in enumerate(layers): + for j in range(layer): + if i == 0 and j == 0: + features.append(block(base, base * 4, block_num, 2)) + elif j == 0: + features.append( + block(base * int(math.pow(2, i + 1)), + base * int(math.pow(2, i + 2)), block_num, 2)) + else: + features.append( + block(base * int(math.pow(2, i + 2)), + base * int(math.pow(2, i + 2)), block_num, 1)) + + return nn.Sequential(*features) + + +class ConvBNRelu(nn.Layer): + def __init__(self, in_planes: int, out_planes: int, kernel: int = 3, stride: int = 1): + super(ConvBNRelu, self).__init__() + self.conv = nn.Conv2D( + in_planes, + out_planes, + kernel_size=kernel, + stride=stride, + padding=kernel // 2, + bias_attr=False) + self.bn = L.SyncBatchNorm(out_planes, data_format='NCHW') + self.relu = nn.ReLU() + + def forward(self, x): + out = self.relu(self.bn(self.conv(x))) + return out + + +class AddBottleneck(nn.Layer): + def __init__(self, in_planes: int, out_planes: int, block_num: int = 3, stride: int = 1): + super(AddBottleneck, self).__init__() + assert block_num > 1, "block number should be larger than 1." + self.conv_list = nn.LayerList() + self.stride = stride + if stride == 2: + self.avd_layer = nn.Sequential( + nn.Conv2D( + out_planes // 2, + out_planes // 2, + kernel_size=3, + stride=2, + padding=1, + groups=out_planes // 2, + bias_attr=False), + nn.BatchNorm2D(out_planes // 2), + ) + self.skip = nn.Sequential( + nn.Conv2D( + in_planes, + in_planes, + kernel_size=3, + stride=2, + padding=1, + groups=in_planes, + bias_attr=False), + nn.BatchNorm2D(in_planes), + nn.Conv2D( + in_planes, out_planes, kernel_size=1, bias_attr=False), + nn.BatchNorm2D(out_planes), + ) + stride = 1 + + for idx in range(block_num): + if idx == 0: + self.conv_list.append( + ConvBNRelu(in_planes, out_planes // 2, kernel=1)) + elif idx == 1 and block_num == 2: + self.conv_list.append( + ConvBNRelu(out_planes // 2, out_planes // 2, stride=stride)) + elif idx == 1 and block_num > 2: + self.conv_list.append( + ConvBNRelu(out_planes // 2, out_planes // 4, stride=stride)) + elif idx < block_num - 1: + self.conv_list.append( + ConvBNRelu(out_planes // int(math.pow(2, idx)), + out_planes // int(math.pow(2, idx + 1)))) + else: + self.conv_list.append( + ConvBNRelu(out_planes // int(math.pow(2, idx)), + out_planes // int(math.pow(2, idx)))) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + out_list = [] + out = x + for idx, conv in enumerate(self.conv_list): + if idx == 0 and self.stride == 2: + out = self.avd_layer(conv(out)) + else: + out = conv(out) + out_list.append(out) + if self.stride == 2: + x = self.skip(x) + return paddle.concat(out_list, axis=1) + x + + +class CatBottleneck(nn.Layer): + def __init__(self, in_planes: int, out_planes: int, block_num: int = 3, stride: int = 1): + super(CatBottleneck, self).__init__() + assert block_num > 1, "block number should be larger than 1." + self.conv_list = nn.LayerList() + self.stride = stride + if stride == 2: + self.avd_layer = nn.Sequential( + nn.Conv2D( + out_planes // 2, + out_planes // 2, + kernel_size=3, + stride=2, + padding=1, + groups=out_planes // 2, + bias_attr=False), + nn.BatchNorm2D(out_planes // 2), + ) + self.skip = nn.AvgPool2D(kernel_size=3, stride=2, padding=1) + stride = 1 + + for idx in range(block_num): + if idx == 0: + self.conv_list.append( + ConvBNRelu(in_planes, out_planes // 2, kernel=1)) + elif idx == 1 and block_num == 2: + self.conv_list.append( + ConvBNRelu(out_planes // 2, out_planes // 2, stride=stride)) + elif idx == 1 and block_num > 2: + self.conv_list.append( + ConvBNRelu(out_planes // 2, out_planes // 4, stride=stride)) + elif idx < block_num - 1: + self.conv_list.append( + ConvBNRelu(out_planes // int(math.pow(2, idx)), + out_planes // int(math.pow(2, idx + 1)))) + else: + self.conv_list.append( + ConvBNRelu(out_planes // int(math.pow(2, idx)), + out_planes // int(math.pow(2, idx)))) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + out_list = [] + out1 = self.conv_list[0](x) + for idx, conv in enumerate(self.conv_list[1:]): + if idx == 0: + if self.stride == 2: + out = conv(self.avd_layer(out1)) + else: + out = conv(out1) + else: + out = conv(out) + out_list.append(out) + + if self.stride == 2: + out1 = self.skip(out1) + out_list.insert(0, out1) + out = paddle.concat(out_list, axis=1) + return out + + +def STDC2(**kwargs): + model = STDCNet(base=64, layers=[4, 5, 3], **kwargs) + return model + +def STDC1(**kwargs): + model = STDCNet(base=64, layers=[2, 2, 2], **kwargs) + return model \ No newline at end of file diff --git a/modules/image/semantic_segmentation/stdc1_seg_voc/README.md b/modules/image/semantic_segmentation/stdc1_seg_voc/README.md new file mode 100644 index 0000000000000000000000000000000000000000..f24a2a813c9bbf184d4714b1d96c23f8082b6b25 --- /dev/null +++ b/modules/image/semantic_segmentation/stdc1_seg_voc/README.md @@ -0,0 +1,182 @@ +# stdc1_seg_voc + +|模型名称|stdc1_seg_voc| +| :--- | :---: | +|类别|图像-图像分割| +|网络|stdc1_seg| +|数据集|PascalVOC2012| +|是否支持Fine-tuning|是| +|模型大小|67MB| +|指标|-| +|最新更新日期|2022-03-21| + +## 一、模型基本信息 + + - 样例结果示例: +

+ +

+ +- ### 模型介绍 + + - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。 + - 更多详情请参考:[stdc](https://arxiv.org/abs/2104.13188) + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、安装 + + - ```shell + $ hub install stdc1_seg_voc + ``` + + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + +## 三、模型API预测 + +- ### 1.预测代码示例 + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='stdc1_seg_voc') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.如何开始Fine-tune + + - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用stdc1_seg_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下: + + - 代码步骤 + + - Step1: 定义数据预处理方式 + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + + - Step2: 下载数据集并使用 + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + - `transforms`: 数据预处理方式。 + - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + + - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + + - Step3: 加载预训练模型 + + - ```python + import paddlehub as hub + + model = hub.Module(name='stdc1_seg_voc', num_classes=2, pretrained=None) + ``` + - `name`: 选择预训练模型的名字。 + - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + + - Step4: 选择优化策略和运行配置 + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + - 模型预测 + + - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='stdc1_seg_voc', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - 参数配置正确后,请执行脚本`python predict.py`。 + + - **Args** + * `images`:原始图像路径或BGR格式图片; + * `visualization`: 是否可视化,默认为True; + * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + + **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 四、服务部署 + +- PaddleHub Serving可以部署一个在线图像分割服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + + - ```shell + $ hub serving start -m stdc1_seg_voc + ``` + + - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + # 发送HTTP请求 + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## 五、更新历史 + +* 1.0.0 + + 初始发布 diff --git a/modules/image/semantic_segmentation/stdc1_seg_voc/README_en.md b/modules/image/semantic_segmentation/stdc1_seg_voc/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..fd11504b94825347bcdf9028486820afe9c5c8d1 --- /dev/null +++ b/modules/image/semantic_segmentation/stdc1_seg_voc/README_en.md @@ -0,0 +1,181 @@ +# stdc1_seg_voc + +|Module Name|stdc1_seg_voc| +| :--- | :---: | +|Category|Image Segmentation| +|Network|stdc1_seg| +|Dataset|PascalVOC2012| +|Fine-tuning supported or not|Yes| +|Module Size|370MB| +|Data indicators|-| +|Latest update date|2022-03-22| + +## I. Basic Information + +- ### Application Effect Display + - Sample results: +

+ +

+ +- ### Module Introduction + + - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction. + - For more information, please refer to: [stdc](https://arxiv.org/abs/2104.13188) + +## II. Installation + +- ### 1、Environmental Dependence + + - paddlepaddle >= 2.0.0 + + - paddlehub >= 2.0.0 + +- ### 2、Installation + + - ```shell + $ hub install stdc1_seg_voc + ``` + + - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) + | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md) + + +## III. Module API Prediction + +- ### 1、Prediction Code Example + + - ```python + import cv2 + import paddle + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='stdc1_seg_voc') + img = cv2.imread("/PATH/TO/IMAGE") + result = model.predict(images=[img], visualization=True) + ``` + +- ### 2.Fine-tune and Encapsulation + + - After completing the installation of PaddlePaddle and PaddleHub, you can start using the stdc1_seg_voc model to fine-tune datasets such as OpticDiscSeg. + + - Steps: + + - Step1: Define the data preprocessing method + + - ```python + from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + + transform = Compose([Resize(target_size=(512, 512)), Normalize()]) + ``` + + - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs. + + - Step2: Download the dataset + + - ```python + from paddlehub.datasets import OpticDiscSeg + + train_reader = OpticDiscSeg(transform, mode='train') + ``` + * `transforms`: data preprocessing methods. + + * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`. + + * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory. + + - Step3: Load the pre-trained model + + - ```python + import paddlehub as hub + + model = hub.Module(name='stdc1_seg_voc', num_classes=2, pretrained=None) + ``` + - `name`: model name. + - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters. + + - Step4: Optimization strategy + + - ```python + import paddle + from paddlehub.finetune.trainer import Trainer + + scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) + optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) + trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True) + trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4) + ``` + + - Model prediction + + - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows: + + ```python + import paddle + import cv2 + import paddlehub as hub + + if __name__ == '__main__': + model = hub.Module(name='stdc1_seg_voc', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) + ``` + + - **Args** + * `images`: Image path or ndarray data with format [H, W, C], BGR. + * `visualization`: Whether to save the recognition results as picture files. + * `save_path`: Save path of the result, default is 'seg_result'. + + +## IV. Server Deployment + +- PaddleHub Serving can deploy an online service of image segmentation. + +- ### Step 1: Start PaddleHub Serving + + - Run the startup command: + + - ```shell + $ hub serving start -m stdc1_seg_voc + ``` + + - The servitization API is now deployed and the default port number is 8866. + + - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set. + +- ### Step 2: Send a predictive request + + - With a configured server, use the following lines of code to send the prediction request and obtain the result: + + ```python + import requests + import json + import cv2 + import base64 + + import numpy as np + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + org_im = cv2.imread('/PATH/TO/IMAGE') + data = {'images':[cv2_to_base64(org_im)]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/stdc1_seg_voc" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + mask = base64_to_cv2(r.json()["results"][0]) + ``` + +## V. Release Note + +- 1.0.0 + + First release diff --git a/modules/image/semantic_segmentation/stdc1_seg_voc/layers.py b/modules/image/semantic_segmentation/stdc1_seg_voc/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..2304610304e0149a1aceb4c1fbf4897edf5220af --- /dev/null +++ b/modules/image/semantic_segmentation/stdc1_seg_voc/layers.py @@ -0,0 +1,357 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.layer import activation +from paddle.nn import Conv2D, AvgPool2D + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu': + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class SeparableConvBNReLU(nn.Layer): + """Depthwise Separable Convolution.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(SeparableConvBNReLU, self).__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU( + in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBN, self).__init__() + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + padding: str = 'same', + **kwargs: dict): + super(ConvBNReLU, self).__init__() + + self._conv = Conv2D( + in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + Returns: + A callable object of Activation. + Raises: + KeyError: When parameter `act` is not in the optional range. + Examples: + from paddleseg.models.common.activation import Activation + relu = Activation("relu") + print(relu) + # + sigmoid = Activation("sigmoid") + print(sigmoid) + # + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format( + act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + + if self._act is not None: + return self.act_func(x) + else: + return x + + +class ASPPModule(nn.Layer): + """ + Atrous Spatial Pyramid Pooling. + Args: + aspp_ratios (tuple): The dilation rate using in ASSP module. + in_channels (int): The number of input channels. + out_channels (int): The number of output channels. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False. + image_pooling (bool, optional): If augmented with image-level features. Default: False + """ + + def __init__(self, + aspp_ratios: tuple, + in_channels: int, + out_channels: int, + align_corners: bool, + use_sep_conv: bool = False, + image_pooling: bool = False): + super().__init__() + + self.align_corners = align_corners + self.aspp_blocks = nn.LayerList() + + for ratio in aspp_ratios: + if use_sep_conv and ratio > 1: + conv_func = SeparableConvBNReLU + else: + conv_func = ConvBNReLU + + block = conv_func( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1 if ratio == 1 else 3, + dilation=ratio, + padding=0 if ratio == 1 else ratio) + self.aspp_blocks.append(block) + + out_size = len(self.aspp_blocks) + + if image_pooling: + self.global_avg_pool = nn.Sequential( + nn.AdaptiveAvgPool2D(output_size=(1, 1)), + ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False)) + out_size += 1 + self.image_pooling = image_pooling + + self.conv_bn_relu = ConvBNReLU( + in_channels=out_channels * out_size, + out_channels=out_channels, + kernel_size=1) + + self.dropout = nn.Dropout(p=0.1) # drop rate + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outputs = [] + for block in self.aspp_blocks: + y = block(x) + y = F.interpolate( + y, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(y) + + if self.image_pooling: + img_avg = self.global_avg_pool(x) + img_avg = F.interpolate( + img_avg, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) + outputs.append(img_avg) + + x = paddle.concat(outputs, axis=1) + x = self.conv_bn_relu(x) + x = self.dropout(x) + + return x + + +class AuxLayer(nn.Layer): + """ + The auxiliary layer implementation for auxiliary loss. + + Args: + in_channels (int): The number of input channels. + inter_channels (int): The intermediate channels. + out_channels (int): The number of output channels, and usually it is num_classes. + dropout_prob (float, optional): The drop rate. Default: 0.1. + """ + + def __init__(self, + in_channels: int, + inter_channels: int, + out_channels: int, + dropout_prob: float = 0.1, + **kwargs): + super().__init__() + + self.conv_bn_relu = ConvBNReLU( + in_channels=in_channels, + out_channels=inter_channels, + kernel_size=3, + padding=1, + **kwargs) + + self.dropout = nn.Dropout(p=dropout_prob) + + self.conv = nn.Conv2D( + in_channels=inter_channels, + out_channels=out_channels, + kernel_size=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv_bn_relu(x) + x = self.dropout(x) + x = self.conv(x) + return x + + +class Add(nn.Layer): + def __init__(self): + super().__init__() + + def forward(self, x: paddle.Tensor, y: paddle.Tensor, name=None) -> paddle.Tensor: + return paddle.add(x, y, name) + +class PPModule(nn.Layer): + """ + Pyramid pooling module originally in PSPNet. + + Args: + in_channels (int): The number of intput channels to pyramid pooling module. + out_channels (int): The number of output channels after pyramid pooling module. + bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1, 2, 3, 6). + dim_reduction (bool, optional): A bool value represents if reducing dimension after pooling. Default: True. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + """ + + def __init__(self, + in_channels: int, + out_channels: int, + bin_sizes: tuple, + dim_reduction: bool, + align_corners: bool): + super().__init__() + + self.bin_sizes = bin_sizes + + inter_channels = in_channels + if dim_reduction: + inter_channels = in_channels // len(bin_sizes) + + # we use dimension reduction after pooling mentioned in original implementation. + self.stages = nn.LayerList([ + self._make_stage(in_channels, inter_channels, size) + for size in bin_sizes + ]) + + self.conv_bn_relu2 = ConvBNReLU( + in_channels=in_channels + inter_channels * len(bin_sizes), + out_channels=out_channels, + kernel_size=3, + padding=1) + + self.align_corners = align_corners + + def _make_stage(self, in_channels: int, out_channels: int, size: int): + """ + Create one pooling layer. + + In our implementation, we adopt the same dimension reduction as the original paper that might be + slightly different with other implementations. + + After pooling, the channels are reduced to 1/len(bin_sizes) immediately, while some other implementations + keep the channels to be same. + + Args: + in_channels (int): The number of intput channels to pyramid pooling module. + out_channels (int): The number of output channels to pyramid pooling module. + size (int): The out size of the pooled layer. + + Returns: + conv (Tensor): A tensor after Pyramid Pooling Module. + """ + + prior = nn.AdaptiveAvgPool2D(output_size=(size, size)) + conv = ConvBNReLU( + in_channels=in_channels, out_channels=out_channels, kernel_size=1) + + return nn.Sequential(prior, conv) + + def forward(self, input: paddle.Tensor) -> paddle.Tensor: + cat_layers = [] + for stage in self.stages: + x = stage(input) + x = F.interpolate( + x, + paddle.shape(input)[2:], + mode='bilinear', + align_corners=self.align_corners) + cat_layers.append(x) + cat_layers = [input] + cat_layers[::-1] + cat = paddle.concat(cat_layers, axis=1) + out = self.conv_bn_relu2(cat) + + return out \ No newline at end of file diff --git a/modules/image/semantic_segmentation/stdc1_seg_voc/module.py b/modules/image/semantic_segmentation/stdc1_seg_voc/module.py new file mode 100644 index 0000000000000000000000000000000000000000..642628dc9d647c64605bd9016ddee7ab4de547d5 --- /dev/null +++ b/modules/image/semantic_segmentation/stdc1_seg_voc/module.py @@ -0,0 +1,235 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from typing import Union, List, Tuple + +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +from stdc1_seg_voc.stdcnet import STDC1 +import stdc1_seg_voc.layers as layers + + +@moduleinfo( + name="stdc1_seg_voc", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="STDCSeg is a segmentation model.", + version="1.0.0", + meta=ImageSegmentationModule) +class STDCSeg(nn.Layer): + """ + The STDCSeg implementation based on PaddlePaddle. + + The original article refers to Meituan + Fan, Mingyuan, et al. "Rethinking BiSeNet For Real-time Semantic Segmentation." + (https://arxiv.org/abs/2104.13188) + + Args: + num_classes(int,optional): The unique number of target classes. + use_boundary_8(bool,non-optional): Whether to use detail loss. it should be True accroding to paper for best metric. Default: True. + Actually,if you want to use _boundary_2/_boundary_4/_boundary_16,you should append loss function number of DetailAggregateLoss.It should work properly. + use_conv_last(bool,optional): Determine ContextPath 's inplanes variable according to whether to use bockbone's last conv. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None. + """ + + def __init__(self, + num_classes: int = 19, + use_boundary_2: bool = False, + use_boundary_4: bool = False, + use_boundary_8: bool = True, + use_boundary_16: bool = False, + use_conv_last: bool = False, + pretrained: str = None): + super(STDCSeg, self).__init__() + + self.use_boundary_2 = use_boundary_2 + self.use_boundary_4 = use_boundary_4 + self.use_boundary_8 = use_boundary_8 + self.use_boundary_16 = use_boundary_16 + self.cp = ContextPath(STDC1(), use_conv_last=use_conv_last) + self.ffm = FeatureFusionModule(384, 256) + self.conv_out = SegHead(256, 256, num_classes) + self.conv_out8 = SegHead(128, 64, num_classes) + self.conv_out16 = SegHead(128, 64, num_classes) + self.conv_out_sp16 = SegHead(512, 64, 1) + self.conv_out_sp8 = SegHead(256, 64, 1) + self.conv_out_sp4 = SegHead(64, 64, 1) + self.conv_out_sp2 = SegHead(32, 64, 1) + self.transforms = T.Compose([T.Normalize()]) + + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]: + return self.transforms(img) + + def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]: + x_hw = paddle.shape(x)[2:] + feat_res2, feat_res4, feat_res8, _, feat_cp8, feat_cp16 = self.cp(x) + + logit_list = [] + if self.training: + feat_fuse = self.ffm(feat_res8, feat_cp8) + feat_out = self.conv_out(feat_fuse) + feat_out8 = self.conv_out8(feat_cp8) + feat_out16 = self.conv_out16(feat_cp16) + + logit_list = [feat_out, feat_out8, feat_out16] + logit_list = [ + F.interpolate(x, x_hw, mode='bilinear', align_corners=True) + for x in logit_list + ] + + if self.use_boundary_2: + feat_out_sp2 = self.conv_out_sp2(feat_res2) + logit_list.append(feat_out_sp2) + if self.use_boundary_4: + feat_out_sp4 = self.conv_out_sp4(feat_res4) + logit_list.append(feat_out_sp4) + if self.use_boundary_8: + feat_out_sp8 = self.conv_out_sp8(feat_res8) + logit_list.append(feat_out_sp8) + else: + feat_fuse = self.ffm(feat_res8, feat_cp8) + feat_out = self.conv_out(feat_fuse) + feat_out = F.interpolate( + feat_out, x_hw, mode='bilinear', align_corners=True) + logit_list = [feat_out] + + return logit_list + + +class SegHead(nn.Layer): + def __init__(self, in_chan: int, mid_chan: int, n_classes:int): + super(SegHead, self).__init__() + self.conv = layers.ConvBNReLU( + in_chan, mid_chan, kernel_size=3, stride=1, padding=1) + self.conv_out = nn.Conv2D( + mid_chan, n_classes, kernel_size=1, bias_attr=None) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv(x) + x = self.conv_out(x) + return x + + +class AttentionRefinementModule(nn.Layer): + def __init__(self, in_chan: int, out_chan: int): + super(AttentionRefinementModule, self).__init__() + self.conv = layers.ConvBNReLU( + in_chan, out_chan, kernel_size=3, stride=1, padding=1) + self.conv_atten = nn.Conv2D( + out_chan, out_chan, kernel_size=1, bias_attr=None) + self.bn_atten = nn.BatchNorm2D(out_chan) + self.sigmoid_atten = nn.Sigmoid() + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + feat = self.conv(x) + atten = F.adaptive_avg_pool2d(feat, 1) + atten = self.conv_atten(atten) + atten = self.bn_atten(atten) + atten = self.sigmoid_atten(atten) + out = paddle.multiply(feat, atten) + return out + + +class ContextPath(nn.Layer): + def __init__(self, backbone, use_conv_last: bool = False): + super(ContextPath, self).__init__() + self.backbone = backbone + self.arm16 = AttentionRefinementModule(512, 128) + inplanes = 1024 + if use_conv_last: + inplanes = 1024 + self.arm32 = AttentionRefinementModule(inplanes, 128) + self.conv_head32 = layers.ConvBNReLU( + 128, 128, kernel_size=3, stride=1, padding=1) + self.conv_head16 = layers.ConvBNReLU( + 128, 128, kernel_size=3, stride=1, padding=1) + self.conv_avg = layers.ConvBNReLU( + inplanes, 128, kernel_size=1, stride=1, padding=0) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + feat2, feat4, feat8, feat16, feat32 = self.backbone(x) + + feat8_hw = paddle.shape(feat8)[2:] + feat16_hw = paddle.shape(feat16)[2:] + feat32_hw = paddle.shape(feat32)[2:] + + avg = F.adaptive_avg_pool2d(feat32, 1) + avg = self.conv_avg(avg) + avg_up = F.interpolate(avg, feat32_hw, mode='nearest') + + feat32_arm = self.arm32(feat32) + feat32_sum = feat32_arm + avg_up + feat32_up = F.interpolate(feat32_sum, feat16_hw, mode='nearest') + feat32_up = self.conv_head32(feat32_up) + + feat16_arm = self.arm16(feat16) + feat16_sum = feat16_arm + feat32_up + feat16_up = F.interpolate(feat16_sum, feat8_hw, mode='nearest') + feat16_up = self.conv_head16(feat16_up) + + return feat2, feat4, feat8, feat16, feat16_up, feat32_up # x8, x16 + + +class FeatureFusionModule(nn.Layer): + def __init__(self, in_chan:int , out_chan: int): + super(FeatureFusionModule, self).__init__() + self.convblk = layers.ConvBNReLU( + in_chan, out_chan, kernel_size=1, stride=1, padding=0) + self.conv1 = nn.Conv2D( + out_chan, + out_chan // 4, + kernel_size=1, + stride=1, + padding=0, + bias_attr=None) + self.conv2 = nn.Conv2D( + out_chan // 4, + out_chan, + kernel_size=1, + stride=1, + padding=0, + bias_attr=None) + self.relu = nn.ReLU() + self.sigmoid = nn.Sigmoid() + + def forward(self, fsp: paddle.Tensor, fcp: paddle.Tensor) -> paddle.Tensor: + fcat = paddle.concat([fsp, fcp], axis=1) + feat = self.convblk(fcat) + atten = F.adaptive_avg_pool2d(feat, 1) + atten = self.conv1(atten) + atten = self.relu(atten) + atten = self.conv2(atten) + atten = self.sigmoid(atten) + feat_atten = paddle.multiply(feat, atten) + feat_out = feat_atten + feat + return feat_out \ No newline at end of file diff --git a/modules/image/semantic_segmentation/stdc1_seg_voc/stdcnet.py b/modules/image/semantic_segmentation/stdc1_seg_voc/stdcnet.py new file mode 100644 index 0000000000000000000000000000000000000000..d2716a83b5fcf975c4d7a5e4291199d6b09689f9 --- /dev/null +++ b/modules/image/semantic_segmentation/stdc1_seg_voc/stdcnet.py @@ -0,0 +1,262 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math + +import paddle +import paddle.nn as nn + +import stdc1_seg_voc.layers as L + +__all__ = ["STDC1", "STDC2"] + + +class STDCNet(nn.Layer): + """ + The STDCNet implementation based on PaddlePaddle. + + The original article refers to Meituan + Fan, Mingyuan, et al. "Rethinking BiSeNet For Real-time Semantic Segmentation." + (https://arxiv.org/abs/2104.13188) + + Args: + base(int, optional): base channels. Default: 64. + layers(list, optional): layers numbers list. It determines STDC block numbers of STDCNet's stage3\4\5. Defualt: [4, 5, 3]. + block_num(int,optional): block_num of features block. Default: 4. + type(str,optional): feature fusion method "cat"/"add". Default: "cat". + num_classes(int, optional): class number for image classification. Default: 1000. + dropout(float,optional): dropout ratio. if >0,use dropout ratio. Default: 0.20. + use_conv_last(bool,optional): whether to use the last ConvBNReLU layer . Default: False. + pretrained(str, optional): the path of pretrained model. + """ + + def __init__(self, + base: int = 64, + layers: List[int] = [4, 5, 3], + block_num: int = 4, + type: str = "cat", + num_classes: int = 1000, + dropout: float = 0.20, + use_conv_last: bool = False): + super(STDCNet, self).__init__() + if type == "cat": + block = CatBottleneck + elif type == "add": + block = AddBottleneck + self.use_conv_last = use_conv_last + self.features = self._make_layers(base, layers, block_num, block) + self.conv_last = ConvBNRelu(base * 16, max(1024, base * 16), 1, 1) + + if (layers == [4, 5, 3]): #stdc1446 + self.x2 = nn.Sequential(self.features[:1]) + self.x4 = nn.Sequential(self.features[1:2]) + self.x8 = nn.Sequential(self.features[2:6]) + self.x16 = nn.Sequential(self.features[6:11]) + self.x32 = nn.Sequential(self.features[11:]) + elif (layers == [2, 2, 2]): #stdc813 + self.x2 = nn.Sequential(self.features[:1]) + self.x4 = nn.Sequential(self.features[1:2]) + self.x8 = nn.Sequential(self.features[2:4]) + self.x16 = nn.Sequential(self.features[4:6]) + self.x32 = nn.Sequential(self.features[6:]) + else: + raise NotImplementedError( + "model with layers:{} is not implemented!".format(layers)) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + """ + forward function for feature extract. + """ + feat2 = self.x2(x) + feat4 = self.x4(feat2) + feat8 = self.x8(feat4) + feat16 = self.x16(feat8) + feat32 = self.x32(feat16) + if self.use_conv_last: + feat32 = self.conv_last(feat32) + return feat2, feat4, feat8, feat16, feat32 + + def _make_layers(self, base, layers, block_num, block): + features = [] + features += [ConvBNRelu(3, base // 2, 3, 2)] + features += [ConvBNRelu(base // 2, base, 3, 2)] + + for i, layer in enumerate(layers): + for j in range(layer): + if i == 0 and j == 0: + features.append(block(base, base * 4, block_num, 2)) + elif j == 0: + features.append( + block(base * int(math.pow(2, i + 1)), + base * int(math.pow(2, i + 2)), block_num, 2)) + else: + features.append( + block(base * int(math.pow(2, i + 2)), + base * int(math.pow(2, i + 2)), block_num, 1)) + + return nn.Sequential(*features) + + +class ConvBNRelu(nn.Layer): + def __init__(self, in_planes: int, out_planes: int, kernel: int = 3, stride: int = 1): + super(ConvBNRelu, self).__init__() + self.conv = nn.Conv2D( + in_planes, + out_planes, + kernel_size=kernel, + stride=stride, + padding=kernel // 2, + bias_attr=False) + self.bn = L.SyncBatchNorm(out_planes, data_format='NCHW') + self.relu = nn.ReLU() + + def forward(self, x): + out = self.relu(self.bn(self.conv(x))) + return out + + +class AddBottleneck(nn.Layer): + def __init__(self, in_planes: int, out_planes: int, block_num: int = 3, stride: int = 1): + super(AddBottleneck, self).__init__() + assert block_num > 1, "block number should be larger than 1." + self.conv_list = nn.LayerList() + self.stride = stride + if stride == 2: + self.avd_layer = nn.Sequential( + nn.Conv2D( + out_planes // 2, + out_planes // 2, + kernel_size=3, + stride=2, + padding=1, + groups=out_planes // 2, + bias_attr=False), + nn.BatchNorm2D(out_planes // 2), + ) + self.skip = nn.Sequential( + nn.Conv2D( + in_planes, + in_planes, + kernel_size=3, + stride=2, + padding=1, + groups=in_planes, + bias_attr=False), + nn.BatchNorm2D(in_planes), + nn.Conv2D( + in_planes, out_planes, kernel_size=1, bias_attr=False), + nn.BatchNorm2D(out_planes), + ) + stride = 1 + + for idx in range(block_num): + if idx == 0: + self.conv_list.append( + ConvBNRelu(in_planes, out_planes // 2, kernel=1)) + elif idx == 1 and block_num == 2: + self.conv_list.append( + ConvBNRelu(out_planes // 2, out_planes // 2, stride=stride)) + elif idx == 1 and block_num > 2: + self.conv_list.append( + ConvBNRelu(out_planes // 2, out_planes // 4, stride=stride)) + elif idx < block_num - 1: + self.conv_list.append( + ConvBNRelu(out_planes // int(math.pow(2, idx)), + out_planes // int(math.pow(2, idx + 1)))) + else: + self.conv_list.append( + ConvBNRelu(out_planes // int(math.pow(2, idx)), + out_planes // int(math.pow(2, idx)))) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + out_list = [] + out = x + for idx, conv in enumerate(self.conv_list): + if idx == 0 and self.stride == 2: + out = self.avd_layer(conv(out)) + else: + out = conv(out) + out_list.append(out) + if self.stride == 2: + x = self.skip(x) + return paddle.concat(out_list, axis=1) + x + + +class CatBottleneck(nn.Layer): + def __init__(self, in_planes: int, out_planes: int, block_num: int = 3, stride: int = 1): + super(CatBottleneck, self).__init__() + assert block_num > 1, "block number should be larger than 1." + self.conv_list = nn.LayerList() + self.stride = stride + if stride == 2: + self.avd_layer = nn.Sequential( + nn.Conv2D( + out_planes // 2, + out_planes // 2, + kernel_size=3, + stride=2, + padding=1, + groups=out_planes // 2, + bias_attr=False), + nn.BatchNorm2D(out_planes // 2), + ) + self.skip = nn.AvgPool2D(kernel_size=3, stride=2, padding=1) + stride = 1 + + for idx in range(block_num): + if idx == 0: + self.conv_list.append( + ConvBNRelu(in_planes, out_planes // 2, kernel=1)) + elif idx == 1 and block_num == 2: + self.conv_list.append( + ConvBNRelu(out_planes // 2, out_planes // 2, stride=stride)) + elif idx == 1 and block_num > 2: + self.conv_list.append( + ConvBNRelu(out_planes // 2, out_planes // 4, stride=stride)) + elif idx < block_num - 1: + self.conv_list.append( + ConvBNRelu(out_planes // int(math.pow(2, idx)), + out_planes // int(math.pow(2, idx + 1)))) + else: + self.conv_list.append( + ConvBNRelu(out_planes // int(math.pow(2, idx)), + out_planes // int(math.pow(2, idx)))) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + out_list = [] + out1 = self.conv_list[0](x) + for idx, conv in enumerate(self.conv_list[1:]): + if idx == 0: + if self.stride == 2: + out = conv(self.avd_layer(out1)) + else: + out = conv(out1) + else: + out = conv(out) + out_list.append(out) + + if self.stride == 2: + out1 = self.skip(out1) + out_list.insert(0, out1) + out = paddle.concat(out_list, axis=1) + return out + + +def STDC2(**kwargs): + model = STDCNet(base=64, layers=[4, 5, 3], **kwargs) + return model + +def STDC1(**kwargs): + model = STDCNet(base=64, layers=[2, 2, 2], **kwargs) + return model \ No newline at end of file