diff --git a/modules/image/semantic_segmentation/bisenetv2_cityscapes/README.md b/modules/image/semantic_segmentation/bisenetv2_cityscapes/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a6cf08ca95e2d0fa8786eb0b6bbb4dcf64920ffe --- /dev/null +++ b/modules/image/semantic_segmentation/bisenetv2_cityscapes/README.md @@ -0,0 +1,176 @@ +# PaddleHub 图像分割 + +## 模型预测 + +若想使用我们提供的预训练模型进行预测,可使用如下脚本: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='bisenetv2_cityscapes') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + + + +## 如何开始Fine-tune + +本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。 + +在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用bisenetv2_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。 + +## 代码步骤 + +使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。 + +### Step1: 定义数据预处理方式 +```python +from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + +transform = Compose([Resize(target_size=(512, 512)), Normalize()]) +``` + +`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + +### Step2: 下载数据集并使用 +```python +from paddlehub.datasets import OpticDiscSeg + +train_reader = OpticDiscSeg(transform, mode='train') + +``` +* `transform`: 数据预处理方式。 +* `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + +数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + +### Step3: 加载预训练模型 + +```python +model = hub.Module(name='bisenetv2_cityscapes', num_classes=2, pretrained=None) +``` +* `name`: 选择预训练模型的名字。 +* `num_classes`: 分割模型的类别数目。 +* `pretrained`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + +### Step4: 选择优化策略和运行配置 + +```python +scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) +optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) +trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True) +``` + +#### 优化策略 + +Paddle2.0提供了多种优化器选择,如`SGD`, `Adam`, `Adamax`等,其中`Adam`: + +* `learning_rate`: 全局学习率。 +* `parameters`: 待优化模型参数。 + +#### 运行配置 +`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数: + +* `model`: 被优化模型; +* `optimizer`: 优化器选择; +* `use_gpu`: 是否使用gpu,默认为False; +* `use_vdl`: 是否使用vdl可视化训练过程; +* `checkpoint_dir`: 保存模型参数的地址; +* `compare_metrics`: 保存最优模型的衡量指标; + +`trainer.train` 主要控制具体的训练过程,包含以下可控制的参数: + +* `train_dataset`: 训练时所用的数据集; +* `epochs`: 训练轮数; +* `batch_size`: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size; +* `num_workers`: works的数量,默认为0; +* `eval_dataset`: 验证集; +* `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。 +* `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。 + +## 模型预测 + +当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。 + +我们使用该模型来进行预测。predict.py脚本如下: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='bisenetv2_cityscapes', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + +参数配置正确后,请执行脚本`python predict.py`。 +**Args** +* `images`:原始图像路径或BGR格式图片; +* `visualization`: 是否可视化,默认为True; +* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + +**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 服务部署 + +PaddleHub Serving可以部署一个在线图像分割服务。 + +### Step1: 启动PaddleHub Serving + +运行启动命令: + +```shell +$ hub serving start -m bisenetv2_cityscapes +``` + +这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + +**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +### Step2: 发送预测请求 + +配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + +```python +import requests +import json +import cv2 +import base64 + +import numpy as np + + +def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + +# 发送HTTP请求 +org_im = cv2.imread('/PATH/TO/IMAGE') +data = {'images':[cv2_to_base64(org_im)]} +headers = {"Content-type": "application/json"} +url = "http://127.0.0.1:8866/predict/bisenetv2_cityscapes" +r = requests.post(url=url, headers=headers, data=json.dumps(data)) +mask = base64_to_cv2(r.json()["results"][0]) +``` + +### 查看代码 + +https://github.com/PaddlePaddle/PaddleSeg + +### 依赖 + +paddlepaddle >= 2.0.0 + +paddlehub >= 2.0.0 diff --git a/modules/image/semantic_segmentation/bisenetv2_cityscapes/layers.py b/modules/image/semantic_segmentation/bisenetv2_cityscapes/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..dcaaded9f5453655c24bbb85e0115b8bb2fb7008 --- /dev/null +++ b/modules/image/semantic_segmentation/bisenetv2_cityscapes/layers.py @@ -0,0 +1,186 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu' or os.environ.get('PADDLESEG_EXPORT_STAGE'): + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs): + super().__init__() + + self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs): + super().__init__() + self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvReLUPool(nn.Layer): + """Basic conv bn pool layer.""" + + def __init__(self, in_channels: int, out_channels: int): + super().__init__() + self.conv = nn.Conv2D(in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv(x) + x = F.relu(x) + x = F.pool2d(x, pool_size=2, pool_type="max", pool_stride=2) + return x + + +class SeparableConvBNReLU(nn.Layer): + """Basic separable conv bn relu layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs): + super().__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class DepthwiseConvBN(nn.Layer): + """Basic depthwise conv bn relu layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs): + super().__init__() + + self.depthwise_conv = ConvBN( + in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + return x + + +class AuxLayer(nn.Layer): + """ + The auxiliary layer implementation for auxiliary loss. + + Args: + in_channels (int): The number of input channels. + inter_channels (int): The intermediate channels. + out_channels (int): The number of output channels, and usually it is num_classes. + dropout_prob (float, optional): The drop rate. Default: 0.1. + """ + + def __init__(self, in_channels: int, inter_channels: int, out_channels: int, dropout_prob: float = 0.1): + super().__init__() + + self.conv_bn_relu = ConvBNReLU(in_channels=in_channels, out_channels=inter_channels, kernel_size=3, padding=1) + + self.dropout = nn.Dropout(p=dropout_prob) + + self.conv = nn.Conv2D(in_channels=inter_channels, out_channels=out_channels, kernel_size=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv_bn_relu(x) + x = self.dropout(x) + x = self.conv(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + Returns: + A callable object of Activation. + Raises: + KeyError: When parameter `act` is not in the optional range. + Examples: + from paddleseg.models.common.activation import Activation + relu = Activation("relu") + print(relu) + # + sigmoid = Activation("sigmoid") + print(sigmoid) + # + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = nn.layer.activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("nn.layer.activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + if self._act is not None: + return self.act_func(x) + else: + return x diff --git a/modules/image/semantic_segmentation/bisenetv2_cityscapes/module.py b/modules/image/semantic_segmentation/bisenetv2_cityscapes/module.py new file mode 100644 index 0000000000000000000000000000000000000000..7745be3c6ec0ae3b598b6598503449c670a54a50 --- /dev/null +++ b/modules/image/semantic_segmentation/bisenetv2_cityscapes/module.py @@ -0,0 +1,288 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from typing import Union, List, Tuple + +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +import bisenet_cityscapes.layers as layers + + +@moduleinfo( + name="bisenetv2_cityscapes", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="Bisenet is a segmentation model trained by Cityscapes.", + version="1.0.0", + meta=ImageSegmentationModule) +class BiSeNetV2(nn.Layer): + """ + The BiSeNet V2 implementation based on PaddlePaddle. + + The original article refers to + Yu, Changqian, et al. "BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation" + (https://arxiv.org/abs/2004.02147) + + Args: + num_classes (int): The unique number of target classes, default is 19. + lambd (float, optional): A factor for controlling the size of semantic branch channels. Default: 0.25. + align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even, + e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None. + """ + + def __init__(self, num_classes: int = 19, lambd: float = 0.25, align_corners: bool = False, pretrained: str = None): + super(BiSeNetV2, self).__init__() + + C1, C2, C3 = 64, 64, 128 + db_channels = (C1, C2, C3) + C1, C3, C4, C5 = int(C1 * lambd), int(C3 * lambd), 64, 128 + sb_channels = (C1, C3, C4, C5) + mid_channels = 128 + + self.db = DetailBranch(db_channels) + self.sb = SemanticBranch(sb_channels) + + self.bga = BGA(mid_channels, align_corners) + self.aux_head1 = SegHead(C1, C1, num_classes) + self.aux_head2 = SegHead(C3, C3, num_classes) + self.aux_head3 = SegHead(C4, C4, num_classes) + self.aux_head4 = SegHead(C5, C5, num_classes) + self.head = SegHead(mid_channels, mid_channels, num_classes) + + self.align_corners = align_corners + self.transforms = T.Compose([T.Normalize()]) + + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'bisenet_model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]: + return self.transforms(img) + + def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]: + dfm = self.db(x) + feat1, feat2, feat3, feat4, sfm = self.sb(x) + logit = self.head(self.bga(dfm, sfm)) + + if not self.training: + logit_list = [logit] + else: + logit1 = self.aux_head1(feat1) + logit2 = self.aux_head2(feat2) + logit3 = self.aux_head3(feat3) + logit4 = self.aux_head4(feat4) + logit_list = [logit, logit1, logit2, logit3, logit4] + + logit_list = [ + F.interpolate(logit, paddle.shape(x)[2:], mode='bilinear', align_corners=self.align_corners) + for logit in logit_list + ] + + return logit_list + + +class StemBlock(nn.Layer): + def __init__(self, in_dim: int, out_dim: int): + super(StemBlock, self).__init__() + + self.conv = layers.ConvBNReLU(in_dim, out_dim, 3, stride=2) + + self.left = nn.Sequential( + layers.ConvBNReLU(out_dim, out_dim // 2, 1), layers.ConvBNReLU(out_dim // 2, out_dim, 3, stride=2)) + + self.right = nn.MaxPool2D(kernel_size=3, stride=2, padding=1) + + self.fuse = layers.ConvBNReLU(out_dim * 2, out_dim, 3) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv(x) + left = self.left(x) + right = self.right(x) + concat = paddle.concat([left, right], axis=1) + return self.fuse(concat) + + +class ContextEmbeddingBlock(nn.Layer): + def __init__(self, in_dim: int, out_dim: int): + super(ContextEmbeddingBlock, self).__init__() + + self.gap = nn.AdaptiveAvgPool2D(1) + self.bn = layers.SyncBatchNorm(in_dim) + + self.conv_1x1 = layers.ConvBNReLU(in_dim, out_dim, 1) + self.conv_3x3 = nn.Conv2D(out_dim, out_dim, 3, 1, 1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + gap = self.gap(x) + bn = self.bn(gap) + conv1 = self.conv_1x1(bn) + x + return self.conv_3x3(conv1) + + +class GatherAndExpansionLayer1(nn.Layer): + """Gather And Expansion Layer with stride 1""" + + def __init__(self, in_dim: int, out_dim: int, expand: int): + super().__init__() + + expand_dim = expand * in_dim + + self.conv = nn.Sequential( + layers.ConvBNReLU(in_dim, in_dim, 3), layers.DepthwiseConvBN(in_dim, expand_dim, 3), + layers.ConvBN(expand_dim, out_dim, 1)) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + return F.relu(self.conv(x) + x) + + +class GatherAndExpansionLayer2(nn.Layer): + """Gather And Expansion Layer with stride 2""" + + def __init__(self, in_dim: int, out_dim: int, expand: int): + super().__init__() + + expand_dim = expand * in_dim + + self.branch_1 = nn.Sequential( + layers.ConvBNReLU(in_dim, in_dim, 3), layers.DepthwiseConvBN(in_dim, expand_dim, 3, stride=2), + layers.DepthwiseConvBN(expand_dim, expand_dim, 3), layers.ConvBN(expand_dim, out_dim, 1)) + + self.branch_2 = nn.Sequential( + layers.DepthwiseConvBN(in_dim, in_dim, 3, stride=2), layers.ConvBN(in_dim, out_dim, 1)) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + return F.relu(self.branch_1(x) + self.branch_2(x)) + + +class DetailBranch(nn.Layer): + """The detail branch of BiSeNet, which has wide channels but shallow layers.""" + + def __init__(self, in_channels: int): + super().__init__() + + C1, C2, C3 = in_channels + + self.convs = nn.Sequential( + # stage 1 + layers.ConvBNReLU(3, C1, 3, stride=2), + layers.ConvBNReLU(C1, C1, 3), + # stage 2 + layers.ConvBNReLU(C1, C2, 3, stride=2), + layers.ConvBNReLU(C2, C2, 3), + layers.ConvBNReLU(C2, C2, 3), + # stage 3 + layers.ConvBNReLU(C2, C3, 3, stride=2), + layers.ConvBNReLU(C3, C3, 3), + layers.ConvBNReLU(C3, C3, 3), + ) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + return self.convs(x) + + +class SemanticBranch(nn.Layer): + """The semantic branch of BiSeNet, which has narrow channels but deep layers.""" + + def __init__(self, in_channels: int): + super().__init__() + C1, C3, C4, C5 = in_channels + + self.stem = StemBlock(3, C1) + + self.stage3 = nn.Sequential(GatherAndExpansionLayer2(C1, C3, 6), GatherAndExpansionLayer1(C3, C3, 6)) + + self.stage4 = nn.Sequential(GatherAndExpansionLayer2(C3, C4, 6), GatherAndExpansionLayer1(C4, C4, 6)) + + self.stage5_4 = nn.Sequential( + GatherAndExpansionLayer2(C4, C5, 6), GatherAndExpansionLayer1(C5, C5, 6), GatherAndExpansionLayer1( + C5, C5, 6), GatherAndExpansionLayer1(C5, C5, 6)) + + self.ce = ContextEmbeddingBlock(C5, C5) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + stage2 = self.stem(x) + stage3 = self.stage3(stage2) + stage4 = self.stage4(stage3) + stage5_4 = self.stage5_4(stage4) + fm = self.ce(stage5_4) + return stage2, stage3, stage4, stage5_4, fm + + +class BGA(nn.Layer): + """The Bilateral Guided Aggregation Layer, used to fuse the semantic features and spatial features.""" + + def __init__(self, out_dim: int, align_corners: bool): + super().__init__() + + self.align_corners = align_corners + + self.db_branch_keep = nn.Sequential(layers.DepthwiseConvBN(out_dim, out_dim, 3), nn.Conv2D(out_dim, out_dim, 1)) + + self.db_branch_down = nn.Sequential( + layers.ConvBN(out_dim, out_dim, 3, stride=2), nn.AvgPool2D(kernel_size=3, stride=2, padding=1)) + + self.sb_branch_keep = nn.Sequential( + layers.DepthwiseConvBN(out_dim, out_dim, 3), nn.Conv2D(out_dim, out_dim, 1), + layers.Activation(act='sigmoid')) + + self.sb_branch_up = layers.ConvBN(out_dim, out_dim, 3) + + self.conv = layers.ConvBN(out_dim, out_dim, 3) + + def forward(self, dfm: int, sfm: int) -> paddle.Tensor: + db_feat_keep = self.db_branch_keep(dfm) + db_feat_down = self.db_branch_down(dfm) + sb_feat_keep = self.sb_branch_keep(sfm) + + sb_feat_up = self.sb_branch_up(sfm) + sb_feat_up = F.interpolate( + sb_feat_up, paddle.shape(db_feat_keep)[2:], mode='bilinear', align_corners=self.align_corners) + + sb_feat_up = F.sigmoid(sb_feat_up) + db_feat = db_feat_keep * sb_feat_up + + sb_feat = db_feat_down * sb_feat_keep + sb_feat = F.interpolate(sb_feat, paddle.shape(db_feat)[2:], mode='bilinear', align_corners=self.align_corners) + + return self.conv(db_feat + sb_feat) + + +class SegHead(nn.Layer): + def __init__(self, in_dim: int, mid_dim: int, num_classes: int): + super().__init__() + + self.conv_3x3 = nn.Sequential(layers.ConvBNReLU(in_dim, mid_dim, 3), nn.Dropout(0.1)) + + self.conv_1x1 = nn.Conv2D(mid_dim, num_classes, 1, 1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + conv1 = self.conv_3x3(x) + conv2 = self.conv_1x1(conv1) + return conv2 diff --git a/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/README.md b/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/README.md new file mode 100644 index 0000000000000000000000000000000000000000..9629ccdf433eb1c1970571a01f2663a2af8457f6 --- /dev/null +++ b/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/README.md @@ -0,0 +1,173 @@ +# PaddleHub 图像分割 + +## 模型预测 + +若想使用我们提供的预训练模型进行预测,可使用如下脚本: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='deeplabv3p_resnet50_cityscapes') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + + +## 如何开始Fine-tune + +在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用deeplabv3p_resnet50_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。 + +## 代码步骤 + +使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。 + +### Step1: 定义数据预处理方式 +```python +from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + +transform = Compose([Resize(target_size=(512, 512)), Normalize()]) +``` + +`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + +### Step2: 下载数据集并使用 +```python +from paddlehub.datasets import OpticDiscSeg + +train_reader = OpticDiscSeg(transform, mode='train') + +``` +* `transform`: 数据预处理方式。 +* `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + +数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + +### Step3: 加载预训练模型 + +```python +model = hub.Module(name='deeplabv3p_resnet50_cityscapes', num_classes=2, pretrained=None) +``` +* `name`: 选择预训练模型的名字。 +* `num_classes`: 分割模型的类别数目。 +* `pretrained`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + +### Step4: 选择优化策略和运行配置 + +```python +scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) +optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) +trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True) +``` + +#### 优化策略 + +Paddle2.0rc提供了多种优化器选择,如`SGD`, `Adam`, `Adamax`等,其中`Adam`: + +* `learning_rate`: 全局学习率。 +* `parameters`: 待优化模型参数。 + +#### 运行配置 +`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数: + +* `model`: 被优化模型; +* `optimizer`: 优化器选择; +* `use_gpu`: 是否使用gpu,默认为False; +* `use_vdl`: 是否使用vdl可视化训练过程; +* `checkpoint_dir`: 保存模型参数的地址; +* `compare_metrics`: 保存最优模型的衡量指标; + +`trainer.train` 主要控制具体的训练过程,包含以下可控制的参数: + +* `train_dataset`: 训练时所用的数据集; +* `epochs`: 训练轮数; +* `batch_size`: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size; +* `num_workers`: works的数量,默认为0; +* `eval_dataset`: 验证集; +* `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。 +* `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。 + +## 模型预测 + +当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。 + +我们使用该模型来进行预测。predict.py脚本如下: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='deeplabv3p_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + +参数配置正确后,请执行脚本`python predict.py`。 +**Args** +* `images`:原始图像路径或BGR格式图片; +* `visualization`: 是否可视化,默认为True; +* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + +**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 服务部署 + +PaddleHub Serving可以部署一个在线图像分割服务。 + +### Step1: 启动PaddleHub Serving + +运行启动命令: + +```shell +$ hub serving start -m deeplabv3p_resnet50_cityscapes +``` + +这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + +**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +### Step2: 发送预测请求 + +配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + +```python +import requests +import json +import cv2 +import base64 + +import numpy as np + + +def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + +# 发送HTTP请求 +org_im = cv2.imread('/PATH/TO/IMAGE') +data = {'images':[cv2_to_base64(org_im)]} +headers = {"Content-type": "application/json"} +url = "http://127.0.0.1:8866/predict/deeplabv3p_resnet50_cityscapes" +r = requests.post(url=url, headers=headers, data=json.dumps(data)) +mask = base64_to_cv2(r.json()["results"][0]) +``` + +### 查看代码 + +https://github.com/PaddlePaddle/PaddleSeg + +### 依赖 + +paddlepaddle >= 2.0.0 + +paddlehub >= 2.0.0 diff --git a/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/layers.py b/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..ee62265b585c80189c32846c0037b2b002244d6d --- /dev/null +++ b/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/layers.py @@ -0,0 +1,295 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn import Conv2D, AvgPool2D + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu': + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class ConvBNLayer(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + stride: int = 1, + dilation: int = 1, + groups: int = 1, + is_vd_mode: bool = False, + act: str = None, + name: str = None): + super(ConvBNLayer, self).__init__() + + self.is_vd_mode = is_vd_mode + self._pool2d_avg = AvgPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True) + self._conv = Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2 if dilation == 1 else 0, + dilation=dilation, + groups=groups, + bias_attr=False) + + self._batch_norm = SyncBatchNorm(out_channels) + self._act_op = Activation(act=act) + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + if self.is_vd_mode: + inputs = self._pool2d_avg(inputs) + y = self._conv(inputs) + y = self._batch_norm(y) + y = self._act_op(y) + + return y + + +class BottleneckBlock(nn.Layer): + """Residual bottleneck block""" + + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + shortcut: bool = True, + if_first: bool = False, + dilation: int = 1, + name: str = None): + super(BottleneckBlock, self).__init__() + + self.conv0 = ConvBNLayer( + in_channels=in_channels, out_channels=out_channels, kernel_size=1, act='relu', name=name + "_branch2a") + + self.dilation = dilation + + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + act='relu', + dilation=dilation, + name=name + "_branch2b") + self.conv2 = ConvBNLayer( + in_channels=out_channels, out_channels=out_channels * 4, kernel_size=1, act=None, name=name + "_branch2c") + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels * 4, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + name=name + "_branch1") + + self.shortcut = shortcut + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + if self.dilation > 1: + padding = self.dilation + y = F.pad(y, [padding, padding, padding, padding]) + + conv1 = self.conv1(y) + conv2 = self.conv2(conv1) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + + y = paddle.add(x=short, y=conv2) + y = F.relu(y) + return y + + +class SeparableConvBNReLU(nn.Layer): + """Depthwise Separable Convolution.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict): + super(SeparableConvBNReLU, self).__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict): + super(ConvBN, self).__init__() + self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict): + super(ConvBNReLU, self).__init__() + + self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + Returns: + A callable object of Activation. + Raises: + KeyError: When parameter `act` is not in the optional range. + Examples: + from paddleseg.models.common.activation import Activation + relu = Activation("relu") + print(relu) + # + sigmoid = Activation("sigmoid") + print(sigmoid) + # + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = nn.layer.activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("nn.layer.activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + if self._act is not None: + return self.act_func(x) + else: + return x + + +class ASPPModule(nn.Layer): + """ + Atrous Spatial Pyramid Pooling. + + Args: + aspp_ratios (tuple): The dilation rate using in ASSP module. + in_channels (int): The number of input channels. + out_channels (int): The number of output channels. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False. + image_pooling (bool, optional): If augmented with image-level features. Default: False + """ + + def __init__(self, + aspp_ratios: tuple, + in_channels: int, + out_channels: int, + align_corners: bool, + use_sep_conv: bool = False, + image_pooling: bool = False): + super().__init__() + + self.align_corners = align_corners + self.aspp_blocks = nn.LayerList() + + for ratio in aspp_ratios: + if use_sep_conv and ratio > 1: + conv_func = SeparableConvBNReLU + else: + conv_func = ConvBNReLU + + block = conv_func( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1 if ratio == 1 else 3, + dilation=ratio, + padding=0 if ratio == 1 else ratio) + self.aspp_blocks.append(block) + + out_size = len(self.aspp_blocks) + + if image_pooling: + self.global_avg_pool = nn.Sequential( + nn.AdaptiveAvgPool2D(output_size=(1, 1)), + ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False)) + out_size += 1 + self.image_pooling = image_pooling + + self.conv_bn_relu = ConvBNReLU(in_channels=out_channels * out_size, out_channels=out_channels, kernel_size=1) + + self.dropout = nn.Dropout(p=0.1) # drop rate + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outputs = [] + for block in self.aspp_blocks: + y = block(x) + y = F.interpolate(y, x.shape[2:], mode='bilinear', align_corners=self.align_corners) + outputs.append(y) + + if self.image_pooling: + img_avg = self.global_avg_pool(x) + img_avg = F.interpolate(img_avg, x.shape[2:], mode='bilinear', align_corners=self.align_corners) + outputs.append(img_avg) + + x = paddle.concat(outputs, axis=1) + x = self.conv_bn_relu(x) + x = self.dropout(x) + + return x diff --git a/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/module.py b/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/module.py new file mode 100644 index 0000000000000000000000000000000000000000..149bc4e04a7b52f8fce71bef4c3dbcdc8e4b74ec --- /dev/null +++ b/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/module.py @@ -0,0 +1,169 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from typing import Union, List, Tuple + +import paddle +from paddle import nn +import paddle.nn.functional as F +import numpy as np +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +from deeplabv3p_resnet50_cityscapes.resnet import ResNet50_vd +import deeplabv3p_resnet50_cityscapes.layers as L + + +@moduleinfo( + name="deeplabv3p_resnet50_cityscapes", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="DeepLabV3PResnet50 is a segmentation model.", + version="1.0.0", + meta=ImageSegmentationModule) +class DeepLabV3PResnet50(nn.Layer): + """ + The DeepLabV3PResnet50 implementation based on PaddlePaddle. + + The original article refers to + Liang-Chieh Chen, et, al. "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation" + (https://arxiv.org/abs/1802.02611) + + Args: + num_classes (int): the unique number of target classes. + backbone_indices (tuple): two values in the tuple indicate the indices of output of backbone. + the first index will be taken as a low-level feature in Decoder component; + the second one will be taken as input of ASPP component. + Usually backbone consists of four downsampling stage, and return an output of + each stage, so we set default (0, 3), which means taking feature map of the first + stage in backbone as low-level feature used in Decoder, and feature map of the fourth + stage as input of ASPP. + aspp_ratios (tuple): the dilation rate using in ASSP module. + if output_stride=16, aspp_ratios should be set as (1, 6, 12, 18). + if output_stride=8, aspp_ratios is (1, 12, 24, 36). + aspp_out_channels (int): the output channels of ASPP module. + align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even, + e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + pretrained (str): the path of pretrained model. Default to None. + """ + + def __init__(self, + num_classes: int = 19, + backbone_indices: Tuple[int] = (0, 3), + aspp_ratios: Tuple[int] = (1, 12, 24, 36), + aspp_out_channels: int = 256, + align_corners=False, + pretrained: str = None): + super(DeepLabV3PResnet50, self).__init__() + self.backbone = ResNet50_vd() + backbone_channels = [self.backbone.feat_channels[i] for i in backbone_indices] + self.head = DeepLabV3PHead(num_classes, backbone_indices, backbone_channels, aspp_ratios, aspp_out_channels, + align_corners) + self.align_corners = align_corners + self.transforms = T.Compose([T.Normalize()]) + + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]: + return self.transforms(img) + + def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]: + feat_list = self.backbone(x) + logit_list = self.head(feat_list) + return [ + F.interpolate(logit, x.shape[2:], mode='bilinear', align_corners=self.align_corners) for logit in logit_list + ] + + +class DeepLabV3PHead(nn.Layer): + """ + The DeepLabV3PHead implementation based on PaddlePaddle. + + Args: + num_classes (int): The unique number of target classes. + backbone_indices (tuple): Two values in the tuple indicate the indices of output of backbone. + the first index will be taken as a low-level feature in Decoder component; + the second one will be taken as input of ASPP component. + Usually backbone consists of four downsampling stage, and return an output of + each stage. If we set it as (0, 3), it means taking feature map of the first + stage in backbone as low-level feature used in Decoder, and feature map of the fourth + stage as input of ASPP. + backbone_channels (tuple): The same length with "backbone_indices". It indicates the channels of corresponding index. + aspp_ratios (tuple): The dilation rates using in ASSP module. + aspp_out_channels (int): The output channels of ASPP module. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + """ + + def __init__(self, num_classes: int, backbone_indices: Tuple[paddle.Tensor], + backbone_channels: Tuple[paddle.Tensor], aspp_ratios: Tuple[float], aspp_out_channels: int, + align_corners: bool): + super().__init__() + + self.aspp = L.ASPPModule( + aspp_ratios, backbone_channels[1], aspp_out_channels, align_corners, use_sep_conv=True, image_pooling=True) + self.decoder = Decoder(num_classes, backbone_channels[0], align_corners) + self.backbone_indices = backbone_indices + + def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]: + logit_list = [] + low_level_feat = feat_list[self.backbone_indices[0]] + x = feat_list[self.backbone_indices[1]] + x = self.aspp(x) + logit = self.decoder(x, low_level_feat) + logit_list.append(logit) + return logit_list + + +class Decoder(nn.Layer): + """ + Decoder module of DeepLabV3P model + + Args: + num_classes (int): The number of classes. + in_channels (int): The number of input channels in decoder module. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + """ + + def __init__(self, num_classes: int, in_channels: int, align_corners: bool): + super(Decoder, self).__init__() + + self.conv_bn_relu1 = L.ConvBNReLU(in_channels=in_channels, out_channels=48, kernel_size=1) + + self.conv_bn_relu2 = L.SeparableConvBNReLU(in_channels=304, out_channels=256, kernel_size=3, padding=1) + self.conv_bn_relu3 = L.SeparableConvBNReLU(in_channels=256, out_channels=256, kernel_size=3, padding=1) + self.conv = nn.Conv2D(in_channels=256, out_channels=num_classes, kernel_size=1) + + self.align_corners = align_corners + + def forward(self, x: paddle.Tensor, low_level_feat: paddle.Tensor) -> paddle.Tensor: + low_level_feat = self.conv_bn_relu1(low_level_feat) + x = F.interpolate(x, low_level_feat.shape[2:], mode='bilinear', align_corners=self.align_corners) + x = paddle.concat([x, low_level_feat], axis=1) + x = self.conv_bn_relu2(x) + x = self.conv_bn_relu3(x) + x = self.conv(x) + return x diff --git a/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/resnet.py b/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/resnet.py new file mode 100644 index 0000000000000000000000000000000000000000..6c7fdfeb66c84d1595954bac4fcd65863649f7c8 --- /dev/null +++ b/modules/image/semantic_segmentation/deeplabv3p_resnet50_cityscapes/resnet.py @@ -0,0 +1,115 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from typing import Union, List, Tuple + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +import deeplabv3p_resnet50_cityscapes.layers as L + + +class BasicBlock(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + shortcut: bool = True, + if_first: bool = False, + name: str = None): + super(BasicBlock, self).__init__() + self.stride = stride + self.conv0 = L.ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + act='relu', + name=name + "_branch2a") + self.conv1 = L.ConvBNLayer( + in_channels=out_channels, out_channels=out_channels, kernel_size=3, act=None, name=name + "_branch2b") + + if not shortcut: + self.short = L.ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first else True, + name=name + "_branch1") + + self.shortcut = shortcut + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + conv1 = self.conv1(y) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + y = paddle.elementwise_add(x=short, y=conv1, act='relu') + + return y + + +class ResNet50_vd(nn.Layer): + def __init__(self, multi_grid: Tuple[int] = (1, 2, 4)): + super(ResNet50_vd, self).__init__() + depth = [3, 4, 6, 3] + num_channels = [64, 256, 512, 1024] + num_filters = [64, 128, 256, 512] + self.feat_channels = [c * 4 for c in num_filters] + dilation_dict = {2: 2, 3: 4} + self.conv1_1 = L.ConvBNLayer( + in_channels=3, out_channels=32, kernel_size=3, stride=2, act='relu', name="conv1_1") + self.conv1_2 = L.ConvBNLayer( + in_channels=32, out_channels=32, kernel_size=3, stride=1, act='relu', name="conv1_2") + self.conv1_3 = L.ConvBNLayer( + in_channels=32, out_channels=64, kernel_size=3, stride=1, act='relu', name="conv1_3") + self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1) + self.stage_list = [] + + for block in range(len(depth)): + shortcut = False + block_list = [] + for i in range(depth[block]): + conv_name = "res" + str(block + 2) + chr(97 + i) + dilation_rate = dilation_dict[block] if dilation_dict and block in dilation_dict else 1 + if block == 3: + dilation_rate = dilation_rate * multi_grid[i] + bottleneck_block = self.add_sublayer( + 'bb_%d_%d' % (block, i), + L.BottleneckBlock( + in_channels=num_channels[block] if i == 0 else num_filters[block] * 4, + out_channels=num_filters[block], + stride=2 if i == 0 and block != 0 and dilation_rate == 1 else 1, + shortcut=shortcut, + if_first=block == i == 0, + name=conv_name, + dilation=dilation_rate)) + block_list.append(bottleneck_block) + shortcut = True + self.stage_list.append(block_list) + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv1_1(inputs) + y = self.conv1_2(y) + y = self.conv1_3(y) + y = self.pool2d_max(y) + feat_list = [] + for stage in self.stage_list: + for block in stage: + y = block(y) + feat_list.append(y) + return feat_list diff --git a/modules/image/semantic_segmentation/fastscnn_cityscapes/README.md b/modules/image/semantic_segmentation/fastscnn_cityscapes/README.md new file mode 100644 index 0000000000000000000000000000000000000000..1c5db026d2bb5450aca1f2e4106f0e3abf0f212c --- /dev/null +++ b/modules/image/semantic_segmentation/fastscnn_cityscapes/README.md @@ -0,0 +1,173 @@ +# PaddleHub 图像分割 + +## 模型预测 + +若想使用我们提供的预训练模型进行预测,可使用如下脚本: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='fastscnn_cityscapes') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + + +## 如何开始Fine-tune + +在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用fastscnn_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。 + +## 代码步骤 + +使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。 + +### Step1: 定义数据预处理方式 +```python +from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + +transform = Compose([Resize(target_size=(512, 512)), Normalize()]) +``` + +`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + +### Step2: 下载数据集并使用 +```python +from paddlehub.datasets import OpticDiscSeg + +train_reader = OpticDiscSeg(transform, mode='train') + +``` +* `transform`: 数据预处理方式。 +* `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + +数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + +### Step3: 加载预训练模型 + +```python +model = hub.Module(name='fastscnn_cityscapes', num_classes=2, pretrained=None) +``` +* `name`: 选择预训练模型的名字。 +* `num_classes`: 分割模型的类别数目。 +* `pretrained`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + +### Step4: 选择优化策略和运行配置 + +```python +scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) +optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) +trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True) +``` + +#### 优化策略 + +Paddle2.0rc提供了多种优化器选择,如`SGD`, `Adam`, `Adamax`等,其中`Adam`: + +* `learning_rate`: 全局学习率。 +* `parameters`: 待优化模型参数。 + +#### 运行配置 +`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数: + +* `model`: 被优化模型; +* `optimizer`: 优化器选择; +* `use_gpu`: 是否使用gpu,默认为False; +* `use_vdl`: 是否使用vdl可视化训练过程; +* `checkpoint_dir`: 保存模型参数的地址; +* `compare_metrics`: 保存最优模型的衡量指标; + +`trainer.train` 主要控制具体的训练过程,包含以下可控制的参数: + +* `train_dataset`: 训练时所用的数据集; +* `epochs`: 训练轮数; +* `batch_size`: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size; +* `num_workers`: works的数量,默认为0; +* `eval_dataset`: 验证集; +* `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。 +* `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。 + +## 模型预测 + +当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。 + +我们使用该模型来进行预测。predict.py脚本如下: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='fastscnn_cityscapes', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + +参数配置正确后,请执行脚本`python predict.py`。 +**Args** +* `images`:原始图像路径或BGR格式图片; +* `visualization`: 是否可视化,默认为True; +* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + +**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 服务部署 + +PaddleHub Serving可以部署一个在线图像分割服务。 + +### Step1: 启动PaddleHub Serving + +运行启动命令: + +```shell +$ hub serving start -m fastscnn_cityscapes +``` + +这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + +**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +### Step2: 发送预测请求 + +配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + +```python +import requests +import json +import cv2 +import base64 + +import numpy as np + + +def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + +# 发送HTTP请求 +org_im = cv2.imread('/PATH/TO/IMAGE') +data = {'images':[cv2_to_base64(org_im)]} +headers = {"Content-type": "application/json"} +url = "http://127.0.0.1:8866/predict/fastscnn_cityscapes" +r = requests.post(url=url, headers=headers, data=json.dumps(data)) +mask = base64_to_cv2(r.json()["results"][0]) +``` + +### 查看代码 + +https://github.com/PaddlePaddle/PaddleSeg + +### 依赖 + +paddlepaddle >= 2.0.0 + +paddlehub >= 2.0.0 diff --git a/modules/image/semantic_segmentation/fastscnn_cityscapes/layers.py b/modules/image/semantic_segmentation/fastscnn_cityscapes/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..5e36a1501126097f5021c0b5e2e53cd98b67976a --- /dev/null +++ b/modules/image/semantic_segmentation/fastscnn_cityscapes/layers.py @@ -0,0 +1,256 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from typing import Tuple + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu' or os.environ.get('PADDLESEG_EXPORT_STAGE'): + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs): + super().__init__() + + self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs): + super().__init__() + self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvReLUPool(nn.Layer): + """Basic conv bn pool layer.""" + + def __init__(self, in_channels: int, out_channels: int): + super().__init__() + self.conv = nn.Conv2D(in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv(x) + x = F.relu(x) + x = F.pool2d(x, pool_size=2, pool_type="max", pool_stride=2) + return x + + +class SeparableConvBNReLU(nn.Layer): + """Basic separable conv bn relu layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs): + super().__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class DepthwiseConvBN(nn.Layer): + """Basic depthwise conv bn relu layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs): + super().__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + return x + + +class AuxLayer(nn.Layer): + """ + The auxiliary layer implementation for auxiliary loss. + + Args: + in_channels (int): The number of input channels. + inter_channels (int): The intermediate channels. + out_channels (int): The number of output channels, and usually it is num_classes. + dropout_prob (float, optional): The drop rate. Default: 0.1. + """ + + def __init__(self, in_channels: int, inter_channels: int, out_channels: int, dropout_prob: float = 0.1): + super().__init__() + + self.conv_bn_relu = ConvBNReLU(in_channels=in_channels, out_channels=inter_channels, kernel_size=3, padding=1) + + self.dropout = nn.Dropout(p=dropout_prob) + + self.conv = nn.Conv2D(in_channels=inter_channels, out_channels=out_channels, kernel_size=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv_bn_relu(x) + x = self.dropout(x) + x = self.conv(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + Returns: + A callable object of Activation. + Raises: + KeyError: When parameter `act` is not in the optional range. + Examples: + from paddleseg.models.common.activation import Activation + relu = Activation("relu") + print(relu) + # + sigmoid = Activation("sigmoid") + print(sigmoid) + # + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = nn.layer.activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("nn.layer.activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + if self._act is not None: + return self.act_func(x) + else: + return x + + +class PPModule(nn.Layer): + """ + Pyramid pooling module originally in PSPNet. + + Args: + in_channels (int): The number of intput channels to pyramid pooling module. + out_channels (int): The number of output channels after pyramid pooling module. + bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1, 2, 3, 6). + dim_reduction (bool, optional): A bool value represents if reducing dimension after pooling. Default: True. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + """ + + def __init__(self, in_channels: int, out_channels: int, bin_sizes: Tuple, dim_reduction: bool, align_corners: bool): + super().__init__() + + self.bin_sizes = bin_sizes + + inter_channels = in_channels + if dim_reduction: + inter_channels = in_channels // len(bin_sizes) + + # we use dimension reduction after pooling mentioned in original implementation. + self.stages = nn.LayerList([self._make_stage(in_channels, inter_channels, size) for size in bin_sizes]) + + self.conv_bn_relu2 = ConvBNReLU( + in_channels=in_channels + inter_channels * len(bin_sizes), + out_channels=out_channels, + kernel_size=3, + padding=1) + + self.align_corners = align_corners + + def _make_stage(self, in_channels: int, out_channels: int, size: int): + """ + Create one pooling layer. + + In our implementation, we adopt the same dimension reduction as the original paper that might be + slightly different with other implementations. + + After pooling, the channels are reduced to 1/len(bin_sizes) immediately, while some other implementations + keep the channels to be same. + + Args: + in_channels (int): The number of intput channels to pyramid pooling module. + out_channels (int): The number of output channels to pyramid pooling module. + size (int): The out size of the pooled layer. + + Returns: + conv (Tensor): A tensor after Pyramid Pooling Module. + """ + + prior = nn.AdaptiveAvgPool2D(output_size=(size, size)) + conv = ConvBNReLU(in_channels=in_channels, out_channels=out_channels, kernel_size=1) + + return nn.Sequential(prior, conv) + + def forward(self, input: paddle.Tensor) -> paddle.Tensor: + cat_layers = [] + for stage in self.stages: + x = stage(input) + x = F.interpolate(x, paddle.shape(input)[2:], mode='bilinear', align_corners=self.align_corners) + cat_layers.append(x) + cat_layers = [input] + cat_layers[::-1] + cat = paddle.concat(cat_layers, axis=1) + out = self.conv_bn_relu2(cat) + + return out diff --git a/modules/image/semantic_segmentation/fastscnn_cityscapes/module.py b/modules/image/semantic_segmentation/fastscnn_cityscapes/module.py new file mode 100644 index 0000000000000000000000000000000000000000..88e805fdcf405ea080cc37ba01e456a8bcba2acd --- /dev/null +++ b/modules/image/semantic_segmentation/fastscnn_cityscapes/module.py @@ -0,0 +1,275 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import os +from typing import Callable, Union, Tuple + +import paddle.nn as nn +import paddle.nn.functional as F +import paddle +import numpy as np +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +import fastscnn_cityscapes.layers as layers + + +@moduleinfo( + name="fastscnn_cityscapes", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="fastscnn_cityscapes is a segmentation model.", + version="1.0.0", + meta=ImageSegmentationModule) +class FastSCNN(nn.Layer): + """ + The FastSCNN implementation based on PaddlePaddle. + As mentioned in the original paper, FastSCNN is a real-time segmentation algorithm (123.5fps) + even for high resolution images (1024x2048). + The original article refers to + Poudel, Rudra PK, et al. "Fast-scnn: Fast semantic segmentation network" + (https://arxiv.org/pdf/1902.04502.pdf). + Args: + num_classes (int): The unique number of target classes, default is 19. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None. + """ + + def __init__(self, num_classes: int = 19, align_corners: bool = False, pretrained: str = None): + + super(FastSCNN, self).__init__() + + self.learning_to_downsample = LearningToDownsample(32, 48, 64) + self.global_feature_extractor = GlobalFeatureExtractor( + in_channels=64, + block_channels=[64, 96, 128], + out_channels=128, + expansion=6, + num_blocks=[3, 3, 3], + align_corners=True) + self.feature_fusion = FeatureFusionModule(64, 128, 128, align_corners) + self.classifier = Classifier(128, num_classes) + self.align_corners = align_corners + self.transforms = T.Compose([T.Normalize()]) + + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'fastscnn_model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]: + return self.transforms(img) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + logit_list = [] + input_size = paddle.shape(x)[2:] + higher_res_features = self.learning_to_downsample(x) + x = self.global_feature_extractor(higher_res_features) + x = self.feature_fusion(higher_res_features, x) + logit = self.classifier(x) + logit = F.interpolate(logit, input_size, mode='bilinear', align_corners=self.align_corners) + logit_list.append(logit) + + return logit_list + + +class LearningToDownsample(nn.Layer): + """ + Learning to downsample module. + This module consists of three downsampling blocks (one conv and two separable conv) + Args: + dw_channels1 (int, optional): The input channels of the first sep conv. Default: 32. + dw_channels2 (int, optional): The input channels of the second sep conv. Default: 48. + out_channels (int, optional): The output channels of LearningToDownsample module. Default: 64. + """ + + def __init__(self, dw_channels1: int = 32, dw_channels2: int = 48, out_channels: int = 64): + super(LearningToDownsample, self).__init__() + + self.conv_bn_relu = layers.ConvBNReLU(in_channels=3, out_channels=dw_channels1, kernel_size=3, stride=2) + self.dsconv_bn_relu1 = layers.SeparableConvBNReLU( + in_channels=dw_channels1, out_channels=dw_channels2, kernel_size=3, stride=2, padding=1) + self.dsconv_bn_relu2 = layers.SeparableConvBNReLU( + in_channels=dw_channels2, out_channels=out_channels, kernel_size=3, stride=2, padding=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv_bn_relu(x) + x = self.dsconv_bn_relu1(x) + x = self.dsconv_bn_relu2(x) + return x + + +class GlobalFeatureExtractor(nn.Layer): + """ + Global feature extractor module. + This module consists of three InvertedBottleneck blocks (like inverted residual introduced by MobileNetV2) and + a PPModule (introduced by PSPNet). + Args: + in_channels (int): The number of input channels to the module. + block_channels (tuple): A tuple represents output channels of each bottleneck block. + out_channels (int): The number of output channels of the module. Default: + expansion (int): The expansion factor in bottleneck. + num_blocks (tuple): It indicates the repeat time of each bottleneck. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + """ + + def __init__(self, in_channels: int, block_channels: int, out_channels: int, expansion: int, num_blocks: Tuple[int], + align_corners: bool): + super(GlobalFeatureExtractor, self).__init__() + + self.bottleneck1 = self._make_layer(InvertedBottleneck, in_channels, block_channels[0], num_blocks[0], + expansion, 2) + self.bottleneck2 = self._make_layer(InvertedBottleneck, block_channels[0], block_channels[1], num_blocks[1], + expansion, 2) + self.bottleneck3 = self._make_layer(InvertedBottleneck, block_channels[1], block_channels[2], num_blocks[2], + expansion, 1) + + self.ppm = layers.PPModule( + block_channels[2], out_channels, bin_sizes=(1, 2, 3, 6), dim_reduction=True, align_corners=align_corners) + + def _make_layer(self, + block: Callable, + in_channels: int, + out_channels: int, + blocks: int, + expansion: int = 6, + stride: int = 1): + layers = [] + layers.append(block(in_channels, out_channels, expansion, stride)) + for _ in range(1, blocks): + layers.append(block(out_channels, out_channels, expansion, 1)) + return nn.Sequential(*layers) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.bottleneck1(x) + x = self.bottleneck2(x) + x = self.bottleneck3(x) + x = self.ppm(x) + return x + + +class InvertedBottleneck(nn.Layer): + """ + Single Inverted bottleneck implementation. + Args: + in_channels (int): The number of input channels to bottleneck block. + out_channels (int): The number of output channels of bottleneck block. + expansion (int, optional). The expansion factor in bottleneck. Default: 6. + stride (int, optional). The stride used in depth-wise conv. Defalt: 2. + """ + + def __init__(self, in_channels: int, out_channels: int, expansion: int = 6, stride: int = 2): + super().__init__() + + self.use_shortcut = stride == 1 and in_channels == out_channels + + expand_channels = in_channels * expansion + self.block = nn.Sequential( + # pw + layers.ConvBNReLU(in_channels=in_channels, out_channels=expand_channels, kernel_size=1, bias_attr=False), + # dw + layers.ConvBNReLU( + in_channels=expand_channels, + out_channels=expand_channels, + kernel_size=3, + stride=stride, + padding=1, + groups=expand_channels, + bias_attr=False), + # pw-linear + layers.ConvBN(in_channels=expand_channels, out_channels=out_channels, kernel_size=1, bias_attr=False)) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + out = self.block(x) + if self.use_shortcut: + out = x + out + return out + + +class FeatureFusionModule(nn.Layer): + """ + Feature Fusion Module Implementation. + This module fuses high-resolution feature and low-resolution feature. + Args: + high_in_channels (int): The channels of high-resolution feature (output of LearningToDownsample). + low_in_channels (int): The channels of low-resolution feature (output of GlobalFeatureExtractor). + out_channels (int): The output channels of this module. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + """ + + def __init__(self, high_in_channels: int, low_in_channels: int, out_channels: int, align_corners: bool): + super().__init__() + + # Only depth-wise conv + self.dwconv = layers.ConvBNReLU( + in_channels=low_in_channels, + out_channels=out_channels, + kernel_size=3, + padding=1, + groups=128, + bias_attr=False) + + self.conv_low_res = layers.ConvBN(out_channels, out_channels, 1) + self.conv_high_res = layers.ConvBN(high_in_channels, out_channels, 1) + self.align_corners = align_corners + + def forward(self, high_res_input: int, low_res_input: int) -> paddle.Tensor: + low_res_input = F.interpolate( + low_res_input, paddle.shape(high_res_input)[2:], mode='bilinear', align_corners=self.align_corners) + low_res_input = self.dwconv(low_res_input) + low_res_input = self.conv_low_res(low_res_input) + high_res_input = self.conv_high_res(high_res_input) + x = high_res_input + low_res_input + + return F.relu(x) + + +class Classifier(nn.Layer): + """ + The Classifier module implementation. + This module consists of two depth-wise conv and one conv. + Args: + input_channels (int): The input channels to this module. + num_classes (int): The unique number of target classes. + """ + + def __init__(self, input_channels: int, num_classes: int): + super().__init__() + + self.dsconv1 = layers.SeparableConvBNReLU( + in_channels=input_channels, out_channels=input_channels, kernel_size=3, padding=1) + + self.dsconv2 = layers.SeparableConvBNReLU( + in_channels=input_channels, out_channels=input_channels, kernel_size=3, padding=1) + + self.conv = nn.Conv2D(in_channels=input_channels, out_channels=num_classes, kernel_size=1) + + self.dropout = nn.Dropout(p=0.1) # dropout_prob + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.dsconv1(x) + x = self.dsconv2(x) + x = self.dropout(x) + x = self.conv(x) + return x diff --git a/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/README.md b/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/README.md new file mode 100644 index 0000000000000000000000000000000000000000..7cd0b8cc83f8024bf90f01dcb5f46d893ca18298 --- /dev/null +++ b/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/README.md @@ -0,0 +1,174 @@ +# PaddleHub 图像分割 + +## 模型预测 + + +若想使用我们提供的预训练模型进行预测,可使用如下脚本: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='fcn_hrnetw18_cityscapes') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + + +## 如何开始Fine-tune + +在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用fcn_hrnetw18_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。 + +## 代码步骤 + +使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。 + +### Step1: 定义数据预处理方式 +```python +from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + +transform = Compose([Resize(target_size=(512, 512)), Normalize()]) +``` + +`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + +### Step2: 下载数据集并使用 +```python +from paddlehub.datasets import OpticDiscSeg + +train_reader = OpticDiscSeg(transform, mode='train') + +``` +* `transform`: 数据预处理方式。 +* `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + +数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + +### Step3: 加载预训练模型 + +```python +model = hub.Module(name='fcn_hrnetw18_cityscapes', num_classes=2, pretrained=None) +``` +* `name`: 选择预训练模型的名字。 +* `num_classes`: 分割模型的类别数目。 +* `pretrained`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + +### Step4: 选择优化策略和运行配置 + +```python +scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) +optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) +trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True) +``` + +#### 优化策略 + +Paddle2.0rc提供了多种优化器选择,如`SGD`, `Adam`, `Adamax`等, 其中`Adam`: + +* `learning_rate`: 全局学习率。 +* `parameters`: 待优化模型参数。 + +#### 运行配置 +`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数: + +* `model`: 被优化模型; +* `optimizer`: 优化器选择; +* `use_gpu`: 是否使用gpu,默认为False; +* `use_vdl`: 是否使用vdl可视化训练过程; +* `checkpoint_dir`: 保存模型参数的地址; +* `compare_metrics`: 保存最优模型的衡量指标; + +`trainer.train` 主要控制具体的训练过程,包含以下可控制的参数: + +* `train_dataset`: 训练时所用的数据集; +* `epochs`: 训练轮数; +* `batch_size`: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size; +* `num_workers`: works的数量,默认为0; +* `eval_dataset`: 验证集; +* `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。 +* `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。 + +## 模型预测 + +当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。 + +我们使用该模型来进行预测。predict.py脚本如下: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='fcn_hrnetw18_cityscapes', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + +参数配置正确后,请执行脚本`python predict.py`。 +**Args** +* `images`:原始图像路径或BGR格式图片; +* `visualization`: 是否可视化,默认为True; +* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + +**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 服务部署 + +PaddleHub Serving可以部署一个在线图像分割服务。 + +### Step1: 启动PaddleHub Serving + +运行启动命令: + +```shell +$ hub serving start -m fcn_hrnetw18_cityscapes +``` + +这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + +**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +### Step2: 发送预测请求 + +配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + +```python +import requests +import json +import cv2 +import base64 + +import numpy as np + + +def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + +# 发送HTTP请求 +org_im = cv2.imread('/PATH/TO/IMAGE') +data = {'images':[cv2_to_base64(org_im)]} +headers = {"Content-type": "application/json"} +url = "http://127.0.0.1:8866/predict/fcn_hrnetw18_cityscapes" +r = requests.post(url=url, headers=headers, data=json.dumps(data)) +mask = base64_to_cv2(r.json()["results"][0]) +``` + +### 查看代码 + +https://github.com/PaddlePaddle/PaddleSeg + +### 依赖 + +paddlepaddle >= 2.0.0 + +paddlehub >= 2.0.0 diff --git a/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/hrnet.py b/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/hrnet.py new file mode 100644 index 0000000000000000000000000000000000000000..3e8422ad158de9b13d4eb4771f1a1736cc3b571e --- /dev/null +++ b/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/hrnet.py @@ -0,0 +1,531 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +from typing import Tuple + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +import fcn_hrnetw18_cityscapes.layers as L + + +class HRNet_W18(nn.Layer): + """ + The HRNet implementation based on PaddlePaddle. + + The original article refers to + Jingdong Wang, et, al. "HRNet:Deep High-Resolution Representation Learning for Visual Recognition" + (https://arxiv.org/pdf/1908.07919.pdf). + + Args: + stage1_num_modules (int, optional): Number of modules for stage1. Default 1. + stage1_num_blocks (list, optional): Number of blocks per module for stage1. Default (4). + stage1_num_channels (list, optional): Number of channels per branch for stage1. Default (64). + stage2_num_modules (int, optional): Number of modules for stage2. Default 1. + stage2_num_blocks (list, optional): Number of blocks per module for stage2. Default (4, 4). + stage2_num_channels (list, optional): Number of channels per branch for stage2. Default (18, 36). + stage3_num_modules (int, optional): Number of modules for stage3. Default 4. + stage3_num_blocks (list, optional): Number of blocks per module for stage3. Default (4, 4, 4). + stage3_num_channels (list, optional): Number of channels per branch for stage3. Default [18, 36, 72). + stage4_num_modules (int, optional): Number of modules for stage4. Default 3. + stage4_num_blocks (list, optional): Number of blocks per module for stage4. Default (4, 4, 4, 4). + stage4_num_channels (list, optional): Number of channels per branch for stage4. Default (18, 36, 72. 144). + has_se (bool, optional): Whether to use Squeeze-and-Excitation module. Default False. + align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even, + e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + """ + + def __init__(self, + stage1_num_modules: int = 1, + stage1_num_blocks: Tuple[int] = (4, ), + stage1_num_channels: Tuple[int] = (64, ), + stage2_num_modules: int = 1, + stage2_num_blocks: Tuple[int] = (4, 4), + stage2_num_channels: Tuple[int] = (18, 36), + stage3_num_modules: int = 4, + stage3_num_blocks: Tuple[int] = (4, 4, 4), + stage3_num_channels: Tuple[int] = (18, 36, 72), + stage4_num_modules: int = 3, + stage4_num_blocks: Tuple[int] = (4, 4, 4, 4), + stage4_num_channels: Tuple[int] = (18, 36, 72, 144), + has_se: bool = False, + align_corners: bool = False): + super(HRNet_W18, self).__init__() + + self.stage1_num_modules = stage1_num_modules + self.stage1_num_blocks = stage1_num_blocks + self.stage1_num_channels = stage1_num_channels + self.stage2_num_modules = stage2_num_modules + self.stage2_num_blocks = stage2_num_blocks + self.stage2_num_channels = stage2_num_channels + self.stage3_num_modules = stage3_num_modules + self.stage3_num_blocks = stage3_num_blocks + self.stage3_num_channels = stage3_num_channels + self.stage4_num_modules = stage4_num_modules + self.stage4_num_blocks = stage4_num_blocks + self.stage4_num_channels = stage4_num_channels + self.has_se = has_se + self.align_corners = align_corners + self.feat_channels = [sum(stage4_num_channels)] + + self.conv_layer1_1 = L.ConvBNReLU( + in_channels=3, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False) + + self.conv_layer1_2 = L.ConvBNReLU( + in_channels=64, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False) + + self.la1 = Layer1( + num_channels=64, + num_blocks=self.stage1_num_blocks[0], + num_filters=self.stage1_num_channels[0], + has_se=has_se, + name="layer2") + + self.tr1 = TransitionLayer( + in_channels=[self.stage1_num_channels[0] * 4], out_channels=self.stage2_num_channels, name="tr1") + + self.st2 = Stage( + num_channels=self.stage2_num_channels, + num_modules=self.stage2_num_modules, + num_blocks=self.stage2_num_blocks, + num_filters=self.stage2_num_channels, + has_se=self.has_se, + name="st2", + align_corners=align_corners) + + self.tr2 = TransitionLayer( + in_channels=self.stage2_num_channels, out_channels=self.stage3_num_channels, name="tr2") + self.st3 = Stage( + num_channels=self.stage3_num_channels, + num_modules=self.stage3_num_modules, + num_blocks=self.stage3_num_blocks, + num_filters=self.stage3_num_channels, + has_se=self.has_se, + name="st3", + align_corners=align_corners) + + self.tr3 = TransitionLayer( + in_channels=self.stage3_num_channels, out_channels=self.stage4_num_channels, name="tr3") + self.st4 = Stage( + num_channels=self.stage4_num_channels, + num_modules=self.stage4_num_modules, + num_blocks=self.stage4_num_blocks, + num_filters=self.stage4_num_channels, + has_se=self.has_se, + name="st4", + align_corners=align_corners) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + conv1 = self.conv_layer1_1(x) + conv2 = self.conv_layer1_2(conv1) + + la1 = self.la1(conv2) + + tr1 = self.tr1([la1]) + st2 = self.st2(tr1) + + tr2 = self.tr2(st2) + st3 = self.st3(tr2) + + tr3 = self.tr3(st3) + st4 = self.st4(tr3) + + x0_h, x0_w = st4[0].shape[2:] + x1 = F.interpolate(st4[1], (x0_h, x0_w), mode='bilinear', align_corners=self.align_corners) + x2 = F.interpolate(st4[2], (x0_h, x0_w), mode='bilinear', align_corners=self.align_corners) + x3 = F.interpolate(st4[3], (x0_h, x0_w), mode='bilinear', align_corners=self.align_corners) + x = paddle.concat([st4[0], x1, x2, x3], axis=1) + + return [x] + + +class Layer1(nn.Layer): + def __init__(self, num_channels: int, num_filters: int, num_blocks: int, has_se: bool = False, name: str = None): + super(Layer1, self).__init__() + + self.bottleneck_block_list = [] + + for i in range(num_blocks): + bottleneck_block = self.add_sublayer( + "bb_{}_{}".format(name, i + 1), + BottleneckBlock( + num_channels=num_channels if i == 0 else num_filters * 4, + num_filters=num_filters, + has_se=has_se, + stride=1, + downsample=True if i == 0 else False, + name=name + '_' + str(i + 1))) + self.bottleneck_block_list.append(bottleneck_block) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + conv = x + for block_func in self.bottleneck_block_list: + conv = block_func(conv) + return conv + + +class TransitionLayer(nn.Layer): + def __init__(self, in_channels: int, out_channels: int, name=None): + super(TransitionLayer, self).__init__() + + num_in = len(in_channels) + num_out = len(out_channels) + self.conv_bn_func_list = [] + for i in range(num_out): + residual = None + if i < num_in: + if in_channels[i] != out_channels[i]: + residual = self.add_sublayer( + "transition_{}_layer_{}".format(name, i + 1), + L.ConvBNReLU( + in_channels=in_channels[i], + out_channels=out_channels[i], + kernel_size=3, + padding='same', + bias_attr=False)) + else: + residual = self.add_sublayer( + "transition_{}_layer_{}".format(name, i + 1), + L.ConvBNReLU( + in_channels=in_channels[-1], + out_channels=out_channels[i], + kernel_size=3, + stride=2, + padding='same', + bias_attr=False)) + self.conv_bn_func_list.append(residual) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outs = [] + for idx, conv_bn_func in enumerate(self.conv_bn_func_list): + if conv_bn_func is None: + outs.append(x[idx]) + else: + if idx < len(x): + outs.append(conv_bn_func(x[idx])) + else: + outs.append(conv_bn_func(x[-1])) + return outs + + +class Branches(nn.Layer): + def __init__(self, num_blocks: int, in_channels: int, out_channels: int, has_se: bool = False, name: str = None): + super(Branches, self).__init__() + + self.basic_block_list = [] + + for i in range(len(out_channels)): + self.basic_block_list.append([]) + for j in range(num_blocks[i]): + in_ch = in_channels[i] if j == 0 else out_channels[i] + basic_block_func = self.add_sublayer( + "bb_{}_branch_layer_{}_{}".format(name, i + 1, j + 1), + BasicBlock( + num_channels=in_ch, + num_filters=out_channels[i], + has_se=has_se, + name=name + '_branch_layer_' + str(i + 1) + '_' + str(j + 1))) + self.basic_block_list[i].append(basic_block_func) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outs = [] + for idx, input in enumerate(x): + conv = input + for basic_block_func in self.basic_block_list[idx]: + conv = basic_block_func(conv) + outs.append(conv) + return outs + + +class BottleneckBlock(nn.Layer): + def __init__(self, + num_channels: int, + num_filters: int, + has_se: bool, + stride: int = 1, + downsample: bool = False, + name: str = None): + super(BottleneckBlock, self).__init__() + + self.has_se = has_se + self.downsample = downsample + + self.conv1 = L.ConvBNReLU( + in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False) + + self.conv2 = L.ConvBNReLU( + in_channels=num_filters, + out_channels=num_filters, + kernel_size=3, + stride=stride, + padding='same', + bias_attr=False) + + self.conv3 = L.ConvBN( + in_channels=num_filters, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False) + + if self.downsample: + self.conv_down = L.ConvBN( + in_channels=num_channels, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False) + + if self.has_se: + self.se = SELayer( + num_channels=num_filters * 4, num_filters=num_filters * 4, reduction_ratio=16, name=name + '_fc') + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + residual = x + conv1 = self.conv1(x) + conv2 = self.conv2(conv1) + conv3 = self.conv3(conv2) + + if self.downsample: + residual = self.conv_down(x) + + if self.has_se: + conv3 = self.se(conv3) + + y = conv3 + residual + y = F.relu(y) + return y + + +class BasicBlock(nn.Layer): + def __init__(self, + num_channels: int, + num_filters: int, + stride: int = 1, + has_se: bool = False, + downsample: bool = False, + name: str = None): + super(BasicBlock, self).__init__() + + self.has_se = has_se + self.downsample = downsample + + self.conv1 = L.ConvBNReLU( + in_channels=num_channels, + out_channels=num_filters, + kernel_size=3, + stride=stride, + padding='same', + bias_attr=False) + self.conv2 = L.ConvBN( + in_channels=num_filters, out_channels=num_filters, kernel_size=3, padding='same', bias_attr=False) + + if self.downsample: + self.conv_down = L.ConvBNReLU( + in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False) + + if self.has_se: + self.se = SELayer(num_channels=num_filters, num_filters=num_filters, reduction_ratio=16, name=name + '_fc') + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + residual = x + conv1 = self.conv1(x) + conv2 = self.conv2(conv1) + + if self.downsample: + residual = self.conv_down(x) + + if self.has_se: + conv2 = self.se(conv2) + + y = conv2 + residual + y = F.relu(y) + return y + + +class SELayer(nn.Layer): + def __init__(self, num_channels: int, num_filters: int, reduction_ratio: int, name: str = None): + super(SELayer, self).__init__() + + self.pool2d_gap = nn.AdaptiveAvgPool2D(1) + + self._num_channels = num_channels + + med_ch = int(num_channels / reduction_ratio) + stdv = 1.0 / math.sqrt(num_channels * 1.0) + self.squeeze = nn.Linear( + num_channels, med_ch, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv))) + + stdv = 1.0 / math.sqrt(med_ch * 1.0) + self.excitation = nn.Linear( + med_ch, num_filters, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv))) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + pool = self.pool2d_gap(x) + pool = paddle.reshape(pool, shape=[-1, self._num_channels]) + squeeze = self.squeeze(pool) + squeeze = F.relu(squeeze) + excitation = self.excitation(squeeze) + excitation = F.sigmoid(excitation) + excitation = paddle.reshape(excitation, shape=[-1, self._num_channels, 1, 1]) + out = x * excitation + return out + + +class Stage(nn.Layer): + def __init__(self, + num_channels: int, + num_modules: int, + num_blocks: int, + num_filters: int, + has_se: bool = False, + multi_scale_output: bool = True, + name: str = None, + align_corners: bool = False): + super(Stage, self).__init__() + + self._num_modules = num_modules + + self.stage_func_list = [] + for i in range(num_modules): + if i == num_modules - 1 and not multi_scale_output: + stage_func = self.add_sublayer( + "stage_{}_{}".format(name, i + 1), + HighResolutionModule( + num_channels=num_channels, + num_blocks=num_blocks, + num_filters=num_filters, + has_se=has_se, + multi_scale_output=False, + name=name + '_' + str(i + 1), + align_corners=align_corners)) + else: + stage_func = self.add_sublayer( + "stage_{}_{}".format(name, i + 1), + HighResolutionModule( + num_channels=num_channels, + num_blocks=num_blocks, + num_filters=num_filters, + has_se=has_se, + name=name + '_' + str(i + 1), + align_corners=align_corners)) + + self.stage_func_list.append(stage_func) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + out = x + for idx in range(self._num_modules): + out = self.stage_func_list[idx](out) + return out + + +class HighResolutionModule(nn.Layer): + def __init__(self, + num_channels: int, + num_blocks: int, + num_filters: int, + has_se: bool = False, + multi_scale_output: bool = True, + name: str = None, + align_corners: str = False): + super(HighResolutionModule, self).__init__() + + self.branches_func = Branches( + num_blocks=num_blocks, in_channels=num_channels, out_channels=num_filters, has_se=has_se, name=name) + + self.fuse_func = FuseLayers( + in_channels=num_filters, + out_channels=num_filters, + multi_scale_output=multi_scale_output, + name=name, + align_corners=align_corners) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + out = self.branches_func(x) + out = self.fuse_func(out) + return out + + +class FuseLayers(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + multi_scale_output: bool = True, + name: str = None, + align_corners: bool = False): + super(FuseLayers, self).__init__() + + self._actual_ch = len(in_channels) if multi_scale_output else 1 + self._in_channels = in_channels + self.align_corners = align_corners + + self.residual_func_list = [] + for i in range(self._actual_ch): + for j in range(len(in_channels)): + if j > i: + residual_func = self.add_sublayer( + "residual_{}_layer_{}_{}".format(name, i + 1, j + 1), + L.ConvBN( + in_channels=in_channels[j], + out_channels=out_channels[i], + kernel_size=1, + padding='same', + bias_attr=False)) + self.residual_func_list.append(residual_func) + elif j < i: + pre_num_filters = in_channels[j] + for k in range(i - j): + if k == i - j - 1: + residual_func = self.add_sublayer( + "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1), + L.ConvBN( + in_channels=pre_num_filters, + out_channels=out_channels[i], + kernel_size=3, + stride=2, + padding='same', + bias_attr=False)) + pre_num_filters = out_channels[i] + else: + residual_func = self.add_sublayer( + "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1), + L.ConvBNReLU( + in_channels=pre_num_filters, + out_channels=out_channels[j], + kernel_size=3, + stride=2, + padding='same', + bias_attr=False)) + pre_num_filters = out_channels[j] + self.residual_func_list.append(residual_func) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outs = [] + residual_func_idx = 0 + for i in range(self._actual_ch): + residual = x[i] + residual_shape = residual.shape[-2:] + for j in range(len(self._in_channels)): + if j > i: + y = self.residual_func_list[residual_func_idx](x[j]) + residual_func_idx += 1 + + y = F.interpolate(y, residual_shape, mode='bilinear', align_corners=self.align_corners) + residual = residual + y + elif j < i: + y = x[j] + for k in range(i - j): + y = self.residual_func_list[residual_func_idx](y) + residual_func_idx += 1 + + residual = residual + y + + residual = F.relu(residual) + outs.append(residual) + + return outs diff --git a/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/layers.py b/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..8758f54f9a840ae49fd6e424b98bfe1dd61e13ec --- /dev/null +++ b/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/layers.py @@ -0,0 +1,296 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from typing import Tuple + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn import Conv2D, AvgPool2D + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu': + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class ConvBNLayer(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + stride: int = 1, + dilation: int = 1, + groups: int = 1, + is_vd_mode: bool = False, + act: str = None, + name: str = None): + super(ConvBNLayer, self).__init__() + + self.is_vd_mode = is_vd_mode + self._pool2d_avg = AvgPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True) + self._conv = Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2 if dilation == 1 else 0, + dilation=dilation, + groups=groups, + bias_attr=False) + + self._batch_norm = SyncBatchNorm(out_channels) + self._act_op = Activation(act=act) + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + if self.is_vd_mode: + inputs = self._pool2d_avg(inputs) + y = self._conv(inputs) + y = self._batch_norm(y) + y = self._act_op(y) + + return y + + +class BottleneckBlock(nn.Layer): + """Residual bottleneck block""" + + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + shortcut: bool = True, + if_first: bool = False, + dilation: int = 1, + name: str = None): + super(BottleneckBlock, self).__init__() + + self.conv0 = ConvBNLayer( + in_channels=in_channels, out_channels=out_channels, kernel_size=1, act='relu', name=name + "_branch2a") + + self.dilation = dilation + + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + act='relu', + dilation=dilation, + name=name + "_branch2b") + self.conv2 = ConvBNLayer( + in_channels=out_channels, out_channels=out_channels * 4, kernel_size=1, act=None, name=name + "_branch2c") + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels * 4, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + name=name + "_branch1") + + self.shortcut = shortcut + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + if self.dilation > 1: + padding = self.dilation + y = F.pad(y, [padding, padding, padding, padding]) + + conv1 = self.conv1(y) + conv2 = self.conv2(conv1) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + + y = paddle.add(x=short, y=conv2) + y = F.relu(y) + return y + + +class SeparableConvBNReLU(nn.Layer): + """Depthwise Separable Convolution.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict): + super(SeparableConvBNReLU, self).__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict): + super(ConvBN, self).__init__() + self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict): + super(ConvBNReLU, self).__init__() + + self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + Returns: + A callable object of Activation. + Raises: + KeyError: When parameter `act` is not in the optional range. + Examples: + from paddleseg.models.common.activation import Activation + relu = Activation("relu") + print(relu) + # + sigmoid = Activation("sigmoid") + print(sigmoid) + # + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = nn.layer.activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("nn.layer.activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + if self._act is not None: + return self.act_func(x) + else: + return x + + +class ASPPModule(nn.Layer): + """ + Atrous Spatial Pyramid Pooling. + + Args: + aspp_ratios (tuple): The dilation rate using in ASSP module. + in_channels (int): The number of input channels. + out_channels (int): The number of output channels. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False. + image_pooling (bool, optional): If augmented with image-level features. Default: False + """ + + def __init__(self, + aspp_ratios: Tuple[int], + in_channels: int, + out_channels: int, + align_corners: bool, + use_sep_conv: bool = False, + image_pooling: bool = False): + super().__init__() + + self.align_corners = align_corners + self.aspp_blocks = nn.LayerList() + + for ratio in aspp_ratios: + if use_sep_conv and ratio > 1: + conv_func = SeparableConvBNReLU + else: + conv_func = ConvBNReLU + + block = conv_func( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1 if ratio == 1 else 3, + dilation=ratio, + padding=0 if ratio == 1 else ratio) + self.aspp_blocks.append(block) + + out_size = len(self.aspp_blocks) + + if image_pooling: + self.global_avg_pool = nn.Sequential( + nn.AdaptiveAvgPool2D(output_size=(1, 1)), + ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False)) + out_size += 1 + self.image_pooling = image_pooling + + self.conv_bn_relu = ConvBNReLU(in_channels=out_channels * out_size, out_channels=out_channels, kernel_size=1) + + self.dropout = nn.Dropout(p=0.1) # drop rate + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outputs = [] + for block in self.aspp_blocks: + y = block(x) + y = F.interpolate(y, x.shape[2:], mode='bilinear', align_corners=self.align_corners) + outputs.append(y) + + if self.image_pooling: + img_avg = self.global_avg_pool(x) + img_avg = F.interpolate(img_avg, x.shape[2:], mode='bilinear', align_corners=self.align_corners) + outputs.append(img_avg) + + x = paddle.concat(outputs, axis=1) + x = self.conv_bn_relu(x) + x = self.dropout(x) + + return x diff --git a/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/module.py b/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/module.py new file mode 100644 index 0000000000000000000000000000000000000000..436207fc12954e43bbccf9a626a6cf9783a88db0 --- /dev/null +++ b/modules/image/semantic_segmentation/fcn_hrnetw18_cityscapes/module.py @@ -0,0 +1,133 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from typing import Union, List, Tuple + +import paddle +from paddle import nn +import paddle.nn.functional as F +import numpy as np +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +from fcn_hrnetw18_cityscapes.hrnet import HRNet_W18 +import fcn_hrnetw18_cityscapes.layers as layers + + +@moduleinfo( + name="fcn_hrnetw18_cityscapes", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="Fcn_hrnetw18 is a segmentation model.", + version="1.0.0", + meta=ImageSegmentationModule) +class FCN(nn.Layer): + """ + A simple implementation for FCN based on PaddlePaddle. + + The original article refers to + Evan Shelhamer, et, al. "Fully Convolutional Networks for Semantic Segmentation" + (https://arxiv.org/abs/1411.4038). + + Args: + num_classes (int): The unique number of target classes. + backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone. + Default: (-1, ). + channels (int, optional): The channels between conv layer and the last layer of FCNHead. + If None, it will be the number of channels of input features. Default: None. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None + """ + + def __init__(self, + num_classes: int = 19, + backbone_indices: Tuple[int] = (-1, ), + channels: int = None, + align_corners: bool = False, + pretrained: str = None): + super(FCN, self).__init__() + + self.backbone = HRNet_W18() + backbone_channels = [self.backbone.feat_channels[i] for i in backbone_indices] + + self.head = FCNHead(num_classes, backbone_indices, backbone_channels, channels) + + self.align_corners = align_corners + self.transforms = T.Compose([T.Normalize()]) + + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]: + return self.transforms(img) + + def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]: + feat_list = self.backbone(x) + logit_list = self.head(feat_list) + return [ + F.interpolate(logit, paddle.shape(x)[2:], mode='bilinear', align_corners=self.align_corners) + for logit in logit_list + ] + + +class FCNHead(nn.Layer): + """ + A simple implementation for FCNHead based on PaddlePaddle + + Args: + num_classes (int): The unique number of target classes. + backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone. + Default: (-1, ). + backbone_channels (tuple): The values of backbone channels. + Default: (270, ). + channels (int, optional): The channels between conv layer and the last layer of FCNHead. + If None, it will be the number of channels of input features. Default: None. + pretrained (str, optional): The path of pretrained model. Default: None + """ + + def __init__(self, + num_classes: int, + backbone_indices: Tuple[int] = (-1, ), + backbone_channels: Tuple[int] = (270, ), + channels: int = None): + super(FCNHead, self).__init__() + + self.num_classes = num_classes + self.backbone_indices = backbone_indices + if channels is None: + channels = backbone_channels[0] + + self.conv_1 = layers.ConvBNReLU( + in_channels=backbone_channels[0], out_channels=channels, kernel_size=1, padding='same', stride=1) + self.cls = nn.Conv2D(in_channels=channels, out_channels=self.num_classes, kernel_size=1, stride=1, padding=0) + + def forward(self, feat_list: nn.Layer) -> List[paddle.Tensor]: + logit_list = [] + x = feat_list[self.backbone_indices[0]] + x = self.conv_1(x) + logit = self.cls(x) + logit_list.append(logit) + return logit_list diff --git a/modules/image/semantic_segmentation/fcn_hrnetw18_voc/README.md b/modules/image/semantic_segmentation/fcn_hrnetw18_voc/README.md new file mode 100644 index 0000000000000000000000000000000000000000..251f9480dd49c5e6632e2b3814face5147c258bc --- /dev/null +++ b/modules/image/semantic_segmentation/fcn_hrnetw18_voc/README.md @@ -0,0 +1,175 @@ +# PaddleHub 图像分割 + + +## 模型预测 + + +若想使用我们提供的预训练模型进行预测,可使用如下脚本: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='fcn_hrnetw18_voc') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + + +## 如何开始Fine-tune + +在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用fcn_hrnetw18_voc模型对OpticDiscSeg等数据集进行Fine-tune。 + +## 代码步骤 + +使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。 + +### Step1: 定义数据预处理方式 +```python +from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + +transform = Compose([Resize(target_size=(512, 512)), Normalize()]) +``` + +`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + +### Step2: 下载数据集并使用 +```python +from paddlehub.datasets import OpticDiscSeg + +train_reader = OpticDiscSeg(transform, mode='train') + +``` +* `transform`: 数据预处理方式。 +* `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + +数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + +### Step3: 加载预训练模型 + +```python +model = hub.Module(name='fcn_hrnetw18_voc', num_classes=2, pretrained=None) +``` +* `name`: 选择预训练模型的名字。 +* `num_classes`: 分割模型的类别数目。 +* `pretrained`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + +### Step4: 选择优化策略和运行配置 + +```python +scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) +optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) +trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True) +``` + +#### 优化策略 + +Paddle2.0rc提供了多种优化器选择,如`SGD`, `Adam`, `Adamax`等,其中`Adam`: + +* `learning_rate`: 全局学习率。 +* `parameters`: 待优化模型参数。 + +#### 运行配置 +`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数: + +* `model`: 被优化模型; +* `optimizer`: 优化器选择; +* `use_gpu`: 是否使用gpu,默认为False; +* `use_vdl`: 是否使用vdl可视化训练过程; +* `checkpoint_dir`: 保存模型参数的地址; +* `compare_metrics`: 保存最优模型的衡量指标; + +`trainer.train` 主要控制具体的训练过程,包含以下可控制的参数: + +* `train_dataset`: 训练时所用的数据集; +* `epochs`: 训练轮数; +* `batch_size`: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size; +* `num_workers`: works的数量,默认为0; +* `eval_dataset`: 验证集; +* `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。 +* `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。 + +## 模型预测 + +当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。 + +我们使用该模型来进行预测。predict.py脚本如下: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='fcn_hrnetw18_voc', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + +参数配置正确后,请执行脚本`python predict.py`。 +**Args** +* `images`:原始图像路径或BGR格式图片; +* `visualization`: 是否可视化,默认为True; +* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + +**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 服务部署 + +PaddleHub Serving可以部署一个在线图像分割服务。 + +### Step1: 启动PaddleHub Serving + +运行启动命令: + +```shell +$ hub serving start -m fcn_hrnetw18_voc +``` + +这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + +**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +### Step2: 发送预测请求 + +配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + +```python +import requests +import json +import cv2 +import base64 + +import numpy as np + + +def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + +# 发送HTTP请求 +org_im = cv2.imread('/PATH/TO/IMAGE') +data = {'images':[cv2_to_base64(org_im)]} +headers = {"Content-type": "application/json"} +url = "http://127.0.0.1:8866/predict/fcn_hrnetw18_voc" +r = requests.post(url=url, headers=headers, data=json.dumps(data)) +mask = base64_to_cv2(r.json()["results"][0]) +``` + +### 查看代码 + +https://github.com/PaddlePaddle/PaddleSeg + +### 依赖 + +paddlepaddle >= 2.0.0 + +paddlehub >= 2.0.0 diff --git a/modules/image/semantic_segmentation/fcn_hrnetw18_voc/hrnet.py b/modules/image/semantic_segmentation/fcn_hrnetw18_voc/hrnet.py new file mode 100644 index 0000000000000000000000000000000000000000..0766871d0f6dd82cc29aae13b7e01d2e377124a9 --- /dev/null +++ b/modules/image/semantic_segmentation/fcn_hrnetw18_voc/hrnet.py @@ -0,0 +1,531 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +from typing import Tuple + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +import fcn_hrnetw18_voc.layers as L + + +class HRNet_W18(nn.Layer): + """ + The HRNet implementation based on PaddlePaddle. + + The original article refers to + Jingdong Wang, et, al. "HRNet:Deep High-Resolution Representation Learning for Visual Recognition" + (https://arxiv.org/pdf/1908.07919.pdf). + + Args: + stage1_num_modules (int, optional): Number of modules for stage1. Default 1. + stage1_num_blocks (list, optional): Number of blocks per module for stage1. Default (4). + stage1_num_channels (list, optional): Number of channels per branch for stage1. Default (64). + stage2_num_modules (int, optional): Number of modules for stage2. Default 1. + stage2_num_blocks (list, optional): Number of blocks per module for stage2. Default (4, 4). + stage2_num_channels (list, optional): Number of channels per branch for stage2. Default (18, 36). + stage3_num_modules (int, optional): Number of modules for stage3. Default 4. + stage3_num_blocks (list, optional): Number of blocks per module for stage3. Default (4, 4, 4). + stage3_num_channels (list, optional): Number of channels per branch for stage3. Default [18, 36, 72). + stage4_num_modules (int, optional): Number of modules for stage4. Default 3. + stage4_num_blocks (list, optional): Number of blocks per module for stage4. Default (4, 4, 4, 4). + stage4_num_channels (list, optional): Number of channels per branch for stage4. Default (18, 36, 72. 144). + has_se (bool, optional): Whether to use Squeeze-and-Excitation module. Default False. + align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even, + e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + """ + + def __init__(self, + stage1_num_modules: int = 1, + stage1_num_blocks: Tuple[int] = (4, ), + stage1_num_channels: Tuple[int] = (64, ), + stage2_num_modules: int = 1, + stage2_num_blocks: Tuple[int] = (4, 4), + stage2_num_channels: Tuple[int] = (18, 36), + stage3_num_modules: int = 4, + stage3_num_blocks: Tuple[int] = (4, 4, 4), + stage3_num_channels: Tuple[int] = (18, 36, 72), + stage4_num_modules: int = 3, + stage4_num_blocks: Tuple[int] = (4, 4, 4, 4), + stage4_num_channels: Tuple[int] = (18, 36, 72, 144), + has_se: bool = False, + align_corners: bool = False): + super(HRNet_W18, self).__init__() + + self.stage1_num_modules = stage1_num_modules + self.stage1_num_blocks = stage1_num_blocks + self.stage1_num_channels = stage1_num_channels + self.stage2_num_modules = stage2_num_modules + self.stage2_num_blocks = stage2_num_blocks + self.stage2_num_channels = stage2_num_channels + self.stage3_num_modules = stage3_num_modules + self.stage3_num_blocks = stage3_num_blocks + self.stage3_num_channels = stage3_num_channels + self.stage4_num_modules = stage4_num_modules + self.stage4_num_blocks = stage4_num_blocks + self.stage4_num_channels = stage4_num_channels + self.has_se = has_se + self.align_corners = align_corners + self.feat_channels = [sum(stage4_num_channels)] + + self.conv_layer1_1 = L.ConvBNReLU( + in_channels=3, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False) + + self.conv_layer1_2 = L.ConvBNReLU( + in_channels=64, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False) + + self.la1 = Layer1( + num_channels=64, + num_blocks=self.stage1_num_blocks[0], + num_filters=self.stage1_num_channels[0], + has_se=has_se, + name="layer2") + + self.tr1 = TransitionLayer( + in_channels=[self.stage1_num_channels[0] * 4], out_channels=self.stage2_num_channels, name="tr1") + + self.st2 = Stage( + num_channels=self.stage2_num_channels, + num_modules=self.stage2_num_modules, + num_blocks=self.stage2_num_blocks, + num_filters=self.stage2_num_channels, + has_se=self.has_se, + name="st2", + align_corners=align_corners) + + self.tr2 = TransitionLayer( + in_channels=self.stage2_num_channels, out_channels=self.stage3_num_channels, name="tr2") + self.st3 = Stage( + num_channels=self.stage3_num_channels, + num_modules=self.stage3_num_modules, + num_blocks=self.stage3_num_blocks, + num_filters=self.stage3_num_channels, + has_se=self.has_se, + name="st3", + align_corners=align_corners) + + self.tr3 = TransitionLayer( + in_channels=self.stage3_num_channels, out_channels=self.stage4_num_channels, name="tr3") + self.st4 = Stage( + num_channels=self.stage4_num_channels, + num_modules=self.stage4_num_modules, + num_blocks=self.stage4_num_blocks, + num_filters=self.stage4_num_channels, + has_se=self.has_se, + name="st4", + align_corners=align_corners) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + conv1 = self.conv_layer1_1(x) + conv2 = self.conv_layer1_2(conv1) + + la1 = self.la1(conv2) + + tr1 = self.tr1([la1]) + st2 = self.st2(tr1) + + tr2 = self.tr2(st2) + st3 = self.st3(tr2) + + tr3 = self.tr3(st3) + st4 = self.st4(tr3) + + x0_h, x0_w = st4[0].shape[2:] + x1 = F.interpolate(st4[1], (x0_h, x0_w), mode='bilinear', align_corners=self.align_corners) + x2 = F.interpolate(st4[2], (x0_h, x0_w), mode='bilinear', align_corners=self.align_corners) + x3 = F.interpolate(st4[3], (x0_h, x0_w), mode='bilinear', align_corners=self.align_corners) + x = paddle.concat([st4[0], x1, x2, x3], axis=1) + + return [x] + + +class Layer1(nn.Layer): + def __init__(self, num_channels: int, num_filters: int, num_blocks: int, has_se: bool = False, name: str = None): + super(Layer1, self).__init__() + + self.bottleneck_block_list = [] + + for i in range(num_blocks): + bottleneck_block = self.add_sublayer( + "bb_{}_{}".format(name, i + 1), + BottleneckBlock( + num_channels=num_channels if i == 0 else num_filters * 4, + num_filters=num_filters, + has_se=has_se, + stride=1, + downsample=True if i == 0 else False, + name=name + '_' + str(i + 1))) + self.bottleneck_block_list.append(bottleneck_block) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + conv = x + for block_func in self.bottleneck_block_list: + conv = block_func(conv) + return conv + + +class TransitionLayer(nn.Layer): + def __init__(self, in_channels: int, out_channels: int, name=None): + super(TransitionLayer, self).__init__() + + num_in = len(in_channels) + num_out = len(out_channels) + self.conv_bn_func_list = [] + for i in range(num_out): + residual = None + if i < num_in: + if in_channels[i] != out_channels[i]: + residual = self.add_sublayer( + "transition_{}_layer_{}".format(name, i + 1), + L.ConvBNReLU( + in_channels=in_channels[i], + out_channels=out_channels[i], + kernel_size=3, + padding='same', + bias_attr=False)) + else: + residual = self.add_sublayer( + "transition_{}_layer_{}".format(name, i + 1), + L.ConvBNReLU( + in_channels=in_channels[-1], + out_channels=out_channels[i], + kernel_size=3, + stride=2, + padding='same', + bias_attr=False)) + self.conv_bn_func_list.append(residual) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outs = [] + for idx, conv_bn_func in enumerate(self.conv_bn_func_list): + if conv_bn_func is None: + outs.append(x[idx]) + else: + if idx < len(x): + outs.append(conv_bn_func(x[idx])) + else: + outs.append(conv_bn_func(x[-1])) + return outs + + +class Branches(nn.Layer): + def __init__(self, num_blocks: int, in_channels: int, out_channels: int, has_se: bool = False, name: str = None): + super(Branches, self).__init__() + + self.basic_block_list = [] + + for i in range(len(out_channels)): + self.basic_block_list.append([]) + for j in range(num_blocks[i]): + in_ch = in_channels[i] if j == 0 else out_channels[i] + basic_block_func = self.add_sublayer( + "bb_{}_branch_layer_{}_{}".format(name, i + 1, j + 1), + BasicBlock( + num_channels=in_ch, + num_filters=out_channels[i], + has_se=has_se, + name=name + '_branch_layer_' + str(i + 1) + '_' + str(j + 1))) + self.basic_block_list[i].append(basic_block_func) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outs = [] + for idx, input in enumerate(x): + conv = input + for basic_block_func in self.basic_block_list[idx]: + conv = basic_block_func(conv) + outs.append(conv) + return outs + + +class BottleneckBlock(nn.Layer): + def __init__(self, + num_channels: int, + num_filters: int, + has_se: bool, + stride: int = 1, + downsample: bool = False, + name: str = None): + super(BottleneckBlock, self).__init__() + + self.has_se = has_se + self.downsample = downsample + + self.conv1 = L.ConvBNReLU( + in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False) + + self.conv2 = L.ConvBNReLU( + in_channels=num_filters, + out_channels=num_filters, + kernel_size=3, + stride=stride, + padding='same', + bias_attr=False) + + self.conv3 = L.ConvBN( + in_channels=num_filters, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False) + + if self.downsample: + self.conv_down = L.ConvBN( + in_channels=num_channels, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False) + + if self.has_se: + self.se = SELayer( + num_channels=num_filters * 4, num_filters=num_filters * 4, reduction_ratio=16, name=name + '_fc') + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + residual = x + conv1 = self.conv1(x) + conv2 = self.conv2(conv1) + conv3 = self.conv3(conv2) + + if self.downsample: + residual = self.conv_down(x) + + if self.has_se: + conv3 = self.se(conv3) + + y = conv3 + residual + y = F.relu(y) + return y + + +class BasicBlock(nn.Layer): + def __init__(self, + num_channels: int, + num_filters: int, + stride: int = 1, + has_se: bool = False, + downsample: bool = False, + name: str = None): + super(BasicBlock, self).__init__() + + self.has_se = has_se + self.downsample = downsample + + self.conv1 = L.ConvBNReLU( + in_channels=num_channels, + out_channels=num_filters, + kernel_size=3, + stride=stride, + padding='same', + bias_attr=False) + self.conv2 = L.ConvBN( + in_channels=num_filters, out_channels=num_filters, kernel_size=3, padding='same', bias_attr=False) + + if self.downsample: + self.conv_down = L.ConvBNReLU( + in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False) + + if self.has_se: + self.se = SELayer(num_channels=num_filters, num_filters=num_filters, reduction_ratio=16, name=name + '_fc') + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + residual = x + conv1 = self.conv1(x) + conv2 = self.conv2(conv1) + + if self.downsample: + residual = self.conv_down(x) + + if self.has_se: + conv2 = self.se(conv2) + + y = conv2 + residual + y = F.relu(y) + return y + + +class SELayer(nn.Layer): + def __init__(self, num_channels: int, num_filters: int, reduction_ratio: int, name: str = None): + super(SELayer, self).__init__() + + self.pool2d_gap = nn.AdaptiveAvgPool2D(1) + + self._num_channels = num_channels + + med_ch = int(num_channels / reduction_ratio) + stdv = 1.0 / math.sqrt(num_channels * 1.0) + self.squeeze = nn.Linear( + num_channels, med_ch, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv))) + + stdv = 1.0 / math.sqrt(med_ch * 1.0) + self.excitation = nn.Linear( + med_ch, num_filters, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv))) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + pool = self.pool2d_gap(x) + pool = paddle.reshape(pool, shape=[-1, self._num_channels]) + squeeze = self.squeeze(pool) + squeeze = F.relu(squeeze) + excitation = self.excitation(squeeze) + excitation = F.sigmoid(excitation) + excitation = paddle.reshape(excitation, shape=[-1, self._num_channels, 1, 1]) + out = x * excitation + return out + + +class Stage(nn.Layer): + def __init__(self, + num_channels: int, + num_modules: int, + num_blocks: int, + num_filters: int, + has_se: bool = False, + multi_scale_output: bool = True, + name: str = None, + align_corners: bool = False): + super(Stage, self).__init__() + + self._num_modules = num_modules + + self.stage_func_list = [] + for i in range(num_modules): + if i == num_modules - 1 and not multi_scale_output: + stage_func = self.add_sublayer( + "stage_{}_{}".format(name, i + 1), + HighResolutionModule( + num_channels=num_channels, + num_blocks=num_blocks, + num_filters=num_filters, + has_se=has_se, + multi_scale_output=False, + name=name + '_' + str(i + 1), + align_corners=align_corners)) + else: + stage_func = self.add_sublayer( + "stage_{}_{}".format(name, i + 1), + HighResolutionModule( + num_channels=num_channels, + num_blocks=num_blocks, + num_filters=num_filters, + has_se=has_se, + name=name + '_' + str(i + 1), + align_corners=align_corners)) + + self.stage_func_list.append(stage_func) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + out = x + for idx in range(self._num_modules): + out = self.stage_func_list[idx](out) + return out + + +class HighResolutionModule(nn.Layer): + def __init__(self, + num_channels: int, + num_blocks: int, + num_filters: int, + has_se: bool = False, + multi_scale_output: bool = True, + name: str = None, + align_corners: str = False): + super(HighResolutionModule, self).__init__() + + self.branches_func = Branches( + num_blocks=num_blocks, in_channels=num_channels, out_channels=num_filters, has_se=has_se, name=name) + + self.fuse_func = FuseLayers( + in_channels=num_filters, + out_channels=num_filters, + multi_scale_output=multi_scale_output, + name=name, + align_corners=align_corners) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + out = self.branches_func(x) + out = self.fuse_func(out) + return out + + +class FuseLayers(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + multi_scale_output: bool = True, + name: str = None, + align_corners: bool = False): + super(FuseLayers, self).__init__() + + self._actual_ch = len(in_channels) if multi_scale_output else 1 + self._in_channels = in_channels + self.align_corners = align_corners + + self.residual_func_list = [] + for i in range(self._actual_ch): + for j in range(len(in_channels)): + if j > i: + residual_func = self.add_sublayer( + "residual_{}_layer_{}_{}".format(name, i + 1, j + 1), + L.ConvBN( + in_channels=in_channels[j], + out_channels=out_channels[i], + kernel_size=1, + padding='same', + bias_attr=False)) + self.residual_func_list.append(residual_func) + elif j < i: + pre_num_filters = in_channels[j] + for k in range(i - j): + if k == i - j - 1: + residual_func = self.add_sublayer( + "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1), + L.ConvBN( + in_channels=pre_num_filters, + out_channels=out_channels[i], + kernel_size=3, + stride=2, + padding='same', + bias_attr=False)) + pre_num_filters = out_channels[i] + else: + residual_func = self.add_sublayer( + "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1), + L.ConvBNReLU( + in_channels=pre_num_filters, + out_channels=out_channels[j], + kernel_size=3, + stride=2, + padding='same', + bias_attr=False)) + pre_num_filters = out_channels[j] + self.residual_func_list.append(residual_func) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outs = [] + residual_func_idx = 0 + for i in range(self._actual_ch): + residual = x[i] + residual_shape = residual.shape[-2:] + for j in range(len(self._in_channels)): + if j > i: + y = self.residual_func_list[residual_func_idx](x[j]) + residual_func_idx += 1 + + y = F.interpolate(y, residual_shape, mode='bilinear', align_corners=self.align_corners) + residual = residual + y + elif j < i: + y = x[j] + for k in range(i - j): + y = self.residual_func_list[residual_func_idx](y) + residual_func_idx += 1 + + residual = residual + y + + residual = F.relu(residual) + outs.append(residual) + + return outs diff --git a/modules/image/semantic_segmentation/fcn_hrnetw18_voc/layers.py b/modules/image/semantic_segmentation/fcn_hrnetw18_voc/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..8758f54f9a840ae49fd6e424b98bfe1dd61e13ec --- /dev/null +++ b/modules/image/semantic_segmentation/fcn_hrnetw18_voc/layers.py @@ -0,0 +1,296 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from typing import Tuple + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn import Conv2D, AvgPool2D + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu': + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class ConvBNLayer(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + stride: int = 1, + dilation: int = 1, + groups: int = 1, + is_vd_mode: bool = False, + act: str = None, + name: str = None): + super(ConvBNLayer, self).__init__() + + self.is_vd_mode = is_vd_mode + self._pool2d_avg = AvgPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True) + self._conv = Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2 if dilation == 1 else 0, + dilation=dilation, + groups=groups, + bias_attr=False) + + self._batch_norm = SyncBatchNorm(out_channels) + self._act_op = Activation(act=act) + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + if self.is_vd_mode: + inputs = self._pool2d_avg(inputs) + y = self._conv(inputs) + y = self._batch_norm(y) + y = self._act_op(y) + + return y + + +class BottleneckBlock(nn.Layer): + """Residual bottleneck block""" + + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + shortcut: bool = True, + if_first: bool = False, + dilation: int = 1, + name: str = None): + super(BottleneckBlock, self).__init__() + + self.conv0 = ConvBNLayer( + in_channels=in_channels, out_channels=out_channels, kernel_size=1, act='relu', name=name + "_branch2a") + + self.dilation = dilation + + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + act='relu', + dilation=dilation, + name=name + "_branch2b") + self.conv2 = ConvBNLayer( + in_channels=out_channels, out_channels=out_channels * 4, kernel_size=1, act=None, name=name + "_branch2c") + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels * 4, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + name=name + "_branch1") + + self.shortcut = shortcut + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + if self.dilation > 1: + padding = self.dilation + y = F.pad(y, [padding, padding, padding, padding]) + + conv1 = self.conv1(y) + conv2 = self.conv2(conv1) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + + y = paddle.add(x=short, y=conv2) + y = F.relu(y) + return y + + +class SeparableConvBNReLU(nn.Layer): + """Depthwise Separable Convolution.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict): + super(SeparableConvBNReLU, self).__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict): + super(ConvBN, self).__init__() + self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict): + super(ConvBNReLU, self).__init__() + + self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + Returns: + A callable object of Activation. + Raises: + KeyError: When parameter `act` is not in the optional range. + Examples: + from paddleseg.models.common.activation import Activation + relu = Activation("relu") + print(relu) + # + sigmoid = Activation("sigmoid") + print(sigmoid) + # + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = nn.layer.activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("nn.layer.activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + if self._act is not None: + return self.act_func(x) + else: + return x + + +class ASPPModule(nn.Layer): + """ + Atrous Spatial Pyramid Pooling. + + Args: + aspp_ratios (tuple): The dilation rate using in ASSP module. + in_channels (int): The number of input channels. + out_channels (int): The number of output channels. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False. + image_pooling (bool, optional): If augmented with image-level features. Default: False + """ + + def __init__(self, + aspp_ratios: Tuple[int], + in_channels: int, + out_channels: int, + align_corners: bool, + use_sep_conv: bool = False, + image_pooling: bool = False): + super().__init__() + + self.align_corners = align_corners + self.aspp_blocks = nn.LayerList() + + for ratio in aspp_ratios: + if use_sep_conv and ratio > 1: + conv_func = SeparableConvBNReLU + else: + conv_func = ConvBNReLU + + block = conv_func( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1 if ratio == 1 else 3, + dilation=ratio, + padding=0 if ratio == 1 else ratio) + self.aspp_blocks.append(block) + + out_size = len(self.aspp_blocks) + + if image_pooling: + self.global_avg_pool = nn.Sequential( + nn.AdaptiveAvgPool2D(output_size=(1, 1)), + ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False)) + out_size += 1 + self.image_pooling = image_pooling + + self.conv_bn_relu = ConvBNReLU(in_channels=out_channels * out_size, out_channels=out_channels, kernel_size=1) + + self.dropout = nn.Dropout(p=0.1) # drop rate + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outputs = [] + for block in self.aspp_blocks: + y = block(x) + y = F.interpolate(y, x.shape[2:], mode='bilinear', align_corners=self.align_corners) + outputs.append(y) + + if self.image_pooling: + img_avg = self.global_avg_pool(x) + img_avg = F.interpolate(img_avg, x.shape[2:], mode='bilinear', align_corners=self.align_corners) + outputs.append(img_avg) + + x = paddle.concat(outputs, axis=1) + x = self.conv_bn_relu(x) + x = self.dropout(x) + + return x diff --git a/modules/image/semantic_segmentation/fcn_hrnetw18_voc/module.py b/modules/image/semantic_segmentation/fcn_hrnetw18_voc/module.py new file mode 100644 index 0000000000000000000000000000000000000000..39e04c6325abd83404c1c81faea59652a4b3f6d1 --- /dev/null +++ b/modules/image/semantic_segmentation/fcn_hrnetw18_voc/module.py @@ -0,0 +1,133 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from typing import Union, List, Tuple + +import paddle +from paddle import nn +import paddle.nn.functional as F +import numpy as np +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +from fcn_hrnetw18_voc.hrnet import HRNet_W18 +import fcn_hrnetw18_voc.layers as layers + + +@moduleinfo( + name="fcn_hrnetw18_voc", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="Fcn_hrnetw18 is a segmentation model.", + version="1.0.0", + meta=ImageSegmentationModule) +class FCN(nn.Layer): + """ + A simple implementation for FCN based on PaddlePaddle. + + The original article refers to + Evan Shelhamer, et, al. "Fully Convolutional Networks for Semantic Segmentation" + (https://arxiv.org/abs/1411.4038). + + Args: + num_classes (int): The unique number of target classes. + backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone. + Default: (-1, ). + channels (int, optional): The channels between conv layer and the last layer of FCNHead. + If None, it will be the number of channels of input features. Default: None. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None + """ + + def __init__(self, + num_classes: int = 21, + backbone_indices: Tuple[int] = (-1, ), + channels: int = None, + align_corners: bool = False, + pretrained: str = None): + super(FCN, self).__init__() + + self.backbone = HRNet_W18() + backbone_channels = [self.backbone.feat_channels[i] for i in backbone_indices] + + self.head = FCNHead(num_classes, backbone_indices, backbone_channels, channels) + + self.align_corners = align_corners + self.transforms = T.Compose([T.Normalize()]) + + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]: + return self.transforms(img) + + def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]: + feat_list = self.backbone(x) + logit_list = self.head(feat_list) + return [ + F.interpolate(logit, paddle.shape(x)[2:], mode='bilinear', align_corners=self.align_corners) + for logit in logit_list + ] + + +class FCNHead(nn.Layer): + """ + A simple implementation for FCNHead based on PaddlePaddle + + Args: + num_classes (int): The unique number of target classes. + backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone. + Default: (-1, ). + backbone_channels (tuple): The values of backbone channels. + Default: (270, ). + channels (int, optional): The channels between conv layer and the last layer of FCNHead. + If None, it will be the number of channels of input features. Default: None. + pretrained (str, optional): The path of pretrained model. Default: None + """ + + def __init__(self, + num_classes: int, + backbone_indices: Tuple[int] = (-1, ), + backbone_channels: Tuple[int] = (270, ), + channels: int = None): + super(FCNHead, self).__init__() + + self.num_classes = num_classes + self.backbone_indices = backbone_indices + if channels is None: + channels = backbone_channels[0] + + self.conv_1 = layers.ConvBNReLU( + in_channels=backbone_channels[0], out_channels=channels, kernel_size=1, padding='same', stride=1) + self.cls = nn.Conv2D(in_channels=channels, out_channels=self.num_classes, kernel_size=1, stride=1, padding=0) + + def forward(self, feat_list: nn.Layer) -> List[paddle.Tensor]: + logit_list = [] + x = feat_list[self.backbone_indices[0]] + x = self.conv_1(x) + logit = self.cls(x) + logit_list.append(logit) + return logit_list diff --git a/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/README.md b/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/README.md new file mode 100644 index 0000000000000000000000000000000000000000..eb7ab11f6d3ee959fcd44977e765511e7c8cbc30 --- /dev/null +++ b/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/README.md @@ -0,0 +1,174 @@ +# PaddleHub 图像分割 + +## 模型预测 + + +若想使用我们提供的预训练模型进行预测,可使用如下脚本: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='fcn_hrnetw48_cityscapes') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + + +## 如何开始Fine-tune + +在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用fcn_hrnetw48_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。 + +## 代码步骤 + +使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。 + +### Step1: 定义数据预处理方式 +```python +from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + +transform = Compose([Resize(target_size=(512, 512)), Normalize()]) +``` + +`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + +### Step2: 下载数据集并使用 +```python +from paddlehub.datasets import OpticDiscSeg + +train_reader = OpticDiscSeg(transform, mode='train') + +``` +* `transform`: 数据预处理方式。 +* `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + +数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + +### Step3: 加载预训练模型 + +```python +model = hub.Module(name='fcn_hrnetw48_cityscapes', num_classes=2, pretrained=None) +``` +* `name`: 选择预训练模型的名字。 +* `num_classes`: 分割模型的类别数目。 +* `pretrained`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + +### Step4: 选择优化策略和运行配置 + +```python +scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) +optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) +trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True) +``` + +#### 优化策略 + +Paddle2.0提供了多种优化器选择,如`SGD`, `Adam`, `Adamax`等,其中`Adam`: + +* `learning_rate`: 全局学习率。 +* `parameters`: 待优化模型参数。 + +#### 运行配置 +`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数: + +* `model`: 被优化模型; +* `optimizer`: 优化器选择; +* `use_gpu`: 是否使用gpu,默认为False; +* `use_vdl`: 是否使用vdl可视化训练过程; +* `checkpoint_dir`: 保存模型参数的地址; +* `compare_metrics`: 保存最优模型的衡量指标; + +`trainer.train` 主要控制具体的训练过程,包含以下可控制的参数: + +* `train_dataset`: 训练时所用的数据集; +* `epochs`: 训练轮数; +* `batch_size`: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size; +* `num_workers`: works的数量,默认为0; +* `eval_dataset`: 验证集; +* `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。 +* `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。 + +## 模型预测 + +当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。 + +我们使用该模型来进行预测。predict.py脚本如下: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='fcn_hrnetw48_cityscapes', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + +参数配置正确后,请执行脚本`python predict.py`。 +**Args** +* `images`:原始图像路径或BGR格式图片; +* `visualization`: 是否可视化,默认为True; +* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + +**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 服务部署 + +PaddleHub Serving可以部署一个在线图像分割服务。 + +### Step1: 启动PaddleHub Serving + +运行启动命令: + +```shell +$ hub serving start -m fcn_hrnetw48_cityscapes +``` + +这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + +**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +### Step2: 发送预测请求 + +配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + +```python +import requests +import json +import cv2 +import base64 + +import numpy as np + + +def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + +# 发送HTTP请求 +org_im = cv2.imread('/PATH/TO/IMAGE') +data = {'images':[cv2_to_base64(org_im)]} +headers = {"Content-type": "application/json"} +url = "http://127.0.0.1:8866/predict/fcn_hrnetw48_cityscapes" +r = requests.post(url=url, headers=headers, data=json.dumps(data)) +mask = base64_to_cv2(r.json()["results"][0]) +``` + +### 查看代码 + +https://github.com/PaddlePaddle/PaddleSeg + +### 依赖 + +paddlepaddle >= 2.0.0 + +paddlehub >= 2.0.0 diff --git a/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/hrnet.py b/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/hrnet.py new file mode 100644 index 0000000000000000000000000000000000000000..72d29357247626cc38c07e586b2f4dffc067513c --- /dev/null +++ b/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/hrnet.py @@ -0,0 +1,528 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +from typing import List + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +import fcn_hrnetw48_cityscapes.layers as layers + + +class HRNet_W48(nn.Layer): + """ + The HRNet implementation based on PaddlePaddle. + The original article refers to + Jingdong Wang, et, al. "HRNet:Deep High-Resolution Representation Learning for Visual Recognition" + (https://arxiv.org/pdf/1908.07919.pdf). + Args: + stage1_num_modules (int, optional): Number of modules for stage1. Default 1. + stage1_num_blocks (list, optional): Number of blocks per module for stage1. Default (4). + stage1_num_channels (list, optional): Number of channels per branch for stage1. Default (64). + stage2_num_modules (int, optional): Number of modules for stage2. Default 1. + stage2_num_blocks (list, optional): Number of blocks per module for stage2. Default (4, 4). + stage2_num_channels (list, optional): Number of channels per branch for stage2. Default (48, 96). + stage3_num_modules (int, optional): Number of modules for stage3. Default 4. + stage3_num_blocks (list, optional): Number of blocks per module for stage3. Default (4, 4, 4). + stage3_num_channels (list, optional): Number of channels per branch for stage3. Default [48, 96, 192). + stage4_num_modules (int, optional): Number of modules for stage4. Default 3. + stage4_num_blocks (list, optional): Number of blocks per module for stage4. Default (4, 4, 4, 4). + stage4_num_channels (list, optional): Number of channels per branch for stage4. Default (48, 96, 192. 384). + has_se (bool, optional): Whether to use Squeeze-and-Excitation module. Default False. + align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even, + e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + """ + + def __init__(self, + stage1_num_modules: int = 1, + stage1_num_blocks: List[int] = [4], + stage1_num_channels: List[int] = [64], + stage2_num_modules: int = 1, + stage2_num_blocks: List[int] = [4, 4], + stage2_num_channels: List[int] = [48, 96], + stage3_num_modules: int = 4, + stage3_num_blocks: List[int] = [4, 4, 4], + stage3_num_channels: List[int] = [48, 96, 192], + stage4_num_modules: int = 3, + stage4_num_blocks: List[int] = [4, 4, 4, 4], + stage4_num_channels: List[int] = [48, 96, 192, 384], + has_se=False, + align_corners=False): + super(HRNet_W48, self).__init__() + self.stage1_num_modules = stage1_num_modules + self.stage1_num_blocks = stage1_num_blocks + self.stage1_num_channels = stage1_num_channels + self.stage2_num_modules = stage2_num_modules + self.stage2_num_blocks = stage2_num_blocks + self.stage2_num_channels = stage2_num_channels + self.stage3_num_modules = stage3_num_modules + self.stage3_num_blocks = stage3_num_blocks + self.stage3_num_channels = stage3_num_channels + self.stage4_num_modules = stage4_num_modules + self.stage4_num_blocks = stage4_num_blocks + self.stage4_num_channels = stage4_num_channels + self.has_se = has_se + self.align_corners = align_corners + self.feat_channels = [sum(stage4_num_channels)] + + self.conv_layer1_1 = layers.ConvBNReLU( + in_channels=3, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False) + + self.conv_layer1_2 = layers.ConvBNReLU( + in_channels=64, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False) + + self.la1 = Layer1( + num_channels=64, + num_blocks=self.stage1_num_blocks[0], + num_filters=self.stage1_num_channels[0], + has_se=has_se, + name="layer2") + + self.tr1 = TransitionLayer( + in_channels=[self.stage1_num_channels[0] * 4], out_channels=self.stage2_num_channels, name="tr1") + + self.st2 = Stage( + num_channels=self.stage2_num_channels, + num_modules=self.stage2_num_modules, + num_blocks=self.stage2_num_blocks, + num_filters=self.stage2_num_channels, + has_se=self.has_se, + name="st2", + align_corners=align_corners) + + self.tr2 = TransitionLayer( + in_channels=self.stage2_num_channels, out_channels=self.stage3_num_channels, name="tr2") + self.st3 = Stage( + num_channels=self.stage3_num_channels, + num_modules=self.stage3_num_modules, + num_blocks=self.stage3_num_blocks, + num_filters=self.stage3_num_channels, + has_se=self.has_se, + name="st3", + align_corners=align_corners) + + self.tr3 = TransitionLayer( + in_channels=self.stage3_num_channels, out_channels=self.stage4_num_channels, name="tr3") + self.st4 = Stage( + num_channels=self.stage4_num_channels, + num_modules=self.stage4_num_modules, + num_blocks=self.stage4_num_blocks, + num_filters=self.stage4_num_channels, + has_se=self.has_se, + name="st4", + align_corners=align_corners) + + def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]: + conv1 = self.conv_layer1_1(x) + conv2 = self.conv_layer1_2(conv1) + + la1 = self.la1(conv2) + + tr1 = self.tr1([la1]) + st2 = self.st2(tr1) + + tr2 = self.tr2(st2) + st3 = self.st3(tr2) + + tr3 = self.tr3(st3) + st4 = self.st4(tr3) + + size = paddle.shape(st4[0])[2:] + x1 = F.interpolate(st4[1], size, mode='bilinear', align_corners=self.align_corners) + x2 = F.interpolate(st4[2], size, mode='bilinear', align_corners=self.align_corners) + x3 = F.interpolate(st4[3], size, mode='bilinear', align_corners=self.align_corners) + x = paddle.concat([st4[0], x1, x2, x3], axis=1) + + return [x] + + +class Layer1(nn.Layer): + def __init__(self, num_channels: int, num_filters: int, num_blocks: int, has_se: bool = False, name: str = None): + super(Layer1, self).__init__() + + self.bottleneck_block_list = [] + + for i in range(num_blocks): + bottleneck_block = self.add_sublayer( + "bb_{}_{}".format(name, i + 1), + BottleneckBlock( + num_channels=num_channels if i == 0 else num_filters * 4, + num_filters=num_filters, + has_se=has_se, + stride=1, + downsample=True if i == 0 else False, + name=name + '_' + str(i + 1))) + self.bottleneck_block_list.append(bottleneck_block) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + conv = x + for block_func in self.bottleneck_block_list: + conv = block_func(conv) + return conv + + +class TransitionLayer(nn.Layer): + def __init__(self, in_channels: int, out_channels: int, name: str = None): + super(TransitionLayer, self).__init__() + + num_in = len(in_channels) + num_out = len(out_channels) + self.conv_bn_func_list = [] + for i in range(num_out): + residual = None + if i < num_in: + if in_channels[i] != out_channels[i]: + residual = self.add_sublayer( + "transition_{}_layer_{}".format(name, i + 1), + layers.ConvBNReLU( + in_channels=in_channels[i], + out_channels=out_channels[i], + kernel_size=3, + padding='same', + bias_attr=False)) + else: + residual = self.add_sublayer( + "transition_{}_layer_{}".format(name, i + 1), + layers.ConvBNReLU( + in_channels=in_channels[-1], + out_channels=out_channels[i], + kernel_size=3, + stride=2, + padding='same', + bias_attr=False)) + self.conv_bn_func_list.append(residual) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outs = [] + for idx, conv_bn_func in enumerate(self.conv_bn_func_list): + if conv_bn_func is None: + outs.append(x[idx]) + else: + if idx < len(x): + outs.append(conv_bn_func(x[idx])) + else: + outs.append(conv_bn_func(x[-1])) + return outs + + +class Branches(nn.Layer): + def __init__(self, num_blocks: int, in_channels: int, out_channels: int, has_se: bool = False, name: str = None): + super(Branches, self).__init__() + + self.basic_block_list = [] + + for i in range(len(out_channels)): + self.basic_block_list.append([]) + for j in range(num_blocks[i]): + in_ch = in_channels[i] if j == 0 else out_channels[i] + basic_block_func = self.add_sublayer( + "bb_{}_branch_layer_{}_{}".format(name, i + 1, j + 1), + BasicBlock( + num_channels=in_ch, + num_filters=out_channels[i], + has_se=has_se, + name=name + '_branch_layer_' + str(i + 1) + '_' + str(j + 1))) + self.basic_block_list[i].append(basic_block_func) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outs = [] + for idx, input in enumerate(x): + conv = input + for basic_block_func in self.basic_block_list[idx]: + conv = basic_block_func(conv) + outs.append(conv) + return outs + + +class BottleneckBlock(nn.Layer): + def __init__(self, + num_channels: int, + num_filters: int, + has_se: bool, + stride: int = 1, + downsample: bool = False, + name: str = None): + super(BottleneckBlock, self).__init__() + + self.has_se = has_se + self.downsample = downsample + + self.conv1 = layers.ConvBNReLU( + in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False) + + self.conv2 = layers.ConvBNReLU( + in_channels=num_filters, + out_channels=num_filters, + kernel_size=3, + stride=stride, + padding='same', + bias_attr=False) + + self.conv3 = layers.ConvBN( + in_channels=num_filters, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False) + + if self.downsample: + self.conv_down = layers.ConvBN( + in_channels=num_channels, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False) + + if self.has_se: + self.se = SELayer( + num_channels=num_filters * 4, num_filters=num_filters * 4, reduction_ratio=16, name=name + '_fc') + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + residual = x + conv1 = self.conv1(x) + conv2 = self.conv2(conv1) + conv3 = self.conv3(conv2) + + if self.downsample: + residual = self.conv_down(x) + + if self.has_se: + conv3 = self.se(conv3) + + y = conv3 + residual + y = F.relu(y) + return y + + +class BasicBlock(nn.Layer): + def __init__(self, + num_channels: int, + num_filters: int, + stride: int = 1, + has_se: bool = False, + downsample: bool = False, + name: str = None): + super(BasicBlock, self).__init__() + + self.has_se = has_se + self.downsample = downsample + + self.conv1 = layers.ConvBNReLU( + in_channels=num_channels, + out_channels=num_filters, + kernel_size=3, + stride=stride, + padding='same', + bias_attr=False) + self.conv2 = layers.ConvBN( + in_channels=num_filters, out_channels=num_filters, kernel_size=3, padding='same', bias_attr=False) + + if self.downsample: + self.conv_down = layers.ConvBNReLU( + in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False) + + if self.has_se: + self.se = SELayer(num_channels=num_filters, num_filters=num_filters, reduction_ratio=16, name=name + '_fc') + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + residual = x + conv1 = self.conv1(x) + conv2 = self.conv2(conv1) + + if self.downsample: + residual = self.conv_down(x) + + if self.has_se: + conv2 = self.se(conv2) + + y = conv2 + residual + y = F.relu(y) + return y + + +class SELayer(nn.Layer): + def __init__(self, num_channels: int, num_filters: int, reduction_ratio: float, name: str = None): + super(SELayer, self).__init__() + + self.pool2d_gap = nn.AdaptiveAvgPool2D(1) + + self._num_channels = num_channels + + med_ch = int(num_channels / reduction_ratio) + stdv = 1.0 / math.sqrt(num_channels * 1.0) + self.squeeze = nn.Linear( + num_channels, med_ch, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv))) + + stdv = 1.0 / math.sqrt(med_ch * 1.0) + self.excitation = nn.Linear( + med_ch, num_filters, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv))) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + pool = self.pool2d_gap(x) + pool = paddle.reshape(pool, shape=[-1, self._num_channels]) + squeeze = self.squeeze(pool) + squeeze = F.relu(squeeze) + excitation = self.excitation(squeeze) + excitation = F.sigmoid(excitation) + excitation = paddle.reshape(excitation, shape=[-1, self._num_channels, 1, 1]) + out = x * excitation + return out + + +class Stage(nn.Layer): + def __init__(self, + num_channels: int, + num_modules: int, + num_blocks: int, + num_filters: int, + has_se: bool = False, + multi_scale_output: bool = True, + name: str = None, + align_corners: bool = False): + super(Stage, self).__init__() + + self._num_modules = num_modules + + self.stage_func_list = [] + for i in range(num_modules): + if i == num_modules - 1 and not multi_scale_output: + stage_func = self.add_sublayer( + "stage_{}_{}".format(name, i + 1), + HighResolutionModule( + num_channels=num_channels, + num_blocks=num_blocks, + num_filters=num_filters, + has_se=has_se, + multi_scale_output=False, + name=name + '_' + str(i + 1), + align_corners=align_corners)) + else: + stage_func = self.add_sublayer( + "stage_{}_{}".format(name, i + 1), + HighResolutionModule( + num_channels=num_channels, + num_blocks=num_blocks, + num_filters=num_filters, + has_se=has_se, + name=name + '_' + str(i + 1), + align_corners=align_corners)) + + self.stage_func_list.append(stage_func) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + out = x + for idx in range(self._num_modules): + out = self.stage_func_list[idx](out) + return out + + +class HighResolutionModule(nn.Layer): + def __init__(self, + num_channels: int, + num_blocks: int, + num_filters: int, + has_se: bool = False, + multi_scale_output: bool = True, + name: str = None, + align_corners: bool = False): + super(HighResolutionModule, self).__init__() + + self.branches_func = Branches( + num_blocks=num_blocks, in_channels=num_channels, out_channels=num_filters, has_se=has_se, name=name) + + self.fuse_func = FuseLayers( + in_channels=num_filters, + out_channels=num_filters, + multi_scale_output=multi_scale_output, + name=name, + align_corners=align_corners) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + out = self.branches_func(x) + out = self.fuse_func(out) + return out + + +class FuseLayers(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + multi_scale_output: bool = True, + name: str = None, + align_corners: bool = False): + super(FuseLayers, self).__init__() + + self._actual_ch = len(in_channels) if multi_scale_output else 1 + self._in_channels = in_channels + self.align_corners = align_corners + + self.residual_func_list = [] + for i in range(self._actual_ch): + for j in range(len(in_channels)): + if j > i: + residual_func = self.add_sublayer( + "residual_{}_layer_{}_{}".format(name, i + 1, j + 1), + layers.ConvBN( + in_channels=in_channels[j], + out_channels=out_channels[i], + kernel_size=1, + padding='same', + bias_attr=False)) + self.residual_func_list.append(residual_func) + elif j < i: + pre_num_filters = in_channels[j] + for k in range(i - j): + if k == i - j - 1: + residual_func = self.add_sublayer( + "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1), + layers.ConvBN( + in_channels=pre_num_filters, + out_channels=out_channels[i], + kernel_size=3, + stride=2, + padding='same', + bias_attr=False)) + pre_num_filters = out_channels[i] + else: + residual_func = self.add_sublayer( + "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1), + layers.ConvBNReLU( + in_channels=pre_num_filters, + out_channels=out_channels[j], + kernel_size=3, + stride=2, + padding='same', + bias_attr=False)) + pre_num_filters = out_channels[j] + self.residual_func_list.append(residual_func) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outs = [] + residual_func_idx = 0 + for i in range(self._actual_ch): + residual = x[i] + residual_shape = paddle.shape(residual)[-2:] + for j in range(len(self._in_channels)): + if j > i: + y = self.residual_func_list[residual_func_idx](x[j]) + residual_func_idx += 1 + + y = F.interpolate(y, residual_shape, mode='bilinear', align_corners=self.align_corners) + residual = residual + y + elif j < i: + y = x[j] + for k in range(i - j): + y = self.residual_func_list[residual_func_idx](y) + residual_func_idx += 1 + + residual = residual + y + + residual = F.relu(residual) + outs.append(residual) + + return outs diff --git a/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/layers.py b/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..09fd7d68e8a34a84c921dbe230749869040308c3 --- /dev/null +++ b/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/layers.py @@ -0,0 +1,297 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from typing import Tuple + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn import Conv2D, AvgPool2D + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu': + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class ConvBNLayer(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + stride: int = 1, + dilation: int = 1, + groups: int = 1, + is_vd_mode: bool = False, + act: str = None, + name: str = None): + super(ConvBNLayer, self).__init__() + + self.is_vd_mode = is_vd_mode + self._pool2d_avg = AvgPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True) + self._conv = Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2 if dilation == 1 else 0, + dilation=dilation, + groups=groups, + bias_attr=False) + + self._batch_norm = SyncBatchNorm(out_channels) + self._act_op = Activation(act=act) + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + if self.is_vd_mode: + inputs = self._pool2d_avg(inputs) + y = self._conv(inputs) + y = self._batch_norm(y) + y = self._act_op(y) + + return y + + +class BottleneckBlock(nn.Layer): + """Residual bottleneck block""" + + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + shortcut: bool = True, + if_first: bool = False, + dilation: int = 1, + name: str = None): + super(BottleneckBlock, self).__init__() + + self.conv0 = ConvBNLayer( + in_channels=in_channels, out_channels=out_channels, kernel_size=1, act='relu', name=name + "_branch2a") + + self.dilation = dilation + + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + act='relu', + dilation=dilation, + name=name + "_branch2b") + self.conv2 = ConvBNLayer( + in_channels=out_channels, out_channels=out_channels * 4, kernel_size=1, act=None, name=name + "_branch2c") + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels * 4, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + name=name + "_branch1") + + self.shortcut = shortcut + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + if self.dilation > 1: + padding = self.dilation + y = F.pad(y, [padding, padding, padding, padding]) + + conv1 = self.conv1(y) + conv2 = self.conv2(conv1) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + + y = paddle.add(x=short, y=conv2) + y = F.relu(y) + return y + + +class SeparableConvBNReLU(nn.Layer): + """Depthwise Separable Convolution.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict): + super(SeparableConvBNReLU, self).__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict): + super(ConvBN, self).__init__() + self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict): + super(ConvBNReLU, self).__init__() + + self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + Returns: + A callable object of Activation. + Raises: + KeyError: When parameter `act` is not in the optional range. + Examples: + from paddleseg.models.common.activation import Activation + relu = Activation("relu") + print(relu) + # + sigmoid = Activation("sigmoid") + print(sigmoid) + # + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = nn.layer.activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("nn.layer.activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + if self._act is not None: + return self.act_func(x) + else: + return x + + +class ASPPModule(nn.Layer): + """ + Atrous Spatial Pyramid Pooling. + + Args: + aspp_ratios (tuple): The dilation rate using in ASSP module. + in_channels (int): The number of input channels. + out_channels (int): The number of output channels. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False. + image_pooling (bool, optional): If augmented with image-level features. Default: False + """ + + def __init__(self, + aspp_ratios: Tuple[int], + in_channels: int, + out_channels: int, + align_corners: bool, + use_sep_conv: bool = False, + image_pooling: bool = False): + super().__init__() + + self.align_corners = align_corners + self.aspp_blocks = nn.LayerList() + + for ratio in aspp_ratios: + if use_sep_conv and ratio > 1: + conv_func = SeparableConvBNReLU + else: + conv_func = ConvBNReLU + + block = conv_func( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1 if ratio == 1 else 3, + dilation=ratio, + padding=0 if ratio == 1 else ratio) + self.aspp_blocks.append(block) + + out_size = len(self.aspp_blocks) + + if image_pooling: + self.global_avg_pool = nn.Sequential( + nn.AdaptiveAvgPool2D(output_size=(1, 1)), + ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False)) + out_size += 1 + self.image_pooling = image_pooling + + self.conv_bn_relu = ConvBNReLU(in_channels=out_channels * out_size, out_channels=out_channels, kernel_size=1) + + self.dropout = nn.Dropout(p=0.1) # drop rate + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outputs = [] + for block in self.aspp_blocks: + y = block(x) + y = F.interpolate(y, x.shape[2:], mode='bilinear', align_corners=self.align_corners) + outputs.append(y) + + if self.image_pooling: + img_avg = self.global_avg_pool(x) + img_avg = F.interpolate(img_avg, x.shape[2:], mode='bilinear', align_corners=self.align_corners) + outputs.append(img_avg) + + x = paddle.concat(outputs, axis=1) + x = self.conv_bn_relu(x) + x = self.dropout(x) + + return x diff --git a/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/module.py b/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/module.py new file mode 100644 index 0000000000000000000000000000000000000000..c7ff6d98c465fd6bc7ffed34c9142d1bdb89c60f --- /dev/null +++ b/modules/image/semantic_segmentation/fcn_hrnetw48_cityscapes/module.py @@ -0,0 +1,133 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from typing import Union, List, Tuple + +import paddle +from paddle import nn +import paddle.nn.functional as F +import numpy as np +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +from fcn_hrnetw48_cityscapes.hrnet import HRNet_W48 +import fcn_hrnetw48_cityscapes.layers as layers + + +@moduleinfo( + name="fcn_hrnetw48_cityscapes", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="Fcn_hrnetw48 is a segmentation model.", + version="1.0.0", + meta=ImageSegmentationModule) +class FCN(nn.Layer): + """ + A simple implementation for FCN based on PaddlePaddle. + + The original article refers to + Evan Shelhamer, et, al. "Fully Convolutional Networks for Semantic Segmentation" + (https://arxiv.org/abs/1411.4038). + + Args: + num_classes (int): The unique number of target classes. + backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone. + Default: (-1, ). + channels (int, optional): The channels between conv layer and the last layer of FCNHead. + If None, it will be the number of channels of input features. Default: None. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None + """ + + def __init__(self, + num_classes: int = 19, + backbone_indices: Tuple[int] = (-1, ), + channels: int = None, + align_corners: bool = False, + pretrained: str = None): + super(FCN, self).__init__() + + self.backbone = HRNet_W48() + backbone_channels = [self.backbone.feat_channels[i] for i in backbone_indices] + + self.head = FCNHead(num_classes, backbone_indices, backbone_channels, channels) + + self.align_corners = align_corners + self.transforms = T.Compose([T.Normalize()]) + + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]: + return self.transforms(img) + + def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]: + feat_list = self.backbone(x) + logit_list = self.head(feat_list) + return [ + F.interpolate(logit, paddle.shape(x)[2:], mode='bilinear', align_corners=self.align_corners) + for logit in logit_list + ] + + +class FCNHead(nn.Layer): + """ + A simple implementation for FCNHead based on PaddlePaddle + + Args: + num_classes (int): The unique number of target classes. + backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone. + Default: (-1, ). + backbone_channels (tuple): The values of backbone channels. + Default: (270, ). + channels (int, optional): The channels between conv layer and the last layer of FCNHead. + If None, it will be the number of channels of input features. Default: None. + pretrained (str, optional): The path of pretrained model. Default: None + """ + + def __init__(self, + num_classes: int, + backbone_indices: Tuple[int] = (-1, ), + backbone_channels: Tuple[int] = (270, ), + channels: int = None): + super(FCNHead, self).__init__() + + self.num_classes = num_classes + self.backbone_indices = backbone_indices + if channels is None: + channels = backbone_channels[0] + + self.conv_1 = layers.ConvBNReLU( + in_channels=backbone_channels[0], out_channels=channels, kernel_size=1, padding='same', stride=1) + self.cls = nn.Conv2D(in_channels=channels, out_channels=self.num_classes, kernel_size=1, stride=1, padding=0) + + def forward(self, feat_list: nn.Layer) -> List[paddle.Tensor]: + logit_list = [] + x = feat_list[self.backbone_indices[0]] + x = self.conv_1(x) + logit = self.cls(x) + logit_list.append(logit) + return logit_list diff --git a/modules/image/semantic_segmentation/fcn_hrnetw48_voc/README.md b/modules/image/semantic_segmentation/fcn_hrnetw48_voc/README.md new file mode 100644 index 0000000000000000000000000000000000000000..1c42e162681a358b0a72e4aa2ac053cd7303a7ae --- /dev/null +++ b/modules/image/semantic_segmentation/fcn_hrnetw48_voc/README.md @@ -0,0 +1,174 @@ +# PaddleHub 图像分割 + +## 模型预测 + + +若想使用我们提供的预训练模型进行预测,可使用如下脚本: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='fcn_hrnetw48_voc') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + + +## 如何开始Fine-tune + +在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用fcn_hrnetw48_voc模型对OpticDiscSeg等数据集进行Fine-tune。 + +## 代码步骤 + +使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。 + +### Step1: 定义数据预处理方式 +```python +from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + +transform = Compose([Resize(target_size=(512, 512)), Normalize()]) +``` + +`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + +### Step2: 下载数据集并使用 +```python +from paddlehub.datasets import OpticDiscSeg + +train_reader = OpticDiscSeg(transform, mode='train') + +``` +* `transform`: 数据预处理方式。 +* `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + +数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + +### Step3: 加载预训练模型 + +```python +model = hub.Module(name='fcn_hrnetw48_voc', num_classes=2, pretrained=None) +``` +* `name`: 选择预训练模型的名字。 +* `num_classes`: 分割模型的类别数目。 +* `pretrained`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + +### Step4: 选择优化策略和运行配置 + +```python +scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) +optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) +trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True) +``` + +#### 优化策略 + +Paddle2.0提供了多种优化器选择,如`SGD`, `Adam`, `Adamax`等,其中`Adam`: + +* `learning_rate`: 全局学习率。 +* `parameters`: 待优化模型参数。 + +#### 运行配置 +`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数: + +* `model`: 被优化模型; +* `optimizer`: 优化器选择; +* `use_gpu`: 是否使用gpu,默认为False; +* `use_vdl`: 是否使用vdl可视化训练过程; +* `checkpoint_dir`: 保存模型参数的地址; +* `compare_metrics`: 保存最优模型的衡量指标; + +`trainer.train` 主要控制具体的训练过程,包含以下可控制的参数: + +* `train_dataset`: 训练时所用的数据集; +* `epochs`: 训练轮数; +* `batch_size`: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size; +* `num_workers`: works的数量,默认为0; +* `eval_dataset`: 验证集; +* `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。 +* `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。 + +## 模型预测 + +当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。 + +我们使用该模型来进行预测。predict.py脚本如下: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='fcn_hrnetw48_voc', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + +参数配置正确后,请执行脚本`python predict.py`。 +**Args** +* `images`:原始图像路径或BGR格式图片; +* `visualization`: 是否可视化,默认为True; +* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + +**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 服务部署 + +PaddleHub Serving可以部署一个在线图像分割服务。 + +### Step1: 启动PaddleHub Serving + +运行启动命令: + +```shell +$ hub serving start -m fcn_hrnetw48_voc +``` + +这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + +**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +### Step2: 发送预测请求 + +配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + +```python +import requests +import json +import cv2 +import base64 + +import numpy as np + + +def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + +# 发送HTTP请求 +org_im = cv2.imread('/PATH/TO/IMAGE') +data = {'images':[cv2_to_base64(org_im)]} +headers = {"Content-type": "application/json"} +url = "http://127.0.0.1:8866/predict/fcn_hrnetw48_voc" +r = requests.post(url=url, headers=headers, data=json.dumps(data)) +mask = base64_to_cv2(r.json()["results"][0]) +``` + +### 查看代码 + +https://github.com/PaddlePaddle/PaddleSeg + +### 依赖 + +paddlepaddle >= 2.0.0 + +paddlehub >= 2.0.0 diff --git a/modules/image/semantic_segmentation/fcn_hrnetw48_voc/hrnet.py b/modules/image/semantic_segmentation/fcn_hrnetw48_voc/hrnet.py new file mode 100644 index 0000000000000000000000000000000000000000..421d70392370a2b962627cc5bcf6f25d775dc454 --- /dev/null +++ b/modules/image/semantic_segmentation/fcn_hrnetw48_voc/hrnet.py @@ -0,0 +1,528 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +from typing import List + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +import fcn_hrnetw48_voc.layers as layers + + +class HRNet_W48(nn.Layer): + """ + The HRNet implementation based on PaddlePaddle. + The original article refers to + Jingdong Wang, et, al. "HRNet:Deep High-Resolution Representation Learning for Visual Recognition" + (https://arxiv.org/pdf/1908.07919.pdf). + Args: + stage1_num_modules (int, optional): Number of modules for stage1. Default 1. + stage1_num_blocks (list, optional): Number of blocks per module for stage1. Default (4). + stage1_num_channels (list, optional): Number of channels per branch for stage1. Default (64). + stage2_num_modules (int, optional): Number of modules for stage2. Default 1. + stage2_num_blocks (list, optional): Number of blocks per module for stage2. Default (4, 4). + stage2_num_channels (list, optional): Number of channels per branch for stage2. Default (48, 96). + stage3_num_modules (int, optional): Number of modules for stage3. Default 4. + stage3_num_blocks (list, optional): Number of blocks per module for stage3. Default (4, 4, 4). + stage3_num_channels (list, optional): Number of channels per branch for stage3. Default [48, 96, 192). + stage4_num_modules (int, optional): Number of modules for stage4. Default 3. + stage4_num_blocks (list, optional): Number of blocks per module for stage4. Default (4, 4, 4, 4). + stage4_num_channels (list, optional): Number of channels per branch for stage4. Default (48, 96, 192. 384). + has_se (bool, optional): Whether to use Squeeze-and-Excitation module. Default False. + align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even, + e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + """ + + def __init__(self, + stage1_num_modules: int = 1, + stage1_num_blocks: List[int] = [4], + stage1_num_channels: List[int] = [64], + stage2_num_modules: int = 1, + stage2_num_blocks: List[int] = [4, 4], + stage2_num_channels: List[int] = [48, 96], + stage3_num_modules: int = 4, + stage3_num_blocks: List[int] = [4, 4, 4], + stage3_num_channels: List[int] = [48, 96, 192], + stage4_num_modules: int = 3, + stage4_num_blocks: List[int] = [4, 4, 4, 4], + stage4_num_channels: List[int] = [48, 96, 192, 384], + has_se=False, + align_corners=False): + super(HRNet_W48, self).__init__() + self.stage1_num_modules = stage1_num_modules + self.stage1_num_blocks = stage1_num_blocks + self.stage1_num_channels = stage1_num_channels + self.stage2_num_modules = stage2_num_modules + self.stage2_num_blocks = stage2_num_blocks + self.stage2_num_channels = stage2_num_channels + self.stage3_num_modules = stage3_num_modules + self.stage3_num_blocks = stage3_num_blocks + self.stage3_num_channels = stage3_num_channels + self.stage4_num_modules = stage4_num_modules + self.stage4_num_blocks = stage4_num_blocks + self.stage4_num_channels = stage4_num_channels + self.has_se = has_se + self.align_corners = align_corners + self.feat_channels = [sum(stage4_num_channels)] + + self.conv_layer1_1 = layers.ConvBNReLU( + in_channels=3, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False) + + self.conv_layer1_2 = layers.ConvBNReLU( + in_channels=64, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False) + + self.la1 = Layer1( + num_channels=64, + num_blocks=self.stage1_num_blocks[0], + num_filters=self.stage1_num_channels[0], + has_se=has_se, + name="layer2") + + self.tr1 = TransitionLayer( + in_channels=[self.stage1_num_channels[0] * 4], out_channels=self.stage2_num_channels, name="tr1") + + self.st2 = Stage( + num_channels=self.stage2_num_channels, + num_modules=self.stage2_num_modules, + num_blocks=self.stage2_num_blocks, + num_filters=self.stage2_num_channels, + has_se=self.has_se, + name="st2", + align_corners=align_corners) + + self.tr2 = TransitionLayer( + in_channels=self.stage2_num_channels, out_channels=self.stage3_num_channels, name="tr2") + self.st3 = Stage( + num_channels=self.stage3_num_channels, + num_modules=self.stage3_num_modules, + num_blocks=self.stage3_num_blocks, + num_filters=self.stage3_num_channels, + has_se=self.has_se, + name="st3", + align_corners=align_corners) + + self.tr3 = TransitionLayer( + in_channels=self.stage3_num_channels, out_channels=self.stage4_num_channels, name="tr3") + self.st4 = Stage( + num_channels=self.stage4_num_channels, + num_modules=self.stage4_num_modules, + num_blocks=self.stage4_num_blocks, + num_filters=self.stage4_num_channels, + has_se=self.has_se, + name="st4", + align_corners=align_corners) + + def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]: + conv1 = self.conv_layer1_1(x) + conv2 = self.conv_layer1_2(conv1) + + la1 = self.la1(conv2) + + tr1 = self.tr1([la1]) + st2 = self.st2(tr1) + + tr2 = self.tr2(st2) + st3 = self.st3(tr2) + + tr3 = self.tr3(st3) + st4 = self.st4(tr3) + + size = paddle.shape(st4[0])[2:] + x1 = F.interpolate(st4[1], size, mode='bilinear', align_corners=self.align_corners) + x2 = F.interpolate(st4[2], size, mode='bilinear', align_corners=self.align_corners) + x3 = F.interpolate(st4[3], size, mode='bilinear', align_corners=self.align_corners) + x = paddle.concat([st4[0], x1, x2, x3], axis=1) + + return [x] + + +class Layer1(nn.Layer): + def __init__(self, num_channels: int, num_filters: int, num_blocks: int, has_se: bool = False, name: str = None): + super(Layer1, self).__init__() + + self.bottleneck_block_list = [] + + for i in range(num_blocks): + bottleneck_block = self.add_sublayer( + "bb_{}_{}".format(name, i + 1), + BottleneckBlock( + num_channels=num_channels if i == 0 else num_filters * 4, + num_filters=num_filters, + has_se=has_se, + stride=1, + downsample=True if i == 0 else False, + name=name + '_' + str(i + 1))) + self.bottleneck_block_list.append(bottleneck_block) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + conv = x + for block_func in self.bottleneck_block_list: + conv = block_func(conv) + return conv + + +class TransitionLayer(nn.Layer): + def __init__(self, in_channels: int, out_channels: int, name: str = None): + super(TransitionLayer, self).__init__() + + num_in = len(in_channels) + num_out = len(out_channels) + self.conv_bn_func_list = [] + for i in range(num_out): + residual = None + if i < num_in: + if in_channels[i] != out_channels[i]: + residual = self.add_sublayer( + "transition_{}_layer_{}".format(name, i + 1), + layers.ConvBNReLU( + in_channels=in_channels[i], + out_channels=out_channels[i], + kernel_size=3, + padding='same', + bias_attr=False)) + else: + residual = self.add_sublayer( + "transition_{}_layer_{}".format(name, i + 1), + layers.ConvBNReLU( + in_channels=in_channels[-1], + out_channels=out_channels[i], + kernel_size=3, + stride=2, + padding='same', + bias_attr=False)) + self.conv_bn_func_list.append(residual) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outs = [] + for idx, conv_bn_func in enumerate(self.conv_bn_func_list): + if conv_bn_func is None: + outs.append(x[idx]) + else: + if idx < len(x): + outs.append(conv_bn_func(x[idx])) + else: + outs.append(conv_bn_func(x[-1])) + return outs + + +class Branches(nn.Layer): + def __init__(self, num_blocks: int, in_channels: int, out_channels: int, has_se: bool = False, name: str = None): + super(Branches, self).__init__() + + self.basic_block_list = [] + + for i in range(len(out_channels)): + self.basic_block_list.append([]) + for j in range(num_blocks[i]): + in_ch = in_channels[i] if j == 0 else out_channels[i] + basic_block_func = self.add_sublayer( + "bb_{}_branch_layer_{}_{}".format(name, i + 1, j + 1), + BasicBlock( + num_channels=in_ch, + num_filters=out_channels[i], + has_se=has_se, + name=name + '_branch_layer_' + str(i + 1) + '_' + str(j + 1))) + self.basic_block_list[i].append(basic_block_func) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outs = [] + for idx, input in enumerate(x): + conv = input + for basic_block_func in self.basic_block_list[idx]: + conv = basic_block_func(conv) + outs.append(conv) + return outs + + +class BottleneckBlock(nn.Layer): + def __init__(self, + num_channels: int, + num_filters: int, + has_se: bool, + stride: int = 1, + downsample: bool = False, + name: str = None): + super(BottleneckBlock, self).__init__() + + self.has_se = has_se + self.downsample = downsample + + self.conv1 = layers.ConvBNReLU( + in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False) + + self.conv2 = layers.ConvBNReLU( + in_channels=num_filters, + out_channels=num_filters, + kernel_size=3, + stride=stride, + padding='same', + bias_attr=False) + + self.conv3 = layers.ConvBN( + in_channels=num_filters, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False) + + if self.downsample: + self.conv_down = layers.ConvBN( + in_channels=num_channels, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False) + + if self.has_se: + self.se = SELayer( + num_channels=num_filters * 4, num_filters=num_filters * 4, reduction_ratio=16, name=name + '_fc') + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + residual = x + conv1 = self.conv1(x) + conv2 = self.conv2(conv1) + conv3 = self.conv3(conv2) + + if self.downsample: + residual = self.conv_down(x) + + if self.has_se: + conv3 = self.se(conv3) + + y = conv3 + residual + y = F.relu(y) + return y + + +class BasicBlock(nn.Layer): + def __init__(self, + num_channels: int, + num_filters: int, + stride: int = 1, + has_se: bool = False, + downsample: bool = False, + name: str = None): + super(BasicBlock, self).__init__() + + self.has_se = has_se + self.downsample = downsample + + self.conv1 = layers.ConvBNReLU( + in_channels=num_channels, + out_channels=num_filters, + kernel_size=3, + stride=stride, + padding='same', + bias_attr=False) + self.conv2 = layers.ConvBN( + in_channels=num_filters, out_channels=num_filters, kernel_size=3, padding='same', bias_attr=False) + + if self.downsample: + self.conv_down = layers.ConvBNReLU( + in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False) + + if self.has_se: + self.se = SELayer(num_channels=num_filters, num_filters=num_filters, reduction_ratio=16, name=name + '_fc') + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + residual = x + conv1 = self.conv1(x) + conv2 = self.conv2(conv1) + + if self.downsample: + residual = self.conv_down(x) + + if self.has_se: + conv2 = self.se(conv2) + + y = conv2 + residual + y = F.relu(y) + return y + + +class SELayer(nn.Layer): + def __init__(self, num_channels: int, num_filters: int, reduction_ratio: float, name: str = None): + super(SELayer, self).__init__() + + self.pool2d_gap = nn.AdaptiveAvgPool2D(1) + + self._num_channels = num_channels + + med_ch = int(num_channels / reduction_ratio) + stdv = 1.0 / math.sqrt(num_channels * 1.0) + self.squeeze = nn.Linear( + num_channels, med_ch, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv))) + + stdv = 1.0 / math.sqrt(med_ch * 1.0) + self.excitation = nn.Linear( + med_ch, num_filters, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv))) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + pool = self.pool2d_gap(x) + pool = paddle.reshape(pool, shape=[-1, self._num_channels]) + squeeze = self.squeeze(pool) + squeeze = F.relu(squeeze) + excitation = self.excitation(squeeze) + excitation = F.sigmoid(excitation) + excitation = paddle.reshape(excitation, shape=[-1, self._num_channels, 1, 1]) + out = x * excitation + return out + + +class Stage(nn.Layer): + def __init__(self, + num_channels: int, + num_modules: int, + num_blocks: int, + num_filters: int, + has_se: bool = False, + multi_scale_output: bool = True, + name: str = None, + align_corners: bool = False): + super(Stage, self).__init__() + + self._num_modules = num_modules + + self.stage_func_list = [] + for i in range(num_modules): + if i == num_modules - 1 and not multi_scale_output: + stage_func = self.add_sublayer( + "stage_{}_{}".format(name, i + 1), + HighResolutionModule( + num_channels=num_channels, + num_blocks=num_blocks, + num_filters=num_filters, + has_se=has_se, + multi_scale_output=False, + name=name + '_' + str(i + 1), + align_corners=align_corners)) + else: + stage_func = self.add_sublayer( + "stage_{}_{}".format(name, i + 1), + HighResolutionModule( + num_channels=num_channels, + num_blocks=num_blocks, + num_filters=num_filters, + has_se=has_se, + name=name + '_' + str(i + 1), + align_corners=align_corners)) + + self.stage_func_list.append(stage_func) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + out = x + for idx in range(self._num_modules): + out = self.stage_func_list[idx](out) + return out + + +class HighResolutionModule(nn.Layer): + def __init__(self, + num_channels: int, + num_blocks: int, + num_filters: int, + has_se: bool = False, + multi_scale_output: bool = True, + name: str = None, + align_corners: bool = False): + super(HighResolutionModule, self).__init__() + + self.branches_func = Branches( + num_blocks=num_blocks, in_channels=num_channels, out_channels=num_filters, has_se=has_se, name=name) + + self.fuse_func = FuseLayers( + in_channels=num_filters, + out_channels=num_filters, + multi_scale_output=multi_scale_output, + name=name, + align_corners=align_corners) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + out = self.branches_func(x) + out = self.fuse_func(out) + return out + + +class FuseLayers(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + multi_scale_output: bool = True, + name: str = None, + align_corners: bool = False): + super(FuseLayers, self).__init__() + + self._actual_ch = len(in_channels) if multi_scale_output else 1 + self._in_channels = in_channels + self.align_corners = align_corners + + self.residual_func_list = [] + for i in range(self._actual_ch): + for j in range(len(in_channels)): + if j > i: + residual_func = self.add_sublayer( + "residual_{}_layer_{}_{}".format(name, i + 1, j + 1), + layers.ConvBN( + in_channels=in_channels[j], + out_channels=out_channels[i], + kernel_size=1, + padding='same', + bias_attr=False)) + self.residual_func_list.append(residual_func) + elif j < i: + pre_num_filters = in_channels[j] + for k in range(i - j): + if k == i - j - 1: + residual_func = self.add_sublayer( + "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1), + layers.ConvBN( + in_channels=pre_num_filters, + out_channels=out_channels[i], + kernel_size=3, + stride=2, + padding='same', + bias_attr=False)) + pre_num_filters = out_channels[i] + else: + residual_func = self.add_sublayer( + "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1), + layers.ConvBNReLU( + in_channels=pre_num_filters, + out_channels=out_channels[j], + kernel_size=3, + stride=2, + padding='same', + bias_attr=False)) + pre_num_filters = out_channels[j] + self.residual_func_list.append(residual_func) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outs = [] + residual_func_idx = 0 + for i in range(self._actual_ch): + residual = x[i] + residual_shape = paddle.shape(residual)[-2:] + for j in range(len(self._in_channels)): + if j > i: + y = self.residual_func_list[residual_func_idx](x[j]) + residual_func_idx += 1 + + y = F.interpolate(y, residual_shape, mode='bilinear', align_corners=self.align_corners) + residual = residual + y + elif j < i: + y = x[j] + for k in range(i - j): + y = self.residual_func_list[residual_func_idx](y) + residual_func_idx += 1 + + residual = residual + y + + residual = F.relu(residual) + outs.append(residual) + + return outs diff --git a/modules/image/semantic_segmentation/fcn_hrnetw48_voc/layers.py b/modules/image/semantic_segmentation/fcn_hrnetw48_voc/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..aca5e911382235cb96d385091f1db261060bad7d --- /dev/null +++ b/modules/image/semantic_segmentation/fcn_hrnetw48_voc/layers.py @@ -0,0 +1,298 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from typing import Tuple + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.layer import activation +from paddle.nn import Conv2D, AvgPool2D + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu': + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class ConvBNLayer(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + stride: int = 1, + dilation: int = 1, + groups: int = 1, + is_vd_mode: bool = False, + act: str = None, + name: str = None): + super(ConvBNLayer, self).__init__() + + self.is_vd_mode = is_vd_mode + self._pool2d_avg = AvgPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True) + self._conv = Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2 if dilation == 1 else 0, + dilation=dilation, + groups=groups, + bias_attr=False) + + self._batch_norm = SyncBatchNorm(out_channels) + self._act_op = Activation(act=act) + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + if self.is_vd_mode: + inputs = self._pool2d_avg(inputs) + y = self._conv(inputs) + y = self._batch_norm(y) + y = self._act_op(y) + + return y + + +class BottleneckBlock(nn.Layer): + """Residual bottleneck block""" + + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + shortcut: bool = True, + if_first: bool = False, + dilation: int = 1, + name: str = None): + super(BottleneckBlock, self).__init__() + + self.conv0 = ConvBNLayer( + in_channels=in_channels, out_channels=out_channels, kernel_size=1, act='relu', name=name + "_branch2a") + + self.dilation = dilation + + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + act='relu', + dilation=dilation, + name=name + "_branch2b") + self.conv2 = ConvBNLayer( + in_channels=out_channels, out_channels=out_channels * 4, kernel_size=1, act=None, name=name + "_branch2c") + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels * 4, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + name=name + "_branch1") + + self.shortcut = shortcut + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + if self.dilation > 1: + padding = self.dilation + y = F.pad(y, [padding, padding, padding, padding]) + + conv1 = self.conv1(y) + conv2 = self.conv2(conv1) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + + y = paddle.add(x=short, y=conv2) + y = F.relu(y) + return y + + +class SeparableConvBNReLU(nn.Layer): + """Depthwise Separable Convolution.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict): + super(SeparableConvBNReLU, self).__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict): + super(ConvBN, self).__init__() + self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict): + super(ConvBNReLU, self).__init__() + + self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + Returns: + A callable object of Activation. + Raises: + KeyError: When parameter `act` is not in the optional range. + Examples: + from paddleseg.models.common.activation import Activation + relu = Activation("relu") + print(relu) + # + sigmoid = Activation("sigmoid") + print(sigmoid) + # + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = nn.layer.activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("nn.layer.activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + if self._act is not None: + return self.act_func(x) + else: + return x + + +class ASPPModule(nn.Layer): + """ + Atrous Spatial Pyramid Pooling. + + Args: + aspp_ratios (tuple): The dilation rate using in ASSP module. + in_channels (int): The number of input channels. + out_channels (int): The number of output channels. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False. + image_pooling (bool, optional): If augmented with image-level features. Default: False + """ + + def __init__(self, + aspp_ratios: Tuple[int], + in_channels: int, + out_channels: int, + align_corners: bool, + use_sep_conv: bool = False, + image_pooling: bool = False): + super().__init__() + + self.align_corners = align_corners + self.aspp_blocks = nn.LayerList() + + for ratio in aspp_ratios: + if use_sep_conv and ratio > 1: + conv_func = SeparableConvBNReLU + else: + conv_func = ConvBNReLU + + block = conv_func( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1 if ratio == 1 else 3, + dilation=ratio, + padding=0 if ratio == 1 else ratio) + self.aspp_blocks.append(block) + + out_size = len(self.aspp_blocks) + + if image_pooling: + self.global_avg_pool = nn.Sequential( + nn.AdaptiveAvgPool2D(output_size=(1, 1)), + ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False)) + out_size += 1 + self.image_pooling = image_pooling + + self.conv_bn_relu = ConvBNReLU(in_channels=out_channels * out_size, out_channels=out_channels, kernel_size=1) + + self.dropout = nn.Dropout(p=0.1) # drop rate + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outputs = [] + for block in self.aspp_blocks: + y = block(x) + y = F.interpolate(y, x.shape[2:], mode='bilinear', align_corners=self.align_corners) + outputs.append(y) + + if self.image_pooling: + img_avg = self.global_avg_pool(x) + img_avg = F.interpolate(img_avg, x.shape[2:], mode='bilinear', align_corners=self.align_corners) + outputs.append(img_avg) + + x = paddle.concat(outputs, axis=1) + x = self.conv_bn_relu(x) + x = self.dropout(x) + + return x diff --git a/modules/image/semantic_segmentation/fcn_hrnetw48_voc/module.py b/modules/image/semantic_segmentation/fcn_hrnetw48_voc/module.py new file mode 100644 index 0000000000000000000000000000000000000000..b0a77b381cc224f0cb2f9f598d787a0a141c3d01 --- /dev/null +++ b/modules/image/semantic_segmentation/fcn_hrnetw48_voc/module.py @@ -0,0 +1,133 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from typing import Union, List, Tuple + +import paddle +from paddle import nn +import paddle.nn.functional as F +import numpy as np +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +from fcn_hrnetw48_voc.hrnet import HRNet_W48 +import fcn_hrnetw48_voc.layers as layers + + +@moduleinfo( + name="fcn_hrnetw48_voc", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="Fcn_hrnetw48 is a segmentation model.", + version="1.0.0", + meta=ImageSegmentationModule) +class FCN(nn.Layer): + """ + A simple implementation for FCN based on PaddlePaddle. + + The original article refers to + Evan Shelhamer, et, al. "Fully Convolutional Networks for Semantic Segmentation" + (https://arxiv.org/abs/1411.4038). + + Args: + num_classes (int): The unique number of target classes. + backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone. + Default: (-1, ). + channels (int, optional): The channels between conv layer and the last layer of FCNHead. + If None, it will be the number of channels of input features. Default: None. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None + """ + + def __init__(self, + num_classes: int = 21, + backbone_indices: Tuple[int] = (-1, ), + channels: int = None, + align_corners: bool = False, + pretrained: str = None): + super(FCN, self).__init__() + + self.backbone = HRNet_W48() + backbone_channels = [self.backbone.feat_channels[i] for i in backbone_indices] + + self.head = FCNHead(num_classes, backbone_indices, backbone_channels, channels) + + self.align_corners = align_corners + self.transforms = T.Compose([T.Normalize()]) + + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]: + return self.transforms(img) + + def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]: + feat_list = self.backbone(x) + logit_list = self.head(feat_list) + return [ + F.interpolate(logit, paddle.shape(x)[2:], mode='bilinear', align_corners=self.align_corners) + for logit in logit_list + ] + + +class FCNHead(nn.Layer): + """ + A simple implementation for FCNHead based on PaddlePaddle + + Args: + num_classes (int): The unique number of target classes. + backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone. + Default: (-1, ). + backbone_channels (tuple): The values of backbone channels. + Default: (270, ). + channels (int, optional): The channels between conv layer and the last layer of FCNHead. + If None, it will be the number of channels of input features. Default: None. + pretrained (str, optional): The path of pretrained model. Default: None + """ + + def __init__(self, + num_classes: int, + backbone_indices: Tuple[int] = (-1, ), + backbone_channels: Tuple[int] = (270, ), + channels: int = None): + super(FCNHead, self).__init__() + + self.num_classes = num_classes + self.backbone_indices = backbone_indices + if channels is None: + channels = backbone_channels[0] + + self.conv_1 = layers.ConvBNReLU( + in_channels=backbone_channels[0], out_channels=channels, kernel_size=1, padding='same', stride=1) + self.cls = nn.Conv2D(in_channels=channels, out_channels=self.num_classes, kernel_size=1, stride=1, padding=0) + + def forward(self, feat_list: nn.Layer) -> List[paddle.Tensor]: + logit_list = [] + x = feat_list[self.backbone_indices[0]] + x = self.conv_1(x) + logit = self.cls(x) + logit_list.append(logit) + return logit_list diff --git a/modules/image/semantic_segmentation/hardnet_cityscapes/README.md b/modules/image/semantic_segmentation/hardnet_cityscapes/README.md new file mode 100644 index 0000000000000000000000000000000000000000..75a44dd551187029409ae788ad736fbb713f0e84 --- /dev/null +++ b/modules/image/semantic_segmentation/hardnet_cityscapes/README.md @@ -0,0 +1,173 @@ +# PaddleHub 图像分割 + +## 模型预测 + +若想使用我们提供的预训练模型进行预测,可使用如下脚本: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='hardnet_cityscapes') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + + +## 如何开始Fine-tune + +在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用hardnet_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。 + +## 代码步骤 + +使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。 + +### Step1: 定义数据预处理方式 +```python +from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + +transform = Compose([Resize(target_size=(512, 512)), Normalize()]) +``` + +`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + +### Step2: 下载数据集并使用 +```python +from paddlehub.datasets import OpticDiscSeg + +train_reader = OpticDiscSeg(transform, mode='train') + +``` +* `transform`: 数据预处理方式。 +* `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + +数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + +### Step3: 加载预训练模型 + +```python +model = hub.Module(name='hardnet_cityscapes', num_classes=2, pretrained=None) +``` +* `name`: 选择预训练模型的名字。 +* `num_classes`: 分割模型的类别数目。 +* `pretrained`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + +### Step4: 选择优化策略和运行配置 + +```python +scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) +optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) +trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True) +``` + +#### 优化策略 + +Paddle2.0提供了多种优化器选择,如`SGD`, `Adam`, `Adamax`等,其中`Adam`: + +* `learning_rate`: 全局学习率。 +* `parameters`: 待优化模型参数。 + +#### 运行配置 +`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数: + +* `model`: 被优化模型; +* `optimizer`: 优化器选择; +* `use_gpu`: 是否使用gpu,默认为False; +* `use_vdl`: 是否使用vdl可视化训练过程; +* `checkpoint_dir`: 保存模型参数的地址; +* `compare_metrics`: 保存最优模型的衡量指标; + +`trainer.train` 主要控制具体的训练过程,包含以下可控制的参数: + +* `train_dataset`: 训练时所用的数据集; +* `epochs`: 训练轮数; +* `batch_size`: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size; +* `num_workers`: works的数量,默认为0; +* `eval_dataset`: 验证集; +* `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。 +* `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。 + +## 模型预测 + +当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。 + +我们使用该模型来进行预测。predict.py脚本如下: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='hardnet_cityscapes', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + +参数配置正确后,请执行脚本`python predict.py`。 +**Args** +* `images`:原始图像路径或BGR格式图片; +* `visualization`: 是否可视化,默认为True; +* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + +**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 服务部署 + +PaddleHub Serving可以部署一个在线图像分割服务。 + +### Step1: 启动PaddleHub Serving + +运行启动命令: + +```shell +$ hub serving start -m hardnet_cityscapes +``` + +这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + +**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +### Step2: 发送预测请求 + +配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + +```python +import requests +import json +import cv2 +import base64 + +import numpy as np + + +def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + +# 发送HTTP请求 +org_im = cv2.imread('/PATH/TO/IMAGE') +data = {'images':[cv2_to_base64(org_im)]} +headers = {"Content-type": "application/json"} +url = "http://127.0.0.1:8866/predict/hardnet_cityscapes" +r = requests.post(url=url, headers=headers, data=json.dumps(data)) +mask = base64_to_cv2(r.json()["results"][0]) +``` + +### 查看代码 + +https://github.com/PaddlePaddle/PaddleSeg + +### 依赖 + +paddlepaddle >= 2.0.0 + +paddlehub >= 2.0.0 diff --git a/modules/image/semantic_segmentation/hardnet_cityscapes/layers.py b/modules/image/semantic_segmentation/hardnet_cityscapes/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..cbcb7ad830fa82a87e1fbd86b1e59a63cc4ef579 --- /dev/null +++ b/modules/image/semantic_segmentation/hardnet_cityscapes/layers.py @@ -0,0 +1,185 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu' or os.environ.get('PADDLESEG_EXPORT_STAGE'): + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs): + super().__init__() + + self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs): + super().__init__() + self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvReLUPool(nn.Layer): + """Basic conv bn pool layer.""" + + def __init__(self, in_channels: int, out_channels: int): + super().__init__() + self.conv = nn.Conv2D(in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv(x) + x = F.relu(x) + x = F.pool2d(x, pool_size=2, pool_type="max", pool_stride=2) + return x + + +class SeparableConvBNReLU(nn.Layer): + """Basic separable conv bn relu layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs): + super().__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class DepthwiseConvBN(nn.Layer): + """Basic depthwise conv bn relu layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs): + super().__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + return x + + +class AuxLayer(nn.Layer): + """ + The auxiliary layer implementation for auxiliary loss. + + Args: + in_channels (int): The number of input channels. + inter_channels (int): The intermediate channels. + out_channels (int): The number of output channels, and usually it is num_classes. + dropout_prob (float, optional): The drop rate. Default: 0.1. + """ + + def __init__(self, in_channels: int, inter_channels: int, out_channels: int, dropout_prob: float = 0.1): + super().__init__() + + self.conv_bn_relu = ConvBNReLU(in_channels=in_channels, out_channels=inter_channels, kernel_size=3, padding=1) + + self.dropout = nn.Dropout(p=dropout_prob) + + self.conv = nn.Conv2D(in_channels=inter_channels, out_channels=out_channels, kernel_size=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv_bn_relu(x) + x = self.dropout(x) + x = self.conv(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + Returns: + A callable object of Activation. + Raises: + KeyError: When parameter `act` is not in the optional range. + Examples: + from paddleseg.models.common.activation import Activation + relu = Activation("relu") + print(relu) + # + sigmoid = Activation("sigmoid") + print(sigmoid) + # + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = nn.layer.activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("nn.layer.activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + if self._act is not None: + return self.act_func(x) + else: + return x diff --git a/modules/image/semantic_segmentation/hardnet_cityscapes/module.py b/modules/image/semantic_segmentation/hardnet_cityscapes/module.py new file mode 100644 index 0000000000000000000000000000000000000000..3923bff5ae20dfd69433d46dcedfd6851d5f40ee --- /dev/null +++ b/modules/image/semantic_segmentation/hardnet_cityscapes/module.py @@ -0,0 +1,291 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import os +from typing import Union, Tuple, List + +import numpy as np +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +import hardnet_cityscapes.layers as layers + + +@moduleinfo( + name="hardnet_cityscapes", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="Hardnet is a segmentation model trained by PascalVoc.", + version="1.0.0", + meta=ImageSegmentationModule) +class HarDNet(nn.Layer): + """ + [Real Time] The FC-HardDNet 70 implementation based on PaddlePaddle. + The original article refers to + Chao, Ping, et al. "HarDNet: A Low Memory Traffic Network" + (https://arxiv.org/pdf/1909.00948.pdf) + + Args: + num_classes (int): The unique number of target classes. + stem_channels (tuple|list, optional): The number of channels before the encoder. Default: (16, 24, 32, 48). + ch_list (tuple|list, optional): The number of channels at each block in the encoder. Default: (64, 96, 160, 224, 320). + grmul (float, optional): The channel multiplying factor in HarDBlock, which is m in the paper. Default: 1.7. + gr (tuple|list, optional): The growth rate in each HarDBlock, which is k in the paper. Default: (10, 16, 18, 24, 32). + n_layers (tuple|list, optional): The number of layers in each HarDBlock. Default: (4, 4, 8, 8, 8). + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None. + """ + + def __init__(self, + num_classes: int = 19, + stem_channels: Tuple[int] = (16, 24, 32, 48), + ch_list: Tuple[int] = (64, 96, 160, 224, 320), + grmul: float = 1.7, + gr: Tuple[int] = (10, 16, 18, 24, 32), + n_layers: Tuple[int] = (4, 4, 8, 8, 8), + align_corners: bool = False, + pretrained: str = None): + + super(HarDNet, self).__init__() + self.align_corners = align_corners + self.pretrained = pretrained + encoder_blks_num = len(n_layers) + decoder_blks_num = encoder_blks_num - 1 + encoder_in_channels = stem_channels[3] + + self.stem = nn.Sequential( + layers.ConvBNReLU(3, stem_channels[0], kernel_size=3, bias_attr=False), + layers.ConvBNReLU(stem_channels[0], stem_channels[1], kernel_size=3, bias_attr=False), + layers.ConvBNReLU(stem_channels[1], stem_channels[2], kernel_size=3, stride=2, bias_attr=False), + layers.ConvBNReLU(stem_channels[2], stem_channels[3], kernel_size=3, bias_attr=False)) + + self.encoder = Encoder(encoder_blks_num, encoder_in_channels, ch_list, gr, grmul, n_layers) + + skip_connection_channels = self.encoder.get_skip_channels() + decoder_in_channels = self.encoder.get_out_channels() + + self.decoder = Decoder(decoder_blks_num, decoder_in_channels, skip_connection_channels, gr, grmul, n_layers, + align_corners) + + self.cls_head = nn.Conv2D(in_channels=self.decoder.get_out_channels(), out_channels=num_classes, kernel_size=1) + + self.transforms = T.Compose([T.Normalize()]) + + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]: + return self.transforms(img) + + def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]: + input_shape = paddle.shape(x)[2:] + x = self.stem(x) + x, skip_connections = self.encoder(x) + x = self.decoder(x, skip_connections) + logit = self.cls_head(x) + logit = F.interpolate(logit, size=input_shape, mode="bilinear", align_corners=self.align_corners) + return [logit] + + +class Encoder(nn.Layer): + """The Encoder implementation of FC-HardDNet 70. + + Args: + n_blocks (int): The number of blocks in the Encoder module. + in_channels (int): The number of input channels. + ch_list (tuple|list): The number of channels at each block in the encoder. + grmul (float): The channel multiplying factor in HarDBlock, which is m in the paper. + gr (tuple|list): The growth rate in each HarDBlock, which is k in the paper. + n_layers (tuple|list): The number of layers in each HarDBlock. + """ + + def __init__(self, n_blocks: int, in_channels: int, ch_list: List[int], gr: List[int], grmul: float, + n_layers: List[int]): + super().__init__() + self.skip_connection_channels = [] + self.shortcut_layers = [] + self.blks = nn.LayerList() + ch = in_channels + for i in range(n_blocks): + blk = HarDBlock(ch, gr[i], grmul, n_layers[i]) + ch = blk.get_out_ch() + self.skip_connection_channels.append(ch) + self.blks.append(blk) + if i < n_blocks - 1: + self.shortcut_layers.append(len(self.blks) - 1) + self.blks.append(layers.ConvBNReLU(ch, ch_list[i], kernel_size=1, bias_attr=False)) + + ch = ch_list[i] + if i < n_blocks - 1: + self.blks.append(nn.AvgPool2D(kernel_size=2, stride=2)) + self.out_channels = ch + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + skip_connections = [] + for i in range(len(self.blks)): + x = self.blks[i](x) + if i in self.shortcut_layers: + skip_connections.append(x) + return x, skip_connections + + def get_skip_channels(self): + return self.skip_connection_channels + + def get_out_channels(self): + return self.out_channels + + +class Decoder(nn.Layer): + """The Decoder implementation of FC-HardDNet 70. + + Args: + n_blocks (int): The number of blocks in the Encoder module. + in_channels (int): The number of input channels. + skip_connection_channels (tuple|list): The channels of shortcut layers in encoder. + grmul (float): The channel multiplying factor in HarDBlock, which is m in the paper. + gr (tuple|list): The growth rate in each HarDBlock, which is k in the paper. + n_layers (tuple|list): The number of layers in each HarDBlock. + """ + + def __init__(self, + n_blocks: int, + in_channels: int, + skip_connection_channels: List[paddle.Tensor], + gr: List[int], + grmul: float, + n_layers: List[int], + align_corners: bool = False): + super().__init__() + prev_block_channels = in_channels + self.n_blocks = n_blocks + self.dense_blocks_up = nn.LayerList() + self.conv1x1_up = nn.LayerList() + + for i in range(n_blocks - 1, -1, -1): + cur_channels_count = prev_block_channels + skip_connection_channels[i] + conv1x1 = layers.ConvBNReLU(cur_channels_count, cur_channels_count // 2, kernel_size=1, bias_attr=False) + blk = HarDBlock(base_channels=cur_channels_count // 2, growth_rate=gr[i], grmul=grmul, n_layers=n_layers[i]) + + self.conv1x1_up.append(conv1x1) + self.dense_blocks_up.append(blk) + + prev_block_channels = blk.get_out_ch() + + self.out_channels = prev_block_channels + self.align_corners = align_corners + + def forward(self, x: paddle.Tensor, skip_connections: List[paddle.Tensor]) -> paddle.Tensor: + for i in range(self.n_blocks): + skip = skip_connections.pop() + x = F.interpolate(x, size=paddle.shape(skip)[2:], mode="bilinear", align_corners=self.align_corners) + x = paddle.concat([x, skip], axis=1) + x = self.conv1x1_up[i](x) + x = self.dense_blocks_up[i](x) + return x + + def get_out_channels(self): + return self.out_channels + + +class HarDBlock(nn.Layer): + """The HarDBlock implementation + + Args: + base_channels (int): The base channels. + growth_rate (tuple|list): The growth rate. + grmul (float): The channel multiplying factor. + n_layers (tuple|list): The number of layers. + keepBase (bool, optional): A bool value indicates whether concatenating the first layer. Default: False. + """ + + def __init__(self, + base_channels: int, + growth_rate: List[int], + grmul: float, + n_layers: List[int], + keepBase: bool = False): + super().__init__() + self.keepBase = keepBase + self.links = [] + layers_ = [] + self.out_channels = 0 + for i in range(n_layers): + outch, inch, link = get_link(i + 1, base_channels, growth_rate, grmul) + + self.links.append(link) + layers_.append(layers.ConvBNReLU(inch, outch, kernel_size=3, bias_attr=False)) + if (i % 2 == 0) or (i == n_layers - 1): + self.out_channels += outch + self.layers = nn.LayerList(layers_) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + layers_ = [x] + for layer in range(len(self.layers)): + link = self.links[layer] + tin = [] + for i in link: + tin.append(layers_[i]) + if len(tin) > 1: + x = paddle.concat(tin, axis=1) + else: + x = tin[0] + out = self.layers[layer](x) + layers_.append(out) + + t = len(layers_) + out_ = [] + for i in range(t): + if (i == 0 and self.keepBase) or \ + (i == t - 1) or (i % 2 == 1): + out_.append(layers_[i]) + out = paddle.concat(out_, 1) + + return out + + def get_out_ch(self): + return self.out_channels + + +def get_link(layer: int, base_ch: int, growth_rate: List[int], grmul: float) -> Tuple: + if layer == 0: + return base_ch, 0, [] + out_channels = growth_rate + link = [] + for i in range(10): + dv = 2**i + if layer % dv == 0: + k = layer - dv + link.insert(0, k) + if i > 0: + out_channels *= grmul + out_channels = int(int(out_channels + 1) / 2) * 2 + in_channels = 0 + for i in link: + ch, _, _ = get_link(i, base_ch, growth_rate, grmul) + in_channels += ch + return out_channels, in_channels, link diff --git a/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/README.md b/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e5a557d39ea40b17e67c2711db4a38fe212f5a50 --- /dev/null +++ b/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/README.md @@ -0,0 +1,173 @@ +# PaddleHub 图像分割 + +## 模型预测 + +若想使用我们提供的预训练模型进行预测,可使用如下脚本: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='ocrnet_hrnetw18_cityscapes') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + + +## 如何开始Fine-tune + +在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ocrnet_hrnetw18_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。 + +## 代码步骤 + +使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。 + +### Step1: 定义数据预处理方式 +```python +from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + +transform = Compose([Resize(target_size=(512, 512)), Normalize()]) +``` + +`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + +### Step2: 下载数据集并使用 +```python +from paddlehub.datasets import OpticDiscSeg + +train_reader = OpticDiscSeg(transform, mode='train') + +``` +* `transform`: 数据预处理方式。 +* `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + +数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + +### Step3: 加载预训练模型 + +```python +model = hub.Module(name='ocrnet_hrnetw18_cityscapes', num_classes=2, pretrained=None) +``` +* `name`: 选择预训练模型的名字。 +* `num_classes`: 分割模型的类别数目。 +* `pretrained`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + +### Step4: 选择优化策略和运行配置 + +```python +scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) +optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) +trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True) +``` + +#### 优化策略 + +Paddle2.0提供了多种优化器选择,如`SGD`, `Adam`, `Adamax`等,其中`Adam`: + +* `learning_rate`: 全局学习率。 +* `parameters`: 待优化模型参数。 + +#### 运行配置 +`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数: + +* `model`: 被优化模型; +* `optimizer`: 优化器选择; +* `use_gpu`: 是否使用gpu,默认为False; +* `use_vdl`: 是否使用vdl可视化训练过程; +* `checkpoint_dir`: 保存模型参数的地址; +* `compare_metrics`: 保存最优模型的衡量指标; + +`trainer.train` 主要控制具体的训练过程,包含以下可控制的参数: + +* `train_dataset`: 训练时所用的数据集; +* `epochs`: 训练轮数; +* `batch_size`: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size; +* `num_workers`: works的数量,默认为0; +* `eval_dataset`: 验证集; +* `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。 +* `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。 + +## 模型预测 + +当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。 + +我们使用该模型来进行预测。predict.py脚本如下: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='ocrnet_hrnetw18_cityscapes', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + +参数配置正确后,请执行脚本`python predict.py`。 +**Args** +* `images`:原始图像路径或BGR格式图片; +* `visualization`: 是否可视化,默认为True; +* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + +**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 服务部署 + +PaddleHub Serving可以部署一个在线图像分割服务。 + +### Step1: 启动PaddleHub Serving + +运行启动命令: + +```shell +$ hub serving start -m ocrnet_hrnetw18_cityscapes +``` + +这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + +**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +### Step2: 发送预测请求 + +配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + +```python +import requests +import json +import cv2 +import base64 + +import numpy as np + + +def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + +# 发送HTTP请求 +org_im = cv2.imread('/PATH/TO/IMAGE') +data = {'images':[cv2_to_base64(org_im)]} +headers = {"Content-type": "application/json"} +url = "http://127.0.0.1:8866/predict/ocrnet_hrnetw18_cityscapes" +r = requests.post(url=url, headers=headers, data=json.dumps(data)) +mask = base64_to_cv2(r.json()["results"][0]) +``` + +### 查看代码 + +https://github.com/PaddlePaddle/PaddleSeg + +### 依赖 + +paddlepaddle >= 2.0.0 + +paddlehub >= 2.0.0 diff --git a/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/hrnet.py b/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/hrnet.py new file mode 100644 index 0000000000000000000000000000000000000000..82f396340cf4db9269a6f140ccdd3d60364035e4 --- /dev/null +++ b/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/hrnet.py @@ -0,0 +1,531 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +from typing import Tuple + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +import ocrnet_hrnetw18_cityscapes.layers as L + + +class HRNet_W18(nn.Layer): + """ + The HRNet implementation based on PaddlePaddle. + + The original article refers to + Jingdong Wang, et, al. "HRNet:Deep High-Resolution Representation Learning for Visual Recognition" + (https://arxiv.org/pdf/1908.07919.pdf). + + Args: + stage1_num_modules (int, optional): Number of modules for stage1. Default 1. + stage1_num_blocks (list, optional): Number of blocks per module for stage1. Default (4). + stage1_num_channels (list, optional): Number of channels per branch for stage1. Default (64). + stage2_num_modules (int, optional): Number of modules for stage2. Default 1. + stage2_num_blocks (list, optional): Number of blocks per module for stage2. Default (4, 4). + stage2_num_channels (list, optional): Number of channels per branch for stage2. Default (18, 36). + stage3_num_modules (int, optional): Number of modules for stage3. Default 4. + stage3_num_blocks (list, optional): Number of blocks per module for stage3. Default (4, 4, 4). + stage3_num_channels (list, optional): Number of channels per branch for stage3. Default (18, 36, 72). + stage4_num_modules (int, optional): Number of modules for stage4. Default 3. + stage4_num_blocks (list, optional): Number of blocks per module for stage4. Default (4, 4, 4, 4). + stage4_num_channels (list, optional): Number of channels per branch for stage4. Default (18, 36, 72. 144). + has_se (bool, optional): Whether to use Squeeze-and-Excitation module. Default False. + align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even, + e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + """ + + def __init__(self, + stage1_num_modules: int = 1, + stage1_num_blocks: Tuple[int] = (4, ), + stage1_num_channels: Tuple[int] = (64, ), + stage2_num_modules: int = 1, + stage2_num_blocks: Tuple[int] = (4, 4), + stage2_num_channels: Tuple[int] = (18, 36), + stage3_num_modules: int = 4, + stage3_num_blocks: Tuple[int] = (4, 4, 4), + stage3_num_channels: Tuple[int] = (18, 36, 72), + stage4_num_modules: int = 3, + stage4_num_blocks: Tuple[int] = (4, 4, 4, 4), + stage4_num_channels: Tuple[int] = (18, 36, 72, 144), + has_se: bool = False, + align_corners: bool = False): + super(HRNet_W18, self).__init__() + + self.stage1_num_modules = stage1_num_modules + self.stage1_num_blocks = stage1_num_blocks + self.stage1_num_channels = stage1_num_channels + self.stage2_num_modules = stage2_num_modules + self.stage2_num_blocks = stage2_num_blocks + self.stage2_num_channels = stage2_num_channels + self.stage3_num_modules = stage3_num_modules + self.stage3_num_blocks = stage3_num_blocks + self.stage3_num_channels = stage3_num_channels + self.stage4_num_modules = stage4_num_modules + self.stage4_num_blocks = stage4_num_blocks + self.stage4_num_channels = stage4_num_channels + self.has_se = has_se + self.align_corners = align_corners + self.feat_channels = [sum(stage4_num_channels)] + + self.conv_layer1_1 = L.ConvBNReLU( + in_channels=3, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False) + + self.conv_layer1_2 = L.ConvBNReLU( + in_channels=64, out_channels=64, kernel_size=3, stride=2, padding='same', bias_attr=False) + + self.la1 = Layer1( + num_channels=64, + num_blocks=self.stage1_num_blocks[0], + num_filters=self.stage1_num_channels[0], + has_se=has_se, + name="layer2") + + self.tr1 = TransitionLayer( + in_channels=[self.stage1_num_channels[0] * 4], out_channels=self.stage2_num_channels, name="tr1") + + self.st2 = Stage( + num_channels=self.stage2_num_channels, + num_modules=self.stage2_num_modules, + num_blocks=self.stage2_num_blocks, + num_filters=self.stage2_num_channels, + has_se=self.has_se, + name="st2", + align_corners=align_corners) + + self.tr2 = TransitionLayer( + in_channels=self.stage2_num_channels, out_channels=self.stage3_num_channels, name="tr2") + self.st3 = Stage( + num_channels=self.stage3_num_channels, + num_modules=self.stage3_num_modules, + num_blocks=self.stage3_num_blocks, + num_filters=self.stage3_num_channels, + has_se=self.has_se, + name="st3", + align_corners=align_corners) + + self.tr3 = TransitionLayer( + in_channels=self.stage3_num_channels, out_channels=self.stage4_num_channels, name="tr3") + self.st4 = Stage( + num_channels=self.stage4_num_channels, + num_modules=self.stage4_num_modules, + num_blocks=self.stage4_num_blocks, + num_filters=self.stage4_num_channels, + has_se=self.has_se, + name="st4", + align_corners=align_corners) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + conv1 = self.conv_layer1_1(x) + conv2 = self.conv_layer1_2(conv1) + + la1 = self.la1(conv2) + + tr1 = self.tr1([la1]) + st2 = self.st2(tr1) + + tr2 = self.tr2(st2) + st3 = self.st3(tr2) + + tr3 = self.tr3(st3) + st4 = self.st4(tr3) + + x0_h, x0_w = st4[0].shape[2:] + x1 = F.interpolate(st4[1], (x0_h, x0_w), mode='bilinear', align_corners=self.align_corners) + x2 = F.interpolate(st4[2], (x0_h, x0_w), mode='bilinear', align_corners=self.align_corners) + x3 = F.interpolate(st4[3], (x0_h, x0_w), mode='bilinear', align_corners=self.align_corners) + x = paddle.concat([st4[0], x1, x2, x3], axis=1) + + return [x] + + +class Layer1(nn.Layer): + def __init__(self, num_channels: int, num_filters: int, num_blocks: int, has_se: bool = False, name: str = None): + super(Layer1, self).__init__() + + self.bottleneck_block_list = [] + + for i in range(num_blocks): + bottleneck_block = self.add_sublayer( + "bb_{}_{}".format(name, i + 1), + BottleneckBlock( + num_channels=num_channels if i == 0 else num_filters * 4, + num_filters=num_filters, + has_se=has_se, + stride=1, + downsample=True if i == 0 else False, + name=name + '_' + str(i + 1))) + self.bottleneck_block_list.append(bottleneck_block) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + conv = x + for block_func in self.bottleneck_block_list: + conv = block_func(conv) + return conv + + +class TransitionLayer(nn.Layer): + def __init__(self, in_channels: int, out_channels: int, name=None): + super(TransitionLayer, self).__init__() + + num_in = len(in_channels) + num_out = len(out_channels) + self.conv_bn_func_list = [] + for i in range(num_out): + residual = None + if i < num_in: + if in_channels[i] != out_channels[i]: + residual = self.add_sublayer( + "transition_{}_layer_{}".format(name, i + 1), + L.ConvBNReLU( + in_channels=in_channels[i], + out_channels=out_channels[i], + kernel_size=3, + padding='same', + bias_attr=False)) + else: + residual = self.add_sublayer( + "transition_{}_layer_{}".format(name, i + 1), + L.ConvBNReLU( + in_channels=in_channels[-1], + out_channels=out_channels[i], + kernel_size=3, + stride=2, + padding='same', + bias_attr=False)) + self.conv_bn_func_list.append(residual) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outs = [] + for idx, conv_bn_func in enumerate(self.conv_bn_func_list): + if conv_bn_func is None: + outs.append(x[idx]) + else: + if idx < len(x): + outs.append(conv_bn_func(x[idx])) + else: + outs.append(conv_bn_func(x[-1])) + return outs + + +class Branches(nn.Layer): + def __init__(self, num_blocks: int, in_channels: int, out_channels: int, has_se: bool = False, name: str = None): + super(Branches, self).__init__() + + self.basic_block_list = [] + + for i in range(len(out_channels)): + self.basic_block_list.append([]) + for j in range(num_blocks[i]): + in_ch = in_channels[i] if j == 0 else out_channels[i] + basic_block_func = self.add_sublayer( + "bb_{}_branch_layer_{}_{}".format(name, i + 1, j + 1), + BasicBlock( + num_channels=in_ch, + num_filters=out_channels[i], + has_se=has_se, + name=name + '_branch_layer_' + str(i + 1) + '_' + str(j + 1))) + self.basic_block_list[i].append(basic_block_func) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outs = [] + for idx, input in enumerate(x): + conv = input + for basic_block_func in self.basic_block_list[idx]: + conv = basic_block_func(conv) + outs.append(conv) + return outs + + +class BottleneckBlock(nn.Layer): + def __init__(self, + num_channels: int, + num_filters: int, + has_se: bool, + stride: int = 1, + downsample: bool = False, + name: str = None): + super(BottleneckBlock, self).__init__() + + self.has_se = has_se + self.downsample = downsample + + self.conv1 = L.ConvBNReLU( + in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False) + + self.conv2 = L.ConvBNReLU( + in_channels=num_filters, + out_channels=num_filters, + kernel_size=3, + stride=stride, + padding='same', + bias_attr=False) + + self.conv3 = L.ConvBN( + in_channels=num_filters, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False) + + if self.downsample: + self.conv_down = L.ConvBN( + in_channels=num_channels, out_channels=num_filters * 4, kernel_size=1, padding='same', bias_attr=False) + + if self.has_se: + self.se = SELayer( + num_channels=num_filters * 4, num_filters=num_filters * 4, reduction_ratio=16, name=name + '_fc') + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + residual = x + conv1 = self.conv1(x) + conv2 = self.conv2(conv1) + conv3 = self.conv3(conv2) + + if self.downsample: + residual = self.conv_down(x) + + if self.has_se: + conv3 = self.se(conv3) + + y = conv3 + residual + y = F.relu(y) + return y + + +class BasicBlock(nn.Layer): + def __init__(self, + num_channels: int, + num_filters: int, + stride: int = 1, + has_se: bool = False, + downsample: bool = False, + name: str = None): + super(BasicBlock, self).__init__() + + self.has_se = has_se + self.downsample = downsample + + self.conv1 = L.ConvBNReLU( + in_channels=num_channels, + out_channels=num_filters, + kernel_size=3, + stride=stride, + padding='same', + bias_attr=False) + self.conv2 = L.ConvBN( + in_channels=num_filters, out_channels=num_filters, kernel_size=3, padding='same', bias_attr=False) + + if self.downsample: + self.conv_down = L.ConvBNReLU( + in_channels=num_channels, out_channels=num_filters, kernel_size=1, padding='same', bias_attr=False) + + if self.has_se: + self.se = SELayer(num_channels=num_filters, num_filters=num_filters, reduction_ratio=16, name=name + '_fc') + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + residual = x + conv1 = self.conv1(x) + conv2 = self.conv2(conv1) + + if self.downsample: + residual = self.conv_down(x) + + if self.has_se: + conv2 = self.se(conv2) + + y = conv2 + residual + y = F.relu(y) + return y + + +class SELayer(nn.Layer): + def __init__(self, num_channels: int, num_filters: int, reduction_ratio: int, name: str = None): + super(SELayer, self).__init__() + + self.pool2d_gap = nn.AdaptiveAvgPool2D(1) + + self._num_channels = num_channels + + med_ch = int(num_channels / reduction_ratio) + stdv = 1.0 / math.sqrt(num_channels * 1.0) + self.squeeze = nn.Linear( + num_channels, med_ch, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv))) + + stdv = 1.0 / math.sqrt(med_ch * 1.0) + self.excitation = nn.Linear( + med_ch, num_filters, weight_attr=paddle.ParamAttr(initializer=nn.initializer.Uniform(-stdv, stdv))) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + pool = self.pool2d_gap(x) + pool = paddle.reshape(pool, shape=[-1, self._num_channels]) + squeeze = self.squeeze(pool) + squeeze = F.relu(squeeze) + excitation = self.excitation(squeeze) + excitation = F.sigmoid(excitation) + excitation = paddle.reshape(excitation, shape=[-1, self._num_channels, 1, 1]) + out = x * excitation + return out + + +class Stage(nn.Layer): + def __init__(self, + num_channels: int, + num_modules: int, + num_blocks: int, + num_filters: int, + has_se: bool = False, + multi_scale_output: bool = True, + name: str = None, + align_corners: bool = False): + super(Stage, self).__init__() + + self._num_modules = num_modules + + self.stage_func_list = [] + for i in range(num_modules): + if i == num_modules - 1 and not multi_scale_output: + stage_func = self.add_sublayer( + "stage_{}_{}".format(name, i + 1), + HighResolutionModule( + num_channels=num_channels, + num_blocks=num_blocks, + num_filters=num_filters, + has_se=has_se, + multi_scale_output=False, + name=name + '_' + str(i + 1), + align_corners=align_corners)) + else: + stage_func = self.add_sublayer( + "stage_{}_{}".format(name, i + 1), + HighResolutionModule( + num_channels=num_channels, + num_blocks=num_blocks, + num_filters=num_filters, + has_se=has_se, + name=name + '_' + str(i + 1), + align_corners=align_corners)) + + self.stage_func_list.append(stage_func) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + out = x + for idx in range(self._num_modules): + out = self.stage_func_list[idx](out) + return out + + +class HighResolutionModule(nn.Layer): + def __init__(self, + num_channels: int, + num_blocks: int, + num_filters: int, + has_se: bool = False, + multi_scale_output: bool = True, + name: str = None, + align_corners: str = False): + super(HighResolutionModule, self).__init__() + + self.branches_func = Branches( + num_blocks=num_blocks, in_channels=num_channels, out_channels=num_filters, has_se=has_se, name=name) + + self.fuse_func = FuseLayers( + in_channels=num_filters, + out_channels=num_filters, + multi_scale_output=multi_scale_output, + name=name, + align_corners=align_corners) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + out = self.branches_func(x) + out = self.fuse_func(out) + return out + + +class FuseLayers(nn.Layer): + def __init__(self, + in_channels: int, + out_channels: int, + multi_scale_output: bool = True, + name: str = None, + align_corners: bool = False): + super(FuseLayers, self).__init__() + + self._actual_ch = len(in_channels) if multi_scale_output else 1 + self._in_channels = in_channels + self.align_corners = align_corners + + self.residual_func_list = [] + for i in range(self._actual_ch): + for j in range(len(in_channels)): + if j > i: + residual_func = self.add_sublayer( + "residual_{}_layer_{}_{}".format(name, i + 1, j + 1), + L.ConvBN( + in_channels=in_channels[j], + out_channels=out_channels[i], + kernel_size=1, + padding='same', + bias_attr=False)) + self.residual_func_list.append(residual_func) + elif j < i: + pre_num_filters = in_channels[j] + for k in range(i - j): + if k == i - j - 1: + residual_func = self.add_sublayer( + "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1), + L.ConvBN( + in_channels=pre_num_filters, + out_channels=out_channels[i], + kernel_size=3, + stride=2, + padding='same', + bias_attr=False)) + pre_num_filters = out_channels[i] + else: + residual_func = self.add_sublayer( + "residual_{}_layer_{}_{}_{}".format(name, i + 1, j + 1, k + 1), + L.ConvBNReLU( + in_channels=pre_num_filters, + out_channels=out_channels[j], + kernel_size=3, + stride=2, + padding='same', + bias_attr=False)) + pre_num_filters = out_channels[j] + self.residual_func_list.append(residual_func) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outs = [] + residual_func_idx = 0 + for i in range(self._actual_ch): + residual = x[i] + residual_shape = residual.shape[-2:] + for j in range(len(self._in_channels)): + if j > i: + y = self.residual_func_list[residual_func_idx](x[j]) + residual_func_idx += 1 + + y = F.interpolate(y, residual_shape, mode='bilinear', align_corners=self.align_corners) + residual = residual + y + elif j < i: + y = x[j] + for k in range(i - j): + y = self.residual_func_list[residual_func_idx](y) + residual_func_idx += 1 + + residual = residual + y + + residual = F.relu(residual) + outs.append(residual) + + return outs diff --git a/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/layers.py b/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..27c5a68a7c725aacca231279aea7ecdd216b20a1 --- /dev/null +++ b/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/layers.py @@ -0,0 +1,297 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from typing import Tuple + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.layer import activation +from paddle.nn import Conv2D, AvgPool2D + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu': + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class ConvBNLayer(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, + in_channels: int, + out_channels: int, + kernel_size: int, + stride: int = 1, + dilation: int = 1, + groups: int = 1, + is_vd_mode: bool = False, + act: str = None, + name: str = None): + super(ConvBNLayer, self).__init__() + + self.is_vd_mode = is_vd_mode + self._pool2d_avg = AvgPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True) + self._conv = Conv2D( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2 if dilation == 1 else 0, + dilation=dilation, + groups=groups, + bias_attr=False) + + self._batch_norm = SyncBatchNorm(out_channels) + self._act_op = Activation(act=act) + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + if self.is_vd_mode: + inputs = self._pool2d_avg(inputs) + y = self._conv(inputs) + y = self._batch_norm(y) + y = self._act_op(y) + + return y + + +class BottleneckBlock(nn.Layer): + """Residual bottleneck block""" + + def __init__(self, + in_channels: int, + out_channels: int, + stride: int, + shortcut: bool = True, + if_first: bool = False, + dilation: int = 1, + name: str = None): + super(BottleneckBlock, self).__init__() + + self.conv0 = ConvBNLayer( + in_channels=in_channels, out_channels=out_channels, kernel_size=1, act='relu', name=name + "_branch2a") + + self.dilation = dilation + + self.conv1 = ConvBNLayer( + in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + stride=stride, + act='relu', + dilation=dilation, + name=name + "_branch2b") + self.conv2 = ConvBNLayer( + in_channels=out_channels, out_channels=out_channels * 4, kernel_size=1, act=None, name=name + "_branch2c") + + if not shortcut: + self.short = ConvBNLayer( + in_channels=in_channels, + out_channels=out_channels * 4, + kernel_size=1, + stride=1, + is_vd_mode=False if if_first or stride == 1 else True, + name=name + "_branch1") + + self.shortcut = shortcut + + def forward(self, inputs: paddle.Tensor) -> paddle.Tensor: + y = self.conv0(inputs) + if self.dilation > 1: + padding = self.dilation + y = F.pad(y, [padding, padding, padding, padding]) + + conv1 = self.conv1(y) + conv2 = self.conv2(conv1) + + if self.shortcut: + short = inputs + else: + short = self.short(inputs) + + y = paddle.add(x=short, y=conv2) + y = F.relu(y) + return y + + +class SeparableConvBNReLU(nn.Layer): + """Depthwise Separable Convolution.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict): + super(SeparableConvBNReLU, self).__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict): + super(ConvBN, self).__init__() + self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict): + super(ConvBNReLU, self).__init__() + + self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + Returns: + A callable object of Activation. + Raises: + KeyError: When parameter `act` is not in the optional range. + Examples: + from paddleseg.models.common.activation import Activation + relu = Activation("relu") + print(relu) + # + sigmoid = Activation("sigmoid") + print(sigmoid) + # + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = nn.layer.activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("nn.layer.activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + if self._act is not None: + return self.act_func(x) + else: + return x + + +class ASPPModule(nn.Layer): + """ + Atrous Spatial Pyramid Pooling. + + Args: + aspp_ratios (tuple): The dilation rate using in ASSP module. + in_channels (int): The number of input channels. + out_channels (int): The number of output channels. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. + use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False. + image_pooling (bool, optional): If augmented with image-level features. Default: False + """ + + def __init__(self, + aspp_ratios: Tuple[int], + in_channels: int, + out_channels: int, + align_corners: bool, + use_sep_conv: bool = False, + image_pooling: bool = False): + super().__init__() + + self.align_corners = align_corners + self.aspp_blocks = nn.LayerList() + + for ratio in aspp_ratios: + if use_sep_conv and ratio > 1: + conv_func = SeparableConvBNReLU + else: + conv_func = ConvBNReLU + + block = conv_func( + in_channels=in_channels, + out_channels=out_channels, + kernel_size=1 if ratio == 1 else 3, + dilation=ratio, + padding=0 if ratio == 1 else ratio) + self.aspp_blocks.append(block) + + out_size = len(self.aspp_blocks) + + if image_pooling: + self.global_avg_pool = nn.Sequential( + nn.AdaptiveAvgPool2D(output_size=(1, 1)), + ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False)) + out_size += 1 + self.image_pooling = image_pooling + + self.conv_bn_relu = ConvBNReLU(in_channels=out_channels * out_size, out_channels=out_channels, kernel_size=1) + + self.dropout = nn.Dropout(p=0.1) # drop rate + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + outputs = [] + for block in self.aspp_blocks: + y = block(x) + y = F.interpolate(y, x.shape[2:], mode='bilinear', align_corners=self.align_corners) + outputs.append(y) + + if self.image_pooling: + img_avg = self.global_avg_pool(x) + img_avg = F.interpolate(img_avg, x.shape[2:], mode='bilinear', align_corners=self.align_corners) + outputs.append(img_avg) + + x = paddle.concat(outputs, axis=1) + x = self.conv_bn_relu(x) + x = self.dropout(x) + + return x diff --git a/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/module.py b/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/module.py new file mode 100644 index 0000000000000000000000000000000000000000..2ebbfbef041133ce3014877bb91035b6a3e40ff7 --- /dev/null +++ b/modules/image/semantic_segmentation/ocrnet_hrnetw18_cityscapes/module.py @@ -0,0 +1,224 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import os +from typing import List + +import paddle +import numpy as np +import paddle.nn as nn +import paddle.nn.functional as F +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +import ocrnet_hrnetw18_cityscapes.layers as L +from ocrnet_hrnetw18_cityscapes.hrnet import HRNet_W18 + + +@moduleinfo( + name="ocrnet_hrnetw18_cityscapes", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="OCRNetHRNetW18 is a segmentation model pretrained by pascal voc.", + version="1.0.0", + meta=ImageSegmentationModule) +class OCRNetHRNetW18(nn.Layer): + """ + The OCRNet implementation based on PaddlePaddle. + The original article refers to + Yuan, Yuhui, et al. "Object-Contextual Representations for Semantic Segmentation" + (https://arxiv.org/pdf/1909.11065.pdf) + Args: + num_classes (int): The unique number of target classes. + backbone_indices (list): A list indicates the indices of output of backbone. + It can be either one or two values, if two values, the first index will be taken as + a deep-supervision feature in auxiliary layer; the second one will be taken as + input of pixel representation. If one value, it is taken by both above. + ocr_mid_channels (int, optional): The number of middle channels in OCRHead. Default: 512. + ocr_key_channels (int, optional): The number of key channels in ObjectAttentionBlock. Default: 256. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None. + """ + + def __init__(self, + num_classes: int = 19, + backbone_indices: List[int] = [0], + ocr_mid_channels: int = 512, + ocr_key_channels: int = 256, + align_corners: bool = False, + pretrained: str = None): + super(OCRNetHRNetW18, self).__init__() + self.backbone = HRNet_W18() + self.backbone_indices = backbone_indices + in_channels = [self.backbone.feat_channels[i] for i in backbone_indices] + self.head = OCRHead( + num_classes=num_classes, + in_channels=in_channels, + ocr_mid_channels=ocr_mid_channels, + ocr_key_channels=ocr_key_channels) + self.align_corners = align_corners + self.transforms = T.Compose([T.Normalize()]) + + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def transform(self, img: np.ndarray) -> np.ndarray: + return self.transforms(img) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + feats = self.backbone(x) + feats = [feats[i] for i in self.backbone_indices] + logit_list = self.head(feats) + logit_list = [ + F.interpolate(logit, x.shape[2:], mode='bilinear', align_corners=self.align_corners) for logit in logit_list + ] + return logit_list + + +class OCRHead(nn.Layer): + """ + The Object contextual representation head. + Args: + num_classes(int): The unique number of target classes. + in_channels(tuple): The number of input channels. + ocr_mid_channels(int, optional): The number of middle channels in OCRHead. Default: 512. + ocr_key_channels(int, optional): The number of key channels in ObjectAttentionBlock. Default: 256. + """ + + def __init__(self, num_classes: int, in_channels: int, ocr_mid_channels: int = 512, ocr_key_channels: int = 256): + super().__init__() + + self.num_classes = num_classes + self.spatial_gather = SpatialGatherBlock() + self.spatial_ocr = SpatialOCRModule(ocr_mid_channels, ocr_key_channels, ocr_mid_channels) + + self.indices = [-2, -1] if len(in_channels) > 1 else [-1, -1] + + self.conv3x3_ocr = L.ConvBNReLU(in_channels[self.indices[1]], ocr_mid_channels, 3, padding=1) + self.cls_head = nn.Conv2D(ocr_mid_channels, self.num_classes, 1) + self.aux_head = nn.Sequential( + L.ConvBNReLU(in_channels[self.indices[0]], in_channels[self.indices[0]], 1), + nn.Conv2D(in_channels[self.indices[0]], self.num_classes, 1)) + + def forward(self, feat_list: List[paddle.Tensor]) -> paddle.Tensor: + feat_shallow, feat_deep = feat_list[self.indices[0]], feat_list[self.indices[1]] + + soft_regions = self.aux_head(feat_shallow) + pixels = self.conv3x3_ocr(feat_deep) + + object_regions = self.spatial_gather(pixels, soft_regions) + ocr = self.spatial_ocr(pixels, object_regions) + + logit = self.cls_head(ocr) + return [logit, soft_regions] + + +class SpatialGatherBlock(nn.Layer): + """Aggregation layer to compute the pixel-region representation.""" + + def forward(self, pixels: paddle.Tensor, regions: paddle.Tensor) -> paddle.Tensor: + n, c, h, w = pixels.shape + _, k, _, _ = regions.shape + + # pixels: from (n, c, h, w) to (n, h*w, c) + pixels = paddle.reshape(pixels, (n, c, h * w)) + pixels = paddle.transpose(pixels, [0, 2, 1]) + + # regions: from (n, k, h, w) to (n, k, h*w) + regions = paddle.reshape(regions, (n, k, h * w)) + regions = F.softmax(regions, axis=2) + + # feats: from (n, k, c) to (n, c, k, 1) + feats = paddle.bmm(regions, pixels) + feats = paddle.transpose(feats, [0, 2, 1]) + feats = paddle.unsqueeze(feats, axis=-1) + + return feats + + +class SpatialOCRModule(nn.Layer): + """Aggregate the global object representation to update the representation for each pixel.""" + + def __init__(self, in_channels: int, key_channels: int, out_channels: int, dropout_rate: float = 0.1): + super().__init__() + + self.attention_block = ObjectAttentionBlock(in_channels, key_channels) + self.conv1x1 = nn.Sequential(L.ConvBNReLU(2 * in_channels, out_channels, 1), nn.Dropout2D(dropout_rate)) + + def forward(self, pixels: paddle.Tensor, regions: paddle.Tensor) -> paddle.Tensor: + context = self.attention_block(pixels, regions) + feats = paddle.concat([context, pixels], axis=1) + feats = self.conv1x1(feats) + + return feats + + +class ObjectAttentionBlock(nn.Layer): + """A self-attention module.""" + + def __init__(self, in_channels: int, key_channels: int): + super().__init__() + + self.in_channels = in_channels + self.key_channels = key_channels + + self.f_pixel = nn.Sequential( + L.ConvBNReLU(in_channels, key_channels, 1), L.ConvBNReLU(key_channels, key_channels, 1)) + + self.f_object = nn.Sequential( + L.ConvBNReLU(in_channels, key_channels, 1), L.ConvBNReLU(key_channels, key_channels, 1)) + + self.f_down = L.ConvBNReLU(in_channels, key_channels, 1) + + self.f_up = L.ConvBNReLU(key_channels, in_channels, 1) + + def forward(self, x: paddle.Tensor, proxy: paddle.Tensor) -> paddle.Tensor: + n, _, h, w = x.shape + + # query : from (n, c1, h1, w1) to (n, h1*w1, key_channels) + query = self.f_pixel(x) + query = paddle.reshape(query, (n, self.key_channels, -1)) + query = paddle.transpose(query, [0, 2, 1]) + + # key : from (n, c2, h2, w2) to (n, key_channels, h2*w2) + key = self.f_object(proxy) + key = paddle.reshape(key, (n, self.key_channels, -1)) + + # value : from (n, c2, h2, w2) to (n, h2*w2, key_channels) + value = self.f_down(proxy) + value = paddle.reshape(value, (n, self.key_channels, -1)) + value = paddle.transpose(value, [0, 2, 1]) + + # sim_map (n, h1*w1, h2*w2) + sim_map = paddle.bmm(query, key) + sim_map = (self.key_channels**-.5) * sim_map + sim_map = F.softmax(sim_map, axis=-1) + + # context from (n, h1*w1, key_channels) to (n , out_channels, h1, w1) + context = paddle.bmm(sim_map, value) + context = paddle.transpose(context, [0, 2, 1]) + context = paddle.reshape(context, (n, self.key_channels, h, w)) + context = self.f_up(context) + + return context diff --git a/modules/image/semantic_segmentation/unet_cityscapes/README.md b/modules/image/semantic_segmentation/unet_cityscapes/README.md new file mode 100644 index 0000000000000000000000000000000000000000..8510ac7fc2d313f4613a5ee70ccf80ba26b4ae30 --- /dev/null +++ b/modules/image/semantic_segmentation/unet_cityscapes/README.md @@ -0,0 +1,174 @@ +# PaddleHub 图像分割 + +## 模型预测 + +若想使用我们提供的预训练模型进行预测,可使用如下脚本: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='unet_cityscapes') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + +## 如何开始Fine-tune + +在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用unet_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。 + +## 代码步骤 + +使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。 + +### Step1: 定义数据预处理方式 +```python +from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize + +transform = Compose([Resize(target_size=(512, 512)), Normalize()]) +``` + +`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。 + +### Step2: 下载数据集并使用 +```python +from paddlehub.datasets import OpticDiscSeg + +train_reader = OpticDiscSeg(transform, mode='train') + +``` +* `transform`: 数据预处理方式。 +* `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。 + +数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。 + +### Step3: 加载预训练模型 + +```python +model = hub.Module(name='unet_cityscapes', num_classes=2, pretrained=None) +``` +* `name`: 选择预训练模型的名字。 +* `num_classes`: 分割模型的类别数目。 +* `pretrained`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。 + +### Step4: 选择优化策略和运行配置 + +```python +scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001) +optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters()) +trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True) +``` + +#### 优化策略 + +Paddle2.0rc提供了多种优化器选择,如`SGD`, `Adam`, `Adamax`等,详细参见[策略](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0-rc/api/paddle/optimizer/optimizer/Optimizer_cn.html)。 + +其中`Adam`: + +* `learning_rate`: 全局学习率。 +* `parameters`: 待优化模型参数。 + +#### 运行配置 +`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数: + +* `model`: 被优化模型; +* `optimizer`: 优化器选择; +* `use_gpu`: 是否使用gpu,默认为False; +* `use_vdl`: 是否使用vdl可视化训练过程; +* `checkpoint_dir`: 保存模型参数的地址; +* `compare_metrics`: 保存最优模型的衡量指标; + +`trainer.train` 主要控制具体的训练过程,包含以下可控制的参数: + +* `train_dataset`: 训练时所用的数据集; +* `epochs`: 训练轮数; +* `batch_size`: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size; +* `num_workers`: works的数量,默认为0; +* `eval_dataset`: 验证集; +* `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。 +* `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。 + +## 模型预测 + +当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。 + +我们使用该模型来进行预测。predict.py脚本如下: + +```python +import paddle +import cv2 +import paddlehub as hub + +if __name__ == '__main__': + model = hub.Module(name='unet_cityscapes', pretrained='/PATH/TO/CHECKPOINT') + img = cv2.imread("/PATH/TO/IMAGE") + model.predict(images=[img], visualization=True) +``` + +参数配置正确后,请执行脚本`python predict.py`。 +**Args** +* `images`:原始图像路径或BGR格式图片; +* `visualization`: 是否可视化,默认为True; +* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。 + +**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。 + +## 服务部署 + +PaddleHub Serving可以部署一个在线图像分割服务。 + +### Step1: 启动PaddleHub Serving + +运行启动命令: + +```shell +$ hub serving start -m unet_cityscapes +``` + +这样就完成了一个图像分割服务化API的部署,默认端口号为8866。 + +**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。 + +### Step2: 发送预测请求 + +配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + +```python +import requests +import json +import cv2 +import base64 + +import numpy as np + + +def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + +# 发送HTTP请求 +org_im = cv2.imread('/PATH/TO/IMAGE') +data = {'images':[cv2_to_base64(org_im)]} +headers = {"Content-type": "application/json"} +url = "http://127.0.0.1:8866/predict/unet_cityscapes" +r = requests.post(url=url, headers=headers, data=json.dumps(data)) +mask = base64_to_cv2(r.json()["results"][0]) +``` + +### 查看代码 + +https://github.com/PaddlePaddle/PaddleSeg + +### 依赖 + +paddlepaddle >= 2.0.0 + +paddlehub >= 2.0.0 diff --git a/modules/image/semantic_segmentation/unet_cityscapes/layers.py b/modules/image/semantic_segmentation/unet_cityscapes/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..e4f909588a88236e9f4f2d2aed9c9c4ea06fead3 --- /dev/null +++ b/modules/image/semantic_segmentation/unet_cityscapes/layers.py @@ -0,0 +1,185 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + + +def SyncBatchNorm(*args, **kwargs): + """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead""" + if paddle.get_device() == 'cpu' or os.environ.get('PADDLESEG_EXPORT_STAGE'): + return nn.BatchNorm2D(*args, **kwargs) + else: + return nn.SyncBatchNorm(*args, **kwargs) + + +class ConvBNReLU(nn.Layer): + """Basic conv bn relu layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs): + super().__init__() + + self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + x = F.relu(x) + return x + + +class ConvBN(nn.Layer): + """Basic conv bn layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs): + super().__init__() + self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs) + self._batch_norm = SyncBatchNorm(out_channels) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self._conv(x) + x = self._batch_norm(x) + return x + + +class ConvReLUPool(nn.Layer): + """Basic conv bn pool layer.""" + + def __init__(self, in_channels: int, out_channels: int): + super().__init__() + self.conv = nn.Conv2D(in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv(x) + x = F.relu(x) + x = F.pool2d(x, pool_size=2, pool_type="max", pool_stride=2) + return x + + +class SeparableConvBNReLU(nn.Layer): + """Basic separable Convolution layer.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs): + super().__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=in_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + x = self.piontwise_conv(x) + return x + + +class DepthwiseConvBN(nn.Layer): + """Depthwise Convolution.""" + + def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs): + super().__init__() + self.depthwise_conv = ConvBN( + in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + padding=padding, + groups=in_channels, + **kwargs) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.depthwise_conv(x) + return x + + +class AuxLayer(nn.Layer): + """ + The auxiliary layer implementation for auxiliary loss. + + Args: + in_channels (int): The number of input channels. + inter_channels (int): The intermediate channels. + out_channels (int): The number of output channels, and usually it is num_classes. + dropout_prob (float, optional): The drop rate. Default: 0.1. + """ + + def __init__(self, in_channels: int, inter_channels: int, out_channels: int, dropout_prob: float = 0.1): + super().__init__() + + self.conv_bn_relu = ConvBNReLU(in_channels=in_channels, out_channels=inter_channels, kernel_size=3, padding=1) + + self.dropout = nn.Dropout(p=dropout_prob) + + self.conv = nn.Conv2D(in_channels=inter_channels, out_channels=out_channels, kernel_size=1) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + x = self.conv_bn_relu(x) + x = self.dropout(x) + x = self.conv(x) + return x + + +class Activation(nn.Layer): + """ + The wrapper of activations. + Args: + act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu', + 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', + 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', + 'hsigmoid']. Default: None, means identical transformation. + Returns: + A callable object of Activation. + Raises: + KeyError: When parameter `act` is not in the optional range. + Examples: + from paddleseg.models.common.activation import Activation + relu = Activation("relu") + print(relu) + # + sigmoid = Activation("sigmoid") + print(sigmoid) + # + not_exit_one = Activation("not_exit_one") + # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink', + # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax', + # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])" + """ + + def __init__(self, act: str = None): + super(Activation, self).__init__() + + self._act = act + upper_act_names = nn.layer.activation.__dict__.keys() + lower_act_names = [act.lower() for act in upper_act_names] + act_dict = dict(zip(lower_act_names, upper_act_names)) + + if act is not None: + if act in act_dict.keys(): + act_name = act_dict[act] + self.act_func = eval("nn.layer.activation.{}()".format(act_name)) + else: + raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys())) + + def forward(self, x: paddle.Tensor) -> paddle.Tensor: + if self._act is not None: + return self.act_func(x) + else: + return x diff --git a/modules/image/semantic_segmentation/unet_cityscapes/module.py b/modules/image/semantic_segmentation/unet_cityscapes/module.py new file mode 100644 index 0000000000000000000000000000000000000000..f2bcc19f5c7662858ecac7b9c2d89dbbc2f8628b --- /dev/null +++ b/modules/image/semantic_segmentation/unet_cityscapes/module.py @@ -0,0 +1,151 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from typing import Union, List, Tuple + +import paddle +from paddle import nn +import paddle.nn.functional as F +import numpy as np +from paddlehub.module.module import moduleinfo +import paddlehub.vision.segmentation_transforms as T +from paddlehub.module.cv_module import ImageSegmentationModule + +import unet_cityscapes.layers as layers + + +@moduleinfo( + name="unet_cityscapes", + type="CV/semantic_segmentation", + author="paddlepaddle", + author_email="", + summary="Unet is a segmentation model.", + version="1.0.0", + meta=ImageSegmentationModule) +class UNet(nn.Layer): + """ + The UNet implementation based on PaddlePaddle. + + The original article refers to + Olaf Ronneberger, et, al. "U-Net: Convolutional Networks for Biomedical Image Segmentation" + (https://arxiv.org/abs/1505.04597). + + Args: + num_classes (int): The unique number of target classes. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + use_deconv (bool, optional): A bool value indicates whether using deconvolution in upsampling. + If False, use resize_bilinear. Default: False. + pretrained (str, optional): The path or url of pretrained model for fine tuning. Default: None. + """ + + def __init__(self, + num_classes: int = 19, + align_corners: bool = False, + use_deconv: bool = False, + pretrained: str = None): + super(UNet, self).__init__() + + self.encode = Encoder() + self.decode = Decoder(align_corners, use_deconv=use_deconv) + self.cls = self.conv = nn.Conv2D(in_channels=64, out_channels=num_classes, kernel_size=3, stride=1, padding=1) + + self.transforms = T.Compose([T.Normalize()]) + + if pretrained is not None: + model_dict = paddle.load(pretrained) + self.set_dict(model_dict) + print("load custom parameters success") + + else: + checkpoint = os.path.join(self.directory, 'model.pdparams') + model_dict = paddle.load(checkpoint) + self.set_dict(model_dict) + print("load pretrained parameters success") + + def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]: + return self.transforms(img) + + def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]: + logit_list = [] + x, short_cuts = self.encode(x) + x = self.decode(x, short_cuts) + logit = self.cls(x) + logit_list.append(logit) + return logit_list + + +class Encoder(nn.Layer): + def __init__(self): + super().__init__() + + self.double_conv = nn.Sequential(layers.ConvBNReLU(3, 64, 3), layers.ConvBNReLU(64, 64, 3)) + down_channels = [[64, 128], [128, 256], [256, 512], [512, 512]] + self.down_sample_list = nn.LayerList([self.down_sampling(channel[0], channel[1]) for channel in down_channels]) + + def down_sampling(self, in_channels: int, out_channels: int) -> nn.Layer: + modules = [] + modules.append(nn.MaxPool2D(kernel_size=2, stride=2)) + modules.append(layers.ConvBNReLU(in_channels, out_channels, 3)) + modules.append(layers.ConvBNReLU(out_channels, out_channels, 3)) + return nn.Sequential(*modules) + + def forward(self, x: paddle.Tensor) -> Tuple: + short_cuts = [] + x = self.double_conv(x) + for down_sample in self.down_sample_list: + short_cuts.append(x) + x = down_sample(x) + return x, short_cuts + + +class Decoder(nn.Layer): + def __init__(self, align_corners: bool, use_deconv: bool = False): + super().__init__() + + up_channels = [[512, 256], [256, 128], [128, 64], [64, 64]] + self.up_sample_list = nn.LayerList( + [UpSampling(channel[0], channel[1], align_corners, use_deconv) for channel in up_channels]) + + def forward(self, x: paddle.Tensor, short_cuts: List) -> paddle.Tensor: + for i in range(len(short_cuts)): + x = self.up_sample_list[i](x, short_cuts[-(i + 1)]) + return x + + +class UpSampling(nn.Layer): + def __init__(self, in_channels: int, out_channels: int, align_corners: bool, use_deconv: bool = False): + super().__init__() + + self.align_corners = align_corners + + self.use_deconv = use_deconv + if self.use_deconv: + self.deconv = nn.Conv2DTranspose(in_channels, out_channels // 2, kernel_size=2, stride=2, padding=0) + in_channels = in_channels + out_channels // 2 + else: + in_channels *= 2 + + self.double_conv = nn.Sequential( + layers.ConvBNReLU(in_channels, out_channels, 3), layers.ConvBNReLU(out_channels, out_channels, 3)) + + def forward(self, x: paddle.Tensor, short_cut: paddle.Tensor) -> paddle.Tensor: + if self.use_deconv: + x = self.deconv(x) + else: + x = F.interpolate(x, paddle.shape(short_cut)[2:], mode='bilinear', align_corners=self.align_corners) + x = paddle.concat([x, short_cut], axis=1) + x = self.double_conv(x) + return x