未验证 提交 595f4534 编写于 作者: H haoyuying 提交者: GitHub

Add more semantic segmentation models

上级 5e9c173b
# PaddleHub 图像分割
## 模型预测
若想使用我们提供的预训练模型进行预测,可使用如下脚本:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='bisenetv2_cityscapes')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
## 如何开始Fine-tune
本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用bisenetv2_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。
## 代码步骤
使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
### Step1: 定义数据预处理方式
```python
from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
transform = Compose([Resize(target_size=(512, 512)), Normalize()])
```
`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
### Step2: 下载数据集并使用
```python
from paddlehub.datasets import OpticDiscSeg
train_reader = OpticDiscSeg(transform mode='train')
```
* `transform`: 数据预处理方式。
* `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`
数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
### Step3: 加载预训练模型
```python
model = hub.Module(name='bisenetv2_cityscapes', num_classes=2, pretrained=None)
```
* `name`: 选择预训练模型的名字。
* `num_classes`: 分割模型的类别数目。
* `pretrained`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
### Step4: 选择优化策略和运行配置
```python
scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True)
```
#### 优化策略
Paddle2.0提供了多种优化器选择,如`SGD`, `Adam`, `Adamax`等,其中`Adam`:
* `learning_rate`: 全局学习率。
* `parameters`: 待优化模型参数。
#### 运行配置
`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数:
* `model`: 被优化模型;
* `optimizer`: 优化器选择;
* `use_gpu`: 是否使用gpu,默认为False;
* `use_vdl`: 是否使用vdl可视化训练过程;
* `checkpoint_dir`: 保存模型参数的地址;
* `compare_metrics`: 保存最优模型的衡量指标;
`trainer.train` 主要控制具体的训练过程,包含以下可控制的参数:
* `train_dataset`: 训练时所用的数据集;
* `epochs`: 训练轮数;
* `batch_size`: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size;
* `num_workers`: works的数量,默认为0;
* `eval_dataset`: 验证集;
* `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。
* `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。
## 模型预测
当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
我们使用该模型来进行预测。predict.py脚本如下:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='bisenetv2_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
参数配置正确后,请执行脚本`python predict.py`
**Args**
* `images`:原始图像路径或BGR格式图片;
* `visualization`: 是否可视化,默认为True;
* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
## 服务部署
PaddleHub Serving可以部署一个在线图像分割服务。
### Step1: 启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m bisenetv2_cityscapes
```
这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
### Step2: 发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/bisenetv2_cityscapes"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
mask = base64_to_cv2(r.json()["results"][0])
```
### 查看代码
https://github.com/PaddlePaddle/PaddleSeg
### 依赖
paddlepaddle >= 2.0.0
paddlehub >= 2.0.0
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
def SyncBatchNorm(*args, **kwargs):
"""In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
if paddle.get_device() == 'cpu' or os.environ.get('PADDLESEG_EXPORT_STAGE'):
return nn.BatchNorm2D(*args, **kwargs)
else:
return nn.SyncBatchNorm(*args, **kwargs)
class ConvBNReLU(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
super().__init__()
self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
x = F.relu(x)
return x
class ConvBN(nn.Layer):
"""Basic conv bn layer."""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
super().__init__()
self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
return x
class ConvReLUPool(nn.Layer):
"""Basic conv bn pool layer."""
def __init__(self, in_channels: int, out_channels: int):
super().__init__()
self.conv = nn.Conv2D(in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.conv(x)
x = F.relu(x)
x = F.pool2d(x, pool_size=2, pool_type="max", pool_stride=2)
return x
class SeparableConvBNReLU(nn.Layer):
"""Basic separable conv bn relu layer."""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
super().__init__()
self.depthwise_conv = ConvBN(
in_channels,
out_channels=in_channels,
kernel_size=kernel_size,
padding=padding,
groups=in_channels,
**kwargs)
self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.depthwise_conv(x)
x = self.piontwise_conv(x)
return x
class DepthwiseConvBN(nn.Layer):
"""Basic depthwise conv bn relu layer."""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
super().__init__()
self.depthwise_conv = ConvBN(
in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
padding=padding,
groups=in_channels,
**kwargs)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.depthwise_conv(x)
return x
class AuxLayer(nn.Layer):
"""
The auxiliary layer implementation for auxiliary loss.
Args:
in_channels (int): The number of input channels.
inter_channels (int): The intermediate channels.
out_channels (int): The number of output channels, and usually it is num_classes.
dropout_prob (float, optional): The drop rate. Default: 0.1.
"""
def __init__(self, in_channels: int, inter_channels: int, out_channels: int, dropout_prob: float = 0.1):
super().__init__()
self.conv_bn_relu = ConvBNReLU(in_channels=in_channels, out_channels=inter_channels, kernel_size=3, padding=1)
self.dropout = nn.Dropout(p=dropout_prob)
self.conv = nn.Conv2D(in_channels=inter_channels, out_channels=out_channels, kernel_size=1)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.conv_bn_relu(x)
x = self.dropout(x)
x = self.conv(x)
return x
class Activation(nn.Layer):
"""
The wrapper of activations.
Args:
act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
'hsigmoid']. Default: None, means identical transformation.
Returns:
A callable object of Activation.
Raises:
KeyError: When parameter `act` is not in the optional range.
Examples:
from paddleseg.models.common.activation import Activation
relu = Activation("relu")
print(relu)
# <class 'paddle.nn.layer.activation.ReLU'>
sigmoid = Activation("sigmoid")
print(sigmoid)
# <class 'paddle.nn.layer.activation.Sigmoid'>
not_exit_one = Activation("not_exit_one")
# KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
# 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
# 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
"""
def __init__(self, act: str = None):
super(Activation, self).__init__()
self._act = act
upper_act_names = nn.layer.activation.__dict__.keys()
lower_act_names = [act.lower() for act in upper_act_names]
act_dict = dict(zip(lower_act_names, upper_act_names))
if act is not None:
if act in act_dict.keys():
act_name = act_dict[act]
self.act_func = eval("nn.layer.activation.{}()".format(act_name))
else:
raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys()))
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
if self._act is not None:
return self.act_func(x)
else:
return x
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from typing import Union, List, Tuple
import numpy as np
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlehub.module.module import moduleinfo
import paddlehub.vision.segmentation_transforms as T
from paddlehub.module.cv_module import ImageSegmentationModule
import bisenet_cityscapes.layers as layers
@moduleinfo(
name="bisenetv2_cityscapes",
type="CV/semantic_segmentation",
author="paddlepaddle",
author_email="",
summary="Bisenet is a segmentation model trained by Cityscapes.",
version="1.0.0",
meta=ImageSegmentationModule)
class BiSeNetV2(nn.Layer):
"""
The BiSeNet V2 implementation based on PaddlePaddle.
The original article refers to
Yu, Changqian, et al. "BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation"
(https://arxiv.org/abs/2004.02147)
Args:
num_classes (int): The unique number of target classes, default is 19.
lambd (float, optional): A factor for controlling the size of semantic branch channels. Default: 0.25.
align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
pretrained (str, optional): The path or url of pretrained model. Default: None.
"""
def __init__(self, num_classes: int = 19, lambd: float = 0.25, align_corners: bool = False, pretrained: str = None):
super(BiSeNetV2, self).__init__()
C1, C2, C3 = 64, 64, 128
db_channels = (C1, C2, C3)
C1, C3, C4, C5 = int(C1 * lambd), int(C3 * lambd), 64, 128
sb_channels = (C1, C3, C4, C5)
mid_channels = 128
self.db = DetailBranch(db_channels)
self.sb = SemanticBranch(sb_channels)
self.bga = BGA(mid_channels, align_corners)
self.aux_head1 = SegHead(C1, C1, num_classes)
self.aux_head2 = SegHead(C3, C3, num_classes)
self.aux_head3 = SegHead(C4, C4, num_classes)
self.aux_head4 = SegHead(C5, C5, num_classes)
self.head = SegHead(mid_channels, mid_channels, num_classes)
self.align_corners = align_corners
self.transforms = T.Compose([T.Normalize()])
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'bisenet_model.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
return self.transforms(img)
def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
dfm = self.db(x)
feat1, feat2, feat3, feat4, sfm = self.sb(x)
logit = self.head(self.bga(dfm, sfm))
if not self.training:
logit_list = [logit]
else:
logit1 = self.aux_head1(feat1)
logit2 = self.aux_head2(feat2)
logit3 = self.aux_head3(feat3)
logit4 = self.aux_head4(feat4)
logit_list = [logit, logit1, logit2, logit3, logit4]
logit_list = [
F.interpolate(logit, paddle.shape(x)[2:], mode='bilinear', align_corners=self.align_corners)
for logit in logit_list
]
return logit_list
class StemBlock(nn.Layer):
def __init__(self, in_dim: int, out_dim: int):
super(StemBlock, self).__init__()
self.conv = layers.ConvBNReLU(in_dim, out_dim, 3, stride=2)
self.left = nn.Sequential(
layers.ConvBNReLU(out_dim, out_dim // 2, 1), layers.ConvBNReLU(out_dim // 2, out_dim, 3, stride=2))
self.right = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
self.fuse = layers.ConvBNReLU(out_dim * 2, out_dim, 3)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.conv(x)
left = self.left(x)
right = self.right(x)
concat = paddle.concat([left, right], axis=1)
return self.fuse(concat)
class ContextEmbeddingBlock(nn.Layer):
def __init__(self, in_dim: int, out_dim: int):
super(ContextEmbeddingBlock, self).__init__()
self.gap = nn.AdaptiveAvgPool2D(1)
self.bn = layers.SyncBatchNorm(in_dim)
self.conv_1x1 = layers.ConvBNReLU(in_dim, out_dim, 1)
self.conv_3x3 = nn.Conv2D(out_dim, out_dim, 3, 1, 1)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
gap = self.gap(x)
bn = self.bn(gap)
conv1 = self.conv_1x1(bn) + x
return self.conv_3x3(conv1)
class GatherAndExpansionLayer1(nn.Layer):
"""Gather And Expansion Layer with stride 1"""
def __init__(self, in_dim: int, out_dim: int, expand: int):
super().__init__()
expand_dim = expand * in_dim
self.conv = nn.Sequential(
layers.ConvBNReLU(in_dim, in_dim, 3), layers.DepthwiseConvBN(in_dim, expand_dim, 3),
layers.ConvBN(expand_dim, out_dim, 1))
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
return F.relu(self.conv(x) + x)
class GatherAndExpansionLayer2(nn.Layer):
"""Gather And Expansion Layer with stride 2"""
def __init__(self, in_dim: int, out_dim: int, expand: int):
super().__init__()
expand_dim = expand * in_dim
self.branch_1 = nn.Sequential(
layers.ConvBNReLU(in_dim, in_dim, 3), layers.DepthwiseConvBN(in_dim, expand_dim, 3, stride=2),
layers.DepthwiseConvBN(expand_dim, expand_dim, 3), layers.ConvBN(expand_dim, out_dim, 1))
self.branch_2 = nn.Sequential(
layers.DepthwiseConvBN(in_dim, in_dim, 3, stride=2), layers.ConvBN(in_dim, out_dim, 1))
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
return F.relu(self.branch_1(x) + self.branch_2(x))
class DetailBranch(nn.Layer):
"""The detail branch of BiSeNet, which has wide channels but shallow layers."""
def __init__(self, in_channels: int):
super().__init__()
C1, C2, C3 = in_channels
self.convs = nn.Sequential(
# stage 1
layers.ConvBNReLU(3, C1, 3, stride=2),
layers.ConvBNReLU(C1, C1, 3),
# stage 2
layers.ConvBNReLU(C1, C2, 3, stride=2),
layers.ConvBNReLU(C2, C2, 3),
layers.ConvBNReLU(C2, C2, 3),
# stage 3
layers.ConvBNReLU(C2, C3, 3, stride=2),
layers.ConvBNReLU(C3, C3, 3),
layers.ConvBNReLU(C3, C3, 3),
)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
return self.convs(x)
class SemanticBranch(nn.Layer):
"""The semantic branch of BiSeNet, which has narrow channels but deep layers."""
def __init__(self, in_channels: int):
super().__init__()
C1, C3, C4, C5 = in_channels
self.stem = StemBlock(3, C1)
self.stage3 = nn.Sequential(GatherAndExpansionLayer2(C1, C3, 6), GatherAndExpansionLayer1(C3, C3, 6))
self.stage4 = nn.Sequential(GatherAndExpansionLayer2(C3, C4, 6), GatherAndExpansionLayer1(C4, C4, 6))
self.stage5_4 = nn.Sequential(
GatherAndExpansionLayer2(C4, C5, 6), GatherAndExpansionLayer1(C5, C5, 6), GatherAndExpansionLayer1(
C5, C5, 6), GatherAndExpansionLayer1(C5, C5, 6))
self.ce = ContextEmbeddingBlock(C5, C5)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
stage2 = self.stem(x)
stage3 = self.stage3(stage2)
stage4 = self.stage4(stage3)
stage5_4 = self.stage5_4(stage4)
fm = self.ce(stage5_4)
return stage2, stage3, stage4, stage5_4, fm
class BGA(nn.Layer):
"""The Bilateral Guided Aggregation Layer, used to fuse the semantic features and spatial features."""
def __init__(self, out_dim: int, align_corners: bool):
super().__init__()
self.align_corners = align_corners
self.db_branch_keep = nn.Sequential(layers.DepthwiseConvBN(out_dim, out_dim, 3), nn.Conv2D(out_dim, out_dim, 1))
self.db_branch_down = nn.Sequential(
layers.ConvBN(out_dim, out_dim, 3, stride=2), nn.AvgPool2D(kernel_size=3, stride=2, padding=1))
self.sb_branch_keep = nn.Sequential(
layers.DepthwiseConvBN(out_dim, out_dim, 3), nn.Conv2D(out_dim, out_dim, 1),
layers.Activation(act='sigmoid'))
self.sb_branch_up = layers.ConvBN(out_dim, out_dim, 3)
self.conv = layers.ConvBN(out_dim, out_dim, 3)
def forward(self, dfm: int, sfm: int) -> paddle.Tensor:
db_feat_keep = self.db_branch_keep(dfm)
db_feat_down = self.db_branch_down(dfm)
sb_feat_keep = self.sb_branch_keep(sfm)
sb_feat_up = self.sb_branch_up(sfm)
sb_feat_up = F.interpolate(
sb_feat_up, paddle.shape(db_feat_keep)[2:], mode='bilinear', align_corners=self.align_corners)
sb_feat_up = F.sigmoid(sb_feat_up)
db_feat = db_feat_keep * sb_feat_up
sb_feat = db_feat_down * sb_feat_keep
sb_feat = F.interpolate(sb_feat, paddle.shape(db_feat)[2:], mode='bilinear', align_corners=self.align_corners)
return self.conv(db_feat + sb_feat)
class SegHead(nn.Layer):
def __init__(self, in_dim: int, mid_dim: int, num_classes: int):
super().__init__()
self.conv_3x3 = nn.Sequential(layers.ConvBNReLU(in_dim, mid_dim, 3), nn.Dropout(0.1))
self.conv_1x1 = nn.Conv2D(mid_dim, num_classes, 1, 1)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
conv1 = self.conv_3x3(x)
conv2 = self.conv_1x1(conv1)
return conv2
# PaddleHub 图像分割
## 模型预测
若想使用我们提供的预训练模型进行预测,可使用如下脚本:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='deeplabv3p_resnet50_cityscapes')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
## 如何开始Fine-tune
在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用deeplabv3p_resnet50_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。
## 代码步骤
使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
### Step1: 定义数据预处理方式
```python
from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
transform = Compose([Resize(target_size=(512, 512)), Normalize()])
```
`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
### Step2: 下载数据集并使用
```python
from paddlehub.datasets import OpticDiscSeg
train_reader = OpticDiscSeg(transform mode='train')
```
* `transform`: 数据预处理方式。
* `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`
数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
### Step3: 加载预训练模型
```python
model = hub.Module(name='deeplabv3p_resnet50_cityscapes', num_classes=2, pretrained=None)
```
* `name`: 选择预训练模型的名字。
* `num_classes`: 分割模型的类别数目。
* `pretrained`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
### Step4: 选择优化策略和运行配置
```python
scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True)
```
#### 优化策略
Paddle2.0rc提供了多种优化器选择,如`SGD`, `Adam`, `Adamax`等,其中`Adam`:
* `learning_rate`: 全局学习率。
* `parameters`: 待优化模型参数。
#### 运行配置
`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数:
* `model`: 被优化模型;
* `optimizer`: 优化器选择;
* `use_gpu`: 是否使用gpu,默认为False;
* `use_vdl`: 是否使用vdl可视化训练过程;
* `checkpoint_dir`: 保存模型参数的地址;
* `compare_metrics`: 保存最优模型的衡量指标;
`trainer.train` 主要控制具体的训练过程,包含以下可控制的参数:
* `train_dataset`: 训练时所用的数据集;
* `epochs`: 训练轮数;
* `batch_size`: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size;
* `num_workers`: works的数量,默认为0;
* `eval_dataset`: 验证集;
* `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。
* `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。
## 模型预测
当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
我们使用该模型来进行预测。predict.py脚本如下:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='deeplabv3p_resnet50_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
参数配置正确后,请执行脚本`python predict.py`
**Args**
* `images`:原始图像路径或BGR格式图片;
* `visualization`: 是否可视化,默认为True;
* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
## 服务部署
PaddleHub Serving可以部署一个在线图像分割服务。
### Step1: 启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m deeplabv3p_resnet50_cityscapes
```
这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
### Step2: 发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/deeplabv3p_resnet50_cityscapes"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
mask = base64_to_cv2(r.json()["results"][0])
```
### 查看代码
https://github.com/PaddlePaddle/PaddleSeg
### 依赖
paddlepaddle >= 2.0.0
paddlehub >= 2.0.0
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn import Conv2D, AvgPool2D
def SyncBatchNorm(*args, **kwargs):
"""In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
if paddle.get_device() == 'cpu':
return nn.BatchNorm2D(*args, **kwargs)
else:
return nn.SyncBatchNorm(*args, **kwargs)
class ConvBNLayer(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
stride: int = 1,
dilation: int = 1,
groups: int = 1,
is_vd_mode: bool = False,
act: str = None,
name: str = None):
super(ConvBNLayer, self).__init__()
self.is_vd_mode = is_vd_mode
self._pool2d_avg = AvgPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True)
self._conv = Conv2D(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
dilation=dilation,
groups=groups,
bias_attr=False)
self._batch_norm = SyncBatchNorm(out_channels)
self._act_op = Activation(act=act)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
if self.is_vd_mode:
inputs = self._pool2d_avg(inputs)
y = self._conv(inputs)
y = self._batch_norm(y)
y = self._act_op(y)
return y
class BottleneckBlock(nn.Layer):
"""Residual bottleneck block"""
def __init__(self,
in_channels: int,
out_channels: int,
stride: int,
shortcut: bool = True,
if_first: bool = False,
dilation: int = 1,
name: str = None):
super(BottleneckBlock, self).__init__()
self.conv0 = ConvBNLayer(
in_channels=in_channels, out_channels=out_channels, kernel_size=1, act='relu', name=name + "_branch2a")
self.dilation = dilation
self.conv1 = ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
stride=stride,
act='relu',
dilation=dilation,
name=name + "_branch2b")
self.conv2 = ConvBNLayer(
in_channels=out_channels, out_channels=out_channels * 4, kernel_size=1, act=None, name=name + "_branch2c")
if not shortcut:
self.short = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels * 4,
kernel_size=1,
stride=1,
is_vd_mode=False if if_first or stride == 1 else True,
name=name + "_branch1")
self.shortcut = shortcut
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv0(inputs)
if self.dilation > 1:
padding = self.dilation
y = F.pad(y, [padding, padding, padding, padding])
conv1 = self.conv1(y)
conv2 = self.conv2(conv1)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
y = paddle.add(x=short, y=conv2)
y = F.relu(y)
return y
class SeparableConvBNReLU(nn.Layer):
"""Depthwise Separable Convolution."""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
super(SeparableConvBNReLU, self).__init__()
self.depthwise_conv = ConvBN(
in_channels,
out_channels=in_channels,
kernel_size=kernel_size,
padding=padding,
groups=in_channels,
**kwargs)
self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.depthwise_conv(x)
x = self.piontwise_conv(x)
return x
class ConvBN(nn.Layer):
"""Basic conv bn layer"""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
super(ConvBN, self).__init__()
self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
return x
class ConvBNReLU(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
super(ConvBNReLU, self).__init__()
self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
x = F.relu(x)
return x
class Activation(nn.Layer):
"""
The wrapper of activations.
Args:
act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
'hsigmoid']. Default: None, means identical transformation.
Returns:
A callable object of Activation.
Raises:
KeyError: When parameter `act` is not in the optional range.
Examples:
from paddleseg.models.common.activation import Activation
relu = Activation("relu")
print(relu)
# <class 'paddle.nn.layer.activation.ReLU'>
sigmoid = Activation("sigmoid")
print(sigmoid)
# <class 'paddle.nn.layer.activation.Sigmoid'>
not_exit_one = Activation("not_exit_one")
# KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
# 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
# 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
"""
def __init__(self, act: str = None):
super(Activation, self).__init__()
self._act = act
upper_act_names = nn.layer.activation.__dict__.keys()
lower_act_names = [act.lower() for act in upper_act_names]
act_dict = dict(zip(lower_act_names, upper_act_names))
if act is not None:
if act in act_dict.keys():
act_name = act_dict[act]
self.act_func = eval("nn.layer.activation.{}()".format(act_name))
else:
raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys()))
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
if self._act is not None:
return self.act_func(x)
else:
return x
class ASPPModule(nn.Layer):
"""
Atrous Spatial Pyramid Pooling.
Args:
aspp_ratios (tuple): The dilation rate using in ASSP module.
in_channels (int): The number of input channels.
out_channels (int): The number of output channels.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
image_pooling (bool, optional): If augmented with image-level features. Default: False
"""
def __init__(self,
aspp_ratios: tuple,
in_channels: int,
out_channels: int,
align_corners: bool,
use_sep_conv: bool = False,
image_pooling: bool = False):
super().__init__()
self.align_corners = align_corners
self.aspp_blocks = nn.LayerList()
for ratio in aspp_ratios:
if use_sep_conv and ratio > 1:
conv_func = SeparableConvBNReLU
else:
conv_func = ConvBNReLU
block = conv_func(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1 if ratio == 1 else 3,
dilation=ratio,
padding=0 if ratio == 1 else ratio)
self.aspp_blocks.append(block)
out_size = len(self.aspp_blocks)
if image_pooling:
self.global_avg_pool = nn.Sequential(
nn.AdaptiveAvgPool2D(output_size=(1, 1)),
ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
out_size += 1
self.image_pooling = image_pooling
self.conv_bn_relu = ConvBNReLU(in_channels=out_channels * out_size, out_channels=out_channels, kernel_size=1)
self.dropout = nn.Dropout(p=0.1) # drop rate
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
outputs = []
for block in self.aspp_blocks:
y = block(x)
y = F.interpolate(y, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
outputs.append(y)
if self.image_pooling:
img_avg = self.global_avg_pool(x)
img_avg = F.interpolate(img_avg, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
outputs.append(img_avg)
x = paddle.concat(outputs, axis=1)
x = self.conv_bn_relu(x)
x = self.dropout(x)
return x
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from typing import Union, List, Tuple
import paddle
from paddle import nn
import paddle.nn.functional as F
import numpy as np
from paddlehub.module.module import moduleinfo
import paddlehub.vision.segmentation_transforms as T
from paddlehub.module.cv_module import ImageSegmentationModule
from deeplabv3p_resnet50_cityscapes.resnet import ResNet50_vd
import deeplabv3p_resnet50_cityscapes.layers as L
@moduleinfo(
name="deeplabv3p_resnet50_cityscapes",
type="CV/semantic_segmentation",
author="paddlepaddle",
author_email="",
summary="DeepLabV3PResnet50 is a segmentation model.",
version="1.0.0",
meta=ImageSegmentationModule)
class DeepLabV3PResnet50(nn.Layer):
"""
The DeepLabV3PResnet50 implementation based on PaddlePaddle.
The original article refers to
Liang-Chieh Chen, et, al. "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation"
(https://arxiv.org/abs/1802.02611)
Args:
num_classes (int): the unique number of target classes.
backbone_indices (tuple): two values in the tuple indicate the indices of output of backbone.
the first index will be taken as a low-level feature in Decoder component;
the second one will be taken as input of ASPP component.
Usually backbone consists of four downsampling stage, and return an output of
each stage, so we set default (0, 3), which means taking feature map of the first
stage in backbone as low-level feature used in Decoder, and feature map of the fourth
stage as input of ASPP.
aspp_ratios (tuple): the dilation rate using in ASSP module.
if output_stride=16, aspp_ratios should be set as (1, 6, 12, 18).
if output_stride=8, aspp_ratios is (1, 12, 24, 36).
aspp_out_channels (int): the output channels of ASPP module.
align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
pretrained (str): the path of pretrained model. Default to None.
"""
def __init__(self,
num_classes: int = 19,
backbone_indices: Tuple[int] = (0, 3),
aspp_ratios: Tuple[int] = (1, 12, 24, 36),
aspp_out_channels: int = 256,
align_corners=False,
pretrained: str = None):
super(DeepLabV3PResnet50, self).__init__()
self.backbone = ResNet50_vd()
backbone_channels = [self.backbone.feat_channels[i] for i in backbone_indices]
self.head = DeepLabV3PHead(num_classes, backbone_indices, backbone_channels, aspp_ratios, aspp_out_channels,
align_corners)
self.align_corners = align_corners
self.transforms = T.Compose([T.Normalize()])
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'model.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
return self.transforms(img)
def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
feat_list = self.backbone(x)
logit_list = self.head(feat_list)
return [
F.interpolate(logit, x.shape[2:], mode='bilinear', align_corners=self.align_corners) for logit in logit_list
]
class DeepLabV3PHead(nn.Layer):
"""
The DeepLabV3PHead implementation based on PaddlePaddle.
Args:
num_classes (int): The unique number of target classes.
backbone_indices (tuple): Two values in the tuple indicate the indices of output of backbone.
the first index will be taken as a low-level feature in Decoder component;
the second one will be taken as input of ASPP component.
Usually backbone consists of four downsampling stage, and return an output of
each stage. If we set it as (0, 3), it means taking feature map of the first
stage in backbone as low-level feature used in Decoder, and feature map of the fourth
stage as input of ASPP.
backbone_channels (tuple): The same length with "backbone_indices". It indicates the channels of corresponding index.
aspp_ratios (tuple): The dilation rates using in ASSP module.
aspp_out_channels (int): The output channels of ASPP module.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
"""
def __init__(self, num_classes: int, backbone_indices: Tuple[paddle.Tensor],
backbone_channels: Tuple[paddle.Tensor], aspp_ratios: Tuple[float], aspp_out_channels: int,
align_corners: bool):
super().__init__()
self.aspp = L.ASPPModule(
aspp_ratios, backbone_channels[1], aspp_out_channels, align_corners, use_sep_conv=True, image_pooling=True)
self.decoder = Decoder(num_classes, backbone_channels[0], align_corners)
self.backbone_indices = backbone_indices
def forward(self, feat_list: List[paddle.Tensor]) -> List[paddle.Tensor]:
logit_list = []
low_level_feat = feat_list[self.backbone_indices[0]]
x = feat_list[self.backbone_indices[1]]
x = self.aspp(x)
logit = self.decoder(x, low_level_feat)
logit_list.append(logit)
return logit_list
class Decoder(nn.Layer):
"""
Decoder module of DeepLabV3P model
Args:
num_classes (int): The number of classes.
in_channels (int): The number of input channels in decoder module.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
"""
def __init__(self, num_classes: int, in_channels: int, align_corners: bool):
super(Decoder, self).__init__()
self.conv_bn_relu1 = L.ConvBNReLU(in_channels=in_channels, out_channels=48, kernel_size=1)
self.conv_bn_relu2 = L.SeparableConvBNReLU(in_channels=304, out_channels=256, kernel_size=3, padding=1)
self.conv_bn_relu3 = L.SeparableConvBNReLU(in_channels=256, out_channels=256, kernel_size=3, padding=1)
self.conv = nn.Conv2D(in_channels=256, out_channels=num_classes, kernel_size=1)
self.align_corners = align_corners
def forward(self, x: paddle.Tensor, low_level_feat: paddle.Tensor) -> paddle.Tensor:
low_level_feat = self.conv_bn_relu1(low_level_feat)
x = F.interpolate(x, low_level_feat.shape[2:], mode='bilinear', align_corners=self.align_corners)
x = paddle.concat([x, low_level_feat], axis=1)
x = self.conv_bn_relu2(x)
x = self.conv_bn_relu3(x)
x = self.conv(x)
return x
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Union, List, Tuple
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
import deeplabv3p_resnet50_cityscapes.layers as L
class BasicBlock(nn.Layer):
def __init__(self,
in_channels: int,
out_channels: int,
stride: int,
shortcut: bool = True,
if_first: bool = False,
name: str = None):
super(BasicBlock, self).__init__()
self.stride = stride
self.conv0 = L.ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=3,
stride=stride,
act='relu',
name=name + "_branch2a")
self.conv1 = L.ConvBNLayer(
in_channels=out_channels, out_channels=out_channels, kernel_size=3, act=None, name=name + "_branch2b")
if not shortcut:
self.short = L.ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1,
stride=1,
is_vd_mode=False if if_first else True,
name=name + "_branch1")
self.shortcut = shortcut
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv0(inputs)
conv1 = self.conv1(y)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
y = paddle.elementwise_add(x=short, y=conv1, act='relu')
return y
class ResNet50_vd(nn.Layer):
def __init__(self, multi_grid: Tuple[int] = (1, 2, 4)):
super(ResNet50_vd, self).__init__()
depth = [3, 4, 6, 3]
num_channels = [64, 256, 512, 1024]
num_filters = [64, 128, 256, 512]
self.feat_channels = [c * 4 for c in num_filters]
dilation_dict = {2: 2, 3: 4}
self.conv1_1 = L.ConvBNLayer(
in_channels=3, out_channels=32, kernel_size=3, stride=2, act='relu', name="conv1_1")
self.conv1_2 = L.ConvBNLayer(
in_channels=32, out_channels=32, kernel_size=3, stride=1, act='relu', name="conv1_2")
self.conv1_3 = L.ConvBNLayer(
in_channels=32, out_channels=64, kernel_size=3, stride=1, act='relu', name="conv1_3")
self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
self.stage_list = []
for block in range(len(depth)):
shortcut = False
block_list = []
for i in range(depth[block]):
conv_name = "res" + str(block + 2) + chr(97 + i)
dilation_rate = dilation_dict[block] if dilation_dict and block in dilation_dict else 1
if block == 3:
dilation_rate = dilation_rate * multi_grid[i]
bottleneck_block = self.add_sublayer(
'bb_%d_%d' % (block, i),
L.BottleneckBlock(
in_channels=num_channels[block] if i == 0 else num_filters[block] * 4,
out_channels=num_filters[block],
stride=2 if i == 0 and block != 0 and dilation_rate == 1 else 1,
shortcut=shortcut,
if_first=block == i == 0,
name=conv_name,
dilation=dilation_rate))
block_list.append(bottleneck_block)
shortcut = True
self.stage_list.append(block_list)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv1_1(inputs)
y = self.conv1_2(y)
y = self.conv1_3(y)
y = self.pool2d_max(y)
feat_list = []
for stage in self.stage_list:
for block in stage:
y = block(y)
feat_list.append(y)
return feat_list
# PaddleHub 图像分割
## 模型预测
若想使用我们提供的预训练模型进行预测,可使用如下脚本:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='fastscnn_cityscapes')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
## 如何开始Fine-tune
在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用fastscnn_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。
## 代码步骤
使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
### Step1: 定义数据预处理方式
```python
from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
transform = Compose([Resize(target_size=(512, 512)), Normalize()])
```
`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
### Step2: 下载数据集并使用
```python
from paddlehub.datasets import OpticDiscSeg
train_reader = OpticDiscSeg(transform mode='train')
```
* `transform`: 数据预处理方式。
* `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`
数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
### Step3: 加载预训练模型
```python
model = hub.Module(name='fastscnn_cityscapes', num_classes=2, pretrained=None)
```
* `name`: 选择预训练模型的名字。
* `num_classes`: 分割模型的类别数目。
* `pretrained`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
### Step4: 选择优化策略和运行配置
```python
scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True)
```
#### 优化策略
Paddle2.0rc提供了多种优化器选择,如`SGD`, `Adam`, `Adamax`等,其中`Adam`:
* `learning_rate`: 全局学习率。
* `parameters`: 待优化模型参数。
#### 运行配置
`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数:
* `model`: 被优化模型;
* `optimizer`: 优化器选择;
* `use_gpu`: 是否使用gpu,默认为False;
* `use_vdl`: 是否使用vdl可视化训练过程;
* `checkpoint_dir`: 保存模型参数的地址;
* `compare_metrics`: 保存最优模型的衡量指标;
`trainer.train` 主要控制具体的训练过程,包含以下可控制的参数:
* `train_dataset`: 训练时所用的数据集;
* `epochs`: 训练轮数;
* `batch_size`: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size;
* `num_workers`: works的数量,默认为0;
* `eval_dataset`: 验证集;
* `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。
* `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。
## 模型预测
当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
我们使用该模型来进行预测。predict.py脚本如下:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='fastscnn_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
参数配置正确后,请执行脚本`python predict.py`
**Args**
* `images`:原始图像路径或BGR格式图片;
* `visualization`: 是否可视化,默认为True;
* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
## 服务部署
PaddleHub Serving可以部署一个在线图像分割服务。
### Step1: 启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m fastscnn_cityscapes
```
这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
### Step2: 发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/fastscnn_cityscapes"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
mask = base64_to_cv2(r.json()["results"][0])
```
### 查看代码
https://github.com/PaddlePaddle/PaddleSeg
### 依赖
paddlepaddle >= 2.0.0
paddlehub >= 2.0.0
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from typing import Tuple
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
def SyncBatchNorm(*args, **kwargs):
"""In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
if paddle.get_device() == 'cpu' or os.environ.get('PADDLESEG_EXPORT_STAGE'):
return nn.BatchNorm2D(*args, **kwargs)
else:
return nn.SyncBatchNorm(*args, **kwargs)
class ConvBNReLU(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
super().__init__()
self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
x = F.relu(x)
return x
class ConvBN(nn.Layer):
"""Basic conv bn layer."""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
super().__init__()
self._conv = nn.Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
return x
class ConvReLUPool(nn.Layer):
"""Basic conv bn pool layer."""
def __init__(self, in_channels: int, out_channels: int):
super().__init__()
self.conv = nn.Conv2D(in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.conv(x)
x = F.relu(x)
x = F.pool2d(x, pool_size=2, pool_type="max", pool_stride=2)
return x
class SeparableConvBNReLU(nn.Layer):
"""Basic separable conv bn relu layer."""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
super().__init__()
self.depthwise_conv = ConvBN(
in_channels,
out_channels=in_channels,
kernel_size=kernel_size,
padding=padding,
groups=in_channels,
**kwargs)
self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.depthwise_conv(x)
x = self.piontwise_conv(x)
return x
class DepthwiseConvBN(nn.Layer):
"""Basic depthwise conv bn relu layer."""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs):
super().__init__()
self.depthwise_conv = ConvBN(
in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
padding=padding,
groups=in_channels,
**kwargs)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.depthwise_conv(x)
return x
class AuxLayer(nn.Layer):
"""
The auxiliary layer implementation for auxiliary loss.
Args:
in_channels (int): The number of input channels.
inter_channels (int): The intermediate channels.
out_channels (int): The number of output channels, and usually it is num_classes.
dropout_prob (float, optional): The drop rate. Default: 0.1.
"""
def __init__(self, in_channels: int, inter_channels: int, out_channels: int, dropout_prob: float = 0.1):
super().__init__()
self.conv_bn_relu = ConvBNReLU(in_channels=in_channels, out_channels=inter_channels, kernel_size=3, padding=1)
self.dropout = nn.Dropout(p=dropout_prob)
self.conv = nn.Conv2D(in_channels=inter_channels, out_channels=out_channels, kernel_size=1)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.conv_bn_relu(x)
x = self.dropout(x)
x = self.conv(x)
return x
class Activation(nn.Layer):
"""
The wrapper of activations.
Args:
act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
'hsigmoid']. Default: None, means identical transformation.
Returns:
A callable object of Activation.
Raises:
KeyError: When parameter `act` is not in the optional range.
Examples:
from paddleseg.models.common.activation import Activation
relu = Activation("relu")
print(relu)
# <class 'paddle.nn.layer.activation.ReLU'>
sigmoid = Activation("sigmoid")
print(sigmoid)
# <class 'paddle.nn.layer.activation.Sigmoid'>
not_exit_one = Activation("not_exit_one")
# KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
# 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
# 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
"""
def __init__(self, act: str = None):
super(Activation, self).__init__()
self._act = act
upper_act_names = nn.layer.activation.__dict__.keys()
lower_act_names = [act.lower() for act in upper_act_names]
act_dict = dict(zip(lower_act_names, upper_act_names))
if act is not None:
if act in act_dict.keys():
act_name = act_dict[act]
self.act_func = eval("nn.layer.activation.{}()".format(act_name))
else:
raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys()))
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
if self._act is not None:
return self.act_func(x)
else:
return x
class PPModule(nn.Layer):
"""
Pyramid pooling module originally in PSPNet.
Args:
in_channels (int): The number of intput channels to pyramid pooling module.
out_channels (int): The number of output channels after pyramid pooling module.
bin_sizes (tuple, optional): The out size of pooled feature maps. Default: (1, 2, 3, 6).
dim_reduction (bool, optional): A bool value represents if reducing dimension after pooling. Default: True.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
"""
def __init__(self, in_channels: int, out_channels: int, bin_sizes: Tuple, dim_reduction: bool, align_corners: bool):
super().__init__()
self.bin_sizes = bin_sizes
inter_channels = in_channels
if dim_reduction:
inter_channels = in_channels // len(bin_sizes)
# we use dimension reduction after pooling mentioned in original implementation.
self.stages = nn.LayerList([self._make_stage(in_channels, inter_channels, size) for size in bin_sizes])
self.conv_bn_relu2 = ConvBNReLU(
in_channels=in_channels + inter_channels * len(bin_sizes),
out_channels=out_channels,
kernel_size=3,
padding=1)
self.align_corners = align_corners
def _make_stage(self, in_channels: int, out_channels: int, size: int):
"""
Create one pooling layer.
In our implementation, we adopt the same dimension reduction as the original paper that might be
slightly different with other implementations.
After pooling, the channels are reduced to 1/len(bin_sizes) immediately, while some other implementations
keep the channels to be same.
Args:
in_channels (int): The number of intput channels to pyramid pooling module.
out_channels (int): The number of output channels to pyramid pooling module.
size (int): The out size of the pooled layer.
Returns:
conv (Tensor): A tensor after Pyramid Pooling Module.
"""
prior = nn.AdaptiveAvgPool2D(output_size=(size, size))
conv = ConvBNReLU(in_channels=in_channels, out_channels=out_channels, kernel_size=1)
return nn.Sequential(prior, conv)
def forward(self, input: paddle.Tensor) -> paddle.Tensor:
cat_layers = []
for stage in self.stages:
x = stage(input)
x = F.interpolate(x, paddle.shape(input)[2:], mode='bilinear', align_corners=self.align_corners)
cat_layers.append(x)
cat_layers = [input] + cat_layers[::-1]
cat = paddle.concat(cat_layers, axis=1)
out = self.conv_bn_relu2(cat)
return out
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from typing import Callable, Union, Tuple
import paddle.nn as nn
import paddle.nn.functional as F
import paddle
import numpy as np
from paddlehub.module.module import moduleinfo
import paddlehub.vision.segmentation_transforms as T
from paddlehub.module.cv_module import ImageSegmentationModule
import fastscnn_cityscapes.layers as layers
@moduleinfo(
name="fastscnn_cityscapes",
type="CV/semantic_segmentation",
author="paddlepaddle",
author_email="",
summary="fastscnn_cityscapes is a segmentation model.",
version="1.0.0",
meta=ImageSegmentationModule)
class FastSCNN(nn.Layer):
"""
The FastSCNN implementation based on PaddlePaddle.
As mentioned in the original paper, FastSCNN is a real-time segmentation algorithm (123.5fps)
even for high resolution images (1024x2048).
The original article refers to
Poudel, Rudra PK, et al. "Fast-scnn: Fast semantic segmentation network"
(https://arxiv.org/pdf/1902.04502.pdf).
Args:
num_classes (int): The unique number of target classes, default is 19.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
pretrained (str, optional): The path or url of pretrained model. Default: None.
"""
def __init__(self, num_classes: int = 19, align_corners: bool = False, pretrained: str = None):
super(FastSCNN, self).__init__()
self.learning_to_downsample = LearningToDownsample(32, 48, 64)
self.global_feature_extractor = GlobalFeatureExtractor(
in_channels=64,
block_channels=[64, 96, 128],
out_channels=128,
expansion=6,
num_blocks=[3, 3, 3],
align_corners=True)
self.feature_fusion = FeatureFusionModule(64, 128, 128, align_corners)
self.classifier = Classifier(128, num_classes)
self.align_corners = align_corners
self.transforms = T.Compose([T.Normalize()])
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'fastscnn_model.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
return self.transforms(img)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
logit_list = []
input_size = paddle.shape(x)[2:]
higher_res_features = self.learning_to_downsample(x)
x = self.global_feature_extractor(higher_res_features)
x = self.feature_fusion(higher_res_features, x)
logit = self.classifier(x)
logit = F.interpolate(logit, input_size, mode='bilinear', align_corners=self.align_corners)
logit_list.append(logit)
return logit_list
class LearningToDownsample(nn.Layer):
"""
Learning to downsample module.
This module consists of three downsampling blocks (one conv and two separable conv)
Args:
dw_channels1 (int, optional): The input channels of the first sep conv. Default: 32.
dw_channels2 (int, optional): The input channels of the second sep conv. Default: 48.
out_channels (int, optional): The output channels of LearningToDownsample module. Default: 64.
"""
def __init__(self, dw_channels1: int = 32, dw_channels2: int = 48, out_channels: int = 64):
super(LearningToDownsample, self).__init__()
self.conv_bn_relu = layers.ConvBNReLU(in_channels=3, out_channels=dw_channels1, kernel_size=3, stride=2)
self.dsconv_bn_relu1 = layers.SeparableConvBNReLU(
in_channels=dw_channels1, out_channels=dw_channels2, kernel_size=3, stride=2, padding=1)
self.dsconv_bn_relu2 = layers.SeparableConvBNReLU(
in_channels=dw_channels2, out_channels=out_channels, kernel_size=3, stride=2, padding=1)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.conv_bn_relu(x)
x = self.dsconv_bn_relu1(x)
x = self.dsconv_bn_relu2(x)
return x
class GlobalFeatureExtractor(nn.Layer):
"""
Global feature extractor module.
This module consists of three InvertedBottleneck blocks (like inverted residual introduced by MobileNetV2) and
a PPModule (introduced by PSPNet).
Args:
in_channels (int): The number of input channels to the module.
block_channels (tuple): A tuple represents output channels of each bottleneck block.
out_channels (int): The number of output channels of the module. Default:
expansion (int): The expansion factor in bottleneck.
num_blocks (tuple): It indicates the repeat time of each bottleneck.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
"""
def __init__(self, in_channels: int, block_channels: int, out_channels: int, expansion: int, num_blocks: Tuple[int],
align_corners: bool):
super(GlobalFeatureExtractor, self).__init__()
self.bottleneck1 = self._make_layer(InvertedBottleneck, in_channels, block_channels[0], num_blocks[0],
expansion, 2)
self.bottleneck2 = self._make_layer(InvertedBottleneck, block_channels[0], block_channels[1], num_blocks[1],
expansion, 2)
self.bottleneck3 = self._make_layer(InvertedBottleneck, block_channels[1], block_channels[2], num_blocks[2],
expansion, 1)
self.ppm = layers.PPModule(
block_channels[2], out_channels, bin_sizes=(1, 2, 3, 6), dim_reduction=True, align_corners=align_corners)
def _make_layer(self,
block: Callable,
in_channels: int,
out_channels: int,
blocks: int,
expansion: int = 6,
stride: int = 1):
layers = []
layers.append(block(in_channels, out_channels, expansion, stride))
for _ in range(1, blocks):
layers.append(block(out_channels, out_channels, expansion, 1))
return nn.Sequential(*layers)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.bottleneck1(x)
x = self.bottleneck2(x)
x = self.bottleneck3(x)
x = self.ppm(x)
return x
class InvertedBottleneck(nn.Layer):
"""
Single Inverted bottleneck implementation.
Args:
in_channels (int): The number of input channels to bottleneck block.
out_channels (int): The number of output channels of bottleneck block.
expansion (int, optional). The expansion factor in bottleneck. Default: 6.
stride (int, optional). The stride used in depth-wise conv. Defalt: 2.
"""
def __init__(self, in_channels: int, out_channels: int, expansion: int = 6, stride: int = 2):
super().__init__()
self.use_shortcut = stride == 1 and in_channels == out_channels
expand_channels = in_channels * expansion
self.block = nn.Sequential(
# pw
layers.ConvBNReLU(in_channels=in_channels, out_channels=expand_channels, kernel_size=1, bias_attr=False),
# dw
layers.ConvBNReLU(
in_channels=expand_channels,
out_channels=expand_channels,
kernel_size=3,
stride=stride,
padding=1,
groups=expand_channels,
bias_attr=False),
# pw-linear
layers.ConvBN(in_channels=expand_channels, out_channels=out_channels, kernel_size=1, bias_attr=False))
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
out = self.block(x)
if self.use_shortcut:
out = x + out
return out
class FeatureFusionModule(nn.Layer):
"""
Feature Fusion Module Implementation.
This module fuses high-resolution feature and low-resolution feature.
Args:
high_in_channels (int): The channels of high-resolution feature (output of LearningToDownsample).
low_in_channels (int): The channels of low-resolution feature (output of GlobalFeatureExtractor).
out_channels (int): The output channels of this module.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
"""
def __init__(self, high_in_channels: int, low_in_channels: int, out_channels: int, align_corners: bool):
super().__init__()
# Only depth-wise conv
self.dwconv = layers.ConvBNReLU(
in_channels=low_in_channels,
out_channels=out_channels,
kernel_size=3,
padding=1,
groups=128,
bias_attr=False)
self.conv_low_res = layers.ConvBN(out_channels, out_channels, 1)
self.conv_high_res = layers.ConvBN(high_in_channels, out_channels, 1)
self.align_corners = align_corners
def forward(self, high_res_input: int, low_res_input: int) -> paddle.Tensor:
low_res_input = F.interpolate(
low_res_input, paddle.shape(high_res_input)[2:], mode='bilinear', align_corners=self.align_corners)
low_res_input = self.dwconv(low_res_input)
low_res_input = self.conv_low_res(low_res_input)
high_res_input = self.conv_high_res(high_res_input)
x = high_res_input + low_res_input
return F.relu(x)
class Classifier(nn.Layer):
"""
The Classifier module implementation.
This module consists of two depth-wise conv and one conv.
Args:
input_channels (int): The input channels to this module.
num_classes (int): The unique number of target classes.
"""
def __init__(self, input_channels: int, num_classes: int):
super().__init__()
self.dsconv1 = layers.SeparableConvBNReLU(
in_channels=input_channels, out_channels=input_channels, kernel_size=3, padding=1)
self.dsconv2 = layers.SeparableConvBNReLU(
in_channels=input_channels, out_channels=input_channels, kernel_size=3, padding=1)
self.conv = nn.Conv2D(in_channels=input_channels, out_channels=num_classes, kernel_size=1)
self.dropout = nn.Dropout(p=0.1) # dropout_prob
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.dsconv1(x)
x = self.dsconv2(x)
x = self.dropout(x)
x = self.conv(x)
return x
# PaddleHub 图像分割
## 模型预测
若想使用我们提供的预训练模型进行预测,可使用如下脚本:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='fcn_hrnetw18_cityscapes')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
## 如何开始Fine-tune
在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用fcn_hrnetw18_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。
## 代码步骤
使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
### Step1: 定义数据预处理方式
```python
from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
transform = Compose([Resize(target_size=(512, 512)), Normalize()])
```
`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
### Step2: 下载数据集并使用
```python
from paddlehub.datasets import OpticDiscSeg
train_reader = OpticDiscSeg(transform mode='train')
```
* `transform`: 数据预处理方式。
* `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`
数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
### Step3: 加载预训练模型
```python
model = hub.Module(name='fcn_hrnetw18_cityscapes', num_classes=2, pretrained=None)
```
* `name`: 选择预训练模型的名字。
* `num_classes`: 分割模型的类别数目。
* `pretrained`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
### Step4: 选择优化策略和运行配置
```python
scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True)
```
#### 优化策略
Paddle2.0rc提供了多种优化器选择,如`SGD`, `Adam`, `Adamax`等, 其中`Adam`:
* `learning_rate`: 全局学习率。
* `parameters`: 待优化模型参数。
#### 运行配置
`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数:
* `model`: 被优化模型;
* `optimizer`: 优化器选择;
* `use_gpu`: 是否使用gpu,默认为False;
* `use_vdl`: 是否使用vdl可视化训练过程;
* `checkpoint_dir`: 保存模型参数的地址;
* `compare_metrics`: 保存最优模型的衡量指标;
`trainer.train` 主要控制具体的训练过程,包含以下可控制的参数:
* `train_dataset`: 训练时所用的数据集;
* `epochs`: 训练轮数;
* `batch_size`: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size;
* `num_workers`: works的数量,默认为0;
* `eval_dataset`: 验证集;
* `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。
* `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。
## 模型预测
当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
我们使用该模型来进行预测。predict.py脚本如下:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='fcn_hrnetw18_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
参数配置正确后,请执行脚本`python predict.py`
**Args**
* `images`:原始图像路径或BGR格式图片;
* `visualization`: 是否可视化,默认为True;
* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
## 服务部署
PaddleHub Serving可以部署一个在线图像分割服务。
### Step1: 启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m fcn_hrnetw18_cityscapes
```
这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
### Step2: 发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/fcn_hrnetw18_cityscapes"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
mask = base64_to_cv2(r.json()["results"][0])
```
### 查看代码
https://github.com/PaddlePaddle/PaddleSeg
### 依赖
paddlepaddle >= 2.0.0
paddlehub >= 2.0.0
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Tuple
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn import Conv2D, AvgPool2D
def SyncBatchNorm(*args, **kwargs):
"""In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
if paddle.get_device() == 'cpu':
return nn.BatchNorm2D(*args, **kwargs)
else:
return nn.SyncBatchNorm(*args, **kwargs)
class ConvBNLayer(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
stride: int = 1,
dilation: int = 1,
groups: int = 1,
is_vd_mode: bool = False,
act: str = None,
name: str = None):
super(ConvBNLayer, self).__init__()
self.is_vd_mode = is_vd_mode
self._pool2d_avg = AvgPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True)
self._conv = Conv2D(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
dilation=dilation,
groups=groups,
bias_attr=False)
self._batch_norm = SyncBatchNorm(out_channels)
self._act_op = Activation(act=act)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
if self.is_vd_mode:
inputs = self._pool2d_avg(inputs)
y = self._conv(inputs)
y = self._batch_norm(y)
y = self._act_op(y)
return y
class BottleneckBlock(nn.Layer):
"""Residual bottleneck block"""
def __init__(self,
in_channels: int,
out_channels: int,
stride: int,
shortcut: bool = True,
if_first: bool = False,
dilation: int = 1,
name: str = None):
super(BottleneckBlock, self).__init__()
self.conv0 = ConvBNLayer(
in_channels=in_channels, out_channels=out_channels, kernel_size=1, act='relu', name=name + "_branch2a")
self.dilation = dilation
self.conv1 = ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
stride=stride,
act='relu',
dilation=dilation,
name=name + "_branch2b")
self.conv2 = ConvBNLayer(
in_channels=out_channels, out_channels=out_channels * 4, kernel_size=1, act=None, name=name + "_branch2c")
if not shortcut:
self.short = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels * 4,
kernel_size=1,
stride=1,
is_vd_mode=False if if_first or stride == 1 else True,
name=name + "_branch1")
self.shortcut = shortcut
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv0(inputs)
if self.dilation > 1:
padding = self.dilation
y = F.pad(y, [padding, padding, padding, padding])
conv1 = self.conv1(y)
conv2 = self.conv2(conv1)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
y = paddle.add(x=short, y=conv2)
y = F.relu(y)
return y
class SeparableConvBNReLU(nn.Layer):
"""Depthwise Separable Convolution."""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
super(SeparableConvBNReLU, self).__init__()
self.depthwise_conv = ConvBN(
in_channels,
out_channels=in_channels,
kernel_size=kernel_size,
padding=padding,
groups=in_channels,
**kwargs)
self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.depthwise_conv(x)
x = self.piontwise_conv(x)
return x
class ConvBN(nn.Layer):
"""Basic conv bn layer"""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
super(ConvBN, self).__init__()
self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
return x
class ConvBNReLU(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
super(ConvBNReLU, self).__init__()
self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
x = F.relu(x)
return x
class Activation(nn.Layer):
"""
The wrapper of activations.
Args:
act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
'hsigmoid']. Default: None, means identical transformation.
Returns:
A callable object of Activation.
Raises:
KeyError: When parameter `act` is not in the optional range.
Examples:
from paddleseg.models.common.activation import Activation
relu = Activation("relu")
print(relu)
# <class 'paddle.nn.layer.activation.ReLU'>
sigmoid = Activation("sigmoid")
print(sigmoid)
# <class 'paddle.nn.layer.activation.Sigmoid'>
not_exit_one = Activation("not_exit_one")
# KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
# 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
# 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
"""
def __init__(self, act: str = None):
super(Activation, self).__init__()
self._act = act
upper_act_names = nn.layer.activation.__dict__.keys()
lower_act_names = [act.lower() for act in upper_act_names]
act_dict = dict(zip(lower_act_names, upper_act_names))
if act is not None:
if act in act_dict.keys():
act_name = act_dict[act]
self.act_func = eval("nn.layer.activation.{}()".format(act_name))
else:
raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys()))
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
if self._act is not None:
return self.act_func(x)
else:
return x
class ASPPModule(nn.Layer):
"""
Atrous Spatial Pyramid Pooling.
Args:
aspp_ratios (tuple): The dilation rate using in ASSP module.
in_channels (int): The number of input channels.
out_channels (int): The number of output channels.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
image_pooling (bool, optional): If augmented with image-level features. Default: False
"""
def __init__(self,
aspp_ratios: Tuple[int],
in_channels: int,
out_channels: int,
align_corners: bool,
use_sep_conv: bool = False,
image_pooling: bool = False):
super().__init__()
self.align_corners = align_corners
self.aspp_blocks = nn.LayerList()
for ratio in aspp_ratios:
if use_sep_conv and ratio > 1:
conv_func = SeparableConvBNReLU
else:
conv_func = ConvBNReLU
block = conv_func(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1 if ratio == 1 else 3,
dilation=ratio,
padding=0 if ratio == 1 else ratio)
self.aspp_blocks.append(block)
out_size = len(self.aspp_blocks)
if image_pooling:
self.global_avg_pool = nn.Sequential(
nn.AdaptiveAvgPool2D(output_size=(1, 1)),
ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
out_size += 1
self.image_pooling = image_pooling
self.conv_bn_relu = ConvBNReLU(in_channels=out_channels * out_size, out_channels=out_channels, kernel_size=1)
self.dropout = nn.Dropout(p=0.1) # drop rate
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
outputs = []
for block in self.aspp_blocks:
y = block(x)
y = F.interpolate(y, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
outputs.append(y)
if self.image_pooling:
img_avg = self.global_avg_pool(x)
img_avg = F.interpolate(img_avg, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
outputs.append(img_avg)
x = paddle.concat(outputs, axis=1)
x = self.conv_bn_relu(x)
x = self.dropout(x)
return x
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from typing import Union, List, Tuple
import paddle
from paddle import nn
import paddle.nn.functional as F
import numpy as np
from paddlehub.module.module import moduleinfo
import paddlehub.vision.segmentation_transforms as T
from paddlehub.module.cv_module import ImageSegmentationModule
from fcn_hrnetw18_cityscapes.hrnet import HRNet_W18
import fcn_hrnetw18_cityscapes.layers as layers
@moduleinfo(
name="fcn_hrnetw18_cityscapes",
type="CV/semantic_segmentation",
author="paddlepaddle",
author_email="",
summary="Fcn_hrnetw18 is a segmentation model.",
version="1.0.0",
meta=ImageSegmentationModule)
class FCN(nn.Layer):
"""
A simple implementation for FCN based on PaddlePaddle.
The original article refers to
Evan Shelhamer, et, al. "Fully Convolutional Networks for Semantic Segmentation"
(https://arxiv.org/abs/1411.4038).
Args:
num_classes (int): The unique number of target classes.
backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone.
Default: (-1, ).
channels (int, optional): The channels between conv layer and the last layer of FCNHead.
If None, it will be the number of channels of input features. Default: None.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
pretrained (str, optional): The path or url of pretrained model. Default: None
"""
def __init__(self,
num_classes: int = 19,
backbone_indices: Tuple[int] = (-1, ),
channels: int = None,
align_corners: bool = False,
pretrained: str = None):
super(FCN, self).__init__()
self.backbone = HRNet_W18()
backbone_channels = [self.backbone.feat_channels[i] for i in backbone_indices]
self.head = FCNHead(num_classes, backbone_indices, backbone_channels, channels)
self.align_corners = align_corners
self.transforms = T.Compose([T.Normalize()])
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'model.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
return self.transforms(img)
def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
feat_list = self.backbone(x)
logit_list = self.head(feat_list)
return [
F.interpolate(logit, paddle.shape(x)[2:], mode='bilinear', align_corners=self.align_corners)
for logit in logit_list
]
class FCNHead(nn.Layer):
"""
A simple implementation for FCNHead based on PaddlePaddle
Args:
num_classes (int): The unique number of target classes.
backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone.
Default: (-1, ).
backbone_channels (tuple): The values of backbone channels.
Default: (270, ).
channels (int, optional): The channels between conv layer and the last layer of FCNHead.
If None, it will be the number of channels of input features. Default: None.
pretrained (str, optional): The path of pretrained model. Default: None
"""
def __init__(self,
num_classes: int,
backbone_indices: Tuple[int] = (-1, ),
backbone_channels: Tuple[int] = (270, ),
channels: int = None):
super(FCNHead, self).__init__()
self.num_classes = num_classes
self.backbone_indices = backbone_indices
if channels is None:
channels = backbone_channels[0]
self.conv_1 = layers.ConvBNReLU(
in_channels=backbone_channels[0], out_channels=channels, kernel_size=1, padding='same', stride=1)
self.cls = nn.Conv2D(in_channels=channels, out_channels=self.num_classes, kernel_size=1, stride=1, padding=0)
def forward(self, feat_list: nn.Layer) -> List[paddle.Tensor]:
logit_list = []
x = feat_list[self.backbone_indices[0]]
x = self.conv_1(x)
logit = self.cls(x)
logit_list.append(logit)
return logit_list
# PaddleHub 图像分割
## 模型预测
若想使用我们提供的预训练模型进行预测,可使用如下脚本:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='fcn_hrnetw18_voc')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
## 如何开始Fine-tune
在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用fcn_hrnetw18_voc模型对OpticDiscSeg等数据集进行Fine-tune。
## 代码步骤
使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
### Step1: 定义数据预处理方式
```python
from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
transform = Compose([Resize(target_size=(512, 512)), Normalize()])
```
`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
### Step2: 下载数据集并使用
```python
from paddlehub.datasets import OpticDiscSeg
train_reader = OpticDiscSeg(transform mode='train')
```
* `transform`: 数据预处理方式。
* `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`
数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
### Step3: 加载预训练模型
```python
model = hub.Module(name='fcn_hrnetw18_voc', num_classes=2, pretrained=None)
```
* `name`: 选择预训练模型的名字。
* `num_classes`: 分割模型的类别数目。
* `pretrained`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
### Step4: 选择优化策略和运行配置
```python
scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True)
```
#### 优化策略
Paddle2.0rc提供了多种优化器选择,如`SGD`, `Adam`, `Adamax`等,其中`Adam`:
* `learning_rate`: 全局学习率。
* `parameters`: 待优化模型参数。
#### 运行配置
`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数:
* `model`: 被优化模型;
* `optimizer`: 优化器选择;
* `use_gpu`: 是否使用gpu,默认为False;
* `use_vdl`: 是否使用vdl可视化训练过程;
* `checkpoint_dir`: 保存模型参数的地址;
* `compare_metrics`: 保存最优模型的衡量指标;
`trainer.train` 主要控制具体的训练过程,包含以下可控制的参数:
* `train_dataset`: 训练时所用的数据集;
* `epochs`: 训练轮数;
* `batch_size`: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size;
* `num_workers`: works的数量,默认为0;
* `eval_dataset`: 验证集;
* `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。
* `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。
## 模型预测
当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
我们使用该模型来进行预测。predict.py脚本如下:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='fcn_hrnetw18_voc', pretrained='/PATH/TO/CHECKPOINT')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
参数配置正确后,请执行脚本`python predict.py`
**Args**
* `images`:原始图像路径或BGR格式图片;
* `visualization`: 是否可视化,默认为True;
* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
## 服务部署
PaddleHub Serving可以部署一个在线图像分割服务。
### Step1: 启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m fcn_hrnetw18_voc
```
这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
### Step2: 发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/fcn_hrnetw18_voc"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
mask = base64_to_cv2(r.json()["results"][0])
```
### 查看代码
https://github.com/PaddlePaddle/PaddleSeg
### 依赖
paddlepaddle >= 2.0.0
paddlehub >= 2.0.0
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Tuple
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn import Conv2D, AvgPool2D
def SyncBatchNorm(*args, **kwargs):
"""In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
if paddle.get_device() == 'cpu':
return nn.BatchNorm2D(*args, **kwargs)
else:
return nn.SyncBatchNorm(*args, **kwargs)
class ConvBNLayer(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
stride: int = 1,
dilation: int = 1,
groups: int = 1,
is_vd_mode: bool = False,
act: str = None,
name: str = None):
super(ConvBNLayer, self).__init__()
self.is_vd_mode = is_vd_mode
self._pool2d_avg = AvgPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True)
self._conv = Conv2D(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
dilation=dilation,
groups=groups,
bias_attr=False)
self._batch_norm = SyncBatchNorm(out_channels)
self._act_op = Activation(act=act)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
if self.is_vd_mode:
inputs = self._pool2d_avg(inputs)
y = self._conv(inputs)
y = self._batch_norm(y)
y = self._act_op(y)
return y
class BottleneckBlock(nn.Layer):
"""Residual bottleneck block"""
def __init__(self,
in_channels: int,
out_channels: int,
stride: int,
shortcut: bool = True,
if_first: bool = False,
dilation: int = 1,
name: str = None):
super(BottleneckBlock, self).__init__()
self.conv0 = ConvBNLayer(
in_channels=in_channels, out_channels=out_channels, kernel_size=1, act='relu', name=name + "_branch2a")
self.dilation = dilation
self.conv1 = ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
stride=stride,
act='relu',
dilation=dilation,
name=name + "_branch2b")
self.conv2 = ConvBNLayer(
in_channels=out_channels, out_channels=out_channels * 4, kernel_size=1, act=None, name=name + "_branch2c")
if not shortcut:
self.short = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels * 4,
kernel_size=1,
stride=1,
is_vd_mode=False if if_first or stride == 1 else True,
name=name + "_branch1")
self.shortcut = shortcut
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv0(inputs)
if self.dilation > 1:
padding = self.dilation
y = F.pad(y, [padding, padding, padding, padding])
conv1 = self.conv1(y)
conv2 = self.conv2(conv1)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
y = paddle.add(x=short, y=conv2)
y = F.relu(y)
return y
class SeparableConvBNReLU(nn.Layer):
"""Depthwise Separable Convolution."""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
super(SeparableConvBNReLU, self).__init__()
self.depthwise_conv = ConvBN(
in_channels,
out_channels=in_channels,
kernel_size=kernel_size,
padding=padding,
groups=in_channels,
**kwargs)
self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.depthwise_conv(x)
x = self.piontwise_conv(x)
return x
class ConvBN(nn.Layer):
"""Basic conv bn layer"""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
super(ConvBN, self).__init__()
self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
return x
class ConvBNReLU(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
super(ConvBNReLU, self).__init__()
self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
x = F.relu(x)
return x
class Activation(nn.Layer):
"""
The wrapper of activations.
Args:
act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
'hsigmoid']. Default: None, means identical transformation.
Returns:
A callable object of Activation.
Raises:
KeyError: When parameter `act` is not in the optional range.
Examples:
from paddleseg.models.common.activation import Activation
relu = Activation("relu")
print(relu)
# <class 'paddle.nn.layer.activation.ReLU'>
sigmoid = Activation("sigmoid")
print(sigmoid)
# <class 'paddle.nn.layer.activation.Sigmoid'>
not_exit_one = Activation("not_exit_one")
# KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
# 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
# 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
"""
def __init__(self, act: str = None):
super(Activation, self).__init__()
self._act = act
upper_act_names = nn.layer.activation.__dict__.keys()
lower_act_names = [act.lower() for act in upper_act_names]
act_dict = dict(zip(lower_act_names, upper_act_names))
if act is not None:
if act in act_dict.keys():
act_name = act_dict[act]
self.act_func = eval("nn.layer.activation.{}()".format(act_name))
else:
raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys()))
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
if self._act is not None:
return self.act_func(x)
else:
return x
class ASPPModule(nn.Layer):
"""
Atrous Spatial Pyramid Pooling.
Args:
aspp_ratios (tuple): The dilation rate using in ASSP module.
in_channels (int): The number of input channels.
out_channels (int): The number of output channels.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
image_pooling (bool, optional): If augmented with image-level features. Default: False
"""
def __init__(self,
aspp_ratios: Tuple[int],
in_channels: int,
out_channels: int,
align_corners: bool,
use_sep_conv: bool = False,
image_pooling: bool = False):
super().__init__()
self.align_corners = align_corners
self.aspp_blocks = nn.LayerList()
for ratio in aspp_ratios:
if use_sep_conv and ratio > 1:
conv_func = SeparableConvBNReLU
else:
conv_func = ConvBNReLU
block = conv_func(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1 if ratio == 1 else 3,
dilation=ratio,
padding=0 if ratio == 1 else ratio)
self.aspp_blocks.append(block)
out_size = len(self.aspp_blocks)
if image_pooling:
self.global_avg_pool = nn.Sequential(
nn.AdaptiveAvgPool2D(output_size=(1, 1)),
ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
out_size += 1
self.image_pooling = image_pooling
self.conv_bn_relu = ConvBNReLU(in_channels=out_channels * out_size, out_channels=out_channels, kernel_size=1)
self.dropout = nn.Dropout(p=0.1) # drop rate
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
outputs = []
for block in self.aspp_blocks:
y = block(x)
y = F.interpolate(y, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
outputs.append(y)
if self.image_pooling:
img_avg = self.global_avg_pool(x)
img_avg = F.interpolate(img_avg, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
outputs.append(img_avg)
x = paddle.concat(outputs, axis=1)
x = self.conv_bn_relu(x)
x = self.dropout(x)
return x
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from typing import Union, List, Tuple
import paddle
from paddle import nn
import paddle.nn.functional as F
import numpy as np
from paddlehub.module.module import moduleinfo
import paddlehub.vision.segmentation_transforms as T
from paddlehub.module.cv_module import ImageSegmentationModule
from fcn_hrnetw18_voc.hrnet import HRNet_W18
import fcn_hrnetw18_voc.layers as layers
@moduleinfo(
name="fcn_hrnetw18_voc",
type="CV/semantic_segmentation",
author="paddlepaddle",
author_email="",
summary="Fcn_hrnetw18 is a segmentation model.",
version="1.0.0",
meta=ImageSegmentationModule)
class FCN(nn.Layer):
"""
A simple implementation for FCN based on PaddlePaddle.
The original article refers to
Evan Shelhamer, et, al. "Fully Convolutional Networks for Semantic Segmentation"
(https://arxiv.org/abs/1411.4038).
Args:
num_classes (int): The unique number of target classes.
backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone.
Default: (-1, ).
channels (int, optional): The channels between conv layer and the last layer of FCNHead.
If None, it will be the number of channels of input features. Default: None.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
pretrained (str, optional): The path or url of pretrained model. Default: None
"""
def __init__(self,
num_classes: int = 21,
backbone_indices: Tuple[int] = (-1, ),
channels: int = None,
align_corners: bool = False,
pretrained: str = None):
super(FCN, self).__init__()
self.backbone = HRNet_W18()
backbone_channels = [self.backbone.feat_channels[i] for i in backbone_indices]
self.head = FCNHead(num_classes, backbone_indices, backbone_channels, channels)
self.align_corners = align_corners
self.transforms = T.Compose([T.Normalize()])
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'model.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
return self.transforms(img)
def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
feat_list = self.backbone(x)
logit_list = self.head(feat_list)
return [
F.interpolate(logit, paddle.shape(x)[2:], mode='bilinear', align_corners=self.align_corners)
for logit in logit_list
]
class FCNHead(nn.Layer):
"""
A simple implementation for FCNHead based on PaddlePaddle
Args:
num_classes (int): The unique number of target classes.
backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone.
Default: (-1, ).
backbone_channels (tuple): The values of backbone channels.
Default: (270, ).
channels (int, optional): The channels between conv layer and the last layer of FCNHead.
If None, it will be the number of channels of input features. Default: None.
pretrained (str, optional): The path of pretrained model. Default: None
"""
def __init__(self,
num_classes: int,
backbone_indices: Tuple[int] = (-1, ),
backbone_channels: Tuple[int] = (270, ),
channels: int = None):
super(FCNHead, self).__init__()
self.num_classes = num_classes
self.backbone_indices = backbone_indices
if channels is None:
channels = backbone_channels[0]
self.conv_1 = layers.ConvBNReLU(
in_channels=backbone_channels[0], out_channels=channels, kernel_size=1, padding='same', stride=1)
self.cls = nn.Conv2D(in_channels=channels, out_channels=self.num_classes, kernel_size=1, stride=1, padding=0)
def forward(self, feat_list: nn.Layer) -> List[paddle.Tensor]:
logit_list = []
x = feat_list[self.backbone_indices[0]]
x = self.conv_1(x)
logit = self.cls(x)
logit_list.append(logit)
return logit_list
# PaddleHub 图像分割
## 模型预测
若想使用我们提供的预训练模型进行预测,可使用如下脚本:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='fcn_hrnetw48_cityscapes')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
## 如何开始Fine-tune
在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用fcn_hrnetw48_cityscapes模型对OpticDiscSeg等数据集进行Fine-tune。
## 代码步骤
使用PaddleHub Fine-tune API进行Fine-tune可以分为4个步骤。
### Step1: 定义数据预处理方式
```python
from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
transform = Compose([Resize(target_size=(512, 512)), Normalize()])
```
`segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
### Step2: 下载数据集并使用
```python
from paddlehub.datasets import OpticDiscSeg
train_reader = OpticDiscSeg(transform mode='train')
```
* `transform`: 数据预处理方式。
* `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`
数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
### Step3: 加载预训练模型
```python
model = hub.Module(name='fcn_hrnetw48_cityscapes', num_classes=2, pretrained=None)
```
* `name`: 选择预训练模型的名字。
* `num_classes`: 分割模型的类别数目。
* `pretrained`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
### Step4: 选择优化策略和运行配置
```python
scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_ocr', use_gpu=True)
```
#### 优化策略
Paddle2.0提供了多种优化器选择,如`SGD`, `Adam`, `Adamax`等,其中`Adam`:
* `learning_rate`: 全局学习率。
* `parameters`: 待优化模型参数。
#### 运行配置
`Trainer` 主要控制Fine-tune的训练,包含以下可控制的参数:
* `model`: 被优化模型;
* `optimizer`: 优化器选择;
* `use_gpu`: 是否使用gpu,默认为False;
* `use_vdl`: 是否使用vdl可视化训练过程;
* `checkpoint_dir`: 保存模型参数的地址;
* `compare_metrics`: 保存最优模型的衡量指标;
`trainer.train` 主要控制具体的训练过程,包含以下可控制的参数:
* `train_dataset`: 训练时所用的数据集;
* `epochs`: 训练轮数;
* `batch_size`: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size;
* `num_workers`: works的数量,默认为0;
* `eval_dataset`: 验证集;
* `log_interval`: 打印日志的间隔, 单位为执行批训练的次数。
* `save_interval`: 保存模型的间隔频次,单位为执行训练的轮数。
## 模型预测
当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。
我们使用该模型来进行预测。predict.py脚本如下:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='fcn_hrnetw48_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
参数配置正确后,请执行脚本`python predict.py`
**Args**
* `images`:原始图像路径或BGR格式图片;
* `visualization`: 是否可视化,默认为True;
* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
## 服务部署
PaddleHub Serving可以部署一个在线图像分割服务。
### Step1: 启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m fcn_hrnetw48_cityscapes
```
这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
### Step2: 发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/fcn_hrnetw48_cityscapes"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
mask = base64_to_cv2(r.json()["results"][0])
```
### 查看代码
https://github.com/PaddlePaddle/PaddleSeg
### 依赖
paddlepaddle >= 2.0.0
paddlehub >= 2.0.0
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Tuple
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn import Conv2D, AvgPool2D
def SyncBatchNorm(*args, **kwargs):
"""In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
if paddle.get_device() == 'cpu':
return nn.BatchNorm2D(*args, **kwargs)
else:
return nn.SyncBatchNorm(*args, **kwargs)
class ConvBNLayer(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
stride: int = 1,
dilation: int = 1,
groups: int = 1,
is_vd_mode: bool = False,
act: str = None,
name: str = None):
super(ConvBNLayer, self).__init__()
self.is_vd_mode = is_vd_mode
self._pool2d_avg = AvgPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True)
self._conv = Conv2D(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
dilation=dilation,
groups=groups,
bias_attr=False)
self._batch_norm = SyncBatchNorm(out_channels)
self._act_op = Activation(act=act)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
if self.is_vd_mode:
inputs = self._pool2d_avg(inputs)
y = self._conv(inputs)
y = self._batch_norm(y)
y = self._act_op(y)
return y
class BottleneckBlock(nn.Layer):
"""Residual bottleneck block"""
def __init__(self,
in_channels: int,
out_channels: int,
stride: int,
shortcut: bool = True,
if_first: bool = False,
dilation: int = 1,
name: str = None):
super(BottleneckBlock, self).__init__()
self.conv0 = ConvBNLayer(
in_channels=in_channels, out_channels=out_channels, kernel_size=1, act='relu', name=name + "_branch2a")
self.dilation = dilation
self.conv1 = ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
stride=stride,
act='relu',
dilation=dilation,
name=name + "_branch2b")
self.conv2 = ConvBNLayer(
in_channels=out_channels, out_channels=out_channels * 4, kernel_size=1, act=None, name=name + "_branch2c")
if not shortcut:
self.short = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels * 4,
kernel_size=1,
stride=1,
is_vd_mode=False if if_first or stride == 1 else True,
name=name + "_branch1")
self.shortcut = shortcut
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv0(inputs)
if self.dilation > 1:
padding = self.dilation
y = F.pad(y, [padding, padding, padding, padding])
conv1 = self.conv1(y)
conv2 = self.conv2(conv1)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
y = paddle.add(x=short, y=conv2)
y = F.relu(y)
return y
class SeparableConvBNReLU(nn.Layer):
"""Depthwise Separable Convolution."""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
super(SeparableConvBNReLU, self).__init__()
self.depthwise_conv = ConvBN(
in_channels,
out_channels=in_channels,
kernel_size=kernel_size,
padding=padding,
groups=in_channels,
**kwargs)
self.piontwise_conv = ConvBNReLU(in_channels, out_channels, kernel_size=1, groups=1)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.depthwise_conv(x)
x = self.piontwise_conv(x)
return x
class ConvBN(nn.Layer):
"""Basic conv bn layer"""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
super(ConvBN, self).__init__()
self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
return x
class ConvBNReLU(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(self, in_channels: int, out_channels: int, kernel_size: int, padding: str = 'same', **kwargs: dict):
super(ConvBNReLU, self).__init__()
self._conv = Conv2D(in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
x = F.relu(x)
return x
class Activation(nn.Layer):
"""
The wrapper of activations.
Args:
act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
'hsigmoid']. Default: None, means identical transformation.
Returns:
A callable object of Activation.
Raises:
KeyError: When parameter `act` is not in the optional range.
Examples:
from paddleseg.models.common.activation import Activation
relu = Activation("relu")
print(relu)
# <class 'paddle.nn.layer.activation.ReLU'>
sigmoid = Activation("sigmoid")
print(sigmoid)
# <class 'paddle.nn.layer.activation.Sigmoid'>
not_exit_one = Activation("not_exit_one")
# KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
# 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
# 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
"""
def __init__(self, act: str = None):
super(Activation, self).__init__()
self._act = act
upper_act_names = nn.layer.activation.__dict__.keys()
lower_act_names = [act.lower() for act in upper_act_names]
act_dict = dict(zip(lower_act_names, upper_act_names))
if act is not None:
if act in act_dict.keys():
act_name = act_dict[act]
self.act_func = eval("nn.layer.activation.{}()".format(act_name))
else:
raise KeyError("{} does not exist in the current {}".format(act, act_dict.keys()))
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
if self._act is not None:
return self.act_func(x)
else:
return x
class ASPPModule(nn.Layer):
"""
Atrous Spatial Pyramid Pooling.
Args:
aspp_ratios (tuple): The dilation rate using in ASSP module.
in_channels (int): The number of input channels.
out_channels (int): The number of output channels.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
image_pooling (bool, optional): If augmented with image-level features. Default: False
"""
def __init__(self,
aspp_ratios: Tuple[int],
in_channels: int,
out_channels: int,
align_corners: bool,
use_sep_conv: bool = False,
image_pooling: bool = False):
super().__init__()
self.align_corners = align_corners
self.aspp_blocks = nn.LayerList()
for ratio in aspp_ratios:
if use_sep_conv and ratio > 1:
conv_func = SeparableConvBNReLU
else:
conv_func = ConvBNReLU
block = conv_func(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1 if ratio == 1 else 3,
dilation=ratio,
padding=0 if ratio == 1 else ratio)
self.aspp_blocks.append(block)
out_size = len(self.aspp_blocks)
if image_pooling:
self.global_avg_pool = nn.Sequential(
nn.AdaptiveAvgPool2D(output_size=(1, 1)),
ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
out_size += 1
self.image_pooling = image_pooling
self.conv_bn_relu = ConvBNReLU(in_channels=out_channels * out_size, out_channels=out_channels, kernel_size=1)
self.dropout = nn.Dropout(p=0.1) # drop rate
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
outputs = []
for block in self.aspp_blocks:
y = block(x)
y = F.interpolate(y, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
outputs.append(y)
if self.image_pooling:
img_avg = self.global_avg_pool(x)
img_avg = F.interpolate(img_avg, x.shape[2:], mode='bilinear', align_corners=self.align_corners)
outputs.append(img_avg)
x = paddle.concat(outputs, axis=1)
x = self.conv_bn_relu(x)
x = self.dropout(x)
return x
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from typing import Union, List, Tuple
import paddle
from paddle import nn
import paddle.nn.functional as F
import numpy as np
from paddlehub.module.module import moduleinfo
import paddlehub.vision.segmentation_transforms as T
from paddlehub.module.cv_module import ImageSegmentationModule
from fcn_hrnetw48_cityscapes.hrnet import HRNet_W48
import fcn_hrnetw48_cityscapes.layers as layers
@moduleinfo(
name="fcn_hrnetw48_cityscapes",
type="CV/semantic_segmentation",
author="paddlepaddle",
author_email="",
summary="Fcn_hrnetw48 is a segmentation model.",
version="1.0.0",
meta=ImageSegmentationModule)
class FCN(nn.Layer):
"""
A simple implementation for FCN based on PaddlePaddle.
The original article refers to
Evan Shelhamer, et, al. "Fully Convolutional Networks for Semantic Segmentation"
(https://arxiv.org/abs/1411.4038).
Args:
num_classes (int): The unique number of target classes.
backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone.
Default: (-1, ).
channels (int, optional): The channels between conv layer and the last layer of FCNHead.
If None, it will be the number of channels of input features. Default: None.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
pretrained (str, optional): The path or url of pretrained model. Default: None
"""
def __init__(self,
num_classes: int = 19,
backbone_indices: Tuple[int] = (-1, ),
channels: int = None,
align_corners: bool = False,
pretrained: str = None):
super(FCN, self).__init__()
self.backbone = HRNet_W48()
backbone_channels = [self.backbone.feat_channels[i] for i in backbone_indices]
self.head = FCNHead(num_classes, backbone_indices, backbone_channels, channels)
self.align_corners = align_corners
self.transforms = T.Compose([T.Normalize()])
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'model.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
return self.transforms(img)
def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
feat_list = self.backbone(x)
logit_list = self.head(feat_list)
return [
F.interpolate(logit, paddle.shape(x)[2:], mode='bilinear', align_corners=self.align_corners)
for logit in logit_list
]
class FCNHead(nn.Layer):
"""
A simple implementation for FCNHead based on PaddlePaddle
Args:
num_classes (int): The unique number of target classes.
backbone_indices (tuple, optional): The values in the tuple indicate the indices of output of backbone.
Default: (-1, ).
backbone_channels (tuple): The values of backbone channels.
Default: (270, ).
channels (int, optional): The channels between conv layer and the last layer of FCNHead.
If None, it will be the number of channels of input features. Default: None.
pretrained (str, optional): The path of pretrained model. Default: None
"""
def __init__(self,
num_classes: int,
backbone_indices: Tuple[int] = (-1, ),
backbone_channels: Tuple[int] = (270, ),
channels: int = None):
super(FCNHead, self).__init__()
self.num_classes = num_classes
self.backbone_indices = backbone_indices
if channels is None:
channels = backbone_channels[0]
self.conv_1 = layers.ConvBNReLU(
in_channels=backbone_channels[0], out_channels=channels, kernel_size=1, padding='same', stride=1)
self.cls = nn.Conv2D(in_channels=channels, out_channels=self.num_classes, kernel_size=1, stride=1, padding=0)
def forward(self, feat_list: nn.Layer) -> List[paddle.Tensor]:
logit_list = []
x = feat_list[self.backbone_indices[0]]
x = self.conv_1(x)
logit = self.cls(x)
logit_list.append(logit)
return logit_list
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册