未验证 提交 6c019346 编写于 作者: K KP 提交者: GitHub

Merge branch 'develop' into add_lapstyle_stars

# photopen
|模型名称|photopen|
| :--- | :---: |
|类别|图像 - 图像生成|
|网络|SPADEGenerator|
|数据集|coco_stuff|
|是否支持Fine-tuning|否|
|模型大小|74MB|
|最新更新日期|2021-12-14|
|数据指标|-|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例:
<p align="center">
<img src="https://camo.githubusercontent.com/22e94b0c7278af08da8c475a3d968ba2f3cd565fcb2ad6b9a165c8a65f2d12f8/68747470733a2f2f61692d73747564696f2d7374617469632d6f6e6c696e652e63646e2e626365626f732e636f6d2f39343733313032336561623934623162393762396361383062643362333038333063393138636631363264303436626438383534306464613435303239356133" width = "90%" hspace='10'/>
<br />
- ### 模型介绍
- 本模块采用一个像素风格迁移网络 Pix2PixHD,能够根据输入的语义分割标签生成照片风格的图片。为了解决模型归一化层导致标签语义信息丢失的问题,向 Pix2PixHD 的生成器网络中添加了 SPADE(Spatially-Adaptive
Normalization)空间自适应归一化模块,通过两个卷积层保留了归一化时训练的缩放与偏置参数的空间维度,以增强生成图片的质量。语义风格标签图像可以参考[coco_stuff数据集](https://github.com/nightrome/cocostuff)获取, 也可以通过[PaddleGAN repo中的该项目](https://github.com/PaddlePaddle/PaddleGAN/blob/87537ad9d4eeda17eaa5916c6a585534ab989ea8/docs/zh_CN/tutorials/photopen.md)来自定义生成图像进行体验。
## 二、安装
- ### 1、环境依赖
- ppgan
- ### 2、安装
- ```shell
$ hub install photopen
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
# Read from a file
$ hub run photopen --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现图像生成模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
module = hub.Module(name="photopen")
input_path = ["/PATH/TO/IMAGE"]
# Read from a file
module.photo_transfer(paths=input_path, output_dir='./transfer_result/', use_gpu=True)
```
- ### 3、API
- ```python
photo_transfer(images=None, paths=None, output_dir='./transfer_result/', use_gpu=False, visualization=True):
```
- 图像转换生成API。
- **参数**
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\];<br/>
- paths (list\[str\]): 图片的路径;<br/>
- output\_dir (str): 结果保存的路径; <br/>
- use\_gpu (bool): 是否使用 GPU;<br/>
- visualization(bool): 是否保存结果到本地文件夹
## 四、服务部署
- PaddleHub Serving可以部署一个在线图像转换生成服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m photopen
```
- 这样就完成了一个图像转换生成的在线服务API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/photopen"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
## 五、更新历史
* 1.0.0
初始发布
- ```shell
$ hub install photopen==1.0.0
```
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import cv2
import numpy as np
import paddle
from PIL import Image
from PIL import ImageOps
from ppgan.models.generators import SPADEGenerator
from ppgan.utils.filesystem import load
from ppgan.utils.photopen import data_onehot_pro
class PhotoPenPredictor:
def __init__(self, weight_path, gen_cfg):
# 初始化模型
gen = SPADEGenerator(
gen_cfg.ngf,
gen_cfg.num_upsampling_layers,
gen_cfg.crop_size,
gen_cfg.aspect_ratio,
gen_cfg.norm_G,
gen_cfg.semantic_nc,
gen_cfg.use_vae,
gen_cfg.nef,
)
gen.eval()
para = load(weight_path)
if 'net_gen' in para:
gen.set_state_dict(para['net_gen'])
else:
gen.set_state_dict(para)
self.gen = gen
self.gen_cfg = gen_cfg
def run(self, image):
sem = Image.fromarray(image).convert('L')
sem = sem.resize((self.gen_cfg.crop_size, self.gen_cfg.crop_size), Image.NEAREST)
sem = np.array(sem).astype('float32')
sem = paddle.to_tensor(sem)
sem = sem.reshape([1, 1, self.gen_cfg.crop_size, self.gen_cfg.crop_size])
one_hot = data_onehot_pro(sem, self.gen_cfg)
predicted = self.gen(one_hot)
pic = predicted.numpy()[0].reshape((3, 256, 256)).transpose((1, 2, 0))
pic = ((pic + 1.) / 2. * 255).astype('uint8')
return pic
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import copy
import os
import cv2
import numpy as np
import paddle
from ppgan.utils.config import get_config
from skimage.io import imread
from skimage.transform import rescale
from skimage.transform import resize
import paddlehub as hub
from .model import PhotoPenPredictor
from .util import base64_to_cv2
from paddlehub.module.module import moduleinfo
from paddlehub.module.module import runnable
from paddlehub.module.module import serving
@moduleinfo(
name="photopen", type="CV/style_transfer", author="paddlepaddle", author_email="", summary="", version="1.0.0")
class Photopen:
def __init__(self):
self.pretrained_model = os.path.join(self.directory, "photopen.pdparams")
cfg = get_config(os.path.join(self.directory, "photopen.yaml"))
self.network = PhotoPenPredictor(weight_path=self.pretrained_model, gen_cfg=cfg.predict)
def photo_transfer(self,
images: list = None,
paths: list = None,
output_dir: str = './transfer_result/',
use_gpu: bool = False,
visualization: bool = True):
'''
images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR(read by cv2).
paths (list[str]): paths to images
output_dir (str): the dir to save the results
use_gpu (bool): if True, use gpu to perform the computation, otherwise cpu.
visualization (bool): if True, save results in output_dir.
'''
results = []
paddle.disable_static()
place = 'gpu:0' if use_gpu else 'cpu'
place = paddle.set_device(place)
if images == None and paths == None:
print('No image provided. Please input an image or a image path.')
return
if images != None:
for image in images:
image = image[:, :, ::-1]
out = self.network.run(image)
results.append(out)
if paths != None:
for path in paths:
image = cv2.imread(path)[:, :, ::-1]
out = self.network.run(image)
results.append(out)
if visualization == True:
if not os.path.exists(output_dir):
os.makedirs(output_dir, exist_ok=True)
for i, out in enumerate(results):
if out is not None:
cv2.imwrite(os.path.join(output_dir, 'output_{}.png'.format(i)), out[:, :, ::-1])
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
self.args = self.parser.parse_args(argvs)
results = self.photo_transfer(
paths=[self.args.input_path],
output_dir=self.args.output_dir,
use_gpu=self.args.use_gpu,
visualization=self.args.visualization)
return results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.photo_transfer(images=images_decode, **kwargs)
tolist = [result.tolist() for result in results]
return tolist
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
self.arg_config_group.add_argument(
'--output_dir', type=str, default='transfer_result', help='output directory for saving result.')
self.arg_config_group.add_argument('--visualization', type=bool, default=False, help='save results or not.')
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
total_iters: 1
output_dir: output_dir
checkpoints_dir: checkpoints
model:
name: PhotoPenModel
generator:
name: SPADEGenerator
ngf: 24
num_upsampling_layers: normal
crop_size: 256
aspect_ratio: 1.0
norm_G: spectralspadebatch3x3
semantic_nc: 14
use_vae: False
nef: 16
discriminator:
name: MultiscaleDiscriminator
ndf: 128
num_D: 4
crop_size: 256
label_nc: 12
output_nc: 3
contain_dontcare_label: True
no_instance: False
n_layers_D: 6
criterion:
name: PhotoPenPerceptualLoss
crop_size: 224
lambda_vgg: 1.6
label_nc: 12
contain_dontcare_label: True
batchSize: 1
crop_size: 256
lambda_feat: 10.0
dataset:
train:
name: PhotoPenDataset
content_root: test/coco_stuff
load_size: 286
crop_size: 256
num_workers: 0
batch_size: 1
test:
name: PhotoPenDataset_test
content_root: test/coco_stuff
load_size: 286
crop_size: 256
num_workers: 0
batch_size: 1
lr_scheduler: # abundoned
name: LinearDecay
learning_rate: 0.0001
start_epoch: 99999
decay_epochs: 99999
# will get from real dataset
iters_per_epoch: 1
optimizer:
lr: 0.0001
optimG:
name: Adam
net_names:
- net_gen
beta1: 0.9
beta2: 0.999
optimD:
name: Adam
net_names:
- net_des
beta1: 0.9
beta2: 0.999
log_config:
interval: 1
visiual_interval: 1
snapshot_config:
interval: 1
predict:
name: SPADEGenerator
ngf: 24
num_upsampling_layers: normal
crop_size: 256
aspect_ratio: 1.0
norm_G: spectralspadebatch3x3
semantic_nc: 14
use_vae: False
nef: 16
contain_dontcare_label: True
label_nc: 12
batchSize: 1
import base64
import cv2
import numpy as np
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
paddlepaddle>=1.8.4
paddlehub>=1.8.0 paddlehub>=1.8.0
# face_parse
|模型名称|face_parse|
| :--- | :---: |
|类别|图像 - 人脸解析|
|网络|BiSeNet|
|数据集|COCO-Stuff|
|是否支持Fine-tuning|否|
|模型大小|77MB|
|最新更新日期|2021-12-07|
|数据指标|-|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例:
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/157190651-595b6964-97c5-4b0b-ac0a-c30c8520a972.png" width = "40%" hspace='10'/>
<br />
输入图像
<br />
<img src="https://user-images.githubusercontent.com/22424850/157192693-b3f737ed-1a24-4ef9-8454-bfd9d51755af.png" width = "40%" hspace='10'/>
<br />
输出图像
<br />
</p>
- ### 模型介绍
- 人脸解析是语义图像分割的一种特殊情况,人脸解析是计算人脸图像中不同语义成分(如头发、嘴唇、鼻子、眼睛等)的像素级标签映射。给定一个输入的人脸图像,人脸解析将为每个语义成分分配一个像素级标签。
## 二、安装
- ### 1、环境依赖
- ppgan
- dlib
- ### 2、安装
- ```shell
$ hub install face_parse
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
# Read from a file
$ hub run face_parse --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现人脸解析模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
module = hub.Module(name="face_parse")
input_path = ["/PATH/TO/IMAGE"]
# Read from a file
module.style_transfer(paths=input_path, output_dir='./transfer_result/', use_gpu=True)
```
- ### 3、API
- ```python
style_transfer(images=None, paths=None, output_dir='./transfer_result/', use_gpu=False, visualization=True):
```
- 人脸解析转换API。
- **参数**
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\];<br/>
- paths (list\[str\]): 图片的路径;<br/>
- output\_dir (str): 结果保存的路径; <br/>
- use\_gpu (bool): 是否使用 GPU;<br/>
- visualization(bool): 是否保存结果到本地文件夹
## 四、服务部署
- PaddleHub Serving可以部署一个在线人脸解析转换服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m face_parse
```
- 这样就完成了一个人脸解析转换的在线服务API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/face_parse"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
## 五、更新历史
* 1.0.0
初始发布
- ```shell
$ hub install face_parse==1.0.0
```
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import argparse
from PIL import Image
import numpy as np
import cv2
import ppgan.faceutils as futils
from ppgan.utils.preprocess import *
from ppgan.utils.visual import mask2image
class FaceParsePredictor:
def __init__(self):
self.input_size = (512, 512)
self.up_ratio = 0.6 / 0.85
self.down_ratio = 0.2 / 0.85
self.width_ratio = 0.2 / 0.85
self.face_parser = futils.mask.FaceParser()
def run(self, image):
image = Image.fromarray(image)
face = futils.dlib.detect(image)
if not face:
return
face_on_image = face[0]
image, face, crop_face = futils.dlib.crop(image, face_on_image, self.up_ratio, self.down_ratio,
self.width_ratio)
np_image = np.array(image)
mask = self.face_parser.parse(np.float32(cv2.resize(np_image, self.input_size)))
mask = cv2.resize(mask.numpy(), (256, 256))
mask = mask.astype(np.uint8)
mask = mask2image(mask)
return mask
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import copy
import os
import cv2
import numpy as np
import paddle
from skimage.io import imread
from skimage.transform import rescale
from skimage.transform import resize
import paddlehub as hub
from .model import FaceParsePredictor
from .util import base64_to_cv2
from paddlehub.module.module import moduleinfo
from paddlehub.module.module import runnable
from paddlehub.module.module import serving
@moduleinfo(
name="face_parse", type="CV/style_transfer", author="paddlepaddle", author_email="", summary="", version="1.0.0")
class Face_parse:
def __init__(self):
self.pretrained_model = os.path.join(self.directory, "bisenet.pdparams")
self.network = FaceParsePredictor()
def style_transfer(self,
images: list = None,
paths: list = None,
output_dir: str = './transfer_result/',
use_gpu: bool = False,
visualization: bool = True):
'''
images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR(read by cv2).
paths (list[str]): paths to images
output_dir (str): the dir to save the results
use_gpu (bool): if True, use gpu to perform the computation, otherwise cpu.
visualization (bool): if True, save results in output_dir.
'''
results = []
paddle.disable_static()
place = 'gpu:0' if use_gpu else 'cpu'
place = paddle.set_device(place)
if images == None and paths == None:
print('No image provided. Please input an image or a image path.')
return
if images != None:
for image in images:
image = image[:, :, ::-1]
out = self.network.run(image)
results.append(out)
if paths != None:
for path in paths:
image = cv2.imread(path)[:, :, ::-1]
out = self.network.run(image)
results.append(out)
if visualization == True:
if not os.path.exists(output_dir):
os.makedirs(output_dir, exist_ok=True)
for i, out in enumerate(results):
if out is not None:
cv2.imwrite(os.path.join(output_dir, 'output_{}.png'.format(i)), out[:, :, ::-1])
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
self.args = self.parser.parse_args(argvs)
results = self.style_transfer(
paths=[self.args.input_path],
output_dir=self.args.output_dir,
use_gpu=self.args.use_gpu,
visualization=self.args.visualization)
return results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.style_transfer(images=images_decode, **kwargs)
tolist = [result.tolist() for result in results]
return tolist
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
self.arg_config_group.add_argument(
'--output_dir', type=str, default='transfer_result', help='output directory for saving result.')
self.arg_config_group.add_argument('--visualization', type=bool, default=False, help='save results or not.')
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
import base64
import cv2
import numpy as np
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# lapstyle_circuit
|模型名称|lapstyle_circuit|
| :--- | :---: |
|类别|图像 - 风格迁移|
|网络|LapStyle|
|数据集|COCO|
|是否支持Fine-tuning|否|
|模型大小|121MB|
|最新更新日期|2021-12-07|
|数据指标|-|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例:
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/144995283-77ddba45-9efe-4f72-914c-1bff734372ed.png" width = "50%" hspace='10'/>
<br />
输入内容图形
<br />
<img src="https://user-images.githubusercontent.com/22424850/144997574-8b4028ad-d871-4caf-87d1-191582bba805.jpg" width = "50%" hspace='10'/>
<br />
输入风格图形
<br />
<img src="https://user-images.githubusercontent.com/22424850/144997589-407a12b9-95bf-44e7-b558-b1026ef3cd5a.png" width = "50%" hspace='10'/>
<br />
输出图像
<br />
</p>
- ### 模型介绍
- LapStyle--拉普拉斯金字塔风格化网络,是一种能够生成高质量风格化图的快速前馈风格化网络,能渐进地生成复杂的纹理迁移效果,同时能够在512分辨率下达到100fps的速度。可实现多种不同艺术风格的快速迁移,在艺术图像生成、滤镜等领域有广泛的应用。
- 更多详情参考:[Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer](https://arxiv.org/pdf/2104.05376.pdf)
## 二、安装
- ### 1、环境依赖
- ppgan
- ### 2、安装
- ```shell
$ hub install lapstyle_circuit
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
# Read from a file
$ hub run lapstyle_circuit --content "/PATH/TO/IMAGE" --style "/PATH/TO/IMAGE1"
```
- 通过命令行方式实现风格转换模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
module = hub.Module(name="lapstyle_circuit")
content = cv2.imread("/PATH/TO/IMAGE")
style = cv2.imread("/PATH/TO/IMAGE1")
results = module.style_transfer(images=[{'content':content, 'style':style}], output_dir='./transfer_result', use_gpu=True)
```
- ### 3、API
- ```python
style_transfer(images=None, paths=None, output_dir='./transfer_result/', use_gpu=False, visualization=True)
```
- 风格转换API。
- **参数**
- images (list[dict]): data of images, 每一个元素都为一个 dict,有关键字 content, style, 相应取值为:
- content (numpy.ndarray): 待转换的图片,shape 为 \[H, W, C\],BGR格式;<br/>
- style (numpy.ndarray) : 风格图像,shape为 \[H, W, C\],BGR格式;<br/>
- paths (list[str]): paths to images, 每一个元素都为一个dict, 有关键字 content, style, 相应取值为:
- content (str): 待转换的图片的路径;<br/>
- style (str) : 风格图像的路径;<br/>
- output\_dir (str): 结果保存的路径; <br/>
- use\_gpu (bool): 是否使用 GPU;<br/>
- visualization(bool): 是否保存结果到本地文件夹
## 四、服务部署
- PaddleHub Serving可以部署一个在线图像风格转换服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m lapstyle_circuit
```
- 这样就完成了一个图像风格转换的在线服务API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[{'content': cv2_to_base64(cv2.imread("/PATH/TO/IMAGE")), 'style': cv2_to_base64(cv2.imread("/PATH/TO/IMAGE1"))}]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/lapstyle_circuit"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
## 五、更新历史
* 1.0.0
初始发布
- ```shell
$ hub install lapstyle_circuit==1.0.0
```
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import urllib.request
import cv2 as cv
import numpy as np
import paddle
import paddle.nn.functional as F
from paddle.vision.transforms import functional
from PIL import Image
from ppgan.models.generators import DecoderNet
from ppgan.models.generators import Encoder
from ppgan.models.generators import RevisionNet
from ppgan.utils.visual import tensor2img
def img(img):
# some images have 4 channels
if img.shape[2] > 3:
img = img[:, :, :3]
# HWC to CHW
return img
def img_totensor(content_img, style_img):
if content_img.ndim == 2:
content_img = cv.cvtColor(content_img, cv.COLOR_GRAY2RGB)
else:
content_img = cv.cvtColor(content_img, cv.COLOR_BGR2RGB)
h, w, c = content_img.shape
content_img = Image.fromarray(content_img)
content_img = content_img.resize((512, 512), Image.BILINEAR)
content_img = np.array(content_img)
content_img = img(content_img)
content_img = functional.to_tensor(content_img)
style_img = cv.cvtColor(style_img, cv.COLOR_BGR2RGB)
style_img = Image.fromarray(style_img)
style_img = style_img.resize((512, 512), Image.BILINEAR)
style_img = np.array(style_img)
style_img = img(style_img)
style_img = functional.to_tensor(style_img)
content_img = paddle.unsqueeze(content_img, axis=0)
style_img = paddle.unsqueeze(style_img, axis=0)
return content_img, style_img, h, w
def tensor_resample(tensor, dst_size, mode='bilinear'):
return F.interpolate(tensor, dst_size, mode=mode, align_corners=False)
def laplacian(x):
"""
Laplacian
return:
x - upsample(downsample(x))
"""
return x - tensor_resample(tensor_resample(x, [x.shape[2] // 2, x.shape[3] // 2]), [x.shape[2], x.shape[3]])
def make_laplace_pyramid(x, levels):
"""
Make Laplacian Pyramid
"""
pyramid = []
current = x
for i in range(levels):
pyramid.append(laplacian(current))
current = tensor_resample(current, (max(current.shape[2] // 2, 1), max(current.shape[3] // 2, 1)))
pyramid.append(current)
return pyramid
def fold_laplace_pyramid(pyramid):
"""
Fold Laplacian Pyramid
"""
current = pyramid[-1]
for i in range(len(pyramid) - 2, -1, -1): # iterate from len-2 to 0
up_h, up_w = pyramid[i].shape[2], pyramid[i].shape[3]
current = pyramid[i] + tensor_resample(current, (up_h, up_w))
return current
class LapStylePredictor:
def __init__(self, weight_path=None):
self.net_enc = Encoder()
self.net_dec = DecoderNet()
self.net_rev = RevisionNet()
self.net_rev_2 = RevisionNet()
self.net_enc.set_dict(paddle.load(weight_path)['net_enc'])
self.net_enc.eval()
self.net_dec.set_dict(paddle.load(weight_path)['net_dec'])
self.net_dec.eval()
self.net_rev.set_dict(paddle.load(weight_path)['net_rev'])
self.net_rev.eval()
self.net_rev_2.set_dict(paddle.load(weight_path)['net_rev_2'])
self.net_rev_2.eval()
def run(self, content_img, style_image):
content_img, style_img, h, w = img_totensor(content_img, style_image)
pyr_ci = make_laplace_pyramid(content_img, 2)
pyr_si = make_laplace_pyramid(style_img, 2)
pyr_ci.append(content_img)
pyr_si.append(style_img)
cF = self.net_enc(pyr_ci[2])
sF = self.net_enc(pyr_si[2])
stylized_small = self.net_dec(cF, sF)
stylized_up = F.interpolate(stylized_small, scale_factor=2)
revnet_input = paddle.concat(x=[pyr_ci[1], stylized_up], axis=1)
stylized_rev_lap = self.net_rev(revnet_input)
stylized_rev = fold_laplace_pyramid([stylized_rev_lap, stylized_small])
stylized_up = F.interpolate(stylized_rev, scale_factor=2)
revnet_input = paddle.concat(x=[pyr_ci[0], stylized_up], axis=1)
stylized_rev_lap_second = self.net_rev_2(revnet_input)
stylized_rev_second = fold_laplace_pyramid([stylized_rev_lap_second, stylized_rev_lap, stylized_small])
stylized = stylized_rev_second
stylized_visual = tensor2img(stylized, min_max=(0., 1.))
return stylized_visual
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import copy
import os
import cv2
import numpy as np
import paddle
from skimage.io import imread
from skimage.transform import rescale
from skimage.transform import resize
import paddlehub as hub
from .model import LapStylePredictor
from .util import base64_to_cv2
from paddlehub.module.module import moduleinfo
from paddlehub.module.module import runnable
from paddlehub.module.module import serving
@moduleinfo(
name="lapstyle_circuit",
type="CV/style_transfer",
author="paddlepaddle",
author_email="",
summary="",
version="1.0.0")
class Lapstyle_circuit:
def __init__(self):
self.pretrained_model = os.path.join(self.directory, "lapstyle_circuit.pdparams")
self.network = LapStylePredictor(weight_path=self.pretrained_model)
def style_transfer(self,
images: list = None,
paths: list = None,
output_dir: str = './transfer_result/',
use_gpu: bool = False,
visualization: bool = True):
'''
Transfer a image to circuit style.
images (list[dict]): data of images, each element is a dict:
- content (numpy.ndarray): input image,shape is \[H, W, C\],BGR format;<br/>
- style (numpy.ndarray) : style image,shape is \[H, W, C\],BGR format;<br/>
paths (list[dict]): paths to images, eacg element is a dict:
- content (str): path to input image;<br/>
- style (str) : path to style image;<br/>
output_dir (str): the dir to save the results
use_gpu (bool): if True, use gpu to perform the computation, otherwise cpu.
visualization (bool): if True, save results in output_dir.
'''
results = []
paddle.disable_static()
place = 'gpu:0' if use_gpu else 'cpu'
place = paddle.set_device(place)
if images == None and paths == None:
print('No image provided. Please input an image or a image path.')
return
if images != None:
for image_dict in images:
content_img = image_dict['content']
style_img = image_dict['style']
results.append(self.network.run(content_img, style_img))
if paths != None:
for path_dict in paths:
content_img = cv2.imread(path_dict['content'])
style_img = cv2.imread(path_dict['style'])
results.append(self.network.run(content_img, style_img))
if visualization == True:
if not os.path.exists(output_dir):
os.makedirs(output_dir, exist_ok=True)
for i, out in enumerate(results):
cv2.imwrite(os.path.join(output_dir, 'output_{}.png'.format(i)), out[:, :, ::-1])
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
self.args = self.parser.parse_args(argvs)
self.style_transfer(
paths=[{
'content': self.args.content,
'style': self.args.style
}],
output_dir=self.args.output_dir,
use_gpu=self.args.use_gpu,
visualization=self.args.visualization)
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = copy.deepcopy(images)
for image in images_decode:
image['content'] = base64_to_cv2(image['content'])
image['style'] = base64_to_cv2(image['style'])
results = self.style_transfer(images_decode, **kwargs)
tolist = [result.tolist() for result in results]
return tolist
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
self.arg_config_group.add_argument(
'--output_dir', type=str, default='transfer_result', help='output directory for saving result.')
self.arg_config_group.add_argument('--visualization', type=bool, default=False, help='save results or not.')
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--content', type=str, help="path to content image.")
self.arg_input_group.add_argument('--style', type=str, help="path to style image.")
import base64
import cv2
import numpy as np
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# lapstyle_ocean
|模型名称|lapstyle_ocean|
| :--- | :---: |
|类别|图像 - 风格迁移|
|网络|LapStyle|
|数据集|COCO|
|是否支持Fine-tuning|否|
|模型大小|121MB|
|最新更新日期|2021-12-07|
|数据指标|-|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例:
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/144995283-77ddba45-9efe-4f72-914c-1bff734372ed.png" width = "50%" hspace='10'/>
<br />
输入内容图形
<br />
<img src="https://user-images.githubusercontent.com/22424850/144997958-9162c304-dff4-4048-a197-607882ded00c.png" width = "50%" hspace='10'/>
<br />
输入风格图形
<br />
<img src="https://user-images.githubusercontent.com/22424850/144997967-43d7579c-cc73-452e-a920-5759eb5a5d67.png" width = "50%" hspace='10'/>
<br />
输出图像
<br />
</p>
- ### 模型介绍
- LapStyle--拉普拉斯金字塔风格化网络,是一种能够生成高质量风格化图的快速前馈风格化网络,能渐进地生成复杂的纹理迁移效果,同时能够在512分辨率下达到100fps的速度。可实现多种不同艺术风格的快速迁移,在艺术图像生成、滤镜等领域有广泛的应用。
- 更多详情参考:[Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer](https://arxiv.org/pdf/2104.05376.pdf)
## 二、安装
- ### 1、环境依赖
- ppgan
- ### 2、安装
- ```shell
$ hub install lapstyle_ocean
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
# Read from a file
$ hub run lapstyle_ocean --content "/PATH/TO/IMAGE" --style "/PATH/TO/IMAGE1"
```
- 通过命令行方式实现风格转换模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
module = hub.Module(name="lapstyle_ocean")
content = cv2.imread("/PATH/TO/IMAGE")
style = cv2.imread("/PATH/TO/IMAGE1")
results = module.style_transfer(images=[{'content':content, 'style':style}], output_dir='./transfer_result', use_gpu=True)
```
- ### 3、API
- ```python
style_transfer(images=None, paths=None, output_dir='./transfer_result/', use_gpu=False, visualization=True)
```
- 风格转换API。
- **参数**
- images (list[dict]): data of images, 每一个元素都为一个 dict,有关键字 content, style, 相应取值为:
- content (numpy.ndarray): 待转换的图片,shape 为 \[H, W, C\],BGR格式;<br/>
- style (numpy.ndarray) : 风格图像,shape为 \[H, W, C\],BGR格式;<br/>
- paths (list[str]): paths to images, 每一个元素都为一个dict, 有关键字 content, style, 相应取值为:
- content (str): 待转换的图片的路径;<br/>
- style (str) : 风格图像的路径;<br/>
- output\_dir (str): 结果保存的路径; <br/>
- use\_gpu (bool): 是否使用 GPU;<br/>
- visualization(bool): 是否保存结果到本地文件夹
## 四、服务部署
- PaddleHub Serving可以部署一个在线图像风格转换服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m lapstyle_ocean
```
- 这样就完成了一个图像风格转换的在线服务API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[{'content': cv2_to_base64(cv2.imread("/PATH/TO/IMAGE")), 'style': cv2_to_base64(cv2.imread("/PATH/TO/IMAGE1"))}]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/lapstyle_ocean"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
## 五、更新历史
* 1.0.0
初始发布
- ```shell
$ hub install lapstyle_ocean==1.0.0
```
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import urllib.request
import cv2 as cv
import numpy as np
import paddle
import paddle.nn.functional as F
from paddle.vision.transforms import functional
from PIL import Image
from ppgan.models.generators import DecoderNet
from ppgan.models.generators import Encoder
from ppgan.models.generators import RevisionNet
from ppgan.utils.visual import tensor2img
def img(img):
# some images have 4 channels
if img.shape[2] > 3:
img = img[:, :, :3]
# HWC to CHW
return img
def img_totensor(content_img, style_img):
if content_img.ndim == 2:
content_img = cv.cvtColor(content_img, cv.COLOR_GRAY2RGB)
else:
content_img = cv.cvtColor(content_img, cv.COLOR_BGR2RGB)
h, w, c = content_img.shape
content_img = Image.fromarray(content_img)
content_img = content_img.resize((512, 512), Image.BILINEAR)
content_img = np.array(content_img)
content_img = img(content_img)
content_img = functional.to_tensor(content_img)
style_img = cv.cvtColor(style_img, cv.COLOR_BGR2RGB)
style_img = Image.fromarray(style_img)
style_img = style_img.resize((512, 512), Image.BILINEAR)
style_img = np.array(style_img)
style_img = img(style_img)
style_img = functional.to_tensor(style_img)
content_img = paddle.unsqueeze(content_img, axis=0)
style_img = paddle.unsqueeze(style_img, axis=0)
return content_img, style_img, h, w
def tensor_resample(tensor, dst_size, mode='bilinear'):
return F.interpolate(tensor, dst_size, mode=mode, align_corners=False)
def laplacian(x):
"""
Laplacian
return:
x - upsample(downsample(x))
"""
return x - tensor_resample(tensor_resample(x, [x.shape[2] // 2, x.shape[3] // 2]), [x.shape[2], x.shape[3]])
def make_laplace_pyramid(x, levels):
"""
Make Laplacian Pyramid
"""
pyramid = []
current = x
for i in range(levels):
pyramid.append(laplacian(current))
current = tensor_resample(current, (max(current.shape[2] // 2, 1), max(current.shape[3] // 2, 1)))
pyramid.append(current)
return pyramid
def fold_laplace_pyramid(pyramid):
"""
Fold Laplacian Pyramid
"""
current = pyramid[-1]
for i in range(len(pyramid) - 2, -1, -1): # iterate from len-2 to 0
up_h, up_w = pyramid[i].shape[2], pyramid[i].shape[3]
current = pyramid[i] + tensor_resample(current, (up_h, up_w))
return current
class LapStylePredictor:
def __init__(self, weight_path=None):
self.net_enc = Encoder()
self.net_dec = DecoderNet()
self.net_rev = RevisionNet()
self.net_rev_2 = RevisionNet()
self.net_enc.set_dict(paddle.load(weight_path)['net_enc'])
self.net_enc.eval()
self.net_dec.set_dict(paddle.load(weight_path)['net_dec'])
self.net_dec.eval()
self.net_rev.set_dict(paddle.load(weight_path)['net_rev'])
self.net_rev.eval()
self.net_rev_2.set_dict(paddle.load(weight_path)['net_rev_2'])
self.net_rev_2.eval()
def run(self, content_img, style_image):
content_img, style_img, h, w = img_totensor(content_img, style_image)
pyr_ci = make_laplace_pyramid(content_img, 2)
pyr_si = make_laplace_pyramid(style_img, 2)
pyr_ci.append(content_img)
pyr_si.append(style_img)
cF = self.net_enc(pyr_ci[2])
sF = self.net_enc(pyr_si[2])
stylized_small = self.net_dec(cF, sF)
stylized_up = F.interpolate(stylized_small, scale_factor=2)
revnet_input = paddle.concat(x=[pyr_ci[1], stylized_up], axis=1)
stylized_rev_lap = self.net_rev(revnet_input)
stylized_rev = fold_laplace_pyramid([stylized_rev_lap, stylized_small])
stylized_up = F.interpolate(stylized_rev, scale_factor=2)
revnet_input = paddle.concat(x=[pyr_ci[0], stylized_up], axis=1)
stylized_rev_lap_second = self.net_rev_2(revnet_input)
stylized_rev_second = fold_laplace_pyramid([stylized_rev_lap_second, stylized_rev_lap, stylized_small])
stylized = stylized_rev_second
stylized_visual = tensor2img(stylized, min_max=(0., 1.))
return stylized_visual
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import copy
import os
import cv2
import numpy as np
import paddle
from skimage.io import imread
from skimage.transform import rescale
from skimage.transform import resize
import paddlehub as hub
from .model import LapStylePredictor
from .util import base64_to_cv2
from paddlehub.module.module import moduleinfo
from paddlehub.module.module import runnable
from paddlehub.module.module import serving
@moduleinfo(
name="lapstyle_ocean",
type="CV/style_transfer",
author="paddlepaddle",
author_email="",
summary="",
version="1.0.0")
class Lapstyle_ocean:
def __init__(self):
self.pretrained_model = os.path.join(self.directory, "lapstyle_ocean.pdparams")
self.network = LapStylePredictor(weight_path=self.pretrained_model)
def style_transfer(self,
images: list = None,
paths: list = None,
output_dir: str = './transfer_result/',
use_gpu: bool = False,
visualization: bool = True):
'''
Transfer a image to ocean style.
images (list[dict]): data of images, each element is a dict:
- content (numpy.ndarray): input image,shape is \[H, W, C\],BGR format;<br/>
- style (numpy.ndarray) : style image,shape is \[H, W, C\],BGR format;<br/>
paths (list[dict]): paths to images, eacg element is a dict:
- content (str): path to input image;<br/>
- style (str) : path to style image;<br/>
output_dir (str): the dir to save the results
use_gpu (bool): if True, use gpu to perform the computation, otherwise cpu.
visualization (bool): if True, save results in output_dir.
'''
results = []
paddle.disable_static()
place = 'gpu:0' if use_gpu else 'cpu'
place = paddle.set_device(place)
if images == None and paths == None:
print('No image provided. Please input an image or a image path.')
return
if images != None:
for image_dict in images:
content_img = image_dict['content']
style_img = image_dict['style']
results.append(self.network.run(content_img, style_img))
if paths != None:
for path_dict in paths:
content_img = cv2.imread(path_dict['content'])
style_img = cv2.imread(path_dict['style'])
results.append(self.network.run(content_img, style_img))
if visualization == True:
if not os.path.exists(output_dir):
os.makedirs(output_dir, exist_ok=True)
for i, out in enumerate(results):
cv2.imwrite(os.path.join(output_dir, 'output_{}.png'.format(i)), out[:, :, ::-1])
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
self.args = self.parser.parse_args(argvs)
self.style_transfer(
paths=[{
'content': self.args.content,
'style': self.args.style
}],
output_dir=self.args.output_dir,
use_gpu=self.args.use_gpu,
visualization=self.args.visualization)
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = copy.deepcopy(images)
for image in images_decode:
image['content'] = base64_to_cv2(image['content'])
image['style'] = base64_to_cv2(image['style'])
results = self.style_transfer(images_decode, **kwargs)
tolist = [result.tolist() for result in results]
return tolist
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
self.arg_config_group.add_argument(
'--output_dir', type=str, default='transfer_result', help='output directory for saving result.')
self.arg_config_group.add_argument('--visualization', type=bool, default=False, help='save results or not.')
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--content', type=str, help="path to content image.")
self.arg_input_group.add_argument('--style', type=str, help="path to style image.")
import base64
import cv2
import numpy as np
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# lapstyle_starrynew
|模型名称|lapstyle_starrynew|
| :--- | :---: |
|类别|图像 - 风格迁移|
|网络|LapStyle|
|数据集|COCO|
|是否支持Fine-tuning|否|
|模型大小|121MB|
|最新更新日期|2021-12-07|
|数据指标|-|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例:
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/144995283-77ddba45-9efe-4f72-914c-1bff734372ed.png" width = "50%" hspace='10'/>
<br />
输入内容图形
<br />
<img src="https://user-images.githubusercontent.com/22424850/144995349-59651a1d-7be4-479f-ad58-063b4fc6dded.png" width = "50%" hspace='10'/>
<br />
输入风格图形
<br />
<img src="https://user-images.githubusercontent.com/22424850/144995779-bb87c39e-643c-4c75-be49-7de5f8b52a17.png" width = "50%" hspace='10'/>
<br />
输出图像
<br />
</p>
- ### 模型介绍
- LapStyle--拉普拉斯金字塔风格化网络,是一种能够生成高质量风格化图的快速前馈风格化网络,能渐进地生成复杂的纹理迁移效果,同时能够在512分辨率下达到100fps的速度。可实现多种不同艺术风格的快速迁移,在艺术图像生成、滤镜等领域有广泛的应用。
- 更多详情参考:[Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer](https://arxiv.org/pdf/2104.05376.pdf)
## 二、安装
- ### 1、环境依赖
- ppgan
- ### 2、安装
- ```shell
$ hub install lapstyle_starrynew
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
# Read from a file
$ hub run lapstyle_starrynew --content "/PATH/TO/IMAGE" --style "/PATH/TO/IMAGE1"
```
- 通过命令行方式实现风格转换模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
module = hub.Module(name="lapstyle_starrynew")
content = cv2.imread("/PATH/TO/IMAGE")
style = cv2.imread("/PATH/TO/IMAGE1")
results = module.style_transfer(images=[{'content':content, 'style':style}], output_dir='./transfer_result', use_gpu=True)
```
- ### 3、API
- ```python
style_transfer(images=None, paths=None, output_dir='./transfer_result/', use_gpu=False, visualization=True)
```
- 风格转换API。
- **参数**
- images (list[dict]): data of images, 每一个元素都为一个 dict,有关键字 content, style, 相应取值为:
- content (numpy.ndarray): 待转换的图片,shape 为 \[H, W, C\],BGR格式;<br/>
- style (numpy.ndarray) : 风格图像,shape为 \[H, W, C\],BGR格式;<br/>
- paths (list[str]): paths to images, 每一个元素都为一个dict, 有关键字 content, style, 相应取值为:
- content (str): 待转换的图片的路径;<br/>
- style (str) : 风格图像的路径;<br/>
- output\_dir (str): 结果保存的路径; <br/>
- use\_gpu (bool): 是否使用 GPU;<br/>
- visualization(bool): 是否保存结果到本地文件夹
## 四、服务部署
- PaddleHub Serving可以部署一个在线图像风格转换服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m lapstyle_starrynew
```
- 这样就完成了一个图像风格转换的在线服务API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[{'content': cv2_to_base64(cv2.imread("/PATH/TO/IMAGE")), 'style': cv2_to_base64(cv2.imread("/PATH/TO/IMAGE1"))}]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/lapstyle_starrynew"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
## 五、更新历史
* 1.0.0
初始发布
- ```shell
$ hub install lapstyle_starrynew==1.0.0
```
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import urllib.request
import cv2 as cv
import numpy as np
import paddle
import paddle.nn.functional as F
from paddle.vision.transforms import functional
from PIL import Image
from ppgan.models.generators import DecoderNet
from ppgan.models.generators import Encoder
from ppgan.models.generators import RevisionNet
from ppgan.utils.visual import tensor2img
def img(img):
# some images have 4 channels
if img.shape[2] > 3:
img = img[:, :, :3]
# HWC to CHW
return img
def img_totensor(content_img, style_img):
if content_img.ndim == 2:
content_img = cv.cvtColor(content_img, cv.COLOR_GRAY2RGB)
else:
content_img = cv.cvtColor(content_img, cv.COLOR_BGR2RGB)
h, w, c = content_img.shape
content_img = Image.fromarray(content_img)
content_img = content_img.resize((512, 512), Image.BILINEAR)
content_img = np.array(content_img)
content_img = img(content_img)
content_img = functional.to_tensor(content_img)
style_img = cv.cvtColor(style_img, cv.COLOR_BGR2RGB)
style_img = Image.fromarray(style_img)
style_img = style_img.resize((512, 512), Image.BILINEAR)
style_img = np.array(style_img)
style_img = img(style_img)
style_img = functional.to_tensor(style_img)
content_img = paddle.unsqueeze(content_img, axis=0)
style_img = paddle.unsqueeze(style_img, axis=0)
return content_img, style_img, h, w
def tensor_resample(tensor, dst_size, mode='bilinear'):
return F.interpolate(tensor, dst_size, mode=mode, align_corners=False)
def laplacian(x):
"""
Laplacian
return:
x - upsample(downsample(x))
"""
return x - tensor_resample(tensor_resample(x, [x.shape[2] // 2, x.shape[3] // 2]), [x.shape[2], x.shape[3]])
def make_laplace_pyramid(x, levels):
"""
Make Laplacian Pyramid
"""
pyramid = []
current = x
for i in range(levels):
pyramid.append(laplacian(current))
current = tensor_resample(current, (max(current.shape[2] // 2, 1), max(current.shape[3] // 2, 1)))
pyramid.append(current)
return pyramid
def fold_laplace_pyramid(pyramid):
"""
Fold Laplacian Pyramid
"""
current = pyramid[-1]
for i in range(len(pyramid) - 2, -1, -1): # iterate from len-2 to 0
up_h, up_w = pyramid[i].shape[2], pyramid[i].shape[3]
current = pyramid[i] + tensor_resample(current, (up_h, up_w))
return current
class LapStylePredictor:
def __init__(self, weight_path=None):
self.net_enc = Encoder()
self.net_dec = DecoderNet()
self.net_rev = RevisionNet()
self.net_rev_2 = RevisionNet()
self.net_enc.set_dict(paddle.load(weight_path)['net_enc'])
self.net_enc.eval()
self.net_dec.set_dict(paddle.load(weight_path)['net_dec'])
self.net_dec.eval()
self.net_rev.set_dict(paddle.load(weight_path)['net_rev'])
self.net_rev.eval()
self.net_rev_2.set_dict(paddle.load(weight_path)['net_rev_2'])
self.net_rev_2.eval()
def run(self, content_img, style_image):
content_img, style_img, h, w = img_totensor(content_img, style_image)
pyr_ci = make_laplace_pyramid(content_img, 2)
pyr_si = make_laplace_pyramid(style_img, 2)
pyr_ci.append(content_img)
pyr_si.append(style_img)
cF = self.net_enc(pyr_ci[2])
sF = self.net_enc(pyr_si[2])
stylized_small = self.net_dec(cF, sF)
stylized_up = F.interpolate(stylized_small, scale_factor=2)
revnet_input = paddle.concat(x=[pyr_ci[1], stylized_up], axis=1)
stylized_rev_lap = self.net_rev(revnet_input)
stylized_rev = fold_laplace_pyramid([stylized_rev_lap, stylized_small])
stylized_up = F.interpolate(stylized_rev, scale_factor=2)
revnet_input = paddle.concat(x=[pyr_ci[0], stylized_up], axis=1)
stylized_rev_lap_second = self.net_rev_2(revnet_input)
stylized_rev_second = fold_laplace_pyramid([stylized_rev_lap_second, stylized_rev_lap, stylized_small])
stylized = stylized_rev_second
stylized_visual = tensor2img(stylized, min_max=(0., 1.))
return stylized_visual
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import copy
import os
import cv2
import numpy as np
import paddle
from skimage.io import imread
from skimage.transform import rescale
from skimage.transform import resize
import paddlehub as hub
from .model import LapStylePredictor
from .util import base64_to_cv2
from paddlehub.module.module import moduleinfo
from paddlehub.module.module import runnable
from paddlehub.module.module import serving
@moduleinfo(
name="lapstyle_starrynew",
type="CV/style_transfer",
author="paddlepaddle",
author_email="",
summary="",
version="1.0.0")
class Lapstyle_starrynew:
def __init__(self):
self.pretrained_model = os.path.join(self.directory, "lapstyle_starrynew.pdparams")
self.network = LapStylePredictor(weight_path=self.pretrained_model)
def style_transfer(self,
images: list = None,
paths: list = None,
output_dir: str = './transfer_result/',
use_gpu: bool = False,
visualization: bool = True):
'''
Transfer a image to starrynew style.
images (list[dict]): data of images, each element is a dict:
- content (numpy.ndarray): input image,shape is \[H, W, C\],BGR format;<br/>
- style (numpy.ndarray) : style image,shape is \[H, W, C\],BGR format;<br/>
paths (list[dict]): paths to images, eacg element is a dict:
- content (str): path to input image;<br/>
- style (str) : path to style image;<br/>
output_dir (str): the dir to save the results
use_gpu (bool): if True, use gpu to perform the computation, otherwise cpu.
visualization (bool): if True, save results in output_dir.
'''
results = []
paddle.disable_static()
place = 'gpu:0' if use_gpu else 'cpu'
place = paddle.set_device(place)
if images == None and paths == None:
print('No image provided. Please input an image or a image path.')
return
if images != None:
for image_dict in images:
content_img = image_dict['content']
style_img = image_dict['style']
results.append(self.network.run(content_img, style_img))
if paths != None:
for path_dict in paths:
content_img = cv2.imread(path_dict['content'])
style_img = cv2.imread(path_dict['style'])
results.append(self.network.run(content_img, style_img))
if visualization == True:
if not os.path.exists(output_dir):
os.makedirs(output_dir, exist_ok=True)
for i, out in enumerate(results):
cv2.imwrite(os.path.join(output_dir, 'output_{}.png'.format(i)), out[:, :, ::-1])
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
self.args = self.parser.parse_args(argvs)
self.style_transfer(
paths=[{
'content': self.args.content,
'style': self.args.style
}],
output_dir=self.args.output_dir,
use_gpu=self.args.use_gpu,
visualization=self.args.visualization)
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = copy.deepcopy(images)
for image in images_decode:
image['content'] = base64_to_cv2(image['content'])
image['style'] = base64_to_cv2(image['style'])
results = self.style_transfer(images_decode, **kwargs)
tolist = [result.tolist() for result in results]
return tolist
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
self.arg_config_group.add_argument(
'--output_dir', type=str, default='transfer_result', help='output directory for saving result.')
self.arg_config_group.add_argument('--visualization', type=bool, default=False, help='save results or not.')
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--content', type=str, help="path to content image.")
self.arg_input_group.add_argument('--style', type=str, help="path to style image.")
import base64
import cv2
import numpy as np
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# paint_transformer
|模型名称|paint_transformer|
| :--- | :---: |
|类别|图像 - 风格转换|
|网络|Paint Transformer|
|数据集|百度自建数据集|
|是否支持Fine-tuning|否|
|模型大小|77MB|
|最新更新日期|2021-12-07|
|数据指标|-|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例:
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/145002878-ffdeea71-8ff4-48cc-88d0-fba1aa1dce4b.jpg" width = "40%" hspace='10'/>
<br />
输入图像
<br />
<img src="https://user-images.githubusercontent.com/22424850/145002301-97c45887-cb2e-4a06-9d00-07b74080effa.png" width = "40%" hspace='10'/>
<br />
输出图像
<br />
</p>
- ### 模型介绍
- 该模型可以实现图像油画风格的转换。
- 更多详情参考:[Paint Transformer: Feed Forward Neural Painting with Stroke Prediction](https://github.com/wzmsltw/PaintTransformer)
## 二、安装
- ### 1、环境依赖
- ppgan
- ### 2、安装
- ```shell
$ hub install paint_transformer
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
# Read from a file
$ hub run paint_transformer --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现风格转换模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
module = hub.Module(name="paint_transformer")
input_path = ["/PATH/TO/IMAGE"]
# Read from a file
module.style_transfer(paths=input_path, output_dir='./transfer_result/', use_gpu=True)
```
- ### 3、API
- ```python
style_transfer(images=None, paths=None, output_dir='./transfer_result/', use_gpu=False, need_animation=False, visualization=True):
```
- 油画风格转换API。
- **参数**
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\];<br/>
- paths (list\[str\]): 图片的路径;<br/>
- output\_dir (str): 结果保存的路径; <br/>
- use\_gpu (bool): 是否使用 GPU;<br/>
- need_animation(bool): 是否保存中间结果形成动画
- visualization(bool): 是否保存结果到本地文件夹
## 四、服务部署
- PaddleHub Serving可以部署一个在线油画风格转换服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m paint_transformer
```
- 这样就完成了一个油画风格转换的在线服务API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/paint_transformer"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
## 五、更新历史
* 1.0.0
初始发布
- ```shell
$ hub install paint_transformer==1.0.0
```
import numpy as np
from PIL import Image
import network
import os
import math
import render_utils
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
import cv2
import render_parallel
import render_serial
def main(input_path, model_path, output_dir, need_animation=False, resize_h=None, resize_w=None, serial=False):
if not os.path.exists(output_dir):
os.mkdir(output_dir)
input_name = os.path.basename(input_path)
output_path = os.path.join(output_dir, input_name)
frame_dir = None
if need_animation:
if not serial:
print('It must be under serial mode if animation results are required, so serial flag is set to True!')
serial = True
frame_dir = os.path.join(output_dir, input_name[:input_name.find('.')])
if not os.path.exists(frame_dir):
os.mkdir(frame_dir)
stroke_num = 8
#* ----- load model ----- *#
paddle.set_device('gpu')
net_g = network.Painter(5, stroke_num, 256, 8, 3, 3)
net_g.set_state_dict(paddle.load(model_path))
net_g.eval()
for param in net_g.parameters():
param.stop_gradient = True
#* ----- load brush ----- *#
brush_large_vertical = render_utils.read_img('brush/brush_large_vertical.png', 'L')
brush_large_horizontal = render_utils.read_img('brush/brush_large_horizontal.png', 'L')
meta_brushes = paddle.concat([brush_large_vertical, brush_large_horizontal], axis=0)
import time
t0 = time.time()
original_img = render_utils.read_img(input_path, 'RGB', resize_h, resize_w)
if serial:
final_result_list = render_serial.render_serial(original_img, net_g, meta_brushes)
if need_animation:
print("total frame:", len(final_result_list))
for idx, frame in enumerate(final_result_list):
cv2.imwrite(os.path.join(frame_dir, '%03d.png' % idx), frame)
else:
cv2.imwrite(output_path, final_result_list[-1])
else:
final_result = render_parallel.render_parallel(original_img, net_g, meta_brushes)
cv2.imwrite(output_path, final_result)
print("total infer time:", time.time() - t0)
if __name__ == '__main__':
main(
input_path='input/chicago.jpg',
model_path='paint_best.pdparams',
output_dir='output/',
need_animation=True, # whether need intermediate results for animation.
resize_h=512, # resize original input to this size. None means do not resize.
resize_w=512, # resize original input to this size. None means do not resize.
serial=True) # if need animation, serial must be True.
import paddle
import paddle.nn as nn
import math
class Painter(nn.Layer):
"""
network architecture written in paddle.
"""
def __init__(self, param_per_stroke, total_strokes, hidden_dim, n_heads=8, n_enc_layers=3, n_dec_layers=3):
super().__init__()
self.enc_img = nn.Sequential(
nn.Pad2D([1, 1, 1, 1], 'reflect'),
nn.Conv2D(3, 32, 3, 1),
nn.BatchNorm2D(32),
nn.ReLU(), # maybe replace with the inplace version
nn.Pad2D([1, 1, 1, 1], 'reflect'),
nn.Conv2D(32, 64, 3, 2),
nn.BatchNorm2D(64),
nn.ReLU(),
nn.Pad2D([1, 1, 1, 1], 'reflect'),
nn.Conv2D(64, 128, 3, 2),
nn.BatchNorm2D(128),
nn.ReLU())
self.enc_canvas = nn.Sequential(
nn.Pad2D([1, 1, 1, 1], 'reflect'), nn.Conv2D(3, 32, 3, 1), nn.BatchNorm2D(32), nn.ReLU(),
nn.Pad2D([1, 1, 1, 1], 'reflect'), nn.Conv2D(32, 64, 3, 2), nn.BatchNorm2D(64), nn.ReLU(),
nn.Pad2D([1, 1, 1, 1], 'reflect'), nn.Conv2D(64, 128, 3, 2), nn.BatchNorm2D(128), nn.ReLU())
self.conv = nn.Conv2D(128 * 2, hidden_dim, 1)
self.transformer = nn.Transformer(hidden_dim, n_heads, n_enc_layers, n_dec_layers)
self.linear_param = nn.Sequential(
nn.Linear(hidden_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, hidden_dim), nn.ReLU(),
nn.Linear(hidden_dim, param_per_stroke))
self.linear_decider = nn.Linear(hidden_dim, 1)
self.query_pos = paddle.static.create_parameter([total_strokes, hidden_dim],
dtype='float32',
default_initializer=nn.initializer.Uniform(0, 1))
self.row_embed = paddle.static.create_parameter([8, hidden_dim // 2],
dtype='float32',
default_initializer=nn.initializer.Uniform(0, 1))
self.col_embed = paddle.static.create_parameter([8, hidden_dim // 2],
dtype='float32',
default_initializer=nn.initializer.Uniform(0, 1))
def forward(self, img, canvas):
"""
prediction
"""
b, _, H, W = img.shape
img_feat = self.enc_img(img)
canvas_feat = self.enc_canvas(canvas)
h, w = img_feat.shape[-2:]
feat = paddle.concat([img_feat, canvas_feat], axis=1)
feat_conv = self.conv(feat)
pos_embed = paddle.concat([
self.col_embed[:w].unsqueeze(0).tile([h, 1, 1]),
self.row_embed[:h].unsqueeze(1).tile([1, w, 1]),
],
axis=-1).flatten(0, 1).unsqueeze(1)
hidden_state = self.transformer((pos_embed + feat_conv.flatten(2).transpose([2, 0, 1])).transpose([1, 0, 2]),
self.query_pos.unsqueeze(1).tile([1, b, 1]).transpose([1, 0, 2]))
param = self.linear_param(hidden_state)
decision = self.linear_decider(hidden_state)
return param, decision
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import argparse
import copy
import paddle
import paddlehub as hub
from paddlehub.module.module import moduleinfo, runnable, serving
import numpy as np
import cv2
from skimage.io import imread
from skimage.transform import rescale, resize
from .model import Painter
from .render_utils import totensor, read_img
from .render_serial import render_serial
from .util import base64_to_cv2
@moduleinfo(
name="paint_transformer",
type="CV/style_transfer",
author="paddlepaddle",
author_email="",
summary="",
version="1.0.0")
class paint_transformer:
def __init__(self):
self.pretrained_model = os.path.join(self.directory, "paint_best.pdparams")
self.network = Painter(5, 8, 256, 8, 3, 3)
self.network.set_state_dict(paddle.load(self.pretrained_model))
self.network.eval()
for param in self.network.parameters():
param.stop_gradient = True
#* ----- load brush ----- *#
brush_large_vertical = read_img(os.path.join(self.directory, 'brush/brush_large_vertical.png'), 'L')
brush_large_horizontal = read_img(os.path.join(self.directory, 'brush/brush_large_horizontal.png'), 'L')
self.meta_brushes = paddle.concat([brush_large_vertical, brush_large_horizontal], axis=0)
def style_transfer(self,
images: list = None,
paths: list = None,
output_dir: str = './transfer_result/',
use_gpu: bool = False,
need_animation: bool = False,
visualization: bool = True):
'''
images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR(read by cv2).
paths (list[str]): paths to images
output_dir (str): the dir to save the results
use_gpu (bool): if True, use gpu to perform the computation, otherwise cpu.
need_animation (bool): if True, save every frame to show the process of painting.
visualization (bool): if True, save results in output_dir.
'''
results = []
paddle.disable_static()
place = 'gpu:0' if use_gpu else 'cpu'
place = paddle.set_device(place)
if images == None and paths == None:
print('No image provided. Please input an image or a image path.')
return
if images != None:
for image in images:
image = image[:, :, ::-1]
image = totensor(image)
final_result_list = render_serial(image, self.network, self.meta_brushes)
results.append(final_result_list)
if paths != None:
for path in paths:
image = cv2.imread(path)[:, :, ::-1]
image = totensor(image)
final_result_list = render_serial(image, self.network, self.meta_brushes)
results.append(final_result_list)
if visualization == True:
if not os.path.exists(output_dir):
os.makedirs(output_dir, exist_ok=True)
for i, out in enumerate(results):
if out:
if need_animation:
curoutputdir = os.path.join(output_dir, 'output_{}'.format(i))
if not os.path.exists(curoutputdir):
os.makedirs(curoutputdir, exist_ok=True)
for j, outimg in enumerate(out):
cv2.imwrite(os.path.join(curoutputdir, 'frame_{}.png'.format(j)), outimg)
else:
cv2.imwrite(os.path.join(output_dir, 'output_{}.png'.format(i)), out[-1])
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
self.args = self.parser.parse_args(argvs)
results = self.style_transfer(
paths=[self.args.input_path],
output_dir=self.args.output_dir,
use_gpu=self.args.use_gpu,
need_animation=self.args.need_animation,
visualization=self.args.visualization)
return results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.style_transfer(images=images_decode, **kwargs)
tolist = [result.tolist() for result in results]
return tolist
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
self.arg_config_group.add_argument(
'--output_dir', type=str, default='transfer_result', help='output directory for saving result.')
self.arg_config_group.add_argument('--visualization', type=bool, default=False, help='save results or not.')
self.arg_config_group.add_argument(
'--need_animation', type=bool, default=False, help='save intermediate results or not.')
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
import render_utils
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
import numpy as np
import math
def crop(img, h, w):
H, W = img.shape[-2:]
pad_h = (H - h) // 2
pad_w = (W - w) // 2
remainder_h = (H - h) % 2
remainder_w = (W - w) % 2
img = img[:, :, pad_h:H - pad_h - remainder_h, pad_w:W - pad_w - remainder_w]
return img
def stroke_net_predict(img_patch, result_patch, patch_size, net_g, stroke_num, patch_num):
"""
stroke_net_predict
"""
img_patch = img_patch.transpose([0, 2, 1]).reshape([-1, 3, patch_size, patch_size])
result_patch = result_patch.transpose([0, 2, 1]).reshape([-1, 3, patch_size, patch_size])
#*----- Stroke Predictor -----*#
shape_param, stroke_decision = net_g(img_patch, result_patch)
stroke_decision = (stroke_decision > 0).astype('float32')
#*----- sampling color -----*#
grid = shape_param[:, :, :2].reshape([img_patch.shape[0] * stroke_num, 1, 1, 2])
img_temp = img_patch.unsqueeze(1).tile([1, stroke_num, 1, 1,
1]).reshape([img_patch.shape[0] * stroke_num, 3, patch_size, patch_size])
color = nn.functional.grid_sample(
img_temp, 2 * grid - 1, align_corners=False).reshape([img_patch.shape[0], stroke_num, 3])
param = paddle.concat([shape_param, color], axis=-1)
param = param.reshape([-1, 8])
param[:, :2] = param[:, :2] / 2 + 0.25
param[:, 2:4] = param[:, 2:4] / 2
param = param.reshape([1, patch_num, patch_num, stroke_num, 8])
decision = stroke_decision.reshape([1, patch_num, patch_num, stroke_num]) #.astype('bool')
return param, decision
def param2img_parallel(param, decision, meta_brushes, cur_canvas, stroke_num=8):
"""
Input stroke parameters and decisions for each patch, meta brushes, current canvas, frame directory,
and whether there is a border (if intermediate painting results are required).
Output the painting results of adding the corresponding strokes on the current canvas.
Args:
param: a tensor with shape batch size x patch along height dimension x patch along width dimension
x n_stroke_per_patch x n_param_per_stroke
decision: a 01 tensor with shape batch size x patch along height dimension x patch along width dimension
x n_stroke_per_patch
meta_brushes: a tensor with shape 2 x 3 x meta_brush_height x meta_brush_width.
The first slice on the batch dimension denotes vertical brush and the second one denotes horizontal brush.
cur_canvas: a tensor with shape batch size x 3 x H x W,
where H and W denote height and width of padded results of original images.
Returns:
cur_canvas: a tensor with shape batch size x 3 x H x W, denoting painting results.
"""
# param: b, h, w, stroke_per_patch, param_per_stroke
# decision: b, h, w, stroke_per_patch
b, h, w, s, p = param.shape
h, w = int(h), int(w)
param = param.reshape([-1, 8])
decision = decision.reshape([-1, 8])
H, W = cur_canvas.shape[-2:]
is_odd_y = h % 2 == 1
is_odd_x = w % 2 == 1
render_size_y = 2 * H // h
render_size_x = 2 * W // w
even_idx_y = paddle.arange(0, h, 2)
even_idx_x = paddle.arange(0, w, 2)
if h > 1:
odd_idx_y = paddle.arange(1, h, 2)
if w > 1:
odd_idx_x = paddle.arange(1, w, 2)
cur_canvas = F.pad(cur_canvas, [render_size_x // 4, render_size_x // 4, render_size_y // 4, render_size_y // 4])
valid_foregrounds = render_utils.param2stroke(param, render_size_y, render_size_x, meta_brushes)
#* ----- load dilation/erosion ---- *#
dilation = render_utils.Dilation2d(m=1)
erosion = render_utils.Erosion2d(m=1)
#* ----- generate alphas ----- *#
valid_alphas = (valid_foregrounds > 0).astype('float32')
valid_foregrounds = valid_foregrounds.reshape([-1, stroke_num, 1, render_size_y, render_size_x])
valid_alphas = valid_alphas.reshape([-1, stroke_num, 1, render_size_y, render_size_x])
temp = [dilation(valid_foregrounds[:, i, :, :, :]) for i in range(stroke_num)]
valid_foregrounds = paddle.stack(temp, axis=1)
valid_foregrounds = valid_foregrounds.reshape([-1, 1, render_size_y, render_size_x])
temp = [erosion(valid_alphas[:, i, :, :, :]) for i in range(stroke_num)]
valid_alphas = paddle.stack(temp, axis=1)
valid_alphas = valid_alphas.reshape([-1, 1, render_size_y, render_size_x])
foregrounds = valid_foregrounds.reshape([-1, h, w, stroke_num, 1, render_size_y, render_size_x])
alphas = valid_alphas.reshape([-1, h, w, stroke_num, 1, render_size_y, render_size_x])
decision = decision.reshape([-1, h, w, stroke_num, 1, 1, 1])
param = param.reshape([-1, h, w, stroke_num, 8])
def partial_render(this_canvas, patch_coord_y, patch_coord_x):
canvas_patch = F.unfold(
this_canvas, [render_size_y, render_size_x], strides=[render_size_y // 2, render_size_x // 2])
# canvas_patch: b, 3 * py * px, h * w
canvas_patch = canvas_patch.reshape([b, 3, render_size_y, render_size_x, h, w])
canvas_patch = canvas_patch.transpose([0, 4, 5, 1, 2, 3])
selected_canvas_patch = paddle.gather(canvas_patch, patch_coord_y, 1)
selected_canvas_patch = paddle.gather(selected_canvas_patch, patch_coord_x, 2)
selected_canvas_patch = selected_canvas_patch.reshape([0, 0, 0, 1, 3, render_size_y, render_size_x])
selected_foregrounds = paddle.gather(foregrounds, patch_coord_y, 1)
selected_foregrounds = paddle.gather(selected_foregrounds, patch_coord_x, 2)
selected_alphas = paddle.gather(alphas, patch_coord_y, 1)
selected_alphas = paddle.gather(selected_alphas, patch_coord_x, 2)
selected_decisions = paddle.gather(decision, patch_coord_y, 1)
selected_decisions = paddle.gather(selected_decisions, patch_coord_x, 2)
selected_color = paddle.gather(param, patch_coord_y, 1)
selected_color = paddle.gather(selected_color, patch_coord_x, 2)
selected_color = paddle.gather(selected_color, paddle.to_tensor([5, 6, 7]), 4)
selected_color = selected_color.reshape([0, 0, 0, stroke_num, 3, 1, 1])
for i in range(stroke_num):
i = paddle.to_tensor(i)
cur_foreground = paddle.gather(selected_foregrounds, i, 3)
cur_alpha = paddle.gather(selected_alphas, i, 3)
cur_decision = paddle.gather(selected_decisions, i, 3)
cur_color = paddle.gather(selected_color, i, 3)
cur_foreground = cur_foreground * cur_color
selected_canvas_patch = cur_foreground * cur_alpha * cur_decision + selected_canvas_patch * (
1 - cur_alpha * cur_decision)
selected_canvas_patch = selected_canvas_patch.reshape([0, 0, 0, 3, render_size_y, render_size_x])
this_canvas = selected_canvas_patch.transpose([0, 3, 1, 4, 2, 5])
# this_canvas: b, 3, h_half, py, w_half, px
h_half = this_canvas.shape[2]
w_half = this_canvas.shape[4]
this_canvas = this_canvas.reshape([b, 3, h_half * render_size_y, w_half * render_size_x])
# this_canvas: b, 3, h_half * py, w_half * px
return this_canvas
# even - even area
# 1 | 0
# 0 | 0
canvas = partial_render(cur_canvas, even_idx_y, even_idx_x)
if not is_odd_y:
canvas = paddle.concat([canvas, cur_canvas[:, :, -render_size_y // 2:, :canvas.shape[3]]], axis=2)
if not is_odd_x:
canvas = paddle.concat([canvas, cur_canvas[:, :, :canvas.shape[2], -render_size_x // 2:]], axis=3)
cur_canvas = canvas
# odd - odd area
# 0 | 0
# 0 | 1
if h > 1 and w > 1:
canvas = partial_render(cur_canvas, odd_idx_y, odd_idx_x)
canvas = paddle.concat([cur_canvas[:, :, :render_size_y // 2, -canvas.shape[3]:], canvas], axis=2)
canvas = paddle.concat([cur_canvas[:, :, -canvas.shape[2]:, :render_size_x // 2], canvas], axis=3)
if is_odd_y:
canvas = paddle.concat([canvas, cur_canvas[:, :, -render_size_y // 2:, :canvas.shape[3]]], axis=2)
if is_odd_x:
canvas = paddle.concat([canvas, cur_canvas[:, :, :canvas.shape[2], -render_size_x // 2:]], axis=3)
cur_canvas = canvas
# odd - even area
# 0 | 0
# 1 | 0
if h > 1:
canvas = partial_render(cur_canvas, odd_idx_y, even_idx_x)
canvas = paddle.concat([cur_canvas[:, :, :render_size_y // 2, :canvas.shape[3]], canvas], axis=2)
if is_odd_y:
canvas = paddle.concat([canvas, cur_canvas[:, :, -render_size_y // 2:, :canvas.shape[3]]], axis=2)
if not is_odd_x:
canvas = paddle.concat([canvas, cur_canvas[:, :, :canvas.shape[2], -render_size_x // 2:]], axis=3)
cur_canvas = canvas
# odd - even area
# 0 | 1
# 0 | 0
if w > 1:
canvas = partial_render(cur_canvas, even_idx_y, odd_idx_x)
canvas = paddle.concat([cur_canvas[:, :, :canvas.shape[2], :render_size_x // 2], canvas], axis=3)
if not is_odd_y:
canvas = paddle.concat([canvas, cur_canvas[:, :, -render_size_y // 2:, -canvas.shape[3]:]], axis=2)
if is_odd_x:
canvas = paddle.concat([canvas, cur_canvas[:, :, :canvas.shape[2], -render_size_x // 2:]], axis=3)
cur_canvas = canvas
cur_canvas = cur_canvas[:, :, render_size_y // 4:-render_size_y // 4, render_size_x // 4:-render_size_x // 4]
return cur_canvas
def render_parallel(original_img, net_g, meta_brushes):
patch_size = 32
stroke_num = 8
with paddle.no_grad():
original_h, original_w = original_img.shape[-2:]
K = max(math.ceil(math.log2(max(original_h, original_w) / patch_size)), 0)
original_img_pad_size = patch_size * (2**K)
original_img_pad = render_utils.pad(original_img, original_img_pad_size, original_img_pad_size)
final_result = paddle.zeros_like(original_img)
for layer in range(0, K + 1):
layer_size = patch_size * (2**layer)
img = F.interpolate(original_img_pad, (layer_size, layer_size))
result = F.interpolate(final_result, (layer_size, layer_size))
img_patch = F.unfold(img, [patch_size, patch_size], strides=[patch_size, patch_size])
result_patch = F.unfold(result, [patch_size, patch_size], strides=[patch_size, patch_size])
# There are patch_num * patch_num patches in total
patch_num = (layer_size - patch_size) // patch_size + 1
param, decision = stroke_net_predict(img_patch, result_patch, patch_size, net_g, stroke_num, patch_num)
#print(param.shape, decision.shape)
final_result = param2img_parallel(param, decision, meta_brushes, final_result)
# paint another time for last layer
border_size = original_img_pad_size // (2 * patch_num)
img = F.interpolate(original_img_pad, (layer_size, layer_size))
result = F.interpolate(final_result, (layer_size, layer_size))
img = F.pad(img, [patch_size // 2, patch_size // 2, patch_size // 2, patch_size // 2])
result = F.pad(result, [patch_size // 2, patch_size // 2, patch_size // 2, patch_size // 2])
img_patch = F.unfold(img, [patch_size, patch_size], strides=[patch_size, patch_size])
result_patch = F.unfold(result, [patch_size, patch_size], strides=[patch_size, patch_size])
final_result = F.pad(final_result, [border_size, border_size, border_size, border_size])
patch_num = (img.shape[2] - patch_size) // patch_size + 1
#w = (img.shape[3] - patch_size) // patch_size + 1
param, decision = stroke_net_predict(img_patch, result_patch, patch_size, net_g, stroke_num, patch_num)
final_result = param2img_parallel(param, decision, meta_brushes, final_result)
final_result = final_result[:, :, border_size:-border_size, border_size:-border_size]
final_result = (final_result.numpy().squeeze().transpose([1, 2, 0])[:, :, ::-1] * 255).astype(np.uint8)
return final_result
# !/usr/bin/env python3
"""
codes for oilpainting style transfer.
"""
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
import numpy as np
from PIL import Image
import math
import cv2
import time
from .render_utils import param2stroke, Dilation2d, Erosion2d
def get_single_layer_lists(param, decision, ori_img, render_size_x, render_size_y, h, w, meta_brushes, dilation,
erosion, stroke_num):
"""
get_single_layer_lists
"""
valid_foregrounds = param2stroke(param[:, :], render_size_y, render_size_x, meta_brushes)
valid_alphas = (valid_foregrounds > 0).astype('float32')
valid_foregrounds = valid_foregrounds.reshape([-1, stroke_num, 1, render_size_y, render_size_x])
valid_alphas = valid_alphas.reshape([-1, stroke_num, 1, render_size_y, render_size_x])
temp = [dilation(valid_foregrounds[:, i, :, :, :]) for i in range(stroke_num)]
valid_foregrounds = paddle.stack(temp, axis=1)
valid_foregrounds = valid_foregrounds.reshape([-1, 1, render_size_y, render_size_x])
temp = [erosion(valid_alphas[:, i, :, :, :]) for i in range(stroke_num)]
valid_alphas = paddle.stack(temp, axis=1)
valid_alphas = valid_alphas.reshape([-1, 1, render_size_y, render_size_x])
patch_y = 4 * render_size_y // 5
patch_x = 4 * render_size_x // 5
img_patch = ori_img.reshape([1, 3, h, ori_img.shape[2] // h, w, ori_img.shape[3] // w])
img_patch = img_patch.transpose([0, 2, 4, 1, 3, 5])[0]
xid_list = []
yid_list = []
error_list = []
for flag_idx, flag in enumerate(decision.cpu().numpy()):
if flag:
flag_idx = flag_idx // stroke_num
x_id = flag_idx % w
flag_idx = flag_idx // w
y_id = flag_idx % h
xid_list.append(x_id)
yid_list.append(y_id)
inner_fores = valid_foregrounds[:, :, render_size_y // 10:9 * render_size_y // 10, render_size_x // 10:9 *
render_size_x // 10]
inner_alpha = valid_alphas[:, :, render_size_y // 10:9 * render_size_y // 10, render_size_x // 10:9 *
render_size_x // 10]
inner_fores = inner_fores.reshape([h * w, stroke_num, 1, patch_y, patch_x])
inner_alpha = inner_alpha.reshape([h * w, stroke_num, 1, patch_y, patch_x])
inner_real = img_patch.reshape([h * w, 3, patch_y, patch_x]).unsqueeze(1)
R = param[:, 5]
G = param[:, 6]
B = param[:, 7] #, G, B = param[5:]
R = R.reshape([-1, stroke_num]).unsqueeze(-1).unsqueeze(-1).unsqueeze(-1)
G = G.reshape([-1, stroke_num]).unsqueeze(-1).unsqueeze(-1).unsqueeze(-1)
B = B.reshape([-1, stroke_num]).unsqueeze(-1).unsqueeze(-1).unsqueeze(-1)
error_R = R * inner_fores - inner_real[:, :, 0:1, :, :]
error_G = G * inner_fores - inner_real[:, :, 1:2, :, :]
error_B = B * inner_fores - inner_real[:, :, 2:3, :, :]
error = paddle.abs(error_R) + paddle.abs(error_G) + paddle.abs(error_B)
error = error * inner_alpha
error = paddle.sum(error, axis=(2, 3, 4)) / paddle.sum(inner_alpha, axis=(2, 3, 4))
error_list = error.reshape([-1]).numpy()[decision.numpy()]
error_list = list(error_list)
valid_foregrounds = paddle.to_tensor(valid_foregrounds.numpy()[decision.numpy()])
valid_alphas = paddle.to_tensor(valid_alphas.numpy()[decision.numpy()])
selected_param = paddle.to_tensor(param.numpy()[decision.numpy()])
return xid_list, yid_list, valid_foregrounds, valid_alphas, error_list, selected_param
def get_single_stroke_on_full_image_A(x_id, y_id, valid_foregrounds, valid_alphas, param, original_img, render_size_x,
render_size_y, patch_x, patch_y):
"""
get_single_stroke_on_full_image_A
"""
tmp_foreground = paddle.zeros_like(original_img)
patch_y_num = original_img.shape[2] // patch_y
patch_x_num = original_img.shape[3] // patch_x
brush = valid_foregrounds.unsqueeze(0)
color_map = param[5:]
brush = brush.tile([1, 3, 1, 1])
color_map = color_map.unsqueeze(-1).unsqueeze(-1).unsqueeze(0) #.repeat(1, 1, H, W)
brush = brush * color_map
pad_l = x_id * patch_x
pad_r = (patch_x_num - x_id - 1) * patch_x
pad_t = y_id * patch_y
pad_b = (patch_y_num - y_id - 1) * patch_y
tmp_foreground = nn.functional.pad(brush, [pad_l, pad_r, pad_t, pad_b])
tmp_foreground = tmp_foreground[:, :, render_size_y // 10:-render_size_y // 10, render_size_x //
10:-render_size_x // 10]
tmp_alpha = nn.functional.pad(valid_alphas.unsqueeze(0), [pad_l, pad_r, pad_t, pad_b])
tmp_alpha = tmp_alpha[:, :, render_size_y // 10:-render_size_y // 10, render_size_x // 10:-render_size_x // 10]
return tmp_foreground, tmp_alpha
def get_single_stroke_on_full_image_B(x_id, y_id, valid_foregrounds, valid_alphas, param, original_img, render_size_x,
render_size_y, patch_x, patch_y):
"""
get_single_stroke_on_full_image_B
"""
x_expand = patch_x // 2 + render_size_x // 10
y_expand = patch_y // 2 + render_size_y // 10
pad_l = x_id * patch_x
pad_r = original_img.shape[3] + 2 * x_expand - (x_id * patch_x + render_size_x)
pad_t = y_id * patch_y
pad_b = original_img.shape[2] + 2 * y_expand - (y_id * patch_y + render_size_y)
brush = valid_foregrounds.unsqueeze(0)
color_map = param[5:]
brush = brush.tile([1, 3, 1, 1])
color_map = color_map.unsqueeze(-1).unsqueeze(-1).unsqueeze(0) #.repeat(1, 1, H, W)
brush = brush * color_map
tmp_foreground = nn.functional.pad(brush, [pad_l, pad_r, pad_t, pad_b])
tmp_foreground = tmp_foreground[:, :, y_expand:-y_expand, x_expand:-x_expand]
tmp_alpha = nn.functional.pad(valid_alphas.unsqueeze(0), [pad_l, pad_r, pad_t, pad_b])
tmp_alpha = tmp_alpha[:, :, y_expand:-y_expand, x_expand:-x_expand]
return tmp_foreground, tmp_alpha
def stroke_net_predict(img_patch, result_patch, patch_size, net_g, stroke_num):
"""
stroke_net_predict
"""
img_patch = img_patch.transpose([0, 2, 1]).reshape([-1, 3, patch_size, patch_size])
result_patch = result_patch.transpose([0, 2, 1]).reshape([-1, 3, patch_size, patch_size])
#*----- Stroke Predictor -----*#
shape_param, stroke_decision = net_g(img_patch, result_patch)
stroke_decision = (stroke_decision > 0).astype('float32')
#*----- sampling color -----*#
grid = shape_param[:, :, :2].reshape([img_patch.shape[0] * stroke_num, 1, 1, 2])
img_temp = img_patch.unsqueeze(1).tile([1, stroke_num, 1, 1,
1]).reshape([img_patch.shape[0] * stroke_num, 3, patch_size, patch_size])
color = nn.functional.grid_sample(
img_temp, 2 * grid - 1, align_corners=False).reshape([img_patch.shape[0], stroke_num, 3])
stroke_param = paddle.concat([shape_param, color], axis=-1)
param = stroke_param.reshape([-1, 8])
decision = stroke_decision.reshape([-1]).astype('bool')
param[:, :2] = param[:, :2] / 1.25 + 0.1
param[:, 2:4] = param[:, 2:4] / 1.25
return param, decision
def sort_strokes(params, decision, scores):
"""
sort_strokes
"""
sorted_scores, sorted_index = paddle.sort(scores, axis=1, descending=False)
sorted_params = []
for idx in range(8):
tmp_pick_params = paddle.gather(params[:, :, idx], axis=1, index=sorted_index)
sorted_params.append(tmp_pick_params)
sorted_params = paddle.stack(sorted_params, axis=2)
sorted_decison = paddle.gather(decision.squeeze(2), axis=1, index=sorted_index)
return sorted_params, sorted_decison
def render_serial(original_img, net_g, meta_brushes):
patch_size = 32
stroke_num = 8
H, W = original_img.shape[-2:]
K = max(math.ceil(math.log2(max(H, W) / patch_size)), 0)
dilation = Dilation2d(m=1)
erosion = Erosion2d(m=1)
frames_per_layer = [20, 20, 30, 40, 60]
final_frame_list = []
with paddle.no_grad():
#* ----- read in image and init canvas ----- *#
final_result = paddle.zeros_like(original_img)
for layer in range(0, K + 1):
t0 = time.time()
layer_size = patch_size * (2**layer)
img = nn.functional.interpolate(original_img, (layer_size, layer_size))
result = nn.functional.interpolate(final_result, (layer_size, layer_size))
img_patch = nn.functional.unfold(img, [patch_size, patch_size], strides=[patch_size, patch_size])
result_patch = nn.functional.unfold(result, [patch_size, patch_size], strides=[patch_size, patch_size])
h = (img.shape[2] - patch_size) // patch_size + 1
w = (img.shape[3] - patch_size) // patch_size + 1
render_size_y = int(1.25 * H // h)
render_size_x = int(1.25 * W // w)
#* -------------------------------------------------------------*#
#* -------------generate strokes on window type A---------------*#
#* -------------------------------------------------------------*#
param, decision = stroke_net_predict(img_patch, result_patch, patch_size, net_g, stroke_num)
expand_img = original_img
wA_xid_list, wA_yid_list, wA_fore_list, wA_alpha_list, wA_error_list, wA_params = \
get_single_layer_lists(param, decision, original_img, render_size_x, render_size_y, h, w,
meta_brushes, dilation, erosion, stroke_num)
#* -------------------------------------------------------------*#
#* -------------generate strokes on window type B---------------*#
#* -------------------------------------------------------------*#
#*----- generate input canvas and target patches -----*#
wB_error_list = []
img = nn.functional.pad(img, [patch_size // 2, patch_size // 2, patch_size // 2, patch_size // 2])
result = nn.functional.pad(result, [patch_size // 2, patch_size // 2, patch_size // 2, patch_size // 2])
img_patch = nn.functional.unfold(img, [patch_size, patch_size], strides=[patch_size, patch_size])
result_patch = nn.functional.unfold(result, [patch_size, patch_size], strides=[patch_size, patch_size])
h += 1
w += 1
param, decision = stroke_net_predict(img_patch, result_patch, patch_size, net_g, stroke_num)
patch_y = 4 * render_size_y // 5
patch_x = 4 * render_size_x // 5
expand_img = nn.functional.pad(original_img, [patch_x // 2, patch_x // 2, patch_y // 2, patch_y // 2])
wB_xid_list, wB_yid_list, wB_fore_list, wB_alpha_list, wB_error_list, wB_params = \
get_single_layer_lists(param, decision, expand_img, render_size_x, render_size_y, h, w,
meta_brushes, dilation, erosion, stroke_num)
#* -------------------------------------------------------------*#
#* -------------rank strokes and plot stroke one by one---------*#
#* -------------------------------------------------------------*#
numA = len(wA_error_list)
numB = len(wB_error_list)
total_error_list = wA_error_list + wB_error_list
sort_list = list(np.argsort(total_error_list))
sample = 0
samples = np.linspace(0, len(sort_list) - 2, frames_per_layer[layer]).astype(int)
for ii in sort_list:
ii = int(ii)
if ii < numA:
x_id = wA_xid_list[ii]
y_id = wA_yid_list[ii]
valid_foregrounds = wA_fore_list[ii]
valid_alphas = wA_alpha_list[ii]
sparam = wA_params[ii]
tmp_foreground, tmp_alpha = get_single_stroke_on_full_image_A(
x_id, y_id, valid_foregrounds, valid_alphas, sparam, original_img, render_size_x, render_size_y,
patch_x, patch_y)
else:
x_id = wB_xid_list[ii - numA]
y_id = wB_yid_list[ii - numA]
valid_foregrounds = wB_fore_list[ii - numA]
valid_alphas = wB_alpha_list[ii - numA]
sparam = wB_params[ii - numA]
tmp_foreground, tmp_alpha = get_single_stroke_on_full_image_B(
x_id, y_id, valid_foregrounds, valid_alphas, sparam, original_img, render_size_x, render_size_y,
patch_x, patch_y)
final_result = tmp_foreground * tmp_alpha + (1 - tmp_alpha) * final_result
if sample in samples:
saveframe = (final_result.numpy().squeeze().transpose([1, 2, 0])[:, :, ::-1] * 255).astype(np.uint8)
final_frame_list.append(saveframe)
#saveframe = cv2.resize(saveframe, (ow, oh))
sample += 1
print("layer %d cost: %.02f" % (layer, time.time() - t0))
saveframe = (final_result.numpy().squeeze().transpose([1, 2, 0])[:, :, ::-1] * 255).astype(np.uint8)
final_frame_list.append(saveframe)
return final_frame_list
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
import cv2
import numpy as np
from PIL import Image
import math
class Erosion2d(nn.Layer):
"""
Erosion2d
"""
def __init__(self, m=1):
super(Erosion2d, self).__init__()
self.m = m
self.pad = [m, m, m, m]
def forward(self, x):
batch_size, c, h, w = x.shape
x_pad = F.pad(x, pad=self.pad, mode='constant', value=1e9)
channel = nn.functional.unfold(x_pad, 2 * self.m + 1, strides=1, paddings=0).reshape([batch_size, c, -1, h, w])
result = paddle.min(channel, axis=2)
return result
class Dilation2d(nn.Layer):
"""
Dilation2d
"""
def __init__(self, m=1):
super(Dilation2d, self).__init__()
self.m = m
self.pad = [m, m, m, m]
def forward(self, x):
batch_size, c, h, w = x.shape
x_pad = F.pad(x, pad=self.pad, mode='constant', value=-1e9)
channel = nn.functional.unfold(x_pad, 2 * self.m + 1, strides=1, paddings=0).reshape([batch_size, c, -1, h, w])
result = paddle.max(channel, axis=2)
return result
def param2stroke(param, H, W, meta_brushes):
"""
param2stroke
"""
b = param.shape[0]
param_list = paddle.split(param, 8, axis=1)
x0, y0, w, h, theta = [item.squeeze(-1) for item in param_list[:5]]
sin_theta = paddle.sin(math.pi * theta)
cos_theta = paddle.cos(math.pi * theta)
index = paddle.full((b, ), -1, dtype='int64').numpy()
index[(h > w).numpy()] = 0
index[(h <= w).numpy()] = 1
meta_brushes_resize = F.interpolate(meta_brushes, (H, W)).numpy()
brush = paddle.to_tensor(meta_brushes_resize[index])
warp_00 = cos_theta / w
warp_01 = sin_theta * H / (W * w)
warp_02 = (1 - 2 * x0) * cos_theta / w + (1 - 2 * y0) * sin_theta * H / (W * w)
warp_10 = -sin_theta * W / (H * h)
warp_11 = cos_theta / h
warp_12 = (1 - 2 * y0) * cos_theta / h - (1 - 2 * x0) * sin_theta * W / (H * h)
warp_0 = paddle.stack([warp_00, warp_01, warp_02], axis=1)
warp_1 = paddle.stack([warp_10, warp_11, warp_12], axis=1)
warp = paddle.stack([warp_0, warp_1], axis=1)
grid = nn.functional.affine_grid(warp, [b, 3, H, W]) # paddle和torch默认值是反过来的
brush = nn.functional.grid_sample(brush, grid)
return brush
def read_img(img_path, img_type='RGB', h=None, w=None):
"""
read img
"""
img = Image.open(img_path).convert(img_type)
if h is not None and w is not None:
img = img.resize((w, h), resample=Image.NEAREST)
img = np.array(img)
if img.ndim == 2:
img = np.expand_dims(img, axis=-1)
img = img.transpose((2, 0, 1))
img = paddle.to_tensor(img).unsqueeze(0).astype('float32') / 255.
return img
def preprocess(img, w=512, h=512):
image = cv2.resize(img, (w, h), cv2.INTER_NEAREST)
image = image.transpose((2, 0, 1))
image = paddle.to_tensor(image).unsqueeze(0).astype('float32') / 255.
return image
def totensor(img):
image = img.transpose((2, 0, 1))
image = paddle.to_tensor(image).unsqueeze(0).astype('float32') / 255.
return image
def pad(img, H, W):
b, c, h, w = img.shape
pad_h = (H - h) // 2
pad_w = (W - w) // 2
remainder_h = (H - h) % 2
remainder_w = (W - w) % 2
expand_img = nn.functional.pad(img, [pad_w, pad_w + remainder_w, pad_h, pad_h + remainder_h])
return expand_img
import base64
import cv2
import numpy as np
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# psgan
|模型名称|psgan|
| :--- | :---: |
|类别|图像 - 妆容迁移|
|网络|PSGAN|
|数据集|-|
|是否支持Fine-tuning|否|
|模型大小|121MB|
|最新更新日期|2021-12-07|
|数据指标|-|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例:
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/157190651-595b6964-97c5-4b0b-ac0a-c30c8520a972.png" width = "30%" hspace='10'/>
<br />
输入内容图形
<br />
<img src="https://user-images.githubusercontent.com/22424850/145003966-c5c2e6ad-d306-4eaf-89a2-965a3dbf3675.jpg" width = "30%" hspace='10'/>
<br />
输入妆容图形
<br />
<img src="https://user-images.githubusercontent.com/22424850/157190800-b1dd79d4-0eca-4b36-b091-6fcd2c00dcf6.png" width = "30%" hspace='10'/>
<br />
输出图像
<br />
</p>
- ### 模型介绍
- PSGAN模型的任务是妆容迁移, 即将任意参照图像上的妆容迁移到不带妆容的源图像上。很多人像美化应用都需要这种技术。
- 更多详情参考:[PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer](https://arxiv.org/pdf/1909.06956.pdf)
## 二、安装
- ### 1、环境依赖
- ppgan
- dlib
- ### 2、安装
- ```shell
$ hub install psgan
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
# Read from a file
$ hub run psgan --content "/PATH/TO/IMAGE" --style "/PATH/TO/IMAGE1"
```
- 通过命令行方式实现妆容转换模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
module = hub.Module(name="psgan")
content = cv2.imread("/PATH/TO/IMAGE")
style = cv2.imread("/PATH/TO/IMAGE1")
results = module.makeup_transfer(images=[{'content':content, 'style':style}], output_dir='./transfer_result', use_gpu=True)
```
- ### 3、API
- ```python
makeup_transfer(images=None, paths=None, output_dir='./transfer_result/', use_gpu=False, visualization=True)
```
- 妆容风格转换API。
- **参数**
- images (list[dict]): data of images, 每一个元素都为一个 dict,有关键字 content, style, 相应取值为:
- content (numpy.ndarray): 待转换的图片,shape 为 \[H, W, C\],BGR格式;<br/>
- style (numpy.ndarray) : 风格图像,shape为 \[H, W, C\],BGR格式;<br/>
- paths (list[str]): paths to images, 每一个元素都为一个dict, 有关键字 content, style, 相应取值为:
- content (str): 待转换的图片的路径;<br/>
- style (str) : 风格图像的路径;<br/>
- output\_dir (str): 结果保存的路径; <br/>
- use\_gpu (bool): 是否使用 GPU;<br/>
- visualization(bool): 是否保存结果到本地文件夹
## 四、服务部署
- PaddleHub Serving可以部署一个在线妆容风格转换服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m psgan
```
- 这样就完成了一个妆容风格转换的在线服务API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[{'content': cv2_to_base64(cv2.imread("/PATH/TO/IMAGE")), 'style': cv2_to_base64(cv2.imread("/PATH/TO/IMAGE1"))}]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/psgan"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
## 五、更新历史
* 1.0.0
初始发布
- ```shell
$ hub install psgan==1.0.0
```
epochs: 100
output_dir: tmp
checkpoints_dir: checkpoints
find_unused_parameters: True
model:
name: MakeupModel
generator:
name: GeneratorPSGANAttention
conv_dim: 64
repeat_num: 6
discriminator:
name: NLayerDiscriminator
ndf: 64
n_layers: 3
input_nc: 3
norm_type: spectral
cycle_criterion:
name: L1Loss
idt_criterion:
name: L1Loss
loss_weight: 0.5
l1_criterion:
name: L1Loss
l2_criterion:
name: MSELoss
gan_criterion:
name: GANLoss
gan_mode: lsgan
dataset:
train:
name: MakeupDataset
trans_size: 256
dataroot: data/MT-Dataset
cls_list: [non-makeup, makeup]
phase: train
test:
name: MakeupDataset
trans_size: 256
dataroot: data/MT-Dataset
cls_list: [non-makeup, makeup]
phase: test
lr_scheduler:
name: LinearDecay
learning_rate: 0.0002
start_epoch: 100
decay_epochs: 100
# will get from real dataset
iters_per_epoch: 1
optimizer:
optimizer_G:
name: Adam
net_names:
- netG
beta1: 0.5
optimizer_DA:
name: Adam
net_names:
- netD_A
beta1: 0.5
optimizer_DB:
name: Adam
net_names:
- netD_B
beta1: 0.5
log_config:
interval: 10
visiual_interval: 500
snapshot_config:
interval: 5
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import os
import sys
from pathlib import Path
import numpy as np
import paddle
import paddle.vision.transforms as T
import ppgan.faceutils as futils
from paddle.utils.download import get_weights_path_from_url
from PIL import Image
from ppgan.models.builder import build_model
from ppgan.utils.config import get_config
from ppgan.utils.filesystem import load
from ppgan.utils.options import parse_args
from ppgan.utils.preprocess import *
def toImage(net_output):
img = net_output.squeeze(0).transpose((1, 2, 0)).numpy() # [1,c,h,w]->[h,w,c]
img = (img * 255.0).clip(0, 255)
img = np.uint8(img)
img = Image.fromarray(img, mode='RGB')
return img
PS_WEIGHT_URL = "https://paddlegan.bj.bcebos.com/models/psgan_weight.pdparams"
class PreProcess:
def __init__(self, config, need_parser=True):
self.img_size = 256
self.transform = transform = T.Compose([
T.Resize(size=256),
T.ToTensor(),
])
self.norm = T.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
if need_parser:
self.face_parser = futils.mask.FaceParser()
self.up_ratio = 0.6 / 0.85
self.down_ratio = 0.2 / 0.85
self.width_ratio = 0.2 / 0.85
def __call__(self, image):
face = futils.dlib.detect(image)
if not face:
return
face_on_image = face[0]
image, face, crop_face = futils.dlib.crop(image, face_on_image, self.up_ratio, self.down_ratio,
self.width_ratio)
np_image = np.array(image)
image_trans = self.transform(np_image)
mask = self.face_parser.parse(np.float32(cv2.resize(np_image, (512, 512))))
mask = cv2.resize(mask.numpy(), (self.img_size, self.img_size), interpolation=cv2.INTER_NEAREST)
mask = mask.astype(np.uint8)
mask_tensor = paddle.to_tensor(mask)
lms = futils.dlib.landmarks(image, face) / image_trans.shape[:2] * self.img_size
lms = lms.round()
P_np = generate_P_from_lmks(lms, self.img_size, self.img_size, self.img_size)
mask_aug = generate_mask_aug(mask, lms)
return [self.norm(image_trans).unsqueeze(0),
np.float32(mask_aug),
np.float32(P_np),
np.float32(mask)], face_on_image, crop_face
class PostProcess:
def __init__(self, config):
self.denoise = True
self.img_size = 256
def __call__(self, source: Image, result: Image):
# TODO: Refract -> name, resize
source = np.array(source)
result = np.array(result)
height, width = source.shape[:2]
small_source = cv2.resize(source, (self.img_size, self.img_size))
laplacian_diff = source.astype(np.float) - cv2.resize(small_source, (width, height)).astype(np.float)
result = (cv2.resize(result, (width, height)) + laplacian_diff).round().clip(0, 255).astype(np.uint8)
if self.denoise:
result = cv2.fastNlMeansDenoisingColored(result)
result = Image.fromarray(result).convert('RGB')
return result
class Inference:
def __init__(self, config, model_path=''):
self.model = build_model(config.model)
self.preprocess = PreProcess(config)
self.model_path = model_path
def transfer(self, source, reference, with_face=False):
source_input, face, crop_face = self.preprocess(source)
reference_input, face, crop_face = self.preprocess(reference)
consis_mask = np.float32(calculate_consis_mask(source_input[1], reference_input[1]))
consis_mask = paddle.to_tensor(np.expand_dims(consis_mask, 0))
if not (source_input and reference_input):
if with_face:
return None, None
return
for i in range(1, len(source_input) - 1):
source_input[i] = paddle.to_tensor(np.expand_dims(source_input[i], 0))
for i in range(1, len(reference_input) - 1):
reference_input[i] = paddle.to_tensor(np.expand_dims(reference_input[i], 0))
input_data = {
'image_A': source_input[0],
'image_B': reference_input[0],
'mask_A_aug': source_input[1],
'mask_B_aug': reference_input[1],
'P_A': source_input[2],
'P_B': reference_input[2],
'consis_mask': consis_mask
}
state_dicts = load(self.model_path)
for net_name, net in self.model.nets.items():
net.set_state_dict(state_dicts[net_name])
result, _ = self.model.test(input_data)
min_, max_ = result.min(), result.max()
result += -min_
result = paddle.divide(result, max_ - min_ + 1e-5)
img = toImage(result)
if with_face:
return img, crop_face
return img
class PSGANPredictor:
def __init__(self, cfg, weight_path):
self.cfg = cfg
self.weight_path = weight_path
def run(self, source, reference):
source = Image.fromarray(source)
reference = Image.fromarray(reference)
inference = Inference(self.cfg, self.weight_path)
postprocess = PostProcess(self.cfg)
# Transfer the psgan from reference to source.
image, face = inference.transfer(source, reference, with_face=True)
source_crop = source.crop((face.left(), face.top(), face.right(), face.bottom()))
image = postprocess(source_crop, image)
image = np.array(image)
return image
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import copy
import os
import cv2
import numpy as np
import paddle
from ppgan.utils.config import get_config
from skimage.io import imread
from skimage.transform import rescale
from skimage.transform import resize
import paddlehub as hub
from .model import PSGANPredictor
from .util import base64_to_cv2
from paddlehub.module.module import moduleinfo
from paddlehub.module.module import runnable
from paddlehub.module.module import serving
@moduleinfo(name="psgan", type="CV/gan", author="paddlepaddle", author_email="", summary="", version="1.0.0")
class psgan:
def __init__(self):
self.pretrained_model = os.path.join(self.directory, "psgan_weight.pdparams")
cfg = get_config(os.path.join(self.directory, 'makeup.yaml'))
self.network = PSGANPredictor(cfg, self.pretrained_model)
def makeup_transfer(self,
images=None,
paths=None,
output_dir='./transfer_result/',
use_gpu=False,
visualization=True):
'''
Transfer a image to stars style.
images (list[dict]): data of images, 每一个元素都为一个 dict,有关键字 content, style, 相应取值为:
- content (numpy.ndarray): 待转换的图片,shape 为 \[H, W, C\],BGR格式;<br/>
- style (numpy.ndarray) : 妆容图像,shape为 \[H, W, C\],BGR格式;<br/>
paths (list[str]): paths to images, 每一个元素都为一个dict, 有关键字 content, style, 相应取值为:
- content (str): 待转换的图片的路径;<br/>
- style (str) : 妆容图像的路径;<br/>
output_dir: the dir to save the results
use_gpu: if True, use gpu to perform the computation, otherwise cpu.
visualization: if True, save results in output_dir.
'''
results = []
paddle.disable_static()
place = 'gpu:0' if use_gpu else 'cpu'
place = paddle.set_device(place)
if images == None and paths == None:
print('No image provided. Please input an image or a image path.')
return
if images != None:
for image_dict in images:
content_img = image_dict['content'][:, :, ::-1]
style_img = image_dict['style'][:, :, ::-1]
results.append(self.network.run(content_img, style_img))
if paths != None:
for path_dict in paths:
content_img = cv2.imread(path_dict['content'])[:, :, ::-1]
style_img = cv2.imread(path_dict['style'])[:, :, ::-1]
results.append(self.network.run(content_img, style_img))
if visualization == True:
if not os.path.exists(output_dir):
os.makedirs(output_dir, exist_ok=True)
for i, out in enumerate(results):
cv2.imwrite(os.path.join(output_dir, 'output_{}.png'.format(i)), out[:, :, ::-1])
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
self.args = self.parser.parse_args(argvs)
self.makeup_transfer(
paths=[{
'content': self.args.content,
'style': self.args.style
}],
output_dir=self.args.output_dir,
use_gpu=self.args.use_gpu,
visualization=self.args.visualization)
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = copy.deepcopy(images)
for image in images_decode:
image['content'] = base64_to_cv2(image['content'])
image['style'] = base64_to_cv2(image['style'])
results = self.makeup_transfer(images_decode, **kwargs)
tolist = [result.tolist() for result in results]
return tolist
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
self.arg_config_group.add_argument(
'--output_dir', type=str, default='transfer_result', help='output directory for saving result.')
self.arg_config_group.add_argument('--visualization', type=bool, default=False, help='save results or not.')
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--content', type=str, help="path to content image.")
self.arg_input_group.add_argument('--style', type=str, help="path to style image.")
import base64
import cv2
import numpy as np
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
paddlepaddle >= 2.0.0
paddlehub >= 2.0.0 paddlehub >= 2.0.0
paddlex == 1.3.7 paddlex == 1.3.7
# seeinthedark
|模型名称|seeinthedark|
| :--- | :---: |
|类别|图像 - 暗光增强|
|网络|ConvNet|
|数据集|SID dataset|
|是否支持Fine-tuning|否|
|模型大小|120MB|
|最新更新日期|2021-11-02|
|数据指标|-|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例:
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/142962370-a957d7b3-8050-4f5a-8462-3d6e49facb33.png" width = "450" height = "300" hspace='10'/>
<br />
输入图像
<br />
<img src="https://user-images.githubusercontent.com/22424850/142962460-4a1b31ef-0eec-423b-ab3d-8622f3e8261a.png" width = "450" height = "300" hspace='10'/>
<br />
输出图像
<br />
</p>
- ### 模型介绍
- 通过大量暗光条件下短曝光和长曝光组成的图像对,以RAW图像为输入,RGB图像为参照进行训练,该模型实现端到端直接将暗光下的RAW图像处理得到可见的RGB图像。
- 更多详情参考:[Learning to See in the Dark](http://cchen156.github.io/paper/18CVPR_SID.pdf)
## 二、安装
- ### 1、环境依赖
- rawpy
- ### 2、安装
- ```shell
$ hub install seeinthedark
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
# Read from a raw(Sony, .ARW) file
$ hub run seeinthedark --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现暗光增强模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
denoiser = hub.Module(name="seeinthedark")
input_path = "/PATH/TO/IMAGE"
# Read from a raw file
denoiser.denoising(paths=[input_path], output_path='./denoising_result.png', use_gpu=True)
```
- ### 3、API
- ```python
def denoising(images=None, paths=None, output_dir='./denoising_result/', use_gpu=False, visualization=True)
```
- 暗光增强API,完成对暗光RAW图像的降噪并处理生成RGB图像。
- **参数**
- images (list\[numpy.ndarray\]): 输入的图像,单通道的马赛克图像; <br/>
- paths (list\[str\]): 暗光图像文件的路径,Sony的RAW格式;<br/>
- output\_dir (str): 结果保存的路径; <br/>
- use\_gpu (bool): 是否使用 GPU;<br/>
- visualization(bool): 是否保存结果到本地文件夹
## 四、服务部署
- PaddleHub Serving可以部署一个在线图像风格转换服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m seeinthedark
```
- 这样就完成了一个图像风格转换的在线服务API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import rawpy
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(rawpy.imread("/PATH/TO/IMAGE").raw_image_visible)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/seeinthedark/"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 五、更新历史
* 1.0.0
初始发布
- ```shell
$ hub install seeinthedark==1.0.0
```
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import argparse
import paddle
import paddlehub as hub
from paddlehub.module.module import moduleinfo, runnable, serving
import numpy as np
import rawpy
import cv2
from .util import base64_to_cv2
def pack_raw(raw):
# pack Bayer image to 4 channels
im = raw
if not isinstance(raw, np.ndarray):
im = raw.raw_image_visible.astype(np.float32)
im = np.maximum(im - 512, 0) / (16383 - 512) # subtract the black level
im = np.expand_dims(im, axis=2)
img_shape = im.shape
H = img_shape[0]
W = img_shape[1]
out = np.concatenate((im[0:H:2, 0:W:2, :], im[0:H:2, 1:W:2, :], im[1:H:2, 1:W:2, :], im[1:H:2, 0:W:2, :]), axis=2)
return out
@moduleinfo(
name="seeinthedark", type="CV/denoising", author="paddlepaddle", author_email="", summary="", version="1.0.0")
class LearningToSeeInDark:
def __init__(self):
self.pretrained_model = os.path.join(self.directory, "pd_model/inference_model")
self.cpu_have_loaded = False
self.gpu_have_loaded = False
def set_device(self, use_gpu=False):
if use_gpu == False:
if not self.cpu_have_loaded:
exe = paddle.static.Executor(paddle.CPUPlace())
[prog, inputs, outputs] = paddle.static.load_inference_model(
path_prefix=self.pretrained_model,
executor=exe,
model_filename="model.pdmodel",
params_filename="model.pdiparams")
self.cpuexec, self.cpuprog, self.cpuinputs, self.cpuoutputs = exe, prog, inputs, outputs
self.cpu_have_loaded = True
return self.cpuexec, self.cpuprog, self.cpuinputs, self.cpuoutputs
else:
if not self.gpu_have_loaded:
exe = paddle.static.Executor(paddle.CUDAPlace(0))
[prog, inputs, outputs] = paddle.static.load_inference_model(
path_prefix=self.pretrained_model,
executor=exe,
model_filename="model.pdmodel",
params_filename="model.pdiparams")
self.gpuexec, self.gpuprog, self.gpuinputs, self.gpuoutputs = exe, prog, inputs, outputs
self.gpu_have_loaded = True
return self.gpuexec, self.gpuprog, self.gpuinputs, self.gpuoutputs
def denoising(self,
images: list = None,
paths: list = None,
output_dir: str = './enlightening_result/',
use_gpu: bool = False,
visualization: bool = True):
'''
Denoise a raw image in the low-light scene.
images (list[numpy.ndarray]): data of images, shape of each is [H, W], must be sing-channel image captured by camera.
paths (list[str]): paths to images
output_dir: the dir to save the results
use_gpu: if True, use gpu to perform the computation, otherwise cpu.
visualization: if True, save results in output_dir.
'''
results = []
paddle.enable_static()
exe, prog, inputs, outputs = self.set_device(use_gpu)
if images != None:
for raw in images:
input_full = np.expand_dims(pack_raw(raw), axis=0) * 300
px = input_full.shape[1] // 512
py = input_full.shape[2] // 512
rx, ry = px * 512, py * 512
input_full = input_full[:, :rx, :ry, :]
output = np.random.randn(rx * 2, ry * 2, 3)
input_full = np.minimum(input_full, 1.0)
for i in range(px):
for j in range(py):
input_patch = input_full[:, i * 512:i * 512 + 512, j * 512:j * 512 + 512, :]
result = exe.run(prog, feed={inputs[0]: input_patch}, fetch_list=outputs)
output[i * 512 * 2:i * 512 * 2 + 512 * 2, j * 512 * 2:j * 512 * 2 + 512 * 2, :] = result[0][0]
output = np.minimum(np.maximum(output, 0), 1)
output = output * 255
output = np.clip(output, 0, 255)
output = output.astype('uint8')
results.append(output)
if paths != None:
for path in paths:
raw = rawpy.imread(path)
input_full = np.expand_dims(pack_raw(raw), axis=0) * 300
px = input_full.shape[1] // 512
py = input_full.shape[2] // 512
rx, ry = px * 512, py * 512
input_full = input_full[:, :rx, :ry, :]
output = np.random.randn(rx * 2, ry * 2, 3)
input_full = np.minimum(input_full, 1.0)
for i in range(px):
for j in range(py):
input_patch = input_full[:, i * 512:i * 512 + 512, j * 512:j * 512 + 512, :]
result = exe.run(prog, feed={inputs[0]: input_patch}, fetch_list=outputs)
output[i * 512 * 2:i * 512 * 2 + 512 * 2, j * 512 * 2:j * 512 * 2 + 512 * 2, :] = result[0][0]
output = np.minimum(np.maximum(output, 0), 1)
output = output * 255
output = np.clip(output, 0, 255)
output = output.astype('uint8')
results.append(output)
if visualization == True:
if not os.path.exists(output_dir):
os.makedirs(output_dir, exist_ok=True)
for i, out in enumerate(results):
cv2.imwrite(os.path.join(output_dir, 'output_{}.png'.format(i)), out[:, :, ::-1])
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
self.args = self.parser.parse_args(argvs)
self.denoising(
paths=[self.args.input_path],
output_dir=self.args.output_dir,
use_gpu=self.args.use_gpu,
visualization=self.args.visualization)
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.denoising(images=images_decode, **kwargs)
tolist = [result.tolist() for result in results]
return tolist
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
self.arg_config_group.add_argument(
'--output_dir', type=str, default='denoising_result', help='output directory for saving result.')
self.arg_config_group.add_argument('--visualization', type=bool, default=False, help='save results or not.')
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument(
'--input_path', type=str, help="path to input raw image, should be raw file captured by camera.")
# dim_vgg16_matting
|模型名称|dim_vgg16_matting|
| :--- | :---: |
|类别|图像-抠图|
|网络|dim_vgg16|
|数据集|百度自建数据集|
|是否支持Fine-tuning|否|
|模型大小|164MB|
|指标|SAD112.73|
|最新更新日期|2021-12-03|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例(左为原图,右为效果图):
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/144574288-28671577-8d5d-4b20-adb9-fe737015c841.jpg" width = "337" height = "505" hspace='10' />
<img src="https://user-images.githubusercontent.com/35907364/144779164-47146d3a-58c9-4a38-b968-3530aa9a0137.png" width = "337" height = "505" hspace='10'/>
</p>
- ### 模型介绍
- Matting(精细化分割/影像去背/抠图)是指借由计算前景的颜色和透明度,将前景从影像中撷取出来的技术,可用于替换背景、影像合成、视觉特效,在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值,称作阿法值(Alpha),一张影像中所有阿法值的集合称作阿法遮罩(Alpha Matte),将影像被遮罩所涵盖的部分取出即可完成前景的分离。dim_vgg16_matting是一种需要trimap作为输入的matting模型。
- 更多详情请参考:[dim_vgg16_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- ### 2、安装
- ```shell
$ hub install dim_vgg16_matting
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run dim_vgg16_matting --input_path "/PATH/TO/IMAGE" --trimap_path "/PATH/TO/TRIMAP"
```
- 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="dim_vgg16_matting")
result = model.predict(image_list=["/PATH/TO/IMAGE"], trimap_list=["PATH/TO/TRIMAP"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
trimap_list,
visualization,
save_path):
```
- 人像matting预测API,用于将输入图片中的人像分割出来。
- 参数
- image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
- trimap_list(list(str | numpy.ndarray)):trimap输入路径或者单通道灰度图片。
- visualization (bool): 是否进行可视化,默认为False。
- save_path (str): 当visualization为True时,保存图片的路径,默认为"dim_vgg16_matting_output" 。
- 返回
- result (list(numpy.ndarray)):模型分割结果:
## 四、服务部署
- PaddleHub Serving可以部署人像matting在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m dim_vgg16_matting
```
- 这样就完成了一个人像matting在线服务API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import time
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))], 'trimaps':[cv2_to_base64(cv2.imread("/PATH/TO/TRIMAP"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/dim_vgg16_matting"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
for image in r.json()["results"]['data']:
data = base64_to_cv2(image)
image_path =str(time.time()) + ".png"
cv2.imwrite(image_path, data)
```
## 五、更新历史
* 1.0.0
初始发布
# dim_vgg16_matting
|Module Name|dim_vgg16_matting|
| :--- | :---: |
|Category|Matting|
|Network|dim_vgg16|
|Dataset|Baidu self-built dataset|
|Support Fine-tuning|No|
|Module Size|164MB|
|Data Indicators|-|
|Latest update date|2021-12-03|
## I. Basic Information
- ### Application Effect Display
- Sample results:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/144574288-28671577-8d5d-4b20-adb9-fe737015c841.jpg" width = "337" height = "505" hspace='10'/>
<img src="https://user-images.githubusercontent.com/35907364/144779164-47146d3a-58c9-4a38-b968-3530aa9a0137.png" width = "337" height = "505" hspace='10'/>
</p>
- ### Module Introduction
- Mating is the technique of extracting foreground from an image by calculating its color and transparency. It is widely used in the film industry to replace background, image composition, and visual effects. Each pixel in the image will have a value that represents its foreground transparency, called Alpha. The set of all Alpha values in an image is called Alpha Matte. The part of the image covered by the mask can be extracted to complete foreground separation.
- For more information, please refer to: [dim_vgg16_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
## II. Installation
- ### 1、Environmental Dependence
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- ### 2、Installation
- ```shell
$ hub install dim_vgg16_matting
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
| [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III. Module API Prediction
- ### 1、Command line Prediction
- ```shell
$ hub run dim_vgg16_matting --input_path "/PATH/TO/IMAGE" --trimap_path "/PATH/TO/TRIMAP"
```
- If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
- ### 2、Prediction Code Example
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="dim_vgg16_matting")
result = model.predict(image_list=["/PATH/TO/IMAGE"], trimap_list=["PATH/TO/TRIMAP"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
trimap_list,
visualization,
save_path):
```
- Prediction API for matting.
- **Parameter**
- image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\],BGR.
- trimap_list(list(str | numpy.ndarray)): Trimap path or trimap data, ndarray.shape is in the format \[H, W],Gray style.
- visualization (bool): Whether to save the recognition results as picture files, default is False.
- save_path (str): Save path of images, "dim_vgg16_matting_output" by default.
- **Return**
- result (list(numpy.ndarray)):The list of model results.
## IV. Server Deployment
- PaddleHub Serving can deploy an online service of matting.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m dim_vgg16_matting
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result
```python
import requests
import json
import cv2
import base64
import time
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))], 'trimaps':[cv2_to_base64(cv2.imread("/PATH/TO/TRIMAP"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/dim_vgg16_matting"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
for image in r.json()["results"]['data']:
data = base64_to_cv2(image)
image_path =str(time.time()) + ".png"
cv2.imwrite(image_path, data)
```
## V. Release Note
- 1.0.0
First release
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
import argparse
from typing import Callable, Union, List, Tuple
import numpy as np
import cv2
import scipy
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlehub.module.module import moduleinfo
import paddlehub.vision.segmentation_transforms as T
from paddlehub.module.module import moduleinfo, runnable, serving
from paddleseg.models import layers
from dim_vgg16_matting.vgg import VGG16
import dim_vgg16_matting.processor as P
@moduleinfo(
name="dim_vgg16_matting",
type="CV/matting",
author="paddlepaddle",
summary="dim_vgg16_matting is a matting model",
version="1.0.0"
)
class DIMVGG16(nn.Layer):
"""
The DIM implementation based on PaddlePaddle.
The original article refers to
Ning Xu, et, al. "Deep Image Matting"
(https://arxiv.org/pdf/1908.07919.pdf).
Args:
stage (int, optional): The stage of model. Defautl: 3.
decoder_input_channels(int, optional): The channel of decoder input. Default: 512.
pretrained(str, optional): The path of pretrianed model. Defautl: None.
"""
def __init__(self,
stage: int = 3,
decoder_input_channels: int = 512,
pretrained: str = None):
super(DIMVGG16, self).__init__()
self.backbone = VGG16()
self.pretrained = pretrained
self.stage = stage
decoder_output_channels = [64, 128, 256, 512]
self.decoder = Decoder(
input_channels=decoder_input_channels,
output_channels=decoder_output_channels)
if self.stage == 2:
for param in self.backbone.parameters():
param.stop_gradient = True
for param in self.decoder.parameters():
param.stop_gradient = True
if self.stage >= 2:
self.refine = Refine()
self.transforms = P.Compose([P.LoadImages(), P.LimitLong(max_long=3840),P.Normalize()])
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'dim-vgg16.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def preprocess(self, img: Union[str, np.ndarray] , transforms: Callable, trimap: Union[str, np.ndarray] = None) -> dict:
data = {}
data['img'] = img
if trimap is not None:
data['trimap'] = trimap
data['gt_fields'] = ['trimap']
data['trans_info'] = []
data = self.transforms(data)
data['img'] = paddle.to_tensor(data['img'])
data['img'] = data['img'].unsqueeze(0)
if trimap is not None:
data['trimap'] = paddle.to_tensor(data['trimap'])
data['trimap'] = data['trimap'].unsqueeze((0, 1))
return data
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
input_shape = paddle.shape(inputs['img'])[-2:]
x = paddle.concat([inputs['img'], inputs['trimap'] / 255], axis=1)
fea_list = self.backbone(x)
# decoder stage
up_shape = []
for i in range(5):
up_shape.append(paddle.shape(fea_list[i])[-2:])
alpha_raw = self.decoder(fea_list, up_shape)
alpha_raw = F.interpolate(
alpha_raw, input_shape, mode='bilinear', align_corners=False)
logit_dict = {'alpha_raw': alpha_raw}
if self.stage < 2:
return logit_dict
if self.stage >= 2:
# refine stage
refine_input = paddle.concat([inputs['img'], alpha_raw], axis=1)
alpha_refine = self.refine(refine_input)
# finally alpha
alpha_pred = alpha_refine + alpha_raw
alpha_pred = F.interpolate(
alpha_pred, input_shape, mode='bilinear', align_corners=False)
if not self.training:
alpha_pred = paddle.clip(alpha_pred, min=0, max=1)
logit_dict['alpha_pred'] = alpha_pred
return alpha_pred
def predict(self, image_list: list, trimap_list: list, visualization: bool =False, save_path: str = "dim_vgg16_matting_output") -> list:
self.eval()
result= []
with paddle.no_grad():
for i, im_path in enumerate(image_list):
trimap = trimap_list[i] if trimap_list is not None else None
data = self.preprocess(img=im_path, transforms=self.transforms, trimap=trimap)
alpha_pred = self.forward(data)
alpha_pred = P.reverse_transform(alpha_pred, data['trans_info'])
alpha_pred = (alpha_pred.numpy()).squeeze()
alpha_pred = (alpha_pred * 255).astype('uint8')
alpha_pred = P.save_alpha_pred(alpha_pred, trimap)
result.append(alpha_pred)
if visualization:
if not os.path.exists(save_path):
os.makedirs(save_path)
img_name = str(time.time()) + '.png'
image_save_path = os.path.join(save_path, img_name)
cv2.imwrite(image_save_path, alpha_pred)
return result
@serving
def serving_method(self, images: list, trimaps:list, **kwargs) -> dict:
"""
Run as a service.
"""
images_decode = [P.base64_to_cv2(image) for image in images]
if trimaps is not None:
trimap_decoder = [cv2.cvtColor(P.base64_to_cv2(trimap), cv2.COLOR_BGR2GRAY) for trimap in trimaps]
else:
trimap_decoder = None
outputs = self.predict(image_list=images_decode, trimap_list= trimap_decoder, **kwargs)
serving_data = [P.cv2_to_base64(outputs[i]) for i in range(len(outputs))]
results = {'data': serving_data}
return results
@runnable
def run_cmd(self, argvs: list) -> list:
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
args = self.parser.parse_args(argvs)
if args.trimap_path is not None:
trimap_list = [args.trimap_path]
else:
trimap_list = None
results = self.predict(image_list=[args.input_path], trimap_list=trimap_list, save_path=args.output_dir, visualization=args.visualization)
return results
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument(
'--output_dir', type=str, default="dim_vgg16_matting_output", help="The directory to save output images.")
self.arg_config_group.add_argument(
'--visualization', type=bool, default=True, help="whether to save output as images.")
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
self.arg_input_group.add_argument('--trimap_path', type=str, help="path to trimap.")
class Up(nn.Layer):
def __init__(self, input_channels: int, output_channels: int):
super().__init__()
self.conv = layers.ConvBNReLU(
input_channels,
output_channels,
kernel_size=5,
padding=2,
bias_attr=False)
def forward(self, x: paddle.Tensor, skip: paddle.Tensor, output_shape: list) -> paddle.Tensor:
x = F.interpolate(
x, size=output_shape, mode='bilinear', align_corners=False)
x = x + skip
x = self.conv(x)
x = F.relu(x)
return x
class Decoder(nn.Layer):
def __init__(self, input_channels: int, output_channels: list = [64, 128, 256, 512]):
super().__init__()
self.deconv6 = nn.Conv2D(
input_channels, input_channels, kernel_size=1, bias_attr=False)
self.deconv5 = Up(input_channels, output_channels[-1])
self.deconv4 = Up(output_channels[-1], output_channels[-2])
self.deconv3 = Up(output_channels[-2], output_channels[-3])
self.deconv2 = Up(output_channels[-3], output_channels[-4])
self.deconv1 = Up(output_channels[-4], 64)
self.alpha_conv = nn.Conv2D(
64, 1, kernel_size=5, padding=2, bias_attr=False)
def forward(self, fea_list: list, shape_list: list) -> paddle.Tensor:
x = fea_list[-1]
x = self.deconv6(x)
x = self.deconv5(x, fea_list[4], shape_list[4])
x = self.deconv4(x, fea_list[3], shape_list[3])
x = self.deconv3(x, fea_list[2], shape_list[2])
x = self.deconv2(x, fea_list[1], shape_list[1])
x = self.deconv1(x, fea_list[0], shape_list[0])
alpha = self.alpha_conv(x)
alpha = F.sigmoid(alpha)
return alpha
class Refine(nn.Layer):
def __init__(self):
super().__init__()
self.conv1 = layers.ConvBNReLU(
4, 64, kernel_size=3, padding=1, bias_attr=False)
self.conv2 = layers.ConvBNReLU(
64, 64, kernel_size=3, padding=1, bias_attr=False)
self.conv3 = layers.ConvBNReLU(
64, 64, kernel_size=3, padding=1, bias_attr=False)
self.alpha_pred = layers.ConvBNReLU(
64, 1, kernel_size=3, padding=1, bias_attr=False)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.conv1(x)
x = self.conv2(x)
x = self.conv3(x)
alpha = self.alpha_pred(x)
return alpha
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import random
import base64
from typing import Callable, Union, List, Tuple
import cv2
import numpy as np
import paddle
import paddle.nn.functional as F
from paddleseg.transforms import functional
from PIL import Image
class Compose:
"""
Do transformation on input data with corresponding pre-processing and augmentation operations.
The shape of input data to all operations is [height, width, channels].
"""
def __init__(self, transforms: Callable, to_rgb: bool = True):
if not isinstance(transforms, list):
raise TypeError('The transforms must be a list!')
self.transforms = transforms
self.to_rgb = to_rgb
def __call__(self, data: dict) -> dict:
if 'trans_info' not in data:
data['trans_info'] = []
for op in self.transforms:
data = op(data)
if data is None:
return None
data['img'] = np.transpose(data['img'], (2, 0, 1))
for key in data.get('gt_fields', []):
if len(data[key].shape) == 2:
continue
data[key] = np.transpose(data[key], (2, 0, 1))
return data
class LoadImages:
"""
Read images from image path.
Args:
to_rgb (bool, optional): If converting image to RGB color space. Default: True.
"""
def __init__(self, to_rgb: bool = True):
self.to_rgb = to_rgb
def __call__(self, data: dict) -> dict:
if isinstance(data['img'], str):
data['img'] = cv2.imread(data['img'])
for key in data.get('gt_fields', []):
if isinstance(data[key], str):
data[key] = cv2.imread(data[key], cv2.IMREAD_UNCHANGED)
# if alpha and trimap has 3 channels, extract one.
if key in ['alpha', 'trimap']:
if len(data[key].shape) > 2:
data[key] = data[key][:, :, 0]
if self.to_rgb:
data['img'] = cv2.cvtColor(data['img'], cv2.COLOR_BGR2RGB)
for key in data.get('gt_fields', []):
if len(data[key].shape) == 2:
continue
data[key] = cv2.cvtColor(data[key], cv2.COLOR_BGR2RGB)
return data
class LimitLong:
"""
Limit the long edge of image.
If the long edge is larger than max_long, resize the long edge
to max_long, while scale the short edge proportionally.
If the long edge is smaller than min_long, resize the long edge
to min_long, while scale the short edge proportionally.
Args:
max_long (int, optional): If the long edge of image is larger than max_long,
it will be resize to max_long. Default: None.
min_long (int, optional): If the long edge of image is smaller than min_long,
it will be resize to min_long. Default: None.
"""
def __init__(self, max_long=None, min_long=None):
if max_long is not None:
if not isinstance(max_long, int):
raise TypeError(
"Type of `max_long` is invalid. It should be int, but it is {}"
.format(type(max_long)))
if min_long is not None:
if not isinstance(min_long, int):
raise TypeError(
"Type of `min_long` is invalid. It should be int, but it is {}"
.format(type(min_long)))
if (max_long is not None) and (min_long is not None):
if min_long > max_long:
raise ValueError(
'`max_long should not smaller than min_long, but they are {} and {}'
.format(max_long, min_long))
self.max_long = max_long
self.min_long = min_long
def __call__(self, data):
h, w = data['img'].shape[:2]
long_edge = max(h, w)
target = long_edge
if (self.max_long is not None) and (long_edge > self.max_long):
target = self.max_long
elif (self.min_long is not None) and (long_edge < self.min_long):
target = self.min_long
if target != long_edge:
data['trans_info'].append(('resize', data['img'].shape[0:2]))
data['img'] = functional.resize_long(data['img'], target)
for key in data.get('gt_fields', []):
data[key] = functional.resize_long(data[key], target)
return data
class Normalize:
"""
Normalize an image.
Args:
mean (list, optional): The mean value of a data set. Default: [0.5, 0.5, 0.5].
std (list, optional): The standard deviation of a data set. Default: [0.5, 0.5, 0.5].
Raises:
ValueError: When mean/std is not list or any value in std is 0.
"""
def __init__(self, mean: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5), std: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5)):
self.mean = mean
self.std = std
if not (isinstance(self.mean, (list, tuple))
and isinstance(self.std, (list, tuple))):
raise ValueError(
"{}: input type is invalid. It should be list or tuple".format(
self))
from functools import reduce
if reduce(lambda x, y: x * y, self.std) == 0:
raise ValueError('{}: std is invalid!'.format(self))
def __call__(self, data: dict) -> dict:
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
std = np.array(self.std)[np.newaxis, np.newaxis, :]
data['img'] = functional.normalize(data['img'], mean, std)
if 'fg' in data.get('gt_fields', []):
data['fg'] = functional.normalize(data['fg'], mean, std)
if 'bg' in data.get('gt_fields', []):
data['bg'] = functional.normalize(data['bg'], mean, std)
return data
def reverse_transform(alpha: paddle.Tensor, trans_info: List[str]):
"""recover pred to origin shape"""
for item in trans_info[::-1]:
if item[0] == 'resize':
h, w = item[1][0], item[1][1]
alpha = F.interpolate(alpha, [h, w], mode='bilinear')
elif item[0] == 'padding':
h, w = item[1][0], item[1][1]
alpha = alpha[:, :, 0:h, 0:w]
else:
raise Exception("Unexpected info '{}' in im_info".format(item[0]))
return alpha
def save_alpha_pred(alpha: np.ndarray, trimap: np.ndarray = None):
"""
The value of alpha is range [0, 1], shape should be [h,w]
"""
if isinstance(trimap, str):
trimap = cv2.imread(trimap, 0)
alpha[trimap == 0] = 0
alpha[trimap == 255] = 255
alpha = (alpha).astype('uint8')
return alpha
def cv2_to_base64(image: np.ndarray):
"""
Convert data from BGR to base64 format.
"""
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str: str):
"""
Convert data from base64 to BGR format.
"""
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import List, Tuple
import paddle
from paddle import ParamAttr
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn import Conv2D, BatchNorm, Linear, Dropout
from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D
from paddleseg.utils import utils
class ConvBlock(nn.Layer):
def __init__(self, input_channels: int, output_channels: int, groups: int, name: str = None):
super(ConvBlock, self).__init__()
self.groups = groups
self._conv_1 = Conv2D(
in_channels=input_channels,
out_channels=output_channels,
kernel_size=3,
stride=1,
padding=1,
weight_attr=ParamAttr(name=name + "1_weights"),
bias_attr=False)
if groups == 2 or groups == 3 or groups == 4:
self._conv_2 = Conv2D(
in_channels=output_channels,
out_channels=output_channels,
kernel_size=3,
stride=1,
padding=1,
weight_attr=ParamAttr(name=name + "2_weights"),
bias_attr=False)
if groups == 3 or groups == 4:
self._conv_3 = Conv2D(
in_channels=output_channels,
out_channels=output_channels,
kernel_size=3,
stride=1,
padding=1,
weight_attr=ParamAttr(name=name + "3_weights"),
bias_attr=False)
if groups == 4:
self._conv_4 = Conv2D(
in_channels=output_channels,
out_channels=output_channels,
kernel_size=3,
stride=1,
padding=1,
weight_attr=ParamAttr(name=name + "4_weights"),
bias_attr=False)
self._pool = MaxPool2D(
kernel_size=2, stride=2, padding=0, return_mask=True)
def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
x = self._conv_1(inputs)
x = F.relu(x)
if self.groups == 2 or self.groups == 3 or self.groups == 4:
x = self._conv_2(x)
x = F.relu(x)
if self.groups == 3 or self.groups == 4:
x = self._conv_3(x)
x = F.relu(x)
if self.groups == 4:
x = self._conv_4(x)
x = F.relu(x)
skip = x
x, max_indices = self._pool(x)
return x, max_indices, skip
class VGGNet(nn.Layer):
def __init__(self, input_channels: int = 4, layers: int = 11, pretrained: str = None):
super(VGGNet, self).__init__()
self.pretrained = pretrained
self.layers = layers
self.vgg_configure = {
11: [1, 1, 2, 2, 2],
13: [2, 2, 2, 2, 2],
16: [2, 2, 3, 3, 3],
19: [2, 2, 4, 4, 4]
}
assert self.layers in self.vgg_configure.keys(), \
"supported layers are {} but input layer is {}".format(
self.vgg_configure.keys(), layers)
self.groups = self.vgg_configure[self.layers]
# matting的第一层卷积输入为4通道,初始化是直接初始化为0
self._conv_block_1 = ConvBlock(
input_channels, 64, self.groups[0], name="conv1_")
self._conv_block_2 = ConvBlock(64, 128, self.groups[1], name="conv2_")
self._conv_block_3 = ConvBlock(128, 256, self.groups[2], name="conv3_")
self._conv_block_4 = ConvBlock(256, 512, self.groups[3], name="conv4_")
self._conv_block_5 = ConvBlock(512, 512, self.groups[4], name="conv5_")
# 这一层的初始化需要利用vgg fc6的参数转换后进行初始化,可以暂时不考虑初始化
self._conv_6 = Conv2D(
512, 512, kernel_size=3, padding=1, bias_attr=False)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
fea_list = []
ids_list = []
x, ids, skip = self._conv_block_1(inputs)
fea_list.append(skip)
ids_list.append(ids)
x, ids, skip = self._conv_block_2(x)
fea_list.append(skip)
ids_list.append(ids)
x, ids, skip = self._conv_block_3(x)
fea_list.append(skip)
ids_list.append(ids)
x, ids, skip = self._conv_block_4(x)
fea_list.append(skip)
ids_list.append(ids)
x, ids, skip = self._conv_block_5(x)
fea_list.append(skip)
ids_list.append(ids)
x = F.relu(self._conv_6(x))
fea_list.append(x)
return fea_list
def VGG16(**args):
model = VGGNet(layers=16, **args)
return model
\ No newline at end of file
# gfm_resnet34_matting
|模型名称|gfm_resnet34_matting|
| :--- | :---: |
|类别|图像-抠图|
|网络|gfm_resnet34|
|数据集|AM-2k|
|是否支持Fine-tuning|否|
|模型大小|562MB|
|指标|SAD10.89|
|最新更新日期|2021-12-03|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例(左为原图,右为效果图):
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/145993777-9b69a85d-d31c-4743-8620-82b2a56ca1e7.jpg" width = "480" height = "350" hspace='10'/>
<img src="https://user-images.githubusercontent.com/35907364/145993809-b0fb4bae-2c64-4868-99fc-500f19343442.png" width = "480" height = "350" hspace='10'/>
</p>
- ### 模型介绍
- Matting(精细化分割/影像去背/抠图)是指借由计算前景的颜色和透明度,将前景从影像中撷取出来的技术,可用于替换背景、影像合成、视觉特效,在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值,称作阿法值(Alpha),一张影像中所有阿法值的集合称作阿法遮罩(Alpha Matte),将影像被遮罩所涵盖的部分取出即可完成前景的分离。gfm_resnet34_matting可生成抠图结果。
- 更多详情请参考:[gfm_resnet34_matting](https://github.com/JizhiziLi/GFM)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- ### 2、安装
- ```shell
$ hub install gfm_resnet34_matting
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run gfm_resnet34_matting --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="gfm_resnet34_matting")
result = model.predict(["/PATH/TO/IMAGE"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
visualization,
save_path):
```
- 动物matting预测API,用于将输入图片中的动物分割出来。
- 参数
- image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
- visualization (bool): 是否进行可视化,默认为False。
- save_path (str): 当visualization为True时,保存图片的路径,默认为"gfm_resnet34_matting_output"。
- 返回
- result (list(numpy.ndarray)):模型分割结果:
## 四、服务部署
- PaddleHub Serving可以部署动物matting在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m gfm_resnet34_matting
```
- 这样就完成了一个动物matting在线服务API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import time
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/gfm_resnet34_matting"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
for image in r.json()["results"]['data']:
data = base64_to_cv2(image)
image_path =str(time.time()) + ".png"
cv2.imwrite(image_path, data)
```
## 五、更新历史
* 1.0.0
初始发布
# gfm_resnet34_matting
|Module Name|gfm_resnet34_matting|
| :--- | :---: |
|Category|Image Matting|
|Network|gfm_resnet34|
|Dataset|AM-2k|
|Support Fine-tuning|No|
|Module Size|562MB|
|Data Indicators|SAD10.89|
|Latest update date|2021-12-03|
## I. Basic Information
- ### Application Effect Display
- Sample results:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/145993777-9b69a85d-d31c-4743-8620-82b2a56ca1e7.jpg" width = "480" height = "350" hspace='10'/>
<img src="https://user-images.githubusercontent.com/35907364/145993809-b0fb4bae-2c64-4868-99fc-500f19343442.png" width = "480" height = "350" hspace='10'/>
</p>
- ### Module Introduction
- Mating is the technique of extracting foreground from an image by calculating its color and transparency. It is widely used in the film industry to replace background, image composition, and visual effects. Each pixel in the image will have a value that represents its foreground transparency, called Alpha. The set of all Alpha values in an image is called Alpha Matte. The part of the image covered by the mask can be extracted to complete foreground separation.
- For more information, please refer to: [gfm_resnet34_matting](https://github.com/JizhiziLi/GFM)
## II. Installation
- ### 1、Environmental Dependence
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- ### 2、Installation
- ```shell
$ hub install gfm_resnet34_matting
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
| [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III. Module API Prediction
- ### 1、Command line Prediction
- ```shell
$ hub run gfm_resnet34_matting --input_path "/PATH/TO/IMAGE"
```
- If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
- ### 2、Prediction Code Example
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="gfm_resnet34_matting")
result = model.predict(["/PATH/TO/IMAGE"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
visualization,
save_path):
```
- Prediction API for matting.
- **Parameter**
- image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\],BGR.
- visualization (bool): Whether to save the recognition results as picture files, default is False.
- save_path (str): Save path of images, "modnet_mobilenetv2_matting_output" by default.
- **Return**
- result (list(numpy.ndarray)):The list of model results.
## IV. Server Deployment
- PaddleHub Serving can deploy an online service of matting.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m gfm_resnet34_matting
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result
```python
import requests
import json
import cv2
import base64
import time
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/gfm_resnet34_matting"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
for image in r.json()["results"]['data']:
data = base64_to_cv2(image)
image_path =str(time.time()) + ".png"
cv2.imwrite(image_path, data)
```
## V. Release Note
- 1.0.0
First release
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Callable, Union, List, Tuple
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from gfm_resnet34_matting.resnet import resnet34
def conv3x3(in_planes: int, out_planes: int, stride: int = 1) -> Callable:
"""3x3 convolution with padding"""
return nn.Conv2D(in_planes, out_planes, kernel_size=3, stride=stride,
padding=1, bias_attr=False)
def conv_up_psp(in_channels: int, out_channels: int, up_sample: float) -> Callable:
return nn.Sequential(nn.Conv2D(in_channels, out_channels, 3, padding=1),
nn.BatchNorm2D(out_channels),
nn.ReLU(),
nn.Upsample(scale_factor=up_sample, mode='bilinear',align_corners = False))
def build_bb(in_channels: int, mid_channels: int, out_channels: int) -> Callable:
return nn.Sequential(nn.Conv2D(in_channels, mid_channels, 3, dilation=2,
padding=2), nn.BatchNorm2D(mid_channels), nn.
ReLU(), nn.Conv2D(mid_channels, out_channels, 3,
dilation=2, padding=2), nn.BatchNorm2D(out_channels), nn.ReLU(), nn.Conv2D(out_channels,
out_channels, 3, dilation=2, padding=2), nn.BatchNorm2D(
out_channels), nn.ReLU())
def build_decoder(in_channels: int, mid_channels_1: int, mid_channels_2: int, out_channels: int,
last_bnrelu: bool, upsample_flag: bool) -> Callable:
layers = []
layers += [nn.Conv2D(in_channels, mid_channels_1, 3, padding=1), nn.
BatchNorm2D(mid_channels_1), nn.ReLU(), nn.Conv2D(mid_channels_1, mid_channels_2, 3, padding=1), nn.
BatchNorm2D(mid_channels_2), nn.ReLU(), nn.Conv2D(mid_channels_2, out_channels, 3, padding=1)]
if last_bnrelu:
layers += [nn.BatchNorm2D(out_channels), nn.ReLU()]
if upsample_flag:
layers += [nn.Upsample(scale_factor=2, mode='bilinear')]
sequential = nn.Sequential(*layers)
return sequential
class BasicBlock(nn.Layer):
expansion = 1
def __init__(self, inplanes: int, planes: int, stride: int = 1, downsample=None):
super(BasicBlock, self).__init__()
self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = nn.BatchNorm2D(planes)
self.relu = nn.ReLU()
self.conv2 = conv3x3(planes, planes)
self.bn2 = nn.BatchNorm2D(planes)
self.downsample = downsample
self.stride = stride
def forward(self, x: paddle.Tensor) -> Callable:
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
class PSPModule(nn.Layer):
def __init__(self, features: paddle.Tensor, out_features: int = 1024, sizes: List[int] = (1, 2, 3, 6)):
super().__init__()
#self.stages = []
self.stages = nn.LayerList([self._make_stage(features, size) for
size in sizes])
self.bottleneck = nn.Conv2D(features * (len(sizes) + 1),
out_features, kernel_size=1)
self.relu = nn.ReLU()
def _make_stage(self, features: paddle.Tensor, size: int) -> Callable:
prior = nn.AdaptiveAvgPool2D(output_size=(size, size))
conv = nn.Conv2D(features, features, kernel_size=1, bias_attr=False)
return nn.Sequential(prior, conv)
def forward(self, feats: paddle.Tensor) -> paddle.Tensor:
h, w = feats.shape[2], feats.shape[3]
priors = [F.upsample(stage(feats), size=(h, w), mode='bilinear',align_corners = True) for stage in self.stages] + [feats]
bottle = self.bottleneck(paddle.concat(priors, 1))
return self.relu(bottle)
class SELayer(nn.Layer):
def __init__(self, channel: int, reduction: int = 4):
super(SELayer, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2D(1)
self.fc = nn.Sequential(nn.Linear(channel, channel // reduction,
bias_attr=False), nn.ReLU(), nn.
Linear(channel // reduction, channel, bias_attr=False), nn.
Sigmoid())
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
b, c, _, _ = x.size()
y = self.avg_pool(x).view(b, c)
y = self.fc(y).view(b, c, 1, 1)
return x * y.expand_as(x)
class GFM(nn.Layer):
"""
The GFM implementation based on PaddlePaddle.
The original article refers to:
Bridging Composite and Real: Towards End-to-end Deep Image Matting [IJCV-2021]
Main network file (GFM).
Copyright (c) 2021, Jizhizi Li (jili8515@uni.sydney.edu.au)
Licensed under the MIT License (see LICENSE for details)
Github repo: https://github.com/JizhiziLi/GFM
Paper link (Arxiv): https://arxiv.org/abs/2010.16188
"""
def __init__(self):
super().__init__()
self.backbone = 'r34_2b'
self.rosta = 'TT'
if self.rosta == 'TT':
self.gd_channel = 3
else:
self.gd_channel = 2
if self.backbone == 'r34_2b':
self.resnet = resnet34()
self.encoder0 = nn.Sequential(nn.Conv2D(3, 64, 3, padding=1),
nn.BatchNorm2D(64), nn.ReLU())
self.encoder1 = self.resnet.layer1
self.encoder2 = self.resnet.layer2
self.encoder3 = self.resnet.layer3
self.encoder4 = self.resnet.layer4
self.encoder5 = nn.Sequential(nn.MaxPool2D(2, 2, ceil_mode=True
), BasicBlock(512, 512), BasicBlock(512, 512), BasicBlock(
512, 512))
self.encoder6 = nn.Sequential(nn.MaxPool2D(2, 2, ceil_mode=True
), BasicBlock(512, 512), BasicBlock(512, 512), BasicBlock(
512, 512))
self.psp_module = PSPModule(512, 512, (1, 3, 5))
self.psp6 = conv_up_psp(512, 512, 2)
self.psp5 = conv_up_psp(512, 512, 4)
self.psp4 = conv_up_psp(512, 256, 8)
self.psp3 = conv_up_psp(512, 128, 16)
self.psp2 = conv_up_psp(512, 64, 32)
self.psp1 = conv_up_psp(512, 64, 32)
self.decoder6_g = build_decoder(1024, 512, 512, 512, True, True)
self.decoder5_g = build_decoder(1024, 512, 512, 512, True, True)
self.decoder4_g = build_decoder(1024, 512, 512, 256, True, True)
self.decoder3_g = build_decoder(512, 256, 256, 128, True, True)
self.decoder2_g = build_decoder(256, 128, 128, 64, True, True)
self.decoder1_g = build_decoder(128, 64, 64, 64, True, False)
self.bridge_block = build_bb(512, 512, 512)
self.decoder6_f = build_decoder(1024, 512, 512, 512, True, True)
self.decoder5_f = build_decoder(1024, 512, 512, 512, True, True)
self.decoder4_f = build_decoder(1024, 512, 512, 256, True, True)
self.decoder3_f = build_decoder(512, 256, 256, 128, True, True)
self.decoder2_f = build_decoder(256, 128, 128, 64, True, True)
self.decoder1_f = build_decoder(128, 64, 64, 64, True, False)
if self.rosta == 'RIM':
self.decoder0_g_tt = nn.Sequential(nn.Conv2D(64, 3, 3,
padding=1))
self.decoder0_g_ft = nn.Sequential(nn.Conv2D(64, 2, 3,
padding=1))
self.decoder0_g_bt = nn.Sequential(nn.Conv2D(64, 2, 3,
padding=1))
self.decoder0_f_tt = nn.Sequential(nn.Conv2D(64, 1, 3,
padding=1))
self.decoder0_f_ft = nn.Sequential(nn.Conv2D(64, 1, 3,
padding=1))
self.decoder0_f_bt = nn.Sequential(nn.Conv2D(64, 1, 3,
padding=1))
else:
self.decoder0_g = nn.Sequential(nn.Conv2D(64, self.
gd_channel, 3, padding=1))
self.decoder0_f = nn.Sequential(nn.Conv2D(64, 1, 3, padding=1))
if self.backbone == 'r34':
self.encoder0 = nn.Sequential(self.resnet.conv1, self.resnet.
bn1, self.resnet.relu)
self.encoder1 = nn.Sequential(self.resnet.maxpool, self.resnet.
layer1)
self.encoder2 = self.resnet.layer2
self.encoder3 = self.resnet.layer3
self.encoder4 = self.resnet.layer4
self.psp_module = PSPModule(512, 512, (1, 3, 5))
self.psp4 = conv_up_psp(512, 256, 2)
self.psp3 = conv_up_psp(512, 128, 4)
self.psp2 = conv_up_psp(512, 64, 8)
self.psp1 = conv_up_psp(512, 64, 16)
self.decoder4_g = build_decoder(1024, 512, 512, 256, True, True)
self.decoder3_g = build_decoder(512, 256, 256, 128, True, True)
self.decoder2_g = build_decoder(256, 128, 128, 64, True, True)
self.decoder1_g = build_decoder(128, 64, 64, 64, True, True)
self.bridge_block = build_bb(512, 512, 512)
self.decoder4_f = build_decoder(1024, 512, 512, 256, True, True)
self.decoder3_f = build_decoder(512, 256, 256, 128, True, True)
self.decoder2_f = build_decoder(256, 128, 128, 64, True, True)
self.decoder1_f = build_decoder(128, 64, 64, 64, True, True)
if self.rosta == 'RIM':
self.decoder0_g_tt = build_decoder(128, 64, 64, 3, False, True)
self.decoder0_g_ft = build_decoder(128, 64, 64, 2, False, True)
self.decoder0_g_bt = build_decoder(128, 64, 64, 2, False, True)
self.decoder0_f_tt = build_decoder(128, 64, 64, 1, False, True)
self.decoder0_f_ft = build_decoder(128, 64, 64, 1, False, True)
self.decoder0_f_bt = build_decoder(128, 64, 64, 1, False, True)
else:
self.decoder0_g = build_decoder(128, 64, 64, self.
gd_channel, False, True)
self.decoder0_f = build_decoder(128, 64, 64, 1, False, True)
elif self.backbone == 'r101':
self.encoder0 = nn.Sequential(self.resnet.conv1, self.resnet.
bn1, self.resnet.relu)
self.encoder1 = nn.Sequential(self.resnet.maxpool, self.resnet.
layer1)
self.encoder2 = self.resnet.layer2
self.encoder3 = self.resnet.layer3
self.encoder4 = self.resnet.layer4
self.psp_module = PSPModule(2048, 2048, (1, 3, 5))
self.bridge_block = build_bb(2048, 2048, 2048)
self.psp4 = conv_up_psp(2048, 1024, 2)
self.psp3 = conv_up_psp(2048, 512, 4)
self.psp2 = conv_up_psp(2048, 256, 8)
self.psp1 = conv_up_psp(2048, 64, 16)
self.decoder4_g = build_decoder(4096, 2048, 1024, 1024, True, True)
self.decoder3_g = build_decoder(2048, 1024, 512, 512, True, True)
self.decoder2_g = build_decoder(1024, 512, 256, 256, True, True)
self.decoder1_g = build_decoder(512, 256, 128, 64, True, True)
self.decoder4_f = build_decoder(4096, 2048, 1024, 1024, True, True)
self.decoder3_f = build_decoder(2048, 1024, 512, 512, True, True)
self.decoder2_f = build_decoder(1024, 512, 256, 256, True, True)
self.decoder1_f = build_decoder(512, 256, 128, 64, True, True)
if self.rosta == 'RIM':
self.decoder0_g_tt = build_decoder(128, 64, 64, 3, False, True)
self.decoder0_g_ft = build_decoder(128, 64, 64, 2, False, True)
self.decoder0_g_bt = build_decoder(128, 64, 64, 2, False, True)
self.decoder0_f_tt = build_decoder(128, 64, 64, 1, False, True)
self.decoder0_f_ft = build_decoder(128, 64, 64, 1, False, True)
self.decoder0_f_bt = build_decoder(128, 64, 64, 1, False, True)
else:
self.decoder0_g = build_decoder(128, 64, 64, self.
gd_channel, False, True)
self.decoder0_f = build_decoder(128, 64, 64, 1, False, True)
elif self.backbone == 'd121':
self.encoder0 = nn.Sequential(self.densenet.features.conv0,
self.densenet.features.norm0, self.densenet.features.relu0)
self.encoder1 = nn.Sequential(self.densenet.features.
denseblock1, self.densenet.features.transition1)
self.encoder2 = nn.Sequential(self.densenet.features.
denseblock2, self.densenet.features.transition2)
self.encoder3 = nn.Sequential(self.densenet.features.
denseblock3, self.densenet.features.transition3)
self.encoder4 = nn.Sequential(self.densenet.features.
denseblock4, nn.Conv2D(1024, 512, 3, padding=1), nn.
BatchNorm2D(512), nn.ReLU(),
nn.MaxPool2D(2, 2, ceil_mode=True))
self.psp_module = PSPModule(512, 512, (1, 3, 5))
self.psp4 = conv_up_psp(512, 256, 2)
self.psp3 = conv_up_psp(512, 128, 4)
self.psp2 = conv_up_psp(512, 64, 8)
self.psp1 = conv_up_psp(512, 64, 16)
self.decoder4_g = build_decoder(1024, 512, 512, 256, True, True)
self.decoder3_g = build_decoder(512, 256, 256, 128, True, True)
self.decoder2_g = build_decoder(256, 128, 128, 64, True, True)
self.decoder1_g = build_decoder(128, 64, 64, 64, True, True)
self.bridge_block = build_bb(512, 512, 512)
self.decoder4_f = build_decoder(1024, 512, 512, 256, True, True)
self.decoder3_f = build_decoder(768, 256, 256, 128, True, True)
self.decoder2_f = build_decoder(384, 128, 128, 64, True, True)
self.decoder1_f = build_decoder(192, 64, 64, 64, True, True)
if self.rosta == 'RIM':
self.decoder0_g_tt = build_decoder(128, 64, 64, 3, False, True)
self.decoder0_g_ft = build_decoder(128, 64, 64, 2, False, True)
self.decoder0_g_bt = build_decoder(128, 64, 64, 2, False, True)
self.decoder0_f_tt = build_decoder(128, 64, 64, 1, False, True)
self.decoder0_f_ft = build_decoder(128, 64, 64, 1, False, True)
self.decoder0_f_bt = build_decoder(128, 64, 64, 1, False, True)
else:
self.decoder0_g = build_decoder(128, 64, 64, self.
gd_channel, False, True)
self.decoder0_f = build_decoder(128, 64, 64, 1, False, True)
if self.rosta == 'RIM':
self.rim = nn.Sequential(nn.Conv2D(3, 16, 1), SELayer(16), nn.
Conv2D(16, 1, 1))
def forward(self, input: paddle.Tensor) -> List[paddle.Tensor]:
glance_sigmoid = paddle.zeros(input.shape)
glance_sigmoid.stop_gradient = True
focus_sigmoid = paddle.zeros(input.shape)
focus_sigmoid.stop_gradient = True
fusion_sigmoid = paddle.zeros(input.shape)
fusion_sigmoid.stop_gradient = True
e0 = self.encoder0(input)
e1 = self.encoder1(e0)
e2 = self.encoder2(e1)
e3 = self.encoder3(e2)
e4 = self.encoder4(e3)
if self.backbone == 'r34_2b':
e5 = self.encoder5(e4)
e6 = self.encoder6(e5)
psp = self.psp_module(e6)
d6_g = self.decoder6_g(paddle.concat((psp, e6), 1))
d5_g = self.decoder5_g(paddle.concat((self.psp6(psp),
d6_g), 1))
d4_g = self.decoder4_g(paddle.concat((self.psp5(psp),
d5_g), 1))
else:
psp = self.psp_module(e4)
d4_g = self.decoder4_g(paddle.concat((psp, e4), 1))
d3_g = self.decoder3_g(paddle.concat((self.psp4(psp), d4_g), 1))
d2_g = self.decoder2_g(paddle.concat((self.psp3(psp), d3_g), 1))
d1_g = self.decoder1_g(paddle.concat((self.psp2(psp), d2_g), 1))
if self.backbone == 'r34_2b':
if self.rosta == 'RIM':
d0_g_tt = self.decoder0_g_tt(d1_g)
d0_g_ft = self.decoder0_g_ft(d1_g)
d0_g_bt = self.decoder0_g_bt(d1_g)
else:
d0_g = self.decoder0_g(d1_g)
elif self.rosta == 'RIM':
d0_g_tt = self.decoder0_g_tt(paddle.concat((self.psp1(psp
), d1_g), 1))
d0_g_ft = self.decoder0_g_ft(paddle.concat((self.psp1(psp
), d1_g), 1))
d0_g_bt = self.decoder0_g_bt(paddle.concat((self.psp1(psp
), d1_g), 1))
else:
d0_g = self.decoder0_g(paddle.concat((self.psp1(psp),
d1_g), 1))
if self.rosta == 'RIM':
glance_sigmoid_tt = F.sigmoid(d0_g_tt)
glance_sigmoid_ft = F.sigmoid(d0_g_ft)
glance_sigmoid_bt = F.sigmoid(d0_g_bt)
else:
glance_sigmoid = F.sigmoid(d0_g)
if self.backbone == 'r34_2b':
bb = self.bridge_block(e6)
d6_f = self.decoder6_f(paddle.concat((bb, e6), 1))
d5_f = self.decoder5_f(paddle.concat((d6_f, e5), 1))
d4_f = self.decoder4_f(paddle.concat((d5_f, e4), 1))
else:
bb = self.bridge_block(e4)
d4_f = self.decoder4_f(paddle.concat((bb, e4), 1))
d3_f = self.decoder3_f(paddle.concat((d4_f, e3), 1))
d2_f = self.decoder2_f(paddle.concat((d3_f, e2), 1))
d1_f = self.decoder1_f(paddle.concat((d2_f, e1), 1))
if self.backbone == 'r34_2b':
if self.rosta == 'RIM':
d0_f_tt = self.decoder0_f_tt(d1_f)
d0_f_ft = self.decoder0_f_ft(d1_f)
d0_f_bt = self.decoder0_f_bt(d1_f)
else:
d0_f = self.decoder0_f(d1_f)
elif self.rosta == 'RIM':
d0_f_tt = self.decoder0_f_tt(paddle.concat((d1_f, e0), 1))
d0_f_ft = self.decoder0_f_ft(paddle.concat((d1_f, e0), 1))
d0_f_bt = self.decoder0_f_bt(paddle.concat((d1_f, e0), 1))
else:
d0_f = self.decoder0_f(paddle.concat((d1_f, e0), 1))
if self.rosta == 'RIM':
focus_sigmoid_tt = F.sigmoid(d0_f_tt)
focus_sigmoid_ft = F.sigmoid(d0_f_ft)
focus_sigmoid_bt = F.sigmoid(d0_f_bt)
else:
focus_sigmoid = F.sigmoid(d0_f)
if self.rosta == 'RIM':
fusion_sigmoid_tt = collaborative_matting('TT',
glance_sigmoid_tt, focus_sigmoid_tt)
fusion_sigmoid_ft = collaborative_matting('FT',
glance_sigmoid_ft, focus_sigmoid_ft)
fusion_sigmoid_bt = collaborative_matting('BT',
glance_sigmoid_bt, focus_sigmoid_bt)
fusion_sigmoid = paddle.concat((fusion_sigmoid_tt,
fusion_sigmoid_ft, fusion_sigmoid_bt), 1)
fusion_sigmoid = self.rim(fusion_sigmoid)
return [[glance_sigmoid_tt, focus_sigmoid_tt, fusion_sigmoid_tt
], [glance_sigmoid_ft, focus_sigmoid_ft, fusion_sigmoid_ft],
[glance_sigmoid_bt, focus_sigmoid_bt, fusion_sigmoid_bt],
fusion_sigmoid]
else:
fusion_sigmoid = collaborative_matting(self.rosta,
glance_sigmoid, focus_sigmoid)
return glance_sigmoid, focus_sigmoid, fusion_sigmoid
def collaborative_matting(rosta, glance_sigmoid, focus_sigmoid):
if rosta == 'TT':
values = paddle.max(glance_sigmoid, axis=1)
index = paddle.argmax(glance_sigmoid, axis=1)
index = index[:, None, :, :].float()
bg_mask = index.clone()
bg_mask[bg_mask == 2] = 1
bg_mask = 1 - bg_mask
trimap_mask = index.clone()
trimap_mask[trimap_mask == 2] = 0
fg_mask = index.clone()
fg_mask[fg_mask == 1] = 0
fg_mask[fg_mask == 2] = 1
focus_sigmoid = focus_sigmoid.cpu()
trimap_mask = trimap_mask.cpu()
fg_mask = fg_mask.cpu()
fusion_sigmoid = focus_sigmoid * trimap_mask + fg_mask
elif rosta == 'BT':
values = paddle.max(glance_sigmoid, axis=1)
index = paddle.argmax(glance_sigmoid, axis=1)
index = index[:, None, :, :].float()
fusion_sigmoid = index - focus_sigmoid
fusion_sigmoid[fusion_sigmoid < 0] = 0
else:
values = paddle.max(glance_sigmoid, axis=1)
index = paddle.argmax(glance_sigmoid, axis=1)
index = index[:, None, :, :].float()
fusion_sigmoid = index + focus_sigmoid
fusion_sigmoid[fusion_sigmoid > 1] = 1
return fusion_sigmoid
if __name__ == "__main__":
model = GFM()
x = paddle.ones([1,3, 256,256])
result = model(x)
print(x)
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
import argparse
from typing import Callable, Union, List, Tuple
from PIL import Image
import numpy as np
import cv2
import scipy
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlehub.module.module import moduleinfo
import paddlehub.vision.transforms as T
from paddlehub.module.module import moduleinfo, runnable, serving
from skimage.transform import resize
from gfm_resnet34_matting.gfm import GFM
import gfm_resnet34_matting.processor as P
@moduleinfo(
name="gfm_resnet34_matting",
type="CV/matting",
author="paddlepaddle",
author_email="",
summary="gfm_resnet34_matting is an animal matting model.",
version="1.0.0")
class GFMResNet34(nn.Layer):
"""
The GFM implementation based on PaddlePaddle.
The original article refers to:
Bridging Composite and Real: Towards End-to-end Deep Image Matting [IJCV-2021]
Main network file (GFM).
Github repo: https://github.com/JizhiziLi/GFM
Paper link (Arxiv): https://arxiv.org/abs/2010.16188
"""
def __init__(self, pretrained: str=None):
super(GFMResNet34, self).__init__()
self.model = GFM()
self.resize_by_short = P.ResizeByShort(1080)
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.model.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'model.pdparams')
model_dict = paddle.load(checkpoint)
self.model.set_dict(model_dict)
print("load pretrained parameters success")
def preprocess(self, img: Union[str, np.ndarray], h: int, w: int) -> paddle.Tensor:
if min(h, w) > 1080:
img = self.resize_by_short(img)
tensor_img = self.scale_image(img, h, w)
return tensor_img
def scale_image(self, img: np.ndarray, h: int, w: int, ratio: float = 1/3):
new_h = min(1600, h - (h % 32))
new_w = min(1600, w - (w % 32))
resize_h = int(h*ratio)
resize_w = int(w*ratio)
new_h = min(1600, resize_h - (resize_h % 32))
new_w = min(1600, resize_w - (resize_w % 32))
scale_img = resize(img,(new_h,new_w)) * 255
tensor_img = paddle.to_tensor(scale_img.astype(np.float32)[np.newaxis, :, :, :])
tensor_img = tensor_img.transpose([0,3,1,2])
return tensor_img
def inference_img_scale(self, input: paddle.Tensor) -> List[paddle.Tensor]:
pred_global, pred_local, pred_fusion = self.model(input)
pred_global = P.gen_trimap_from_segmap_e2e(pred_global)
pred_local = pred_local.numpy()[0,0,:,:]
pred_fusion = pred_fusion.numpy()[0,0,:,:]
return pred_global, pred_local, pred_fusion
def predict(self, image_list: list, visualization: bool =True, save_path: str = "gfm_resnet34_matting_output"):
self.model.eval()
result = []
with paddle.no_grad():
for i, img in enumerate(image_list):
if isinstance(img, str):
img = np.array(Image.open(img))[:,:,:3]
else:
img = img[:,:,::-1]
h, w, _ = img.shape
tensor_img = self.preprocess(img, h, w)
pred_glance_1, pred_focus_1, pred_fusion_1 = self.inference_img_scale(tensor_img)
pred_glance_1 = resize(pred_glance_1,(h,w)) * 255.0
tensor_img = self.scale_image(img, h, w, 1/2)
pred_glance_2, pred_focus_2, pred_fusion_2 = self.inference_img_scale(tensor_img)
pred_focus_2 = resize(pred_focus_2,(h,w))
pred_fusion = P.get_masked_local_from_global_test(pred_glance_1, pred_focus_2)
pred_fusion = (pred_fusion * 255).astype(np.uint8)
if visualization:
if not os.path.exists(save_path):
os.makedirs(save_path)
img_name = str(time.time()) + '.png'
image_save_path = os.path.join(save_path, img_name)
cv2.imwrite(image_save_path, pred_fusion)
result.append(pred_fusion)
return result
@serving
def serving_method(self, images: str, **kwargs):
"""
Run as a service.
"""
images_decode = [P.base64_to_cv2(image) for image in images]
outputs = self.predict(image_list=images_decode, **kwargs)
serving_data = [P.cv2_to_base64(outputs[i]) for i in range(len(outputs))]
results = {'data': serving_data}
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
args = self.parser.parse_args(argvs)
results = self.predict(image_list=[args.input_path], save_path=args.output_dir, visualization=args.visualization)
return results
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument(
'--output_dir', type=str, default="gfm_resnet34_matting_output", help="The directory to save output images.")
self.arg_config_group.add_argument(
'--visualization', type=bool, default=True, help="whether to save output as images.")
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import base64
import cv2
import numpy as np
from paddleseg.transforms import functional
class ResizeByLong:
"""
Resize the long side of an image to given size, and then scale the other side proportionally.
Args:
long_size (int): The target size of long side.
"""
def __init__(self, long_size):
self.long_size = long_size
def __call__(self, data):
data = functional.resize_long(data, self.long_size)
return data
class ResizeByShort:
"""
Resize the short side of an image to given size, and then scale the other side proportionally.
Args:
short_size (int): The target size of short side.
"""
def __init__(self, short_size):
self.short_size = short_size
def __call__(self, data):
data = functional.resize_short(data, self.short_size)
return data
def gen_trimap_from_segmap_e2e(segmap):
trimap = np.argmax(segmap, axis=1)[0]
trimap = trimap.astype(np.int64)
trimap[trimap==1]=128
trimap[trimap==2]=255
return trimap.astype(np.uint8)
def get_masked_local_from_global_test(global_result, local_result):
weighted_global = np.ones(global_result.shape)
weighted_global[global_result==255] = 0
weighted_global[global_result==0] = 0
fusion_result = global_result*(1.-weighted_global)/255+local_result*weighted_global
return fusion_result
def cv2_to_base64(image: np.ndarray):
"""
Convert data from BGR to base64 format.
"""
data = cv2.imencode('.png', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str: str):
"""
Convert data from base64 to BGR format.
"""
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
from typing import Type, Any, Callable, Union, List, Optional
def conv3x3(in_planes: int, out_planes: int, stride: int=1, groups: int=1,
dilation: int=1) ->paddle.nn.Conv2D:
"""3x3 convolution with padding"""
return nn.Conv2D(in_planes, out_planes, kernel_size=3, stride=stride,
padding=dilation, groups=groups, dilation=dilation, bias_attr=False)
def conv1x1(in_planes: int, out_planes: int, stride: int=1) ->paddle.nn.Conv2D:
"""1x1 convolution"""
return nn.Conv2D(in_planes, out_planes, kernel_size=1, stride=stride,
bias_attr=False)
class BasicBlock(nn.Layer):
expansion: int = 1
def __init__(self, inplanes: int, planes: int, stride: int=1,
downsample: Optional[nn.Layer]=None, groups: int=1, base_width:
int=64, dilation: int=1, norm_layer: Optional[Callable[..., paddle.
nn.Layer]]=None) ->None:
super(BasicBlock, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2D
if groups != 1 or base_width != 64:
raise ValueError(
'BasicBlock only supports groups=1 and base_width=64')
if dilation > 1:
raise NotImplementedError(
'Dilation > 1 not supported in BasicBlock')
self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = norm_layer(planes)
self.relu = paddle.nn.ReLU()
self.conv2 = conv3x3(planes, planes)
self.bn2 = norm_layer(planes)
self.downsample = downsample
self.stride = stride
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
class Bottleneck(nn.Layer):
expansion: int = 4
def __init__(self, inplanes: int, planes: int, stride: int=1,
downsample: Optional[nn.Layer]=None, groups: int=1, base_width:
int=64, dilation: int=1, norm_layer: Optional[Callable[..., paddle.
nn.Layer]]=None) ->None:
super(Bottleneck, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2D
width = int(planes * (base_width / 64.0)) * groups
self.conv1 = conv1x1(inplanes, width)
self.bn1 = norm_layer(width)
self.conv2 = conv3x3(width, width, stride, groups, dilation)
self.bn2 = norm_layer(width)
self.conv3 = conv1x1(width, planes * self.expansion)
self.bn3 = norm_layer(planes * self.expansion)
self.relu = paddle.nn.ReLU()
self.downsample = downsample
self.stride = stride
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
class ResNet(nn.Layer):
def __init__(self, block: Type[Union[BasicBlock, Bottleneck]], layers:
List[int], num_classes: int=1000, zero_init_residual: bool=False,
groups: int=1, width_per_group: int=64,
replace_stride_with_dilation: Optional[List[bool]]=None, norm_layer:
Optional[Callable[..., paddle.nn.Layer]]=None) ->None:
super(ResNet, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2D
self._norm_layer = norm_layer
self.inplanes = 64
self.dilation = 1
if replace_stride_with_dilation is None:
replace_stride_with_dilation = [False, False, False]
if len(replace_stride_with_dilation) != 3:
raise ValueError(
'replace_stride_with_dilation should be None or a 3-element tuple, got {}'
.format(replace_stride_with_dilation))
self.groups = groups
self.base_width = width_per_group
self.conv1 = nn.Conv2D(3, self.inplanes, kernel_size=7, stride=2,
padding=3, bias_attr=False)
self.bn1 = norm_layer(self.inplanes)
self.relu = paddle.nn.ReLU()
self.maxpool = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
dilate=replace_stride_with_dilation[0])
self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
dilate=replace_stride_with_dilation[1])
self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
dilate=replace_stride_with_dilation[2])
self.avgpool = nn.AdaptiveAvgPool2D((1, 1))
self.fc = nn.Linear(512 * block.expansion, num_classes)
def _make_layer(self, block: Type[Union[BasicBlock, Bottleneck]],
planes: int, blocks: int, stride: int=1, dilate: bool=False
) ->paddle.nn.Sequential:
norm_layer = self._norm_layer
downsample = None
previous_dilation = self.dilation
if dilate:
self.dilation *= stride
stride = 1
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(conv1x1(self.inplanes, planes *
block.expansion, stride), norm_layer(planes * block.expansion))
layers = []
layers.append(block(self.inplanes, planes, stride, downsample, self
.groups, self.base_width, previous_dilation, norm_layer))
self.inplanes = planes * block.expansion
for _ in range(1, blocks):
layers.append(block(self.inplanes, planes, groups=self.groups,
base_width=self.base_width, dilation=self.dilation,
norm_layer=norm_layer))
return nn.Sequential(*layers)
def _forward_impl(self, x: paddle.Tensor) ->paddle.Tensor:
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x= paddle.flatten(x,1)
x = self.fc(x)
return x
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
return self._forward_impl(x)
def _resnet(arch: str, block: Type[Union[BasicBlock, Bottleneck]], layers:
List[int], pretrained: bool, progress: bool, **kwargs: Any) ->ResNet:
model = ResNet(block, layers, **kwargs)
return model
def resnet34(pretrained: bool=False, progress: bool=True, **kwargs: Any
) ->ResNet:
"""ResNet-34 model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
return _resnet('resnet34', BasicBlock, [3, 4, 6, 3], pretrained,
progress, **kwargs)
# modnet_hrnet18_matting
|模型名称|modnet_hrnet18_matting|
| :--- | :---: |
|类别|图像-抠图|
|网络|modnet_hrnet18|
|数据集|百度自建数据集|
|是否支持Fine-tuning|否|
|模型大小|60MB|
|指标|SAD77.96|
|最新更新日期|2021-12-03|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例(左为原图,右为效果图):
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/144574288-28671577-8d5d-4b20-adb9-fe737015c841.jpg" width = "337" height = "505" hspace='10'/>
<img src="https://user-images.githubusercontent.com/35907364/144780857-13c63c21-5d12-4028-985b-378776f58220.png" width = "337" height = "505" hspace='10'/>
</p>
- ### 模型介绍
- Matting(精细化分割/影像去背/抠图)是指借由计算前景的颜色和透明度,将前景从影像中撷取出来的技术,可用于替换背景、影像合成、视觉特效,在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值,称作阿法值(Alpha),一张影像中所有阿法值的集合称作阿法遮罩(Alpha Matte),将影像被遮罩所涵盖的部分取出即可完成前景的分离。modnet_hrnet18_matting可生成抠图结果。
- 更多详情请参考:[modnet_hrnet18_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- ### 2、安装
- ```shell
$ hub install modnet_hrnet18_matting
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run modnet_hrnet18_matting --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="modnet_hrnet18_matting")
result = model.predict(["/PATH/TO/IMAGE"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
trimap_list,
visualization,
save_path):
```
- 人像matting预测API,用于将输入图片中的人像分割出来。
- 参数
- image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
- trimap_list(list(str | numpy.ndarray)):trimap输入路径或者单通道灰度图格式图片。
- visualization (bool): 是否进行可视化,默认为False。
- save_path (str): 当visualization为True时,保存图片的路径,默认为"modnet_hrnet18_matting_output"。
- 返回
- result (list(numpy.ndarray)):模型分割结果:
## 四、服务部署
- PaddleHub Serving可以部署人像matting在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m modnet_hrnet18_matting
```
- 这样就完成了一个人像matting在线服务API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import time
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/modnet_hrnet18_matting"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
for image in r.json()["results"]['data']:
data = base64_to_cv2(image)
image_path =str(time.time()) + ".png"
cv2.imwrite(image_path, data)
```
## 五、更新历史
* 1.0.0
初始发布
# modnet_hrnet18_matting
|Module Name|modnet_hrnet18_matting|
| :--- | :---: |
|Category|Image Segmentation|
|Network|modnet_mobilenetv2|
|Dataset|Baidu self-built dataset|
|Support Fine-tuning|No|
|Module Size|60MB|
|Data Indicators|SAD77.96|
|Latest update date|2021-12-03|
## I. Basic Information
- ### Application Effect Display
- Sample results:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/144574288-28671577-8d5d-4b20-adb9-fe737015c841.jpg" width = "337" height = "505" hspace='10'/>
<img src="https://user-images.githubusercontent.com/35907364/144780857-13c63c21-5d12-4028-985b-378776f58220.png" width = "337" height = "505" hspace='10'/>
</p>
- ### Module Introduction
- Mating is the technique of extracting foreground from an image by calculating its color and transparency. It is widely used in the film industry to replace background, image composition, and visual effects. Each pixel in the image will have a value that represents its foreground transparency, called Alpha. The set of all Alpha values in an image is called Alpha Matte. The part of the image covered by the mask can be extracted to complete foreground separation.
- For more information, please refer to: [modnet_hrnet18_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
## II. Installation
- ### 1、Environmental Dependence
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- ### 2、Installation
- ```shell
$ hub install modnet_hrnet18_matting
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
| [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III. Module API Prediction
- ### 1、Command line Prediction
- ```shell
$ hub run modnet_hrnet18_matting --input_path "/PATH/TO/IMAGE"
```
- If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
- ### 2、Prediction Code Example
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="modnet_hrnet18_matting")
result = model.predict(["/PATH/TO/IMAGE"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
trimap_list,
visualization,
save_path):
```
- Prediction API for matting.
- **Parameter**
- image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\],BGR.
- trimap_list(list(str | numpy.ndarray)): Trimap path or trimap data, ndarray.shape is in the format \[H, W],gray. Default is None
- visualization (bool): Whether to save the recognition results as picture files, default is False.
- save_path (str): Save path of images, "modnet_hrnet18_matting_output" by default.
- **Return**
- result (list(numpy.ndarray)):The list of model results.
## IV. Server Deployment
- PaddleHub Serving can deploy an online service of matting.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m modnet_hrnet18_matting
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result
```python
import requests
import json
import cv2
import base64
import time
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/modnet_hrnet18_matting"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
for image in r.json()["results"]['data']:
data = base64_to_cv2(image)
image_path =str(time.time()) + ".png"
cv2.imwrite(image_path, data)
```
## V. Release Note
- 1.0.0
First release
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import math
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddleseg.cvlibs import manager, param_init
from paddleseg.models import layers
from paddleseg.utils import utils
__all__ = ["HRNet_W18"]
class HRNet(nn.Layer):
"""
The HRNet implementation based on PaddlePaddle.
The original article refers to
Jingdong Wang, et, al. "HRNet:Deep High-Resolution Representation Learning for Visual Recognition"
(https://arxiv.org/pdf/1908.07919.pdf).
Args:
pretrained (str, optional): The path of pretrained model.
stage1_num_modules (int, optional): Number of modules for stage1. Default 1.
stage1_num_blocks (list, optional): Number of blocks per module for stage1. Default (4).
stage1_num_channels (list, optional): Number of channels per branch for stage1. Default (64).
stage2_num_modules (int, optional): Number of modules for stage2. Default 1.
stage2_num_blocks (list, optional): Number of blocks per module for stage2. Default (4, 4).
stage2_num_channels (list, optional): Number of channels per branch for stage2. Default (18, 36).
stage3_num_modules (int, optional): Number of modules for stage3. Default 4.
stage3_num_blocks (list, optional): Number of blocks per module for stage3. Default (4, 4, 4).
stage3_num_channels (list, optional): Number of channels per branch for stage3. Default [18, 36, 72).
stage4_num_modules (int, optional): Number of modules for stage4. Default 3.
stage4_num_blocks (list, optional): Number of blocks per module for stage4. Default (4, 4, 4, 4).
stage4_num_channels (list, optional): Number of channels per branch for stage4. Default (18, 36, 72. 144).
has_se (bool, optional): Whether to use Squeeze-and-Excitation module. Default False.
align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
"""
def __init__(self,
input_channels: int=3,
pretrained: int = None,
stage1_num_modules: int = 1,
stage1_num_blocks: list = (4, ),
stage1_num_channels: list = (64, ),
stage2_num_modules: int = 1,
stage2_num_blocks: list = (4, 4),
stage2_num_channels: list = (18, 36),
stage3_num_modules: int = 4,
stage3_num_blocks: list = (4, 4, 4),
stage3_num_channels: list = (18, 36, 72),
stage4_num_modules: int = 3,
stage4_num_blocks: list = (4, 4, 4, 4),
stage4_num_channels: list = (18, 36, 72, 144),
has_se: bool = False,
align_corners: bool = False,
padding_same: bool = True):
super(HRNet, self).__init__()
self.pretrained = pretrained
self.stage1_num_modules = stage1_num_modules
self.stage1_num_blocks = stage1_num_blocks
self.stage1_num_channels = stage1_num_channels
self.stage2_num_modules = stage2_num_modules
self.stage2_num_blocks = stage2_num_blocks
self.stage2_num_channels = stage2_num_channels
self.stage3_num_modules = stage3_num_modules
self.stage3_num_blocks = stage3_num_blocks
self.stage3_num_channels = stage3_num_channels
self.stage4_num_modules = stage4_num_modules
self.stage4_num_blocks = stage4_num_blocks
self.stage4_num_channels = stage4_num_channels
self.has_se = has_se
self.align_corners = align_corners
self.feat_channels = [i for i in stage4_num_channels]
self.feat_channels = [64] + self.feat_channels
self.conv_layer1_1 = layers.ConvBNReLU(
in_channels=input_channels,
out_channels=64,
kernel_size=3,
stride=2,
padding=1 if not padding_same else 'same',
bias_attr=False)
self.conv_layer1_2 = layers.ConvBNReLU(
in_channels=64,
out_channels=64,
kernel_size=3,
stride=2,
padding=1 if not padding_same else 'same',
bias_attr=False)
self.la1 = Layer1(
num_channels=64,
num_blocks=self.stage1_num_blocks[0],
num_filters=self.stage1_num_channels[0],
has_se=has_se,
name="layer2",
padding_same=padding_same)
self.tr1 = TransitionLayer(
in_channels=[self.stage1_num_channels[0] * 4],
out_channels=self.stage2_num_channels,
name="tr1",
padding_same=padding_same)
self.st2 = Stage(
num_channels=self.stage2_num_channels,
num_modules=self.stage2_num_modules,
num_blocks=self.stage2_num_blocks,
num_filters=self.stage2_num_channels,
has_se=self.has_se,
name="st2",
align_corners=align_corners,
padding_same=padding_same)
self.tr2 = TransitionLayer(
in_channels=self.stage2_num_channels,
out_channels=self.stage3_num_channels,
name="tr2",
padding_same=padding_same)
self.st3 = Stage(
num_channels=self.stage3_num_channels,
num_modules=self.stage3_num_modules,
num_blocks=self.stage3_num_blocks,
num_filters=self.stage3_num_channels,
has_se=self.has_se,
name="st3",
align_corners=align_corners,
padding_same=padding_same)
self.tr3 = TransitionLayer(
in_channels=self.stage3_num_channels,
out_channels=self.stage4_num_channels,
name="tr3",
padding_same=padding_same)
self.st4 = Stage(
num_channels=self.stage4_num_channels,
num_modules=self.stage4_num_modules,
num_blocks=self.stage4_num_blocks,
num_filters=self.stage4_num_channels,
has_se=self.has_se,
name="st4",
align_corners=align_corners,
padding_same=padding_same)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
feat_list = []
conv1 = self.conv_layer1_1(x)
feat_list.append(conv1)
conv2 = self.conv_layer1_2(conv1)
la1 = self.la1(conv2)
tr1 = self.tr1([la1])
st2 = self.st2(tr1)
tr2 = self.tr2(st2)
st3 = self.st3(tr2)
tr3 = self.tr3(st3)
st4 = self.st4(tr3)
feat_list = feat_list + st4
return feat_list
class Layer1(nn.Layer):
def __init__(self,
num_channels: int,
num_filters: int,
num_blocks: int,
has_se: bool = False,
name: str = None,
padding_same: bool = True):
super(Layer1, self).__init__()
self.bottleneck_block_list = []
for i in range(num_blocks):
bottleneck_block = self.add_sublayer(
"bb_{}_{}".format(name, i + 1),
BottleneckBlock(
num_channels=num_channels if i == 0 else num_filters * 4,
num_filters=num_filters,
has_se=has_se,
stride=1,
downsample=True if i == 0 else False,
name=name + '_' + str(i + 1),
padding_same=padding_same))
self.bottleneck_block_list.append(bottleneck_block)
def forward(self, x: paddle.Tensor):
conv = x
for block_func in self.bottleneck_block_list:
conv = block_func(conv)
return conv
class TransitionLayer(nn.Layer):
def __init__(self,
in_channels: int,
out_channels: int,
name: str = None,
padding_same: bool = True):
super(TransitionLayer, self).__init__()
num_in = len(in_channels)
num_out = len(out_channels)
self.conv_bn_func_list = []
for i in range(num_out):
residual = None
if i < num_in:
if in_channels[i] != out_channels[i]:
residual = self.add_sublayer(
"transition_{}_layer_{}".format(name, i + 1),
layers.ConvBNReLU(
in_channels=in_channels[i],
out_channels=out_channels[i],
kernel_size=3,
padding=1 if not padding_same else 'same',
bias_attr=False))
else:
residual = self.add_sublayer(
"transition_{}_layer_{}".format(name, i + 1),
layers.ConvBNReLU(
in_channels=in_channels[-1],
out_channels=out_channels[i],
kernel_size=3,
stride=2,
padding=1 if not padding_same else 'same',
bias_attr=False))
self.conv_bn_func_list.append(residual)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
outs = []
for idx, conv_bn_func in enumerate(self.conv_bn_func_list):
if conv_bn_func is None:
outs.append(x[idx])
else:
if idx < len(x):
outs.append(conv_bn_func(x[idx]))
else:
outs.append(conv_bn_func(x[-1]))
return outs
class Branches(nn.Layer):
def __init__(self,
num_blocks: int,
in_channels: int,
out_channels: int,
has_se: bool = False,
name: str = None,
padding_same: bool = True):
super(Branches, self).__init__()
self.basic_block_list = []
for i in range(len(out_channels)):
self.basic_block_list.append([])
for j in range(num_blocks[i]):
in_ch = in_channels[i] if j == 0 else out_channels[i]
basic_block_func = self.add_sublayer(
"bb_{}_branch_layer_{}_{}".format(name, i + 1, j + 1),
BasicBlock(
num_channels=in_ch,
num_filters=out_channels[i],
has_se=has_se,
name=name + '_branch_layer_' + str(i + 1) + '_' +
str(j + 1),
padding_same=padding_same))
self.basic_block_list[i].append(basic_block_func)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
outs = []
for idx, input in enumerate(x):
conv = input
for basic_block_func in self.basic_block_list[idx]:
conv = basic_block_func(conv)
outs.append(conv)
return outs
class BottleneckBlock(nn.Layer):
def __init__(self,
num_channels: int,
num_filters: int,
has_se: bool,
stride: int = 1,
downsample: bool = False,
name:str = None,
padding_same: bool = True):
super(BottleneckBlock, self).__init__()
self.has_se = has_se
self.downsample = downsample
self.conv1 = layers.ConvBNReLU(
in_channels=num_channels,
out_channels=num_filters,
kernel_size=1,
bias_attr=False)
self.conv2 = layers.ConvBNReLU(
in_channels=num_filters,
out_channels=num_filters,
kernel_size=3,
stride=stride,
padding=1 if not padding_same else 'same',
bias_attr=False)
self.conv3 = layers.ConvBN(
in_channels=num_filters,
out_channels=num_filters * 4,
kernel_size=1,
bias_attr=False)
if self.downsample:
self.conv_down = layers.ConvBN(
in_channels=num_channels,
out_channels=num_filters * 4,
kernel_size=1,
bias_attr=False)
if self.has_se:
self.se = SELayer(
num_channels=num_filters * 4,
num_filters=num_filters * 4,
reduction_ratio=16,
name=name + '_fc')
self.add = layers.Add()
self.relu = layers.Activation("relu")
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
residual = x
conv1 = self.conv1(x)
conv2 = self.conv2(conv1)
conv3 = self.conv3(conv2)
if self.downsample:
residual = self.conv_down(x)
if self.has_se:
conv3 = self.se(conv3)
y = self.add(conv3, residual)
y = self.relu(y)
return y
class BasicBlock(nn.Layer):
def __init__(self,
num_channels: int,
num_filters: int,
stride: int = 1,
has_se: bool = False,
downsample: bool = False,
name: str = None,
padding_same: bool = True):
super(BasicBlock, self).__init__()
self.has_se = has_se
self.downsample = downsample
self.conv1 = layers.ConvBNReLU(
in_channels=num_channels,
out_channels=num_filters,
kernel_size=3,
stride=stride,
padding=1 if not padding_same else 'same',
bias_attr=False)
self.conv2 = layers.ConvBN(
in_channels=num_filters,
out_channels=num_filters,
kernel_size=3,
padding=1 if not padding_same else 'same',
bias_attr=False)
if self.downsample:
self.conv_down = layers.ConvBNReLU(
in_channels=num_channels,
out_channels=num_filters,
kernel_size=1,
bias_attr=False)
if self.has_se:
self.se = SELayer(
num_channels=num_filters,
num_filters=num_filters,
reduction_ratio=16,
name=name + '_fc')
self.add = layers.Add()
self.relu = layers.Activation("relu")
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
residual = x
conv1 = self.conv1(x)
conv2 = self.conv2(conv1)
if self.downsample:
residual = self.conv_down(x)
if self.has_se:
conv2 = self.se(conv2)
y = self.add(conv2, residual)
y = self.relu(y)
return y
class SELayer(nn.Layer):
def __init__(self, num_channels: int, num_filters: int, reduction_ratio: int, name: str = None):
super(SELayer, self).__init__()
self.pool2d_gap = nn.AdaptiveAvgPool2D(1)
self._num_channels = num_channels
med_ch = int(num_channels / reduction_ratio)
stdv = 1.0 / math.sqrt(num_channels * 1.0)
self.squeeze = nn.Linear(
num_channels,
med_ch,
weight_attr=paddle.ParamAttr(
initializer=nn.initializer.Uniform(-stdv, stdv)))
stdv = 1.0 / math.sqrt(med_ch * 1.0)
self.excitation = nn.Linear(
med_ch,
num_filters,
weight_attr=paddle.ParamAttr(
initializer=nn.initializer.Uniform(-stdv, stdv)))
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
pool = self.pool2d_gap(x)
pool = paddle.reshape(pool, shape=[-1, self._num_channels])
squeeze = self.squeeze(pool)
squeeze = F.relu(squeeze)
excitation = self.excitation(squeeze)
excitation = F.sigmoid(excitation)
excitation = paddle.reshape(
excitation, shape=[-1, self._num_channels, 1, 1])
out = x * excitation
return out
class Stage(nn.Layer):
def __init__(self,
num_channels: int,
num_modules: int,
num_blocks: int,
num_filters: int,
has_se: bool = False,
multi_scale_output: bool = True,
name: str = None,
align_corners: bool = False,
padding_same: bool = True):
super(Stage, self).__init__()
self._num_modules = num_modules
self.stage_func_list = []
for i in range(num_modules):
if i == num_modules - 1 and not multi_scale_output:
stage_func = self.add_sublayer(
"stage_{}_{}".format(name, i + 1),
HighResolutionModule(
num_channels=num_channels,
num_blocks=num_blocks,
num_filters=num_filters,
has_se=has_se,
multi_scale_output=False,
name=name + '_' + str(i + 1),
align_corners=align_corners,
padding_same=padding_same))
else:
stage_func = self.add_sublayer(
"stage_{}_{}".format(name, i + 1),
HighResolutionModule(
num_channels=num_channels,
num_blocks=num_blocks,
num_filters=num_filters,
has_se=has_se,
name=name + '_' + str(i + 1),
align_corners=align_corners,
padding_same=padding_same))
self.stage_func_list.append(stage_func)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
out = x
for idx in range(self._num_modules):
out = self.stage_func_list[idx](out)
return out
class HighResolutionModule(nn.Layer):
def __init__(self,
num_channels: int,
num_blocks: int,
num_filters: int,
has_se: bool = False,
multi_scale_output: bool = True,
name: str = None,
align_corners: bool = False,
padding_same: bool = True):
super(HighResolutionModule, self).__init__()
self.branches_func = Branches(
num_blocks=num_blocks,
in_channels=num_channels,
out_channels=num_filters,
has_se=has_se,
name=name,
padding_same=padding_same)
self.fuse_func = FuseLayers(
in_channels=num_filters,
out_channels=num_filters,
multi_scale_output=multi_scale_output,
name=name,
align_corners=align_corners,
padding_same=padding_same)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
out = self.branches_func(x)
out = self.fuse_func(out)
return out
class FuseLayers(nn.Layer):
def __init__(self,
in_channels: int,
out_channels: int,
multi_scale_output: bool = True,
name: str = None,
align_corners: bool = False,
padding_same: bool = True):
super(FuseLayers, self).__init__()
self._actual_ch = len(in_channels) if multi_scale_output else 1
self._in_channels = in_channels
self.align_corners = align_corners
self.residual_func_list = []
for i in range(self._actual_ch):
for j in range(len(in_channels)):
if j > i:
residual_func = self.add_sublayer(
"residual_{}_layer_{}_{}".format(name, i + 1, j + 1),
layers.ConvBN(
in_channels=in_channels[j],
out_channels=out_channels[i],
kernel_size=1,
bias_attr=False))
self.residual_func_list.append(residual_func)
elif j < i:
pre_num_filters = in_channels[j]
for k in range(i - j):
if k == i - j - 1:
residual_func = self.add_sublayer(
"residual_{}_layer_{}_{}_{}".format(
name, i + 1, j + 1, k + 1),
layers.ConvBN(
in_channels=pre_num_filters,
out_channels=out_channels[i],
kernel_size=3,
stride=2,
padding=1 if not padding_same else 'same',
bias_attr=False))
pre_num_filters = out_channels[i]
else:
residual_func = self.add_sublayer(
"residual_{}_layer_{}_{}_{}".format(
name, i + 1, j + 1, k + 1),
layers.ConvBNReLU(
in_channels=pre_num_filters,
out_channels=out_channels[j],
kernel_size=3,
stride=2,
padding=1 if not padding_same else 'same',
bias_attr=False))
pre_num_filters = out_channels[j]
self.residual_func_list.append(residual_func)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
outs = []
residual_func_idx = 0
for i in range(self._actual_ch):
residual = x[i]
residual_shape = paddle.shape(residual)[-2:]
for j in range(len(self._in_channels)):
if j > i:
y = self.residual_func_list[residual_func_idx](x[j])
residual_func_idx += 1
y = F.interpolate(
y,
residual_shape,
mode='bilinear',
align_corners=self.align_corners)
residual = residual + y
elif j < i:
y = x[j]
for k in range(i - j):
y = self.residual_func_list[residual_func_idx](y)
residual_func_idx += 1
residual = residual + y
residual = F.relu(residual)
outs.append(residual)
return outs
def HRNet_W18(**kwargs):
model = HRNet(
stage1_num_modules=1,
stage1_num_blocks=[4],
stage1_num_channels=[64],
stage2_num_modules=1,
stage2_num_blocks=[4, 4],
stage2_num_channels=[18, 36],
stage3_num_modules=4,
stage3_num_blocks=[4, 4, 4],
stage3_num_channels=[18, 36, 72],
stage4_num_modules=3,
stage4_num_blocks=[4, 4, 4, 4],
stage4_num_channels=[18, 36, 72, 144],
**kwargs)
return model
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
import argparse
from typing import Callable, Union, List, Tuple
import numpy as np
import cv2
import scipy
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlehub.module.module import moduleinfo
import paddlehub.vision.segmentation_transforms as T
from paddlehub.module.module import moduleinfo, runnable, serving
from modnet_hrnet18_matting.hrnet import HRNet_W18
import modnet_hrnet18_matting.processor as P
@moduleinfo(
name="modnet_hrnet18_matting",
type="CV/matting",
author="paddlepaddle",
summary="modnet_hrnet18_matting is a matting model",
version="1.0.0"
)
class MODNetHRNet18(nn.Layer):
"""
The MODNet implementation based on PaddlePaddle.
The original article refers to
Zhanghan Ke, et, al. "Is a Green Screen Really Necessary for Real-Time Portrait Matting?"
(https://arxiv.org/pdf/2011.11961.pdf).
Args:
hr_channels(int, optional): The channels of high resolutions branch. Defautl: None.
pretrained(str, optional): The path of pretrianed model. Defautl: None.
"""
def __init__(self, hr_channels:int = 32, pretrained=None):
super(MODNetHRNet18, self).__init__()
self.backbone = HRNet_W18()
self.pretrained = pretrained
self.head = MODNetHead(
hr_channels=hr_channels, backbone_channels=self.backbone.feat_channels)
self.blurer = GaussianBlurLayer(1, 3)
self.transforms = P.Compose([P.LoadImages(), P.ResizeByShort(), P.ResizeToIntMult(), P.Normalize()])
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'modnet-hrnet_w18.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def preprocess(self, img: Union[str, np.ndarray] , transforms: Callable, trimap: Union[str, np.ndarray] = None):
data = {}
data['img'] = img
if trimap is not None:
data['trimap'] = trimap
data['gt_fields'] = ['trimap']
data['trans_info'] = []
data = self.transforms(data)
data['img'] = paddle.to_tensor(data['img'])
data['img'] = data['img'].unsqueeze(0)
if trimap is not None:
data['trimap'] = paddle.to_tensor(data['trimap'])
data['trimap'] = data['trimap'].unsqueeze((0, 1))
return data
def forward(self, inputs: dict) -> paddle.Tensor:
x = inputs['img']
feat_list = self.backbone(x)
y = self.head(inputs=inputs, feat_list=feat_list)
return y
def predict(self, image_list: list, trimap_list: list = None, visualization: bool =False, save_path: str = "modnet_hrnet18_matting_output") -> list:
self.eval()
result= []
with paddle.no_grad():
for i, im_path in enumerate(image_list):
trimap = trimap_list[i] if trimap_list is not None else None
data = self.preprocess(img=im_path, transforms=self.transforms, trimap=trimap)
alpha_pred = self.forward(data)
alpha_pred = P.reverse_transform(alpha_pred, data['trans_info'])
alpha_pred = (alpha_pred.numpy()).squeeze()
alpha_pred = (alpha_pred * 255).astype('uint8')
alpha_pred = P.save_alpha_pred(alpha_pred, trimap)
result.append(alpha_pred)
if visualization:
if not os.path.exists(save_path):
os.makedirs(save_path)
img_name = str(time.time()) + '.png'
image_save_path = os.path.join(save_path, img_name)
cv2.imwrite(image_save_path, alpha_pred)
return result
@serving
def serving_method(self, images: list, trimaps:list = None, **kwargs) -> dict:
"""
Run as a service.
"""
images_decode = [P.base64_to_cv2(image) for image in images]
if trimaps is not None:
trimap_decoder = [cv2.cvtColor(P.base64_to_cv2(trimap), cv2.COLOR_BGR2GRAY) for trimap in trimaps]
else:
trimap_decoder = None
outputs = self.predict(image_list=images_decode, trimap_list= trimap_decoder, **kwargs)
serving_data = [P.cv2_to_base64(outputs[i]) for i in range(len(outputs))]
results = {'data': serving_data}
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
args = self.parser.parse_args(argvs)
if args.trimap_path is not None:
trimap_list = [args.trimap_path]
else:
trimap_list = None
results = self.predict(image_list=[args.input_path], trimap_list=trimap_list, save_path=args.output_dir, visualization=args.visualization)
return results
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument(
'--output_dir', type=str, default="modnet_hrnet18_matting_output", help="The directory to save output images.")
self.arg_config_group.add_argument(
'--visualization', type=bool, default=True, help="whether to save output as images.")
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
self.arg_input_group.add_argument('--trimap_path', type=str, default=None, help="path to image.")
class MODNetHead(nn.Layer):
"""
Segmentation head.
"""
def __init__(self, hr_channels: int, backbone_channels: int):
super().__init__()
self.lr_branch = LRBranch(backbone_channels)
self.hr_branch = HRBranch(hr_channels, backbone_channels)
self.f_branch = FusionBranch(hr_channels, backbone_channels)
def forward(self, inputs: paddle.Tensor, feat_list: list):
pred_semantic, lr8x, [enc2x, enc4x] = self.lr_branch(feat_list)
pred_detail, hr2x = self.hr_branch(inputs['img'], enc2x, enc4x, lr8x)
pred_matte = self.f_branch(inputs['img'], lr8x, hr2x)
if self.training:
logit_dict = {
'semantic': pred_semantic,
'detail': pred_detail,
'matte': pred_matte
}
return logit_dict
else:
return pred_matte
class FusionBranch(nn.Layer):
def __init__(self, hr_channels: int, enc_channels: int):
super().__init__()
self.conv_lr4x = Conv2dIBNormRelu(
enc_channels[2], hr_channels, 5, stride=1, padding=2)
self.conv_f2x = Conv2dIBNormRelu(
2 * hr_channels, hr_channels, 3, stride=1, padding=1)
self.conv_f = nn.Sequential(
Conv2dIBNormRelu(
hr_channels + 3, int(hr_channels / 2), 3, stride=1, padding=1),
Conv2dIBNormRelu(
int(hr_channels / 2),
1,
1,
stride=1,
padding=0,
with_ibn=False,
with_relu=False))
def forward(self, img: paddle.Tensor, lr8x: paddle.Tensor, hr2x: paddle.Tensor):
lr4x = F.interpolate(
lr8x, scale_factor=2, mode='bilinear', align_corners=False)
lr4x = self.conv_lr4x(lr4x)
lr2x = F.interpolate(
lr4x, scale_factor=2, mode='bilinear', align_corners=False)
f2x = self.conv_f2x(paddle.concat((lr2x, hr2x), axis=1))
f = F.interpolate(
f2x, scale_factor=2, mode='bilinear', align_corners=False)
f = self.conv_f(paddle.concat((f, img), axis=1))
pred_matte = F.sigmoid(f)
return pred_matte
class HRBranch(nn.Layer):
"""
High Resolution Branch of MODNet
"""
def __init__(self, hr_channels: int, enc_channels:int):
super().__init__()
self.tohr_enc2x = Conv2dIBNormRelu(
enc_channels[0], hr_channels, 1, stride=1, padding=0)
self.conv_enc2x = Conv2dIBNormRelu(
hr_channels + 3, hr_channels, 3, stride=2, padding=1)
self.tohr_enc4x = Conv2dIBNormRelu(
enc_channels[1], hr_channels, 1, stride=1, padding=0)
self.conv_enc4x = Conv2dIBNormRelu(
2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1)
self.conv_hr4x = nn.Sequential(
Conv2dIBNormRelu(
2 * hr_channels + enc_channels[2] + 3,
2 * hr_channels,
3,
stride=1,
padding=1),
Conv2dIBNormRelu(
2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(
2 * hr_channels, hr_channels, 3, stride=1, padding=1))
self.conv_hr2x = nn.Sequential(
Conv2dIBNormRelu(
2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(
2 * hr_channels, hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1))
self.conv_hr = nn.Sequential(
Conv2dIBNormRelu(
hr_channels + 3, hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(
hr_channels,
1,
1,
stride=1,
padding=0,
with_ibn=False,
with_relu=False))
def forward(self, img: paddle.Tensor, enc2x: paddle.Tensor, enc4x: paddle.Tensor, lr8x: paddle.Tensor):
img2x = F.interpolate(
img, scale_factor=1 / 2, mode='bilinear', align_corners=False)
img4x = F.interpolate(
img, scale_factor=1 / 4, mode='bilinear', align_corners=False)
enc2x = self.tohr_enc2x(enc2x)
hr4x = self.conv_enc2x(paddle.concat((img2x, enc2x), axis=1))
enc4x = self.tohr_enc4x(enc4x)
hr4x = self.conv_enc4x(paddle.concat((hr4x, enc4x), axis=1))
lr4x = F.interpolate(
lr8x, scale_factor=2, mode='bilinear', align_corners=False)
hr4x = self.conv_hr4x(paddle.concat((hr4x, lr4x, img4x), axis=1))
hr2x = F.interpolate(
hr4x, scale_factor=2, mode='bilinear', align_corners=False)
hr2x = self.conv_hr2x(paddle.concat((hr2x, enc2x), axis=1))
pred_detail = None
if self.training:
hr = F.interpolate(
hr2x, scale_factor=2, mode='bilinear', align_corners=False)
hr = self.conv_hr(paddle.concat((hr, img), axis=1))
pred_detail = F.sigmoid(hr)
return pred_detail, hr2x
class LRBranch(nn.Layer):
"""
Low Resolution Branch of MODNet
"""
def __init__(self, backbone_channels: int):
super().__init__()
self.se_block = SEBlock(backbone_channels[4], reduction=4)
self.conv_lr16x = Conv2dIBNormRelu(
backbone_channels[4], backbone_channels[3], 5, stride=1, padding=2)
self.conv_lr8x = Conv2dIBNormRelu(
backbone_channels[3], backbone_channels[2], 5, stride=1, padding=2)
self.conv_lr = Conv2dIBNormRelu(
backbone_channels[2],
1,
3,
stride=2,
padding=1,
with_ibn=False,
with_relu=False)
def forward(self, feat_list: list):
enc2x, enc4x, enc32x = feat_list[0], feat_list[1], feat_list[4]
enc32x = self.se_block(enc32x)
lr16x = F.interpolate(
enc32x, scale_factor=2, mode='bilinear', align_corners=False)
lr16x = self.conv_lr16x(lr16x)
lr8x = F.interpolate(
lr16x, scale_factor=2, mode='bilinear', align_corners=False)
lr8x = self.conv_lr8x(lr8x)
pred_semantic = None
if self.training:
lr = self.conv_lr(lr8x)
pred_semantic = F.sigmoid(lr)
return pred_semantic, lr8x, [enc2x, enc4x]
class IBNorm(nn.Layer):
"""
Combine Instance Norm and Batch Norm into One Layer
"""
def __init__(self, in_channels: int):
super().__init__()
self.bnorm_channels = in_channels // 2
self.inorm_channels = in_channels - self.bnorm_channels
self.bnorm = nn.BatchNorm2D(self.bnorm_channels)
self.inorm = nn.InstanceNorm2D(self.inorm_channels)
def forward(self, x):
bn_x = self.bnorm(x[:, :self.bnorm_channels, :, :])
in_x = self.inorm(x[:, self.bnorm_channels:, :, :])
return paddle.concat((bn_x, in_x), 1)
class Conv2dIBNormRelu(nn.Layer):
"""
Convolution + IBNorm + Relu
"""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
stride: int = 1,
padding: int = 0,
dilation:int = 1,
groups: int = 1,
bias_attr: paddle.ParamAttr = None,
with_ibn: bool = True,
with_relu: bool = True):
super().__init__()
layers = [
nn.Conv2D(
in_channels,
out_channels,
kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
groups=groups,
bias_attr=bias_attr)
]
if with_ibn:
layers.append(IBNorm(out_channels))
if with_relu:
layers.append(nn.ReLU())
self.layers = nn.Sequential(*layers)
def forward(self, x: paddle.Tensor):
return self.layers(x)
class SEBlock(nn.Layer):
"""
SE Block Proposed in https://arxiv.org/pdf/1709.01507.pdf
"""
def __init__(self, num_channels: int, reduction:int = 1):
super().__init__()
self.pool = nn.AdaptiveAvgPool2D(1)
self.conv = nn.Sequential(
nn.Conv2D(
num_channels,
int(num_channels // reduction),
1,
bias_attr=False), nn.ReLU(),
nn.Conv2D(
int(num_channels // reduction),
num_channels,
1,
bias_attr=False), nn.Sigmoid())
def forward(self, x: paddle.Tensor):
w = self.pool(x)
w = self.conv(w)
return w * x
class GaussianBlurLayer(nn.Layer):
""" Add Gaussian Blur to a 4D tensors
This layer takes a 4D tensor of {N, C, H, W} as input.
The Gaussian blur will be performed in given channel number (C) splitly.
"""
def __init__(self, channels: int, kernel_size: int):
"""
Args:
channels (int): Channel for input tensor
kernel_size (int): Size of the kernel used in blurring
"""
super(GaussianBlurLayer, self).__init__()
self.channels = channels
self.kernel_size = kernel_size
assert self.kernel_size % 2 != 0
self.op = nn.Sequential(
nn.Pad2D(int(self.kernel_size / 2), mode='reflect'),
nn.Conv2D(
channels,
channels,
self.kernel_size,
stride=1,
padding=0,
bias_attr=False,
groups=channels))
self._init_kernel()
self.op[1].weight.stop_gradient = True
def forward(self, x: paddle.Tensor):
"""
Args:
x (paddle.Tensor): input 4D tensor
Returns:
paddle.Tensor: Blurred version of the input
"""
if not len(list(x.shape)) == 4:
print('\'GaussianBlurLayer\' requires a 4D tensor as input\n')
exit()
elif not x.shape[1] == self.channels:
print('In \'GaussianBlurLayer\', the required channel ({0}) is'
'not the same as input ({1})\n'.format(
self.channels, x.shape[1]))
exit()
return self.op(x)
def _init_kernel(self):
sigma = 0.3 * ((self.kernel_size - 1) * 0.5 - 1) + 0.8
n = np.zeros((self.kernel_size, self.kernel_size))
i = int(self.kernel_size / 2)
n[i, i] = 1
kernel = scipy.ndimage.gaussian_filter(n, sigma)
kernel = kernel.astype('float32')
kernel = kernel[np.newaxis, np.newaxis, :, :]
paddle.assign(kernel, self.op[1].weight)
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import random
import base64
from typing import Callable, Union, List, Tuple
import cv2
import numpy as np
import paddle
import paddle.nn.functional as F
from paddleseg.transforms import functional
from PIL import Image
class Compose:
"""
Do transformation on input data with corresponding pre-processing and augmentation operations.
The shape of input data to all operations is [height, width, channels].
"""
def __init__(self, transforms: Callable, to_rgb: bool = True):
if not isinstance(transforms, list):
raise TypeError('The transforms must be a list!')
self.transforms = transforms
self.to_rgb = to_rgb
def __call__(self, data: dict) -> dict:
if 'trans_info' not in data:
data['trans_info'] = []
for op in self.transforms:
data = op(data)
if data is None:
return None
data['img'] = np.transpose(data['img'], (2, 0, 1))
for key in data.get('gt_fields', []):
if len(data[key].shape) == 2:
continue
data[key] = np.transpose(data[key], (2, 0, 1))
return data
class LoadImages:
"""
Read images from image path.
Args:
to_rgb (bool, optional): If converting image to RGB color space. Default: True.
"""
def __init__(self, to_rgb: bool = True):
self.to_rgb = to_rgb
def __call__(self, data: dict) -> dict:
if isinstance(data['img'], str):
data['img'] = cv2.imread(data['img'])
for key in data.get('gt_fields', []):
if isinstance(data[key], str):
data[key] = cv2.imread(data[key], cv2.IMREAD_UNCHANGED)
# if alpha and trimap has 3 channels, extract one.
if key in ['alpha', 'trimap']:
if len(data[key].shape) > 2:
data[key] = data[key][:, :, 0]
if self.to_rgb:
data['img'] = cv2.cvtColor(data['img'], cv2.COLOR_BGR2RGB)
for key in data.get('gt_fields', []):
if len(data[key].shape) == 2:
continue
data[key] = cv2.cvtColor(data[key], cv2.COLOR_BGR2RGB)
return data
class ResizeByShort:
"""
Resize the short side of an image to given size, and then scale the other side proportionally.
Args:
short_size (int): The target size of short side.
"""
def __init__(self, short_size: int =512):
self.short_size = short_size
def __call__(self, data: dict) -> dict:
data['trans_info'].append(('resize', data['img'].shape[0:2]))
data['img'] = functional.resize_short(data['img'], self.short_size)
for key in data.get('gt_fields', []):
data[key] = functional.resize_short(data[key], self.short_size)
return data
class ResizeToIntMult:
"""
Resize to some int muitple, d.g. 32.
"""
def __init__(self, mult_int: int = 32):
self.mult_int = mult_int
def __call__(self, data: dict) -> dict:
data['trans_info'].append(('resize', data['img'].shape[0:2]))
h, w = data['img'].shape[0:2]
rw = w - w % 32
rh = h - h % 32
data['img'] = functional.resize(data['img'], (rw, rh))
for key in data.get('gt_fields', []):
data[key] = functional.resize(data[key], (rw, rh))
return data
class Normalize:
"""
Normalize an image.
Args:
mean (list, optional): The mean value of a data set. Default: [0.5, 0.5, 0.5].
std (list, optional): The standard deviation of a data set. Default: [0.5, 0.5, 0.5].
Raises:
ValueError: When mean/std is not list or any value in std is 0.
"""
def __init__(self, mean: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5), std: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5)):
self.mean = mean
self.std = std
if not (isinstance(self.mean, (list, tuple))
and isinstance(self.std, (list, tuple))):
raise ValueError(
"{}: input type is invalid. It should be list or tuple".format(
self))
from functools import reduce
if reduce(lambda x, y: x * y, self.std) == 0:
raise ValueError('{}: std is invalid!'.format(self))
def __call__(self, data: dict) -> dict:
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
std = np.array(self.std)[np.newaxis, np.newaxis, :]
data['img'] = functional.normalize(data['img'], mean, std)
if 'fg' in data.get('gt_fields', []):
data['fg'] = functional.normalize(data['fg'], mean, std)
if 'bg' in data.get('gt_fields', []):
data['bg'] = functional.normalize(data['bg'], mean, std)
return data
def reverse_transform(alpha: paddle.Tensor, trans_info: List[str]):
"""recover pred to origin shape"""
for item in trans_info[::-1]:
if item[0] == 'resize':
h, w = item[1][0], item[1][1]
alpha = F.interpolate(alpha, [h, w], mode='bilinear')
elif item[0] == 'padding':
h, w = item[1][0], item[1][1]
alpha = alpha[:, :, 0:h, 0:w]
else:
raise Exception("Unexpected info '{}' in im_info".format(item[0]))
return alpha
def save_alpha_pred(alpha: np.ndarray, trimap: Union[np.ndarray, str] = None):
"""
The value of alpha is range [0, 1], shape should be [h,w]
"""
if isinstance(trimap, str):
trimap = cv2.imread(trimap, 0)
alpha[trimap == 0] = 0
alpha[trimap == 255] = 255
alpha = (alpha).astype('uint8')
return alpha
def cv2_to_base64(image: np.ndarray):
"""
Convert data from BGR to base64 format.
"""
data = cv2.imencode('.png', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str: str):
"""
Convert data from base64 to BGR format.
"""
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
\ No newline at end of file
# modnet_mobilenetv2_matting
|模型名称|modnet_mobilenetv2_matting|
| :--- | :---: |
|类别|图像-抠图|
|网络|modnet_mobilenetv2|
|数据集|百度自建数据集|
|是否支持Fine-tuning|否|
|模型大小|38MB|
|指标|SAD112.73|
|最新更新日期|2021-12-03|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例(左为原图,右为效果图):
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/144574288-28671577-8d5d-4b20-adb9-fe737015c841.jpg" width = "337" height = "505" hspace='10'/>
<img src="https://user-images.githubusercontent.com/35907364/144574092-d0dd08f3-309b-4a7d-84d5-8b94604431a1.png" width = "337" height = "505" hspace='10'/>
</p>
- ### 模型介绍
- Matting(精细化分割/影像去背/抠图)是指借由计算前景的颜色和透明度,将前景从影像中撷取出来的技术,可用于替换背景、影像合成、视觉特效,在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值,称作阿法值(Alpha),一张影像中所有阿法值的集合称作阿法遮罩(Alpha Matte),将影像被遮罩所涵盖的部分取出即可完成前景的分离。modnet_mobilenetv2_matting可生成抠图结果。
- 更多详情请参考:[modnet_mobilenetv2_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- ### 2、安装
- ```shell
$ hub install modnet_mobilenetv2_matting
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run modnet_mobilenetv2_matting --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="modnet_mobilenetv2_matting")
result = model.predict(["/PATH/TO/IMAGE"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
trimap_list,
visualization,
save_path):
```
- 人像matting预测API,用于将输入图片中的人像分割出来。
- 参数
- image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
- trimap_list(list(str | numpy.ndarray)):trimap输入路径或者灰度图单通道格式图片。默认为None。
- visualization (bool): 是否进行可视化,默认为False。
- save_path (str): 当visualization为True时,保存图片的路径,默认为"modnet_mobilenetv2_matting_output"。
- 返回
- result (list(numpy.ndarray)):模型分割结果:
## 四、服务部署
- PaddleHub Serving可以部署人像matting在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m modnet_mobilenetv2_matting
```
- 这样就完成了一个人像matting在线服务API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import time
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/modnet_mobilenetv2_matting"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
for image in r.json()["results"]['data']:
data = base64_to_cv2(image)
image_path =str(time.time()) + ".png"
cv2.imwrite(image_path, data)
```
## 五、更新历史
* 1.0.0
初始发布
# modnet_mobilenetv2_matting
|Module Name|modnet_mobilenetv2_matting|
| :--- | :---: |
|Category|Image Matting|
|Network|modnet_mobilenetv2|
|Dataset|Baidu self-built dataset|
|Support Fine-tuning|No|
|Module Size|38MB|
|Data Indicators|SAD112.73|
|Latest update date|2021-12-03|
## I. Basic Information
- ### Application Effect Display
- Sample results:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/144574288-28671577-8d5d-4b20-adb9-fe737015c841.jpg" width = "337" height = "505" hspace='10'/>
<img src="https://user-images.githubusercontent.com/35907364/144574092-d0dd08f3-309b-4a7d-84d5-8b94604431a1.png" width = "337" height = "505" hspace='10'/>
</p>
- ### Module Introduction
- Mating is the technique of extracting foreground from an image by calculating its color and transparency. It is widely used in the film industry to replace background, image composition, and visual effects. Each pixel in the image will have a value that represents its foreground transparency, called Alpha. The set of all Alpha values in an image is called Alpha Matte. The part of the image covered by the mask can be extracted to complete foreground separation.
- For more information, please refer to: [modnet_mobilenetv2_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
## II. Installation
- ### 1、Environmental Dependence
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- ### 2、Installation
- ```shell
$ hub install modnet_mobilenetv2_matting
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
| [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III. Module API Prediction
- ### 1、Command line Prediction
- ```shell
$ hub run modnet_mobilenetv2_matting --input_path "/PATH/TO/IMAGE"
```
- If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
- ### 2、Prediction Code Example
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="modnet_mobilenetv2_matting")
result = model.predict(image_list=["/PATH/TO/IMAGE"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
trimap_list,
visualization,
save_path):
```
- Prediction API for matting.
- **Parameter**
- image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\],BGR.
- trimap_list(list(str | numpy.ndarray)): Trimap path or trimap data, ndarray.shape is in the format \[H, W],gray. Default is None.
- visualization (bool): Whether to save the recognition results as picture files, default is False.
- save_path (str): Save path of images, "modnet_mobilenetv2_matting_output" by default.
- **Return**
- result (list(numpy.ndarray)):The list of model results.
## IV. Server Deployment
- PaddleHub Serving can deploy an online service of matting.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m modnet_mobilenetv2_matting
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result
```python
import requests
import json
import cv2
import base64
import time
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/modnet_mobilenetv2_matting"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
for image in r.json()["results"]['data']:
data = base64_to_cv2(image)
image_path =str(time.time()) + ".png"
cv2.imwrite(image_path, data)
```
## V. Release Note
- 1.0.0
First release
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import math
import numpy as np
import paddle
from paddle import ParamAttr
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn import Conv2D, BatchNorm, Linear, Dropout
from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D
from paddleseg import utils
from paddleseg.cvlibs import manager
__all__ = ["MobileNetV2"]
class ConvBNLayer(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(self,
num_channels: int,
filter_size: int,
num_filters: int,
stride: int,
padding: int,
num_groups: int=1,
name: str = None,
use_cudnn: bool = True):
super(ConvBNLayer, self).__init__()
self._conv = Conv2D(
in_channels=num_channels,
out_channels=num_filters,
kernel_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
weight_attr=ParamAttr(name=name + "_weights"),
bias_attr=False)
self._batch_norm = BatchNorm(
num_filters,
param_attr=ParamAttr(name=name + "_bn_scale"),
bias_attr=ParamAttr(name=name + "_bn_offset"),
moving_mean_name=name + "_bn_mean",
moving_variance_name=name + "_bn_variance")
def forward(self, inputs: paddle.Tensor, if_act: bool = True) -> paddle.Tensor:
y = self._conv(inputs)
y = self._batch_norm(y)
if if_act:
y = F.relu6(y)
return y
class InvertedResidualUnit(nn.Layer):
"""Inverted residual block"""
def __init__(self, num_channels: int, num_in_filter: int, num_filters: int, stride: int,
filter_size: int, padding: int, expansion_factor: int, name: str):
super(InvertedResidualUnit, self).__init__()
num_expfilter = int(round(num_in_filter * expansion_factor))
self._expand_conv = ConvBNLayer(
num_channels=num_channels,
num_filters=num_expfilter,
filter_size=1,
stride=1,
padding=0,
num_groups=1,
name=name + "_expand")
self._bottleneck_conv = ConvBNLayer(
num_channels=num_expfilter,
num_filters=num_expfilter,
filter_size=filter_size,
stride=stride,
padding=padding,
num_groups=num_expfilter,
use_cudnn=False,
name=name + "_dwise")
self._linear_conv = ConvBNLayer(
num_channels=num_expfilter,
num_filters=num_filters,
filter_size=1,
stride=1,
padding=0,
num_groups=1,
name=name + "_linear")
def forward(self, inputs: paddle.Tensor, ifshortcut: bool) -> paddle.Tensor:
y = self._expand_conv(inputs, if_act=True)
y = self._bottleneck_conv(y, if_act=True)
y = self._linear_conv(y, if_act=False)
if ifshortcut:
y = paddle.add(inputs, y)
return y
class InvresiBlocks(nn.Layer):
def __init__(self, in_c: int, t: int, c: int, n: int, s: int, name: str):
super(InvresiBlocks, self).__init__()
self._first_block = InvertedResidualUnit(
num_channels=in_c,
num_in_filter=in_c,
num_filters=c,
stride=s,
filter_size=3,
padding=1,
expansion_factor=t,
name=name + "_1")
self._block_list = []
for i in range(1, n):
block = self.add_sublayer(
name + "_" + str(i + 1),
sublayer=InvertedResidualUnit(
num_channels=c,
num_in_filter=c,
num_filters=c,
stride=1,
filter_size=3,
padding=1,
expansion_factor=t,
name=name + "_" + str(i + 1)))
self._block_list.append(block)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self._first_block(inputs, ifshortcut=False)
for block in self._block_list:
y = block(y, ifshortcut=True)
return y
class MobileNet(nn.Layer):
"""Networj of MobileNet"""
def __init__(self,
input_channels: int = 3,
scale: float = 1.0,
pretrained: str = None,
prefix_name: str = ""):
super(MobileNet, self).__init__()
self.scale = scale
bottleneck_params_list = [
(1, 16, 1, 1),
(6, 24, 2, 2),
(6, 32, 3, 2),
(6, 64, 4, 2),
(6, 96, 3, 1),
(6, 160, 3, 2),
(6, 320, 1, 1),
]
self.conv1 = ConvBNLayer(
num_channels=input_channels,
num_filters=int(32 * scale),
filter_size=3,
stride=2,
padding=1,
name=prefix_name + "conv1_1")
self.block_list = []
i = 1
in_c = int(32 * scale)
for layer_setting in bottleneck_params_list:
t, c, n, s = layer_setting
i += 1
block = self.add_sublayer(
prefix_name + "conv" + str(i),
sublayer=InvresiBlocks(
in_c=in_c,
t=t,
c=int(c * scale),
n=n,
s=s,
name=prefix_name + "conv" + str(i)))
self.block_list.append(block)
in_c = int(c * scale)
self.out_c = int(1280 * scale) if scale > 1.0 else 1280
self.conv9 = ConvBNLayer(
num_channels=in_c,
num_filters=self.out_c,
filter_size=1,
stride=1,
padding=0,
name=prefix_name + "conv9")
self.feat_channels = [int(i * scale) for i in [16, 24, 32, 96, 1280]]
self.pretrained = pretrained
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
feat_list = []
y = self.conv1(inputs, if_act=True)
block_index = 0
for block in self.block_list:
y = block(y)
if block_index in [0, 1, 2, 4]:
feat_list.append(y)
block_index += 1
y = self.conv9(y, if_act=True)
feat_list.append(y)
return feat_list
def MobileNetV2(**kwargs):
model = MobileNet(scale=1.0, **kwargs)
return model
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
import argparse
from typing import Callable, Union, List, Tuple
import numpy as np
import cv2
import scipy
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlehub.module.module import moduleinfo
import paddlehub.vision.segmentation_transforms as T
from paddlehub.module.module import moduleinfo, runnable, serving
from modnet_mobilenetv2_matting.mobilenetv2 import MobileNetV2
import modnet_mobilenetv2_matting.processor as P
@moduleinfo(
name="modnet_mobilenetv2_matting",
type="CV",
author="paddlepaddle",
summary="modnet_mobilenetv2_matting is a matting model",
version="1.0.0"
)
class MODNetMobilenetV2(nn.Layer):
"""
The MODNet implementation based on PaddlePaddle.
The original article refers to
Zhanghan Ke, et, al. "Is a Green Screen Really Necessary for Real-Time Portrait Matting?"
(https://arxiv.org/pdf/2011.11961.pdf).
Args:
hr_channels(int, optional): The channels of high resolutions branch. Defautl: None.
pretrained(str, optional): The path of pretrianed model. Defautl: None.
"""
def __init__(self, hr_channels:int = 32, pretrained=None):
super(MODNetMobilenetV2, self).__init__()
self.backbone = MobileNetV2()
self.pretrained = pretrained
self.head = MODNetHead(
hr_channels=hr_channels, backbone_channels=self.backbone.feat_channels)
self.blurer = GaussianBlurLayer(1, 3)
self.transforms = P.Compose([P.LoadImages(), P.ResizeByShort(), P.ResizeToIntMult(), P.Normalize()])
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'modnet-mobilenetv2.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def preprocess(self, img: Union[str, np.ndarray] , transforms: Callable, trimap: Union[str, np.ndarray] = None):
data = {}
data['img'] = img
if trimap is not None:
data['trimap'] = trimap
data['gt_fields'] = ['trimap']
data['trans_info'] = []
data = self.transforms(data)
data['img'] = paddle.to_tensor(data['img'])
data['img'] = data['img'].unsqueeze(0)
if trimap is not None:
data['trimap'] = paddle.to_tensor(data['trimap'])
data['trimap'] = data['trimap'].unsqueeze((0, 1))
return data
def forward(self, inputs: dict):
x = inputs['img']
feat_list = self.backbone(x)
y = self.head(inputs=inputs, feat_list=feat_list)
return y
def predict(self, image_list: list, trimap_list: list = None, visualization: bool =False, save_path: str = "modnet_mobilenetv2_matting_output"):
self.eval()
result = []
with paddle.no_grad():
for i, im_path in enumerate(image_list):
trimap = trimap_list[i] if trimap_list is not None else None
data = self.preprocess(img=im_path, transforms=self.transforms, trimap=trimap)
alpha_pred = self.forward(data)
alpha_pred = P.reverse_transform(alpha_pred, data['trans_info'])
alpha_pred = (alpha_pred.numpy()).squeeze()
alpha_pred = (alpha_pred * 255).astype('uint8')
alpha_pred = P.save_alpha_pred(alpha_pred, trimap)
result.append(alpha_pred)
if visualization:
if not os.path.exists(save_path):
os.makedirs(save_path)
img_name = str(time.time()) + '.png'
image_save_path = os.path.join(save_path, img_name)
cv2.imwrite(image_save_path, alpha_pred)
return result
@serving
def serving_method(self, images: list, trimaps:list = None, **kwargs):
"""
Run as a service.
"""
images_decode = [P.base64_to_cv2(image) for image in images]
if trimaps is not None:
trimap_decoder = [cv2.cvtColor(P.base64_to_cv2(trimap), cv2.COLOR_BGR2GRAY) for trimap in trimaps]
else:
trimap_decoder = None
outputs = self.predict(image_list=images_decode, trimap_list= trimap_decoder, **kwargs)
serving_data = [P.cv2_to_base64(outputs[i]) for i in range(len(outputs))]
results = {'data': serving_data}
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
args = self.parser.parse_args(argvs)
if args.trimap_path is not None:
trimap_list = [args.trimap_path]
else:
trimap_list = None
results = self.predict(image_list=[args.input_path], trimap_list=trimap_list, save_path=args.output_dir, visualization=args.visualization)
return results
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument(
'--output_dir', type=str, default="modnet_mobilenetv2_matting_output", help="The directory to save output images.")
self.arg_config_group.add_argument(
'--visualization', type=bool, default=True, help="whether to save output as images.")
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
self.arg_input_group.add_argument('--trimap_path', type=str, default=None, help="path to image.")
class MODNetHead(nn.Layer):
"""
Segmentation head.
"""
def __init__(self, hr_channels: int, backbone_channels: int):
super().__init__()
self.lr_branch = LRBranch(backbone_channels)
self.hr_branch = HRBranch(hr_channels, backbone_channels)
self.f_branch = FusionBranch(hr_channels, backbone_channels)
def forward(self, inputs: paddle.Tensor, feat_list: list):
pred_semantic, lr8x, [enc2x, enc4x] = self.lr_branch(feat_list)
pred_detail, hr2x = self.hr_branch(inputs['img'], enc2x, enc4x, lr8x)
pred_matte = self.f_branch(inputs['img'], lr8x, hr2x)
if self.training:
logit_dict = {
'semantic': pred_semantic,
'detail': pred_detail,
'matte': pred_matte
}
return logit_dict
else:
return pred_matte
class FusionBranch(nn.Layer):
def __init__(self, hr_channels: int, enc_channels: int):
super().__init__()
self.conv_lr4x = Conv2dIBNormRelu(
enc_channels[2], hr_channels, 5, stride=1, padding=2)
self.conv_f2x = Conv2dIBNormRelu(
2 * hr_channels, hr_channels, 3, stride=1, padding=1)
self.conv_f = nn.Sequential(
Conv2dIBNormRelu(
hr_channels + 3, int(hr_channels / 2), 3, stride=1, padding=1),
Conv2dIBNormRelu(
int(hr_channels / 2),
1,
1,
stride=1,
padding=0,
with_ibn=False,
with_relu=False))
def forward(self, img: paddle.Tensor, lr8x: paddle.Tensor, hr2x: paddle.Tensor):
lr4x = F.interpolate(
lr8x, scale_factor=2, mode='bilinear', align_corners=False)
lr4x = self.conv_lr4x(lr4x)
lr2x = F.interpolate(
lr4x, scale_factor=2, mode='bilinear', align_corners=False)
f2x = self.conv_f2x(paddle.concat((lr2x, hr2x), axis=1))
f = F.interpolate(
f2x, scale_factor=2, mode='bilinear', align_corners=False)
f = self.conv_f(paddle.concat((f, img), axis=1))
pred_matte = F.sigmoid(f)
return pred_matte
class HRBranch(nn.Layer):
"""
High Resolution Branch of MODNet
"""
def __init__(self, hr_channels: int, enc_channels:int):
super().__init__()
self.tohr_enc2x = Conv2dIBNormRelu(
enc_channels[0], hr_channels, 1, stride=1, padding=0)
self.conv_enc2x = Conv2dIBNormRelu(
hr_channels + 3, hr_channels, 3, stride=2, padding=1)
self.tohr_enc4x = Conv2dIBNormRelu(
enc_channels[1], hr_channels, 1, stride=1, padding=0)
self.conv_enc4x = Conv2dIBNormRelu(
2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1)
self.conv_hr4x = nn.Sequential(
Conv2dIBNormRelu(
2 * hr_channels + enc_channels[2] + 3,
2 * hr_channels,
3,
stride=1,
padding=1),
Conv2dIBNormRelu(
2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(
2 * hr_channels, hr_channels, 3, stride=1, padding=1))
self.conv_hr2x = nn.Sequential(
Conv2dIBNormRelu(
2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(
2 * hr_channels, hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1))
self.conv_hr = nn.Sequential(
Conv2dIBNormRelu(
hr_channels + 3, hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(
hr_channels,
1,
1,
stride=1,
padding=0,
with_ibn=False,
with_relu=False))
def forward(self, img: paddle.Tensor, enc2x: paddle.Tensor, enc4x: paddle.Tensor, lr8x: paddle.Tensor):
img2x = F.interpolate(
img, scale_factor=1 / 2, mode='bilinear', align_corners=False)
img4x = F.interpolate(
img, scale_factor=1 / 4, mode='bilinear', align_corners=False)
enc2x = self.tohr_enc2x(enc2x)
hr4x = self.conv_enc2x(paddle.concat((img2x, enc2x), axis=1))
enc4x = self.tohr_enc4x(enc4x)
hr4x = self.conv_enc4x(paddle.concat((hr4x, enc4x), axis=1))
lr4x = F.interpolate(
lr8x, scale_factor=2, mode='bilinear', align_corners=False)
hr4x = self.conv_hr4x(paddle.concat((hr4x, lr4x, img4x), axis=1))
hr2x = F.interpolate(
hr4x, scale_factor=2, mode='bilinear', align_corners=False)
hr2x = self.conv_hr2x(paddle.concat((hr2x, enc2x), axis=1))
pred_detail = None
if self.training:
hr = F.interpolate(
hr2x, scale_factor=2, mode='bilinear', align_corners=False)
hr = self.conv_hr(paddle.concat((hr, img), axis=1))
pred_detail = F.sigmoid(hr)
return pred_detail, hr2x
class LRBranch(nn.Layer):
"""
Low Resolution Branch of MODNet
"""
def __init__(self, backbone_channels: int):
super().__init__()
self.se_block = SEBlock(backbone_channels[4], reduction=4)
self.conv_lr16x = Conv2dIBNormRelu(
backbone_channels[4], backbone_channels[3], 5, stride=1, padding=2)
self.conv_lr8x = Conv2dIBNormRelu(
backbone_channels[3], backbone_channels[2], 5, stride=1, padding=2)
self.conv_lr = Conv2dIBNormRelu(
backbone_channels[2],
1,
3,
stride=2,
padding=1,
with_ibn=False,
with_relu=False)
def forward(self, feat_list: list):
enc2x, enc4x, enc32x = feat_list[0], feat_list[1], feat_list[4]
enc32x = self.se_block(enc32x)
lr16x = F.interpolate(
enc32x, scale_factor=2, mode='bilinear', align_corners=False)
lr16x = self.conv_lr16x(lr16x)
lr8x = F.interpolate(
lr16x, scale_factor=2, mode='bilinear', align_corners=False)
lr8x = self.conv_lr8x(lr8x)
pred_semantic = None
if self.training:
lr = self.conv_lr(lr8x)
pred_semantic = F.sigmoid(lr)
return pred_semantic, lr8x, [enc2x, enc4x]
class IBNorm(nn.Layer):
"""
Combine Instance Norm and Batch Norm into One Layer
"""
def __init__(self, in_channels: int):
super().__init__()
self.bnorm_channels = in_channels // 2
self.inorm_channels = in_channels - self.bnorm_channels
self.bnorm = nn.BatchNorm2D(self.bnorm_channels)
self.inorm = nn.InstanceNorm2D(self.inorm_channels)
def forward(self, x):
bn_x = self.bnorm(x[:, :self.bnorm_channels, :, :])
in_x = self.inorm(x[:, self.bnorm_channels:, :, :])
return paddle.concat((bn_x, in_x), 1)
class Conv2dIBNormRelu(nn.Layer):
"""
Convolution + IBNorm + Relu
"""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
stride: int = 1,
padding: int = 0,
dilation:int = 1,
groups: int = 1,
bias_attr: paddle.ParamAttr = None,
with_ibn: bool = True,
with_relu: bool = True):
super().__init__()
layers = [
nn.Conv2D(
in_channels,
out_channels,
kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
groups=groups,
bias_attr=bias_attr)
]
if with_ibn:
layers.append(IBNorm(out_channels))
if with_relu:
layers.append(nn.ReLU())
self.layers = nn.Sequential(*layers)
def forward(self, x: paddle.Tensor):
return self.layers(x)
class SEBlock(nn.Layer):
"""
SE Block Proposed in https://arxiv.org/pdf/1709.01507.pdf
"""
def __init__(self, num_channels: int, reduction:int = 1):
super().__init__()
self.pool = nn.AdaptiveAvgPool2D(1)
self.conv = nn.Sequential(
nn.Conv2D(
num_channels,
int(num_channels // reduction),
1,
bias_attr=False), nn.ReLU(),
nn.Conv2D(
int(num_channels // reduction),
num_channels,
1,
bias_attr=False), nn.Sigmoid())
def forward(self, x: paddle.Tensor):
w = self.pool(x)
w = self.conv(w)
return w * x
class GaussianBlurLayer(nn.Layer):
""" Add Gaussian Blur to a 4D tensors
This layer takes a 4D tensor of {N, C, H, W} as input.
The Gaussian blur will be performed in given channel number (C) splitly.
"""
def __init__(self, channels: int, kernel_size: int):
"""
Args:
channels (int): Channel for input tensor
kernel_size (int): Size of the kernel used in blurring
"""
super(GaussianBlurLayer, self).__init__()
self.channels = channels
self.kernel_size = kernel_size
assert self.kernel_size % 2 != 0
self.op = nn.Sequential(
nn.Pad2D(int(self.kernel_size / 2), mode='reflect'),
nn.Conv2D(
channels,
channels,
self.kernel_size,
stride=1,
padding=0,
bias_attr=False,
groups=channels))
self._init_kernel()
self.op[1].weight.stop_gradient = True
def forward(self, x: paddle.Tensor):
"""
Args:
x (paddle.Tensor): input 4D tensor
Returns:
paddle.Tensor: Blurred version of the input
"""
if not len(list(x.shape)) == 4:
print('\'GaussianBlurLayer\' requires a 4D tensor as input\n')
exit()
elif not x.shape[1] == self.channels:
print('In \'GaussianBlurLayer\', the required channel ({0}) is'
'not the same as input ({1})\n'.format(
self.channels, x.shape[1]))
exit()
return self.op(x)
def _init_kernel(self):
sigma = 0.3 * ((self.kernel_size - 1) * 0.5 - 1) + 0.8
n = np.zeros((self.kernel_size, self.kernel_size))
i = int(self.kernel_size / 2)
n[i, i] = 1
kernel = scipy.ndimage.gaussian_filter(n, sigma)
kernel = kernel.astype('float32')
kernel = kernel[np.newaxis, np.newaxis, :, :]
paddle.assign(kernel, self.op[1].weight)
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import random
import base64
from typing import Callable, Union, List, Tuple
import cv2
import numpy as np
import paddle
import paddle.nn.functional as F
from paddleseg.transforms import functional
from PIL import Image
class Compose:
"""
Do transformation on input data with corresponding pre-processing and augmentation operations.
The shape of input data to all operations is [height, width, channels].
"""
def __init__(self, transforms: Callable, to_rgb: bool = True):
if not isinstance(transforms, list):
raise TypeError('The transforms must be a list!')
self.transforms = transforms
self.to_rgb = to_rgb
def __call__(self, data: dict) -> dict:
if 'trans_info' not in data:
data['trans_info'] = []
for op in self.transforms:
data = op(data)
if data is None:
return None
data['img'] = np.transpose(data['img'], (2, 0, 1))
for key in data.get('gt_fields', []):
if len(data[key].shape) == 2:
continue
data[key] = np.transpose(data[key], (2, 0, 1))
return data
class LoadImages:
"""
Read images from image path.
Args:
to_rgb (bool, optional): If converting image to RGB color space. Default: True.
"""
def __init__(self, to_rgb: bool = True):
self.to_rgb = to_rgb
def __call__(self, data: dict) -> dict:
if isinstance(data['img'], str):
data['img'] = cv2.imread(data['img'])
for key in data.get('gt_fields', []):
if isinstance(data[key], str):
data[key] = cv2.imread(data[key], cv2.IMREAD_UNCHANGED)
# if alpha and trimap has 3 channels, extract one.
if key in ['alpha', 'trimap']:
if len(data[key].shape) > 2:
data[key] = data[key][:, :, 0]
if self.to_rgb:
data['img'] = cv2.cvtColor(data['img'], cv2.COLOR_BGR2RGB)
for key in data.get('gt_fields', []):
if len(data[key].shape) == 2:
continue
data[key] = cv2.cvtColor(data[key], cv2.COLOR_BGR2RGB)
return data
class ResizeByShort:
"""
Resize the short side of an image to given size, and then scale the other side proportionally.
Args:
short_size (int): The target size of short side.
"""
def __init__(self, short_size: int =512):
self.short_size = short_size
def __call__(self, data: dict) -> dict:
data['trans_info'].append(('resize', data['img'].shape[0:2]))
data['img'] = functional.resize_short(data['img'], self.short_size)
for key in data.get('gt_fields', []):
data[key] = functional.resize_short(data[key], self.short_size)
return data
class ResizeToIntMult:
"""
Resize to some int muitple, d.g. 32.
"""
def __init__(self, mult_int: int = 32):
self.mult_int = mult_int
def __call__(self, data: dict) -> dict:
data['trans_info'].append(('resize', data['img'].shape[0:2]))
h, w = data['img'].shape[0:2]
rw = w - w % 32
rh = h - h % 32
data['img'] = functional.resize(data['img'], (rw, rh))
for key in data.get('gt_fields', []):
data[key] = functional.resize(data[key], (rw, rh))
return data
class Normalize:
"""
Normalize an image.
Args:
mean (list, optional): The mean value of a data set. Default: [0.5, 0.5, 0.5].
std (list, optional): The standard deviation of a data set. Default: [0.5, 0.5, 0.5].
Raises:
ValueError: When mean/std is not list or any value in std is 0.
"""
def __init__(self, mean: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5), std: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5)):
self.mean = mean
self.std = std
if not (isinstance(self.mean, (list, tuple))
and isinstance(self.std, (list, tuple))):
raise ValueError(
"{}: input type is invalid. It should be list or tuple".format(
self))
from functools import reduce
if reduce(lambda x, y: x * y, self.std) == 0:
raise ValueError('{}: std is invalid!'.format(self))
def __call__(self, data: dict) -> dict:
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
std = np.array(self.std)[np.newaxis, np.newaxis, :]
data['img'] = functional.normalize(data['img'], mean, std)
if 'fg' in data.get('gt_fields', []):
data['fg'] = functional.normalize(data['fg'], mean, std)
if 'bg' in data.get('gt_fields', []):
data['bg'] = functional.normalize(data['bg'], mean, std)
return data
def reverse_transform(alpha: paddle.Tensor, trans_info: List[str]):
"""recover pred to origin shape"""
for item in trans_info[::-1]:
if item[0] == 'resize':
h, w = item[1][0], item[1][1]
alpha = F.interpolate(alpha, [h, w], mode='bilinear')
elif item[0] == 'padding':
h, w = item[1][0], item[1][1]
alpha = alpha[:, :, 0:h, 0:w]
else:
raise Exception("Unexpected info '{}' in im_info".format(item[0]))
return alpha
def save_alpha_pred(alpha: np.ndarray, trimap: np.ndarray = None):
"""
The value of alpha is range [0, 1], shape should be [h,w]
"""
if isinstance(trimap, str):
trimap = cv2.imread(trimap, 0)
alpha[trimap == 0] = 0
alpha[trimap == 255] = 255
alpha = (alpha).astype('uint8')
return alpha
def cv2_to_base64(image: np.ndarray):
"""
Convert data from BGR to base64 format.
"""
data = cv2.imencode('.png', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str: str):
"""
Convert data from base64 to BGR format.
"""
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
\ No newline at end of file
# modnet_resnet50vd_matting
|模型名称|modnet_resnet50vd_matting|
| :--- | :---: |
|类别|图像-抠图|
|网络|modnet_resnet50vd|
|数据集|百度自建数据集|
|是否支持Fine-tuning|否|
|模型大小|535MB|
|指标|SAD112.73|
|最新更新日期|2021-12-03|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例(左为原图,右为效果图):
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/144574288-28671577-8d5d-4b20-adb9-fe737015c841.jpg" width = "337" height = "505" hspace='10'/>
<img src="https://user-images.githubusercontent.com/35907364/144779164-47146d3a-58c9-4a38-b968-3530aa9a0137.png" width = "337" height = "505" hspace='10'/>
</p>
- ### 模型介绍
- Matting(精细化分割/影像去背/抠图)是指借由计算前景的颜色和透明度,将前景从影像中撷取出来的技术,可用于替换背景、影像合成、视觉特效,在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值,称作阿法值(Alpha),一张影像中所有阿法值的集合称作阿法遮罩(Alpha Matte),将影像被遮罩所涵盖的部分取出即可完成前景的分离。modnet_resnet50vd_matting可生成抠图结果。
- 更多详情请参考:[modnet_resnet50vd_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- ### 2、安装
- ```shell
$ hub install modnet_resnet50vd_matting
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run modnet_resnet50vd_matting --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="modnet_resnet50vd_matting")
result = model.predict(["/PATH/TO/IMAGE"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
trimap_list,
visualization,
save_path):
```
- 人像matting预测API,用于将输入图片中的人像分割出来。
- 参数
- image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
- trimap_list(list(str | numpy.ndarray)):trimap输入路径或者灰度图单通道格式图片。
- visualization (bool): 是否进行可视化,默认为False。
- save_path (str): 当visualization为True时,保存图片的路径,默认为"modnet_resnet50vd_matting_output"。
- 返回
- result (list(numpy.ndarray)):模型分割结果:
## 四、服务部署
- PaddleHub Serving可以部署人像matting在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m modnet_resnet50vd_matting
```
- 这样就完成了一个人像matting在线服务API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import time
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/modnet_resnet50vd_matting"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
for image in r.json()["results"]['data']:
data = base64_to_cv2(image)
image_path =str(time.time()) + ".png"
cv2.imwrite(image_path, data)
```
## 五、更新历史
* 1.0.0
初始发布
# modnet_resnet50vd_matting
|Module Name|modnet_resnet50vd_matting|
| :--- | :---: |
|Category|Image Matting|
|Network|modnet_resnet50vd|
|Dataset|Baidu self-built dataset|
|Support Fine-tuning|No|
|Module Size|535MB|
|Data Indicators|SAD104.14|
|Latest update date|2021-12-03|
## I. Basic Information
- ### Application Effect Display
- Sample results:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/144574288-28671577-8d5d-4b20-adb9-fe737015c841.jpg" width = "337" height = "505" hspace='10'/>
<img src="https://user-images.githubusercontent.com/35907364/144779164-47146d3a-58c9-4a38-b968-3530aa9a0137.png" width = "337" height = "505" hspace='10'/>
</p>
- ### Module Introduction
- Mating is the technique of extracting foreground from an image by calculating its color and transparency. It is widely used in the film industry to replace background, image composition, and visual effects. Each pixel in the image will have a value that represents its foreground transparency, called Alpha. The set of all Alpha values in an image is called Alpha Matte. The part of the image covered by the mask can be extracted to complete foreground separation.
- For more information, please refer to: [modnet_resnet50vd_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
## II. Installation
- ### 1、Environmental Dependence
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- ### 2、Installation
- ```shell
$ hub install modnet_resnet50vd_matting
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
| [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III. Module API Prediction
- ### 1、Command line Prediction
- ```shell
$ hub run modnet_resnet50vd_matting --input_path "/PATH/TO/IMAGE"
```
- If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
- ### 2、Prediction Code Example
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="modnet_resnet50vd_matting")
result = model.predict(["/PATH/TO/IMAGE"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
trimap_list,
visualization,
save_path):
```
- Prediction API for matting.
- **Parameter**
- image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\], BGR.
- trimap_list(list(str | numpy.ndarray)): Trimap path or trimap data, ndarray.shape is in the format \[H, W\], Gray. Default is None.
- visualization (bool): Whether to save the recognition results as picture files, default is False.
- save_path (str): Save path of images, "modnet_resnet50vd_matting_output" by default.
- **Return**
- result (list(numpy.ndarray)):The list of model results.
## IV. Server Deployment
- PaddleHub Serving can deploy an online service of matting.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m modnet_resnet50vd_matting
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result
```python
import requests
import json
import cv2
import base64
import time
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/modnet_resnet50vd_matting"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
for image in r.json()["results"]['data']:
data = base64_to_cv2(image)
image_path =str(time.time()) + ".png"
cv2.imwrite(image_path, data)
```
## V. Release Note
- 1.0.0
First release
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
import argparse
from typing import Callable, Union, List, Tuple
import numpy as np
import cv2
import scipy
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlehub.module.module import moduleinfo
import paddlehub.vision.segmentation_transforms as T
from paddlehub.module.module import moduleinfo, runnable, serving
from modnet_resnet50vd_matting.resnet import ResNet50_vd
import modnet_resnet50vd_matting.processor as P
@moduleinfo(
name="modnet_resnet50vd_matting",
type="CV/matting",
author="paddlepaddle",
summary="modnet_resnet50vd_matting is a matting model",
version="1.0.0"
)
class MODNetResNet50Vd(nn.Layer):
"""
The MODNet implementation based on PaddlePaddle.
The original article refers to
Zhanghan Ke, et, al. "Is a Green Screen Really Necessary for Real-Time Portrait Matting?"
(https://arxiv.org/pdf/2011.11961.pdf).
Args:
hr_channels(int, optional): The channels of high resolutions branch. Defautl: None.
pretrained(str, optional): The path of pretrianed model. Defautl: None.
"""
def __init__(self, hr_channels:int = 32, pretrained=None):
super(MODNetResNet50Vd, self).__init__()
self.backbone = ResNet50_vd()
self.pretrained = pretrained
self.head = MODNetHead(
hr_channels=hr_channels, backbone_channels=self.backbone.feat_channels)
self.blurer = GaussianBlurLayer(1, 3)
self.transforms = P.Compose([P.LoadImages(), P.ResizeByShort(), P.ResizeToIntMult(), P.Normalize()])
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'modnet-resnet50_vd.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def preprocess(self, img: Union[str, np.ndarray] , transforms: Callable, trimap: Union[str, np.ndarray] = None):
data = {}
data['img'] = img
if trimap is not None:
data['trimap'] = trimap
data['gt_fields'] = ['trimap']
data['trans_info'] = []
data = self.transforms(data)
data['img'] = paddle.to_tensor(data['img'])
data['img'] = data['img'].unsqueeze(0)
if trimap is not None:
data['trimap'] = paddle.to_tensor(data['trimap'])
data['trimap'] = data['trimap'].unsqueeze((0, 1))
return data
def forward(self, inputs: dict):
x = inputs['img']
feat_list = self.backbone(x)
y = self.head(inputs=inputs, feat_list=feat_list)
return y
def predict(self, image_list: list, trimap_list: list = None, visualization: bool =False, save_path: str = "modnet_resnet50vd_matting_output"):
self.eval()
result= []
with paddle.no_grad():
for i, im_path in enumerate(image_list):
trimap = trimap_list[i] if trimap_list is not None else None
data = self.preprocess(img=im_path, transforms=self.transforms, trimap=trimap)
alpha_pred = self.forward(data)
alpha_pred = P.reverse_transform(alpha_pred, data['trans_info'])
alpha_pred = (alpha_pred.numpy()).squeeze()
alpha_pred = (alpha_pred * 255).astype('uint8')
alpha_pred = P.save_alpha_pred(alpha_pred, trimap)
result.append(alpha_pred)
if visualization:
if not os.path.exists(save_path):
os.makedirs(save_path)
img_name = str(time.time()) + '.png'
image_save_path = os.path.join(save_path, img_name)
cv2.imwrite(image_save_path, alpha_pred)
return result
@serving
def serving_method(self, images: list, trimaps:list = None, **kwargs):
"""
Run as a service.
"""
images_decode = [P.base64_to_cv2(image) for image in images]
if trimaps is not None:
trimap_decoder = [cv2.cvtColor(P.base64_to_cv2(trimap), cv2.COLOR_BGR2GRAY) for trimap in trimaps]
else:
trimap_decoder = None
outputs = self.predict(image_list=images_decode, trimap_list= trimap_decoder, **kwargs)
serving_data = [P.cv2_to_base64(outputs[i]) for i in range(len(outputs))]
results = {'data': serving_data}
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
args = self.parser.parse_args(argvs)
if args.trimap_path is not None:
trimap_list = [args.trimap_path]
else:
trimap_list = None
results = self.predict(image_list=[args.input_path], trimap_list=trimap_list, save_path=args.output_dir, visualization=args.visualization)
return results
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument(
'--output_dir', type=str, default="modnet_resnet50vd_matting_output", help="The directory to save output images.")
self.arg_config_group.add_argument(
'--visualization', type=bool, default=True, help="whether to save output as images.")
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
self.arg_input_group.add_argument('--trimap_path', type=str, default=None, help="path to trimap.")
class MODNetHead(nn.Layer):
"""
Segmentation head.
"""
def __init__(self, hr_channels: int, backbone_channels: int):
super().__init__()
self.lr_branch = LRBranch(backbone_channels)
self.hr_branch = HRBranch(hr_channels, backbone_channels)
self.f_branch = FusionBranch(hr_channels, backbone_channels)
def forward(self, inputs: paddle.Tensor, feat_list: list) -> paddle.Tensor:
pred_semantic, lr8x, [enc2x, enc4x] = self.lr_branch(feat_list)
pred_detail, hr2x = self.hr_branch(inputs['img'], enc2x, enc4x, lr8x)
pred_matte = self.f_branch(inputs['img'], lr8x, hr2x)
return pred_matte
class FusionBranch(nn.Layer):
def __init__(self, hr_channels: int, enc_channels: int):
super().__init__()
self.conv_lr4x = Conv2dIBNormRelu(
enc_channels[2], hr_channels, 5, stride=1, padding=2)
self.conv_f2x = Conv2dIBNormRelu(
2 * hr_channels, hr_channels, 3, stride=1, padding=1)
self.conv_f = nn.Sequential(
Conv2dIBNormRelu(
hr_channels + 3, int(hr_channels / 2), 3, stride=1, padding=1),
Conv2dIBNormRelu(
int(hr_channels / 2),
1,
1,
stride=1,
padding=0,
with_ibn=False,
with_relu=False))
def forward(self, img: paddle.Tensor, lr8x: paddle.Tensor, hr2x: paddle.Tensor) -> paddle.Tensor:
lr4x = F.interpolate(
lr8x, scale_factor=2, mode='bilinear', align_corners=False)
lr4x = self.conv_lr4x(lr4x)
lr2x = F.interpolate(
lr4x, scale_factor=2, mode='bilinear', align_corners=False)
f2x = self.conv_f2x(paddle.concat((lr2x, hr2x), axis=1))
f = F.interpolate(
f2x, scale_factor=2, mode='bilinear', align_corners=False)
f = self.conv_f(paddle.concat((f, img), axis=1))
pred_matte = F.sigmoid(f)
return pred_matte
class HRBranch(nn.Layer):
"""
High Resolution Branch of MODNet
"""
def __init__(self, hr_channels: int, enc_channels:int):
super().__init__()
self.tohr_enc2x = Conv2dIBNormRelu(
enc_channels[0], hr_channels, 1, stride=1, padding=0)
self.conv_enc2x = Conv2dIBNormRelu(
hr_channels + 3, hr_channels, 3, stride=2, padding=1)
self.tohr_enc4x = Conv2dIBNormRelu(
enc_channels[1], hr_channels, 1, stride=1, padding=0)
self.conv_enc4x = Conv2dIBNormRelu(
2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1)
self.conv_hr4x = nn.Sequential(
Conv2dIBNormRelu(
2 * hr_channels + enc_channels[2] + 3,
2 * hr_channels,
3,
stride=1,
padding=1),
Conv2dIBNormRelu(
2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(
2 * hr_channels, hr_channels, 3, stride=1, padding=1))
self.conv_hr2x = nn.Sequential(
Conv2dIBNormRelu(
2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(
2 * hr_channels, hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1))
self.conv_hr = nn.Sequential(
Conv2dIBNormRelu(
hr_channels + 3, hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(
hr_channels,
1,
1,
stride=1,
padding=0,
with_ibn=False,
with_relu=False))
def forward(self, img: paddle.Tensor, enc2x: paddle.Tensor, enc4x: paddle.Tensor, lr8x: paddle.Tensor) -> paddle.Tensor:
img2x = F.interpolate(
img, scale_factor=1 / 2, mode='bilinear', align_corners=False)
img4x = F.interpolate(
img, scale_factor=1 / 4, mode='bilinear', align_corners=False)
enc2x = self.tohr_enc2x(enc2x)
hr4x = self.conv_enc2x(paddle.concat((img2x, enc2x), axis=1))
enc4x = self.tohr_enc4x(enc4x)
hr4x = self.conv_enc4x(paddle.concat((hr4x, enc4x), axis=1))
lr4x = F.interpolate(
lr8x, scale_factor=2, mode='bilinear', align_corners=False)
hr4x = self.conv_hr4x(paddle.concat((hr4x, lr4x, img4x), axis=1))
hr2x = F.interpolate(
hr4x, scale_factor=2, mode='bilinear', align_corners=False)
hr2x = self.conv_hr2x(paddle.concat((hr2x, enc2x), axis=1))
pred_detail = None
return pred_detail, hr2x
class LRBranch(nn.Layer):
"""
Low Resolution Branch of MODNet
"""
def __init__(self, backbone_channels: int):
super().__init__()
self.se_block = SEBlock(backbone_channels[4], reduction=4)
self.conv_lr16x = Conv2dIBNormRelu(
backbone_channels[4], backbone_channels[3], 5, stride=1, padding=2)
self.conv_lr8x = Conv2dIBNormRelu(
backbone_channels[3], backbone_channels[2], 5, stride=1, padding=2)
self.conv_lr = Conv2dIBNormRelu(
backbone_channels[2],
1,
3,
stride=2,
padding=1,
with_ibn=False,
with_relu=False)
def forward(self, feat_list: list) -> List[paddle.Tensor]:
enc2x, enc4x, enc32x = feat_list[0], feat_list[1], feat_list[4]
enc32x = self.se_block(enc32x)
lr16x = F.interpolate(
enc32x, scale_factor=2, mode='bilinear', align_corners=False)
lr16x = self.conv_lr16x(lr16x)
lr8x = F.interpolate(
lr16x, scale_factor=2, mode='bilinear', align_corners=False)
lr8x = self.conv_lr8x(lr8x)
pred_semantic = None
if self.training:
lr = self.conv_lr(lr8x)
pred_semantic = F.sigmoid(lr)
return pred_semantic, lr8x, [enc2x, enc4x]
class IBNorm(nn.Layer):
"""
Combine Instance Norm and Batch Norm into One Layer
"""
def __init__(self, in_channels: int):
super().__init__()
self.bnorm_channels = in_channels // 2
self.inorm_channels = in_channels - self.bnorm_channels
self.bnorm = nn.BatchNorm2D(self.bnorm_channels)
self.inorm = nn.InstanceNorm2D(self.inorm_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
bn_x = self.bnorm(x[:, :self.bnorm_channels, :, :])
in_x = self.inorm(x[:, self.bnorm_channels:, :, :])
return paddle.concat((bn_x, in_x), 1)
class Conv2dIBNormRelu(nn.Layer):
"""
Convolution + IBNorm + Relu
"""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
stride: int = 1,
padding: int = 0,
dilation:int = 1,
groups: int = 1,
bias_attr: paddle.ParamAttr = None,
with_ibn: bool = True,
with_relu: bool = True):
super().__init__()
layers = [
nn.Conv2D(
in_channels,
out_channels,
kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
groups=groups,
bias_attr=bias_attr)
]
if with_ibn:
layers.append(IBNorm(out_channels))
if with_relu:
layers.append(nn.ReLU())
self.layers = nn.Sequential(*layers)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
return self.layers(x)
class SEBlock(nn.Layer):
"""
SE Block Proposed in https://arxiv.org/pdf/1709.01507.pdf
"""
def __init__(self, num_channels: int, reduction:int = 1):
super().__init__()
self.pool = nn.AdaptiveAvgPool2D(1)
self.conv = nn.Sequential(
nn.Conv2D(
num_channels,
int(num_channels // reduction),
1,
bias_attr=False), nn.ReLU(),
nn.Conv2D(
int(num_channels // reduction),
num_channels,
1,
bias_attr=False), nn.Sigmoid())
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
w = self.pool(x)
w = self.conv(w)
return w * x
class GaussianBlurLayer(nn.Layer):
""" Add Gaussian Blur to a 4D tensors
This layer takes a 4D tensor of {N, C, H, W} as input.
The Gaussian blur will be performed in given channel number (C) splitly.
"""
def __init__(self, channels: int, kernel_size: int):
"""
Args:
channels (int): Channel for input tensor
kernel_size (int): Size of the kernel used in blurring
"""
super(GaussianBlurLayer, self).__init__()
self.channels = channels
self.kernel_size = kernel_size
assert self.kernel_size % 2 != 0
self.op = nn.Sequential(
nn.Pad2D(int(self.kernel_size / 2), mode='reflect'),
nn.Conv2D(
channels,
channels,
self.kernel_size,
stride=1,
padding=0,
bias_attr=False,
groups=channels))
self._init_kernel()
self.op[1].weight.stop_gradient = True
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
"""
Args:
x (paddle.Tensor): input 4D tensor
Returns:
paddle.Tensor: Blurred version of the input
"""
if not len(list(x.shape)) == 4:
print('\'GaussianBlurLayer\' requires a 4D tensor as input\n')
exit()
elif not x.shape[1] == self.channels:
print('In \'GaussianBlurLayer\', the required channel ({0}) is'
'not the same as input ({1})\n'.format(
self.channels, x.shape[1]))
exit()
return self.op(x)
def _init_kernel(self):
sigma = 0.3 * ((self.kernel_size - 1) * 0.5 - 1) + 0.8
n = np.zeros((self.kernel_size, self.kernel_size))
i = int(self.kernel_size / 2)
n[i, i] = 1
kernel = scipy.ndimage.gaussian_filter(n, sigma)
kernel = kernel.astype('float32')
kernel = kernel[np.newaxis, np.newaxis, :, :]
paddle.assign(kernel, self.op[1].weight)
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import random
import base64
from typing import Callable, Union, List, Tuple
import cv2
import numpy as np
import paddle
import paddle.nn.functional as F
from paddleseg.transforms import functional
from PIL import Image
class Compose:
"""
Do transformation on input data with corresponding pre-processing and augmentation operations.
The shape of input data to all operations is [height, width, channels].
"""
def __init__(self, transforms: Callable, to_rgb: bool = True):
if not isinstance(transforms, list):
raise TypeError('The transforms must be a list!')
self.transforms = transforms
self.to_rgb = to_rgb
def __call__(self, data: dict) -> dict:
if 'trans_info' not in data:
data['trans_info'] = []
for op in self.transforms:
data = op(data)
if data is None:
return None
data['img'] = np.transpose(data['img'], (2, 0, 1))
for key in data.get('gt_fields', []):
if len(data[key].shape) == 2:
continue
data[key] = np.transpose(data[key], (2, 0, 1))
return data
class LoadImages:
"""
Read images from image path.
Args:
to_rgb (bool, optional): If converting image to RGB color space. Default: True.
"""
def __init__(self, to_rgb: bool = True):
self.to_rgb = to_rgb
def __call__(self, data: dict) -> dict:
if isinstance(data['img'], str):
data['img'] = cv2.imread(data['img'])
for key in data.get('gt_fields', []):
if isinstance(data[key], str):
data[key] = cv2.imread(data[key], cv2.IMREAD_UNCHANGED)
# if alpha and trimap has 3 channels, extract one.
if key in ['alpha', 'trimap']:
if len(data[key].shape) > 2:
data[key] = data[key][:, :, 0]
if self.to_rgb:
data['img'] = cv2.cvtColor(data['img'], cv2.COLOR_BGR2RGB)
for key in data.get('gt_fields', []):
if len(data[key].shape) == 2:
continue
data[key] = cv2.cvtColor(data[key], cv2.COLOR_BGR2RGB)
return data
class ResizeByShort:
"""
Resize the short side of an image to given size, and then scale the other side proportionally.
Args:
short_size (int): The target size of short side.
"""
def __init__(self, short_size: int =512):
self.short_size = short_size
def __call__(self, data: dict) -> dict:
data['trans_info'].append(('resize', data['img'].shape[0:2]))
data['img'] = functional.resize_short(data['img'], self.short_size)
for key in data.get('gt_fields', []):
data[key] = functional.resize_short(data[key], self.short_size)
return data
class ResizeToIntMult:
"""
Resize to some int muitple, d.g. 32.
"""
def __init__(self, mult_int: int = 32):
self.mult_int = mult_int
def __call__(self, data: dict) -> dict:
data['trans_info'].append(('resize', data['img'].shape[0:2]))
h, w = data['img'].shape[0:2]
rw = w - w % 32
rh = h - h % 32
data['img'] = functional.resize(data['img'], (rw, rh))
for key in data.get('gt_fields', []):
data[key] = functional.resize(data[key], (rw, rh))
return data
class Normalize:
"""
Normalize an image.
Args:
mean (list, optional): The mean value of a data set. Default: [0.5, 0.5, 0.5].
std (list, optional): The standard deviation of a data set. Default: [0.5, 0.5, 0.5].
Raises:
ValueError: When mean/std is not list or any value in std is 0.
"""
def __init__(self, mean: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5), std: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5)):
self.mean = mean
self.std = std
if not (isinstance(self.mean, (list, tuple))
and isinstance(self.std, (list, tuple))):
raise ValueError(
"{}: input type is invalid. It should be list or tuple".format(
self))
from functools import reduce
if reduce(lambda x, y: x * y, self.std) == 0:
raise ValueError('{}: std is invalid!'.format(self))
def __call__(self, data: dict) -> dict:
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
std = np.array(self.std)[np.newaxis, np.newaxis, :]
data['img'] = functional.normalize(data['img'], mean, std)
if 'fg' in data.get('gt_fields', []):
data['fg'] = functional.normalize(data['fg'], mean, std)
if 'bg' in data.get('gt_fields', []):
data['bg'] = functional.normalize(data['bg'], mean, std)
return data
def reverse_transform(alpha: paddle.Tensor, trans_info: List[str]):
"""recover pred to origin shape"""
for item in trans_info[::-1]:
if item[0] == 'resize':
h, w = item[1][0], item[1][1]
alpha = F.interpolate(alpha, [h, w], mode='bilinear')
elif item[0] == 'padding':
h, w = item[1][0], item[1][1]
alpha = alpha[:, :, 0:h, 0:w]
else:
raise Exception("Unexpected info '{}' in im_info".format(item[0]))
return alpha
def save_alpha_pred(alpha: np.ndarray, trimap: np.ndarray = None):
"""
The value of alpha is range [0, 1], shape should be [h,w]
"""
if isinstance(trimap, str):
trimap = cv2.imread(trimap, 0)
alpha[trimap == 0] = 0
alpha[trimap == 255] = 255
alpha = (alpha).astype('uint8')
return alpha
def cv2_to_base64(image: np.ndarray):
"""
Convert data from BGR to base64 format.
"""
data = cv2.imencode('.png', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str: str):
"""
Convert data from base64 to BGR format.
"""
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
\ No newline at end of file
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddleseg.models import layers
from paddleseg.utils import utils
__all__ = ["ResNet50_vd"]
class ConvBNLayer(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(
self,
in_channels: int,
out_channels: int,
kernel_size: int,
stride: int = 1,
dilation: int = 1,
groups: int = 1,
is_vd_mode: bool = False,
act: str = None,
):
super(ConvBNLayer, self).__init__()
self.is_vd_mode = is_vd_mode
self._pool2d_avg = nn.AvgPool2D(
kernel_size=2, stride=2, padding=0, ceil_mode=True)
self._conv = nn.Conv2D(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
dilation=dilation,
groups=groups,
bias_attr=False)
self._batch_norm = layers.SyncBatchNorm(out_channels)
self._act_op = layers.Activation(act=act)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
if self.is_vd_mode:
inputs = self._pool2d_avg(inputs)
y = self._conv(inputs)
y = self._batch_norm(y)
y = self._act_op(y)
return y
class BottleneckBlock(nn.Layer):
"""Residual bottleneck block"""
def __init__(self,
in_channels: int,
out_channels: int,
stride: int,
shortcut: bool = True,
if_first: bool = False,
dilation: int = 1):
super(BottleneckBlock, self).__init__()
self.conv0 = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1,
act='relu')
self.dilation = dilation
self.conv1 = ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
stride=stride,
act='relu',
dilation=dilation)
self.conv2 = ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels * 4,
kernel_size=1,
act=None)
if not shortcut:
self.short = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels * 4,
kernel_size=1,
stride=1,
is_vd_mode=False if if_first or stride == 1 else True)
self.shortcut = shortcut
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv0(inputs)
####################################################################
# If given dilation rate > 1, using corresponding padding.
# The performance drops down without the follow padding.
if self.dilation > 1:
padding = self.dilation
y = F.pad(y, [padding, padding, padding, padding])
#####################################################################
conv1 = self.conv1(y)
conv2 = self.conv2(conv1)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
y = paddle.add(x=short, y=conv2)
y = F.relu(y)
return y
class BasicBlock(nn.Layer):
"""Basic residual block"""
def __init__(self,
in_channels: int,
out_channels: int,
stride: int,
shortcut: bool = True,
if_first: bool = False):
super(BasicBlock, self).__init__()
self.stride = stride
self.conv0 = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=3,
stride=stride,
act='relu')
self.conv1 = ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
act=None)
if not shortcut:
self.short = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1,
stride=1,
is_vd_mode=False if if_first else True)
self.shortcut = shortcut
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv0(inputs)
conv1 = self.conv1(y)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
y = paddle.add(x=short, y=conv1)
y = F.relu(y)
return y
class ResNet_vd(nn.Layer):
"""
The ResNet_vd implementation based on PaddlePaddle.
The original article refers to Jingdong
Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
(https://arxiv.org/pdf/1812.01187.pdf).
"""
def __init__(self,
input_channels: int = 3,
layers: int = 50,
output_stride: int = 32,
multi_grid: tuple = (1, 1, 1),
pretrained: str = None):
super(ResNet_vd, self).__init__()
self.conv1_logit = None # for gscnn shape stream
self.layers = layers
supported_layers = [18, 34, 50, 101, 152, 200]
assert layers in supported_layers, \
"supported layers are {} but input layer is {}".format(
supported_layers, layers)
if layers == 18:
depth = [2, 2, 2, 2]
elif layers == 34 or layers == 50:
depth = [3, 4, 6, 3]
elif layers == 101:
depth = [3, 4, 23, 3]
elif layers == 152:
depth = [3, 8, 36, 3]
elif layers == 200:
depth = [3, 12, 48, 3]
num_channels = [64, 256, 512, 1024
] if layers >= 50 else [64, 64, 128, 256]
num_filters = [64, 128, 256, 512]
# for channels of four returned stages
self.feat_channels = [c * 4 for c in num_filters
] if layers >= 50 else num_filters
self.feat_channels = [64] + self.feat_channels
dilation_dict = None
if output_stride == 8:
dilation_dict = {2: 2, 3: 4}
elif output_stride == 16:
dilation_dict = {3: 2}
self.conv1_1 = ConvBNLayer(
in_channels=input_channels,
out_channels=32,
kernel_size=3,
stride=2,
act='relu')
self.conv1_2 = ConvBNLayer(
in_channels=32,
out_channels=32,
kernel_size=3,
stride=1,
act='relu')
self.conv1_3 = ConvBNLayer(
in_channels=32,
out_channels=64,
kernel_size=3,
stride=1,
act='relu')
self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
# self.block_list = []
self.stage_list = []
if layers >= 50:
for block in range(len(depth)):
shortcut = False
block_list = []
for i in range(depth[block]):
if layers in [101, 152] and block == 2:
if i == 0:
conv_name = "res" + str(block + 2) + "a"
else:
conv_name = "res" + str(block + 2) + "b" + str(i)
else:
conv_name = "res" + str(block + 2) + chr(97 + i)
###############################################################################
# Add dilation rate for some segmentation tasks, if dilation_dict is not None.
dilation_rate = dilation_dict[
block] if dilation_dict and block in dilation_dict else 1
# Actually block here is 'stage', and i is 'block' in 'stage'
# At the stage 4, expand the the dilation_rate if given multi_grid
if block == 3:
dilation_rate = dilation_rate * multi_grid[i]
###############################################################################
bottleneck_block = self.add_sublayer(
'bb_%d_%d' % (block, i),
BottleneckBlock(
in_channels=num_channels[block]
if i == 0 else num_filters[block] * 4,
out_channels=num_filters[block],
stride=2 if i == 0 and block != 0
and dilation_rate == 1 else 1,
shortcut=shortcut,
if_first=block == i == 0,
dilation=dilation_rate))
block_list.append(bottleneck_block)
shortcut = True
self.stage_list.append(block_list)
else:
for block in range(len(depth)):
shortcut = False
block_list = []
for i in range(depth[block]):
conv_name = "res" + str(block + 2) + chr(97 + i)
basic_block = self.add_sublayer(
'bb_%d_%d' % (block, i),
BasicBlock(
in_channels=num_channels[block]
if i == 0 else num_filters[block],
out_channels=num_filters[block],
stride=2 if i == 0 and block != 0 else 1,
shortcut=shortcut,
if_first=block == i == 0))
block_list.append(basic_block)
shortcut = True
self.stage_list.append(block_list)
self.pretrained = pretrained
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
feat_list = []
y = self.conv1_1(inputs)
y = self.conv1_2(y)
y = self.conv1_3(y)
feat_list.append(y)
y = self.pool2d_max(y)
# A feature list saves the output feature map of each stage.
for stage in self.stage_list:
for block in stage:
y = block(y)
feat_list.append(y)
return feat_list
def ResNet50_vd(**args):
model = ResNet_vd(layers=50, **args)
return model
# bisenet_lane_segmentation
|模型名称|bisenet_lane_segmentation|
| :--- | :---: |
|类别|图像-图像分割|
|网络|bisenet|
|数据集|TuSimple|
|是否支持Fine-tuning|否|
|模型大小|9.7MB|
|指标|ACC96.09%|
|最新更新日期|2021-12-03|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例(左为原图,右为效果图):
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/146115316-e9ed4220-8470-432f-b3f1-549d2bcdc845.jpg" />
<img src="https://user-images.githubusercontent.com/35907364/146115396-a7d19290-6117-4831-bc35-4b14ae8f90bc.png" />
</p>
- ### 模型介绍
- 车道线分割是自动驾驶算法的一个范畴,可以用来辅助进行车辆定位和进行决策,早期已有基于传统图像处理的车道线检测方法,但是随着技术的演进,车道线检测任务所应对的场景越来越多样化,目前更多的方式是寻求在语义上对车道线存在位置的检测。bisenet_lane_segmentation是一个轻量化车道线分割模型。
- 更多详情请参考:[bisenet_lane_segmentation](https://github.com/PaddlePaddle/PaddleSeg)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- Python >= 3.7+
- ### 2、安装
- ```shell
$ hub install bisenet_lane_segmentation
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run bisenet_lane_segmentation --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="bisenet_lane_segmentation")
result = model.predict(image_list=["/PATH/TO/IMAGE"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
visualization,
save_path):
```
- 车道线分割预测API,用于将输入图片中的车道线分割出来。
- 参数
- image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
- visualization (bool): 是否进行可视化,默认为False。
- save_path (str): 当visualization为True时,保存图片的路径,默认为"bisenet_lane_segmentation_output"。
- 返回
- result (list(numpy.ndarray)):模型分割结果:
## 四、服务部署
- PaddleHub Serving可以部署车道线分割在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m bisenet_lane_segmentation
```
- 这样就完成了一个车道线分割在线服务API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/bisenet_lane_segmentation"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
#print(r.json())
mask = base64_to_cv2(r.json()["results"]['data'][0])
print(mask)
```
## 五、更新历史
* 1.0.0
初始发布
# bisenet_lane_segmentation
|Module Name|bisenet_lane_segmentation|
| :--- | :---: |
|Category|Image Segmentation|
|Network|bisenet|
|Dataset|TuSimple|
|Support Fine-tuning|No|
|Module Size|9.7MB|
|Data Indicators|ACC96.09%|
|Latest update date|2021-12-03|
## I. Basic Information
- ### Application Effect Display
- Sample results:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/146115316-e9ed4220-8470-432f-b3f1-549d2bcdc845.jpg" />
<img src="https://user-images.githubusercontent.com/35907364/146115396-a7d19290-6117-4831-bc35-4b14ae8f90bc.png" />
</p>
- ### Module Introduction
- Lane segmentation is a category of automatic driving algorithms, which can be used to assist vehicle positioning and decision-making. In the early days, there were lane detection methods based on traditional image processing, but with the evolution of technology, the scenes that lane detection tasks deal with More and more diversified, and more methods are currently seeking to detect the location of lane semantically. bisenet_lane_segmentation is a lightweight model for lane segmentation.
- For more information, please refer to: [bisenet_lane_segmentation](https://github.com/PaddlePaddle/PaddleSeg)
## II. Installation
- ### 1、Environmental Dependence
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- Python >= 3.7+
- ### 2、Installation
- ```shell
$ hub install bisenet_lane_segmentation
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
| [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III. Module API Prediction
- ### 1、Command line Prediction
- ```shell
$ hub run bisenet_lane_segmentation --input_path "/PATH/TO/IMAGE"
```
- If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
- ### 2、Prediction Code Example
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="bisenet_lane_segmentation")
result = model.predict(image_list=["/PATH/TO/IMAGE"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
visualization,
save_path):
```
- Prediction API for lane segmentation.
- **Parameter**
- image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\],BGR.
- visualization (bool): Whether to save the recognition results as picture files, default is False.
- save_path (str): Save path of images, "bisenet_lane_segmentation_output" by default.
- **Return**
- result (list(numpy.ndarray)):The list of model results.
## IV. Server Deployment
- PaddleHub Serving can deploy an online service of lane segmentation.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m bisenet_lane_segmentation
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/bisenet_lane_segmentation"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
#print(r.json())
mask = base64_to_cv2(r.json()["results"]['data'][0])
print(mask)
```
## V. Release Note
- 1.0.0
First release
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# this code is based on
# https://github.com/ZJULearning/resa/blob/main/datasets/tusimple.py
import cv2
import numpy as np
class LaneProcessor:
def __init__(self,
num_classes=2,
ori_shape=(720, 1280),
cut_height=0,
y_pixel_gap=10,
points_nums=56,
thresh=0.6,
smooth=True):
super(LaneProcessor, self).__init__()
self.num_classes = num_classes
self.ori_shape = ori_shape
self.cut_height = cut_height
self.y_pixel_gap = y_pixel_gap
self.points_nums = points_nums
self.thresh = thresh
self.smooth = smooth
def get_lane_coords(self, seg_pred):
lane_coords_list = []
for batch in range(len(seg_pred)):
seg = seg_pred[batch]
lane_coords = self.heatmap2coords(seg)
for i in range(len(lane_coords)):
lane_coords[i] = sorted(
lane_coords[i], key=lambda pair: pair[1])
lane_coords_list.append(lane_coords)
return lane_coords_list
def process_gap(self, coordinate):
if any(x > 0 for x in coordinate):
start = [i for i, x in enumerate(coordinate) if x > 0][0]
end = [
i for i, x in reversed(list(enumerate(coordinate))) if x > 0
][0]
lane = coordinate[start:end + 1]
# The line segment is not continuous
if any(x < 0 for x in lane):
gap_start = [
i for i, x in enumerate(lane[:-1])
if x > 0 and lane[i + 1] < 0
]
gap_end = [
i + 1 for i, x in enumerate(lane[:-1])
if x < 0 and lane[i + 1] > 0
]
gap_id = [i for i, x in enumerate(lane) if x < 0]
if len(gap_start) == 0 or len(gap_end) == 0:
return coordinate
for id in gap_id:
for i in range(len(gap_start)):
if i >= len(gap_end):
return coordinate
if id > gap_start[i] and id < gap_end[i]:
gap_width = float(gap_end[i] - gap_start[i])
# line interpolation
lane[id] = int((id - gap_start[i]) / gap_width *
lane[gap_end[i]] +
(gap_end[i] - id) / gap_width *
lane[gap_start[i]])
if not all(x > 0 for x in lane):
print("Gaps still exist!")
coordinate[start:end + 1] = lane
return coordinate
def get_coords(self, heat_map):
dst_height = self.ori_shape[0] - self.cut_height
coords = np.zeros(self.points_nums)
coords[:] = -2
pointCount = 0
for i in range(self.points_nums):
y_coord = dst_height - 10 - i * self.y_pixel_gap
y = int(y_coord / dst_height * heat_map.shape[0])
if y < 0:
break
prob_line = heat_map[y, :]
x = np.argmax(prob_line)
prob = prob_line[x]
if prob > self.thresh:
coords[i] = int(x / heat_map.shape[1] * self.ori_shape[1])
pointCount = pointCount + 1
if pointCount < 2:
coords[:] = -2
self.process_gap(coords)
return coords
def fix_outliers(self, coords):
data = [x for i, x in enumerate(coords) if x > 0]
index = [i for i, x in enumerate(coords) if x > 0]
if len(data) == 0:
return coords
diff = []
is_outlier = False
n = 1
x_gap = abs((data[-1] - data[0]) / (1.0 * (len(data) - 1)))
for idx, dt in enumerate(data):
if is_outlier == False:
t = idx - 1
n = 1
if idx == 0:
diff.append(0)
else:
diff.append(abs(data[idx] - data[t]))
if abs(data[idx] - data[t]) > n * (x_gap * 1.5):
n = n + 1
is_outlier = True
ind = index[idx]
coords[ind] = -1
else:
is_outlier = False
def heatmap2coords(self, seg_pred):
coordinates = []
for i in range(self.num_classes - 1):
heat_map = seg_pred[i + 1]
if self.smooth:
heat_map = cv2.blur(
heat_map, (9, 9), borderType=cv2.BORDER_REPLICATE)
coords = self.get_coords(heat_map)
indexes = [i for i, x in enumerate(coords) if x > 0]
if not indexes:
continue
self.add_coords(coordinates, coords)
if len(coordinates) == 0:
coords = np.zeros(self.points_nums)
self.add_coords(coordinates, coords)
return coordinates
def add_coords(self, coordinates, coords):
sub_lanes = []
for j in range(self.points_nums):
y_lane = self.ori_shape[0] - 10 - j * self.y_pixel_gap
x_lane = coords[j] if coords[j] > 0 else -2
sub_lanes.append([x_lane, y_lane])
coordinates.append(sub_lanes)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# this code is from https://github.com/TuSimple/tusimple-benchmark/blob/master/evaluate/lane.py
import json as json
import numpy as np
from sklearn.linear_model import LinearRegression
class LaneEval(object):
lr = LinearRegression()
pixel_thresh = 20
pt_thresh = 0.85
@staticmethod
def get_angle(xs, y_samples):
xs, ys = xs[xs >= 0], y_samples[xs >= 0]
if len(xs) > 1:
LaneEval.lr.fit(ys[:, None], xs)
k = LaneEval.lr.coef_[0]
theta = np.arctan(k)
else:
theta = 0
return theta
@staticmethod
def line_accuracy(pred, gt, thresh):
pred = np.array([p if p >= 0 else -100 for p in pred])
gt = np.array([g if g >= 0 else -100 for g in gt])
return np.sum(np.where(np.abs(pred - gt) < thresh, 1., 0.)) / len(gt)
@staticmethod
def bench(pred, gt, y_samples, running_time):
if any(len(p) != len(y_samples) for p in pred):
raise Exception('Format of lanes error.')
if running_time > 200 or len(gt) + 2 < len(pred):
return 0., 0., 1.
angles = [
LaneEval.get_angle(np.array(x_gts), np.array(y_samples))
for x_gts in gt
]
threshs = [LaneEval.pixel_thresh / np.cos(angle) for angle in angles]
line_accs = []
fp, fn = 0., 0.
matched = 0.
for x_gts, thresh in zip(gt, threshs):
accs = [
LaneEval.line_accuracy(
np.array(x_preds), np.array(x_gts), thresh)
for x_preds in pred
]
max_acc = np.max(accs) if len(accs) > 0 else 0.
if max_acc < LaneEval.pt_thresh:
fn += 1
else:
matched += 1
line_accs.append(max_acc)
fp = len(pred) - matched
if len(gt) > 4 and fn > 0:
fn -= 1
s = sum(line_accs)
if len(gt) > 4:
s -= min(line_accs)
return s / max(min(4.0, len(gt)),
1.), fp / len(pred) if len(pred) > 0 else 0., fn / max(
min(len(gt), 4.), 1.)
@staticmethod
def bench_one_submit(pred_file, gt_file):
try:
json_pred = [
json.loads(line) for line in open(pred_file).readlines()
]
except BaseException as e:
raise Exception('Fail to load json file of the prediction.')
json_gt = [json.loads(line) for line in open(gt_file).readlines()]
if len(json_gt) != len(json_pred):
raise Exception(
'We do not get the predictions of all the test tasks')
gts = {l['raw_file']: l for l in json_gt}
accuracy, fp, fn = 0., 0., 0.
for pred in json_pred:
if 'raw_file' not in pred or 'lanes' not in pred or 'run_time' not in pred:
raise Exception(
'raw_file or lanes or run_time not in some predictions.')
raw_file = pred['raw_file']
pred_lanes = pred['lanes']
run_time = pred['run_time']
if raw_file not in gts:
raise Exception(
'Some raw_file from your predictions do not exist in the test tasks.'
)
gt = gts[raw_file]
gt_lanes = gt['lanes']
y_samples = gt['h_samples']
try:
a, p, n = LaneEval.bench(pred_lanes, gt_lanes, y_samples,
run_time)
except BaseException as e:
raise Exception('Format of lanes error.')
accuracy += a
fp += p
fn += n
num = len(gts)
# the first return parameter is the default ranking parameter
return json.dumps([{
'name': 'Accuracy',
'value': accuracy / num,
'order': 'desc'
}, {
'name': 'FP',
'value': fp / num,
'order': 'asc'
}, {
'name': 'FN',
'value': fn / num,
'order': 'asc'
}]), accuracy / num, fp / num, fn / num
if __name__ == '__main__':
import sys
try:
if len(sys.argv) != 3:
raise Exception('Invalid input arguments')
print(LaneEval.bench_one_submit(sys.argv[1], sys.argv[2]))
except Exception as e:
print(e.message)
sys.exit(e.message)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import cv2
import json
import paddle.nn as nn
from .lane import LaneEval
from .get_lane_coords import LaneProcessor
def mkdir(path):
sub_dir = os.path.dirname(path)
if not os.path.exists(sub_dir):
os.makedirs(sub_dir)
class TusimpleProcessor:
def __init__(self,
num_classes=2,
ori_shape=(720, 1280),
cut_height=0,
thresh=0.6,
test_gt_json=None,
save_dir='output/'):
super(TusimpleProcessor, self).__init__()
self.num_classes = num_classes
self.dump_to_json = []
self.save_dir = save_dir
self.test_gt_json = test_gt_json
self.color_map = [(255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0),
(255, 0, 255), (0, 255, 125), (50, 100, 50),
(100, 50, 100)]
self.laneProcessor = LaneProcessor(
num_classes=self.num_classes,
ori_shape=ori_shape,
cut_height=cut_height,
y_pixel_gap=10,
points_nums=56,
thresh=thresh,
smooth=True)
def dump_data_to_json(self,
output,
im_path,
run_time=0,
is_dump_json=True,
is_view=False):
seg_pred = output[0]
seg_pred = nn.functional.softmax(seg_pred, axis=1)
seg_pred = seg_pred.numpy()
lane_coords_list = self.laneProcessor.get_lane_coords(seg_pred)
for batch in range(len(seg_pred)):
lane_coords = lane_coords_list[batch]
path_list = im_path[batch].split("/")
if is_dump_json:
json_pred = {}
json_pred['lanes'] = []
json_pred['run_time'] = run_time * 1000
json_pred['h_sample'] = []
json_pred['raw_file'] = os.path.join(*path_list[-4:])
for l in lane_coords:
if len(l) == 0:
continue
json_pred['lanes'].append([])
for (x, y) in l:
json_pred['lanes'][-1].append(int(x))
for (x, y) in lane_coords[0]:
json_pred['h_sample'].append(y)
self.dump_to_json.append(json.dumps(json_pred))
if is_view:
img = cv2.imread(im_path[batch])
if is_dump_json:
img_name = '_'.join([x for x in path_list[-4:]])
sub_dir = 'visual_eval'
else:
img_name = os.path.basename(im_path[batch])
sub_dir = 'visual_points'
saved_path = os.path.join(self.save_dir, sub_dir, img_name)
self.draw(img, lane_coords, saved_path)
def predict(self, output, im_path):
self.dump_data_to_json(
output, [im_path], is_dump_json=False, is_view=True)
def bench_one_submit(self):
output_file = os.path.join(self.save_dir, 'pred.json')
if output_file is not None:
mkdir(output_file)
with open(output_file, "w+") as f:
for line in self.dump_to_json:
print(line, end="\n", file=f)
eval_rst, acc, fp, fn = LaneEval.bench_one_submit(
output_file, self.test_gt_json)
self.dump_to_json = []
return acc, fp, fn, eval_rst
def draw(self, img, coords, file_path=None):
for i, coord in enumerate(coords):
for x, y in coord:
if x <= 0 or y <= 0:
continue
cv2.circle(img, (int(x), int(y)), 4,
self.color_map[i % self.num_classes], 2)
if file_path is not None:
mkdir(file_path)
cv2.imwrite(file_path, img)
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
import argparse
import os
from typing import Union, List, Tuple
import cv2
import paddle
from paddle import nn
import paddle.nn.functional as F
import numpy as np
from paddlehub.module.module import moduleinfo, runnable, serving
import paddleseg.transforms as T
from paddleseg.utils import logger, progbar, visualize
from paddlehub.module.cv_module import ImageSegmentationModule
import paddleseg.utils as utils
from paddleseg.models import layers
from paddleseg.models import BiSeNetV2
from bisenet_lane_segmentation.processor import Crop, reverse_transform, cv2_to_base64, base64_to_cv2
from bisenet_lane_segmentation.lane_processor.tusimple_processor import TusimpleProcessor
@moduleinfo(
name="bisenet_lane_segmentation",
type="CV/semantic_segmentation",
author="paddlepaddle",
author_email="",
summary="BiSeNetLane is a lane segmentation model.",
version="1.0.0")
class BiSeNetLane(nn.Layer):
"""
The BiSeNetLane use BiseNet V2 to process lane segmentation .
Args:
num_classes (int): The unique number of target classes.
lambd (float, optional): A factor for controlling the size of semantic branch channels. Default: 0.25.
pretrained (str, optional): The path or url of pretrained model. Default: None.
"""
def __init__(self,
num_classes: int = 7,
lambd: float = 0.25,
align_corners: bool = False,
pretrained: str = None):
super(BiSeNetLane, self).__init__()
self.net = BiSeNetV2(
num_classes=num_classes,
lambd=lambd,
align_corners=align_corners,
pretrained=None)
self.transforms = [Crop(up_h_off=160), T.Resize([640, 368]), T.Normalize()]
self.cut_height = 160
self.postprocessor = TusimpleProcessor(num_classes=7, cut_height=160,)
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'model.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
logit_list = self.net(x)
return logit_list
def predict(self, image_list: list, visualization: bool = False, save_path: str = "bisenet_lane_segmentation_output") -> List[np.ndarray]:
self.eval()
result = []
with paddle.no_grad():
for i, im in enumerate(image_list):
if isinstance(im, str):
im = cv2.imread(im)
ori_shape = im.shape[:2]
for op in self.transforms:
outputs = op(im)
im = outputs[0]
im = np.transpose(im, (2, 0, 1))
im = im[np.newaxis, ...]
im = paddle.to_tensor(im)
logit = self.forward(im)[0]
pred = reverse_transform(logit, ori_shape, self.transforms, mode='bilinear')
pred = paddle.argmax(pred, axis=1, keepdim=True, dtype='int32')
pred = paddle.squeeze(pred[0])
pred = pred.numpy().astype('uint8')
if visualization:
color_map = visualize.get_color_map_list(256)
pred_mask = visualize.get_pseudo_color_map(pred, color_map)
if not os.path.exists(save_path):
os.makedirs(save_path)
img_name = str(time.time()) + '.png'
image_save_path = os.path.join(save_path, img_name)
pred_mask.save(image_save_path)
result.append(pred)
return result
@serving
def serving_method(self, images: str, **kwargs) -> dict:
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
outputs = self.predict(image_list=images_decode, **kwargs)
serving_data = [cv2_to_base64(outputs[i]) for i in range(len(outputs))]
results = {'data': serving_data}
return results
@runnable
def run_cmd(self, argvs: list) -> List[np.ndarray]:
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
args = self.parser.parse_args(argvs)
results = self.predict(image_list=[args.input_path], save_path=args.output_dir, visualization=args.visualization)
return results
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument(
'--output_dir', type=str, default="bisenet_lane_segmentation_output", help="The directory to save output images.")
self.arg_config_group.add_argument(
'--visualization', type=bool, default=True, help="whether to save output as images.")
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
\ No newline at end of file
import base64
import collections.abc
from itertools import combinations
from typing import Union, List, Tuple, Callable
import numpy as np
import cv2
import paddle
import paddle.nn.functional as F
def get_reverse_list(ori_shape: list, transforms: Callable) -> list:
"""
get reverse list of transform.
Args:
ori_shape (list): Origin shape of image.
transforms (list): List of transform.
Returns:
list: List of tuple, there are two format:
('resize', (h, w)) The image shape before resize,
('padding', (h, w)) The image shape before padding.
"""
reverse_list = []
h, w = ori_shape[0], ori_shape[1]
for op in transforms:
if op.__class__.__name__ in ['Resize']:
reverse_list.append(('resize', (h, w)))
h, w = op.target_size[0], op.target_size[1]
if op.__class__.__name__ in ['Crop']:
reverse_list.append(('crop', (op.up_h_off, op.down_h_off),
(op.left_w_off, op.right_w_off)))
h = h - op.up_h_off
h = h - op.down_h_off
w = w - op.left_w_off
w = w - op.right_w_off
if op.__class__.__name__ in ['ResizeByLong']:
reverse_list.append(('resize', (h, w)))
long_edge = max(h, w)
short_edge = min(h, w)
short_edge = int(round(short_edge * op.long_size / long_edge))
long_edge = op.long_size
if h > w:
h = long_edge
w = short_edge
else:
w = long_edge
h = short_edge
if op.__class__.__name__ in ['ResizeByShort']:
reverse_list.append(('resize', (h, w)))
long_edge = max(h, w)
short_edge = min(h, w)
long_edge = int(round(long_edge * op.short_size / short_edge))
short_edge = op.short_size
if h > w:
h = long_edge
w = short_edge
else:
w = long_edge
h = short_edge
if op.__class__.__name__ in ['Padding']:
reverse_list.append(('padding', (h, w)))
w, h = op.target_size[0], op.target_size[1]
if op.__class__.__name__ in ['PaddingByAspectRatio']:
reverse_list.append(('padding', (h, w)))
ratio = w / h
if ratio == op.aspect_ratio:
pass
elif ratio > op.aspect_ratio:
h = int(w / op.aspect_ratio)
else:
w = int(h * op.aspect_ratio)
if op.__class__.__name__ in ['LimitLong']:
long_edge = max(h, w)
short_edge = min(h, w)
if ((op.max_long is not None) and (long_edge > op.max_long)):
reverse_list.append(('resize', (h, w)))
long_edge = op.max_long
short_edge = int(round(short_edge * op.max_long / long_edge))
elif ((op.min_long is not None) and (long_edge < op.min_long)):
reverse_list.append(('resize', (h, w)))
long_edge = op.min_long
short_edge = int(round(short_edge * op.min_long / long_edge))
if h > w:
h = long_edge
w = short_edge
else:
w = long_edge
h = short_edge
return reverse_list
def reverse_transform(pred: paddle.Tensor, ori_shape: list, transforms: Callable, mode: str = 'nearest') -> paddle.Tensor:
"""recover pred to origin shape"""
reverse_list = get_reverse_list(ori_shape, transforms)
for item in reverse_list[::-1]:
if item[0] == 'resize':
h, w = item[1][0], item[1][1]
# if paddle.get_device() == 'cpu':
# pred = paddle.cast(pred, 'uint8')
# pred = F.interpolate(pred, (h, w), mode=mode)
# pred = paddle.cast(pred, 'int32')
# else:
pred = F.interpolate(pred, (h, w), mode=mode)
elif item[0] == 'crop':
up_h_off, down_h_off = item[1][0], item[1][1]
left_w_off, right_w_off = item[2][0], item[2][1]
pred = F.pad(
pred, [left_w_off, right_w_off, up_h_off, down_h_off],
value=0,
mode='constant',
data_format="NCHW")
elif item[0] == 'padding':
h, w = item[1][0], item[1][1]
pred = pred[:, :, 0:h, 0:w]
else:
raise Exception("Unexpected info '{}' in im_info".format(item[0]))
return pred
class Crop:
"""
crop an image from four forwards.
Args:
up_h_off (int, optional): The cut height for image from up to down. Default: 0.
down_h_off (int, optional): The cut height for image from down to up . Default: 0.
left_w_off (int, optional): The cut height for image from left to right. Default: 0.
right_w_off (int, optional): The cut width for image from right to left. Default: 0.
"""
def __init__(self, up_h_off: int = 0, down_h_off: int = 0, left_w_off: int = 0, right_w_off: int = 0):
self.up_h_off = up_h_off
self.down_h_off = down_h_off
self.left_w_off = left_w_off
self.right_w_off = right_w_off
def __call__(self, im: np.ndarray, label: np.ndarray = None) -> Tuple[np.ndarray]:
if self.up_h_off < 0 or self.down_h_off < 0 or self.left_w_off < 0 or self.right_w_off < 0:
raise Exception(
"up_h_off, down_h_off, left_w_off, right_w_off must equal or greater zero"
)
if self.up_h_off > 0 and self.up_h_off < im.shape[0]:
im = im[self.up_h_off:, :, :]
if label is not None:
label = label[self.up_h_off:, :]
if self.down_h_off > 0 and self.down_h_off < im.shape[0]:
im = im[:-self.down_h_off, :, :]
if label is not None:
label = label[:-self.down_h_off, :]
if self.left_w_off > 0 and self.left_w_off < im.shape[1]:
im = im[:, self.left_w_off:, :]
if label is not None:
label = label[:, self.left_w_off:]
if self.right_w_off > 0 and self.right_w_off < im.shape[1]:
im = im[:, :-self.right_w_off, :]
if label is not None:
label = label[:, :-self.right_w_off]
if label is None:
return (im, )
else:
return (im, label)
def cv2_to_base64(image: np.ndarray) -> str:
"""
Convert data from BGR to base64 format.
"""
data = cv2.imencode('.png', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str: str) -> np.ndarray:
"""
Convert data from base64 to BGR format.
"""
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# ginet_resnet101vd_ade20k
|模型名称|ginet_resnet101vd_ade20k|
| :--- | :---: |
|类别|图像-图像分割|
|网络|ginet_resnet101vd|
|数据集|ADE20K|
|是否支持Fine-tuning|是|
|模型大小|287MB|
|指标|-|
|最新更新日期|2021-12-14|
## 一、模型基本信息
- 样例结果示例:
- Sample results:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/145947107-b6f87161-d824-4c21-b01d-594ad03e56de.jpg" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/145960027-343246bf-ce8b-456b-85c6-042c1f4477bd.png" hspace='10'/>
</p>
- ### 模型介绍
- 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
- 更多详情请参考:[ginet](https://arxiv.org/pdf/2009.06160)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.0.0
- paddlehub >= 2.0.0
- ### 2、安装
- ```shell
$ hub install ginet_resnet101vd_ade20k
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1.预测代码示例
- ```python
import cv2
import paddle
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet101vd_ade20k')
img = cv2.imread("/PATH/TO/IMAGE")
result = model.predict(images=[img], visualization=True)
```
- ### 2.如何开始Fine-tune
- 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ginet_resnet101vd_ade20k模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
- 代码步骤
- Step1: 定义数据预处理方式
- ```python
from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
transform = Compose([Resize(target_size=(512, 512)), Normalize()])
```
- `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
- Step2: 下载数据集并使用
- ```python
from paddlehub.datasets import OpticDiscSeg
train_reader = OpticDiscSeg(transform, mode='train')
```
- `transforms`: 数据预处理方式。
- `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
- 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
- Step3: 加载预训练模型
- ```python
import paddlehub as hub
model = hub.Module(name='ginet_resnet101vd_ade20k', num_classes=2, pretrained=None)
```
- `name`: 选择预训练模型的名字。
- `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
- Step4: 选择优化策略和运行配置
- ```python
import paddle
from paddlehub.finetune.trainer import Trainer
scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
```
- 模型预测
- 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet101vd_ade20k', pretrained='/PATH/TO/CHECKPOINT')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
- 参数配置正确后,请执行脚本`python predict.py`。
- **Args**
* `images`:原始图像路径或BGR格式图片;
* `visualization`: 是否可视化,默认为True;
* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
## 四、服务部署
- PaddleHub Serving可以部署一个在线图像分割服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m ginet_resnet101vd_ade20k
```
- 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/ginet_resnet101vd_ade20k"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
mask = base64_to_cv2(r.json()["results"][0])
```
## 五、更新历史
* 1.0.0
初始发布
\ No newline at end of file
# ginet_resnet101vd_ade20k
|Module Name|ginet_resnet101vd_ade20k|
| :--- | :---: |
|Category|Image Segmentation|
|Network|ginet_resnet101vd|
|Dataset|ADE20K|
|Fine-tuning supported or not|Yes|
|Module Size|287MB|
|Data indicators|-|
|Latest update date|2021-12-14|
## I. Basic Information
- ### Application Effect Display
- Sample results:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/145947107-b6f87161-d824-4c21-b01d-594ad03e56de.jpg" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/145960027-343246bf-ce8b-456b-85c6-042c1f4477bd.png" hspace='10'/>
</p>
- ### Module Introduction
- We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
- For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
## II. Installation
- ### 1、Environmental Dependence
- paddlepaddle >= 2.0.0
- paddlehub >= 2.0.0
- ### 2、Installation
- ```shell
$ hub install ginet_resnet101vd_ade20k
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
| [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III. Module API Prediction
- ### 1、Prediction Code Example
- ```python
import cv2
import paddle
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet101vd_ade20k')
img = cv2.imread("/PATH/TO/IMAGE")
result = model.predict(images=[img], visualization=True)
```
- ### 2.Fine-tune and Encapsulation
- After completing the installation of PaddlePaddle and PaddleHub, you can start using the ginet_resnet101vd_ade20k model to fine-tune datasets such as OpticDiscSeg.
- Steps:
- Step1: Define the data preprocessing method
- ```python
from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
transform = Compose([Resize(target_size=(512, 512)), Normalize()])
```
- `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
- Step2: Download the dataset
- ```python
from paddlehub.datasets import OpticDiscSeg
train_reader = OpticDiscSeg(transform, mode='train')
```
* `transforms`: data preprocessing methods.
* `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
* Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
- Step3: Load the pre-trained model
- ```python
import paddlehub as hub
model = hub.Module(name='ginet_resnet101vd_ade20k', num_classes=2, pretrained=None)
```
- `name`: model name.
- `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
- Step4: Optimization strategy
- ```python
import paddle
from paddlehub.finetune.trainer import Trainer
scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
```
- Model prediction
- When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet101vd_ade20k', pretrained='/PATH/TO/CHECKPOINT')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
- **Args**
* `images`: Image path or ndarray data with format [H, W, C], BGR.
* `visualization`: Whether to save the recognition results as picture files.
* `save_path`: Save path of the result, default is 'seg_result'.
## IV. Server Deployment
- PaddleHub Serving can deploy an online service of image segmentation.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m ginet_resnet101vd_ade20k
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result:
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/ginet_resnet101vd_ade20k"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
mask = base64_to_cv2(r.json()["results"][0])
```
## V. Release Note
- 1.0.0
First release
\ No newline at end of file
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn.layer import activation
from paddle.nn import Conv2D, AvgPool2D
def SyncBatchNorm(*args, **kwargs):
"""In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
if paddle.get_device() == 'cpu':
return nn.BatchNorm2D(*args, **kwargs)
else:
return nn.SyncBatchNorm(*args, **kwargs)
class ConvBNLayer(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(
self,
in_channels: int,
out_channels: int,
kernel_size: int,
stride: int = 1,
dilation: int = 1,
groups: int = 1,
is_vd_mode: bool = False,
act: str = None,
name: str = None):
super(ConvBNLayer, self).__init__()
self.is_vd_mode = is_vd_mode
self._pool2d_avg = AvgPool2D(
kernel_size=2, stride=2, padding=0, ceil_mode=True)
self._conv = Conv2D(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
dilation=dilation,
groups=groups,
bias_attr=False)
self._batch_norm = SyncBatchNorm(out_channels)
self._act_op = Activation(act=act)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
if self.is_vd_mode:
inputs = self._pool2d_avg(inputs)
y = self._conv(inputs)
y = self._batch_norm(y)
y = self._act_op(y)
return y
class BottleneckBlock(nn.Layer):
"""Residual bottleneck block"""
def __init__(self,
in_channels: int,
out_channels: int,
stride: int,
shortcut: bool = True,
if_first: bool = False,
dilation: int = 1,
name: str = None):
super(BottleneckBlock, self).__init__()
self.conv0 = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1,
act='relu',
name=name + "_branch2a")
self.dilation = dilation
self.conv1 = ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
stride=stride,
act='relu',
dilation=dilation,
name=name + "_branch2b")
self.conv2 = ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels * 4,
kernel_size=1,
act=None,
name=name + "_branch2c")
if not shortcut:
self.short = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels * 4,
kernel_size=1,
stride=1,
is_vd_mode=False if if_first or stride == 1 else True,
name=name + "_branch1")
self.shortcut = shortcut
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv0(inputs)
if self.dilation > 1:
padding = self.dilation
y = F.pad(y, [padding, padding, padding, padding])
conv1 = self.conv1(y)
conv2 = self.conv2(conv1)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
y = paddle.add(x=short, y=conv2)
y = F.relu(y)
return y
class SeparableConvBNReLU(nn.Layer):
"""Depthwise Separable Convolution."""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
padding: str = 'same',
**kwargs: dict):
super(SeparableConvBNReLU, self).__init__()
self.depthwise_conv = ConvBN(
in_channels,
out_channels=in_channels,
kernel_size=kernel_size,
padding=padding,
groups=in_channels,
**kwargs)
self.piontwise_conv = ConvBNReLU(
in_channels, out_channels, kernel_size=1, groups=1)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.depthwise_conv(x)
x = self.piontwise_conv(x)
return x
class ConvBN(nn.Layer):
"""Basic conv bn layer"""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
padding: str = 'same',
**kwargs: dict):
super(ConvBN, self).__init__()
self._conv = Conv2D(
in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
return x
class ConvBNReLU(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
padding: str = 'same',
**kwargs: dict):
super(ConvBNReLU, self).__init__()
self._conv = Conv2D(
in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
x = F.relu(x)
return x
class Activation(nn.Layer):
"""
The wrapper of activations.
Args:
act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
'hsigmoid']. Default: None, means identical transformation.
Returns:
A callable object of Activation.
Raises:
KeyError: When parameter `act` is not in the optional range.
Examples:
from paddleseg.models.common.activation import Activation
relu = Activation("relu")
print(relu)
# <class 'paddle.nn.layer.activation.ReLU'>
sigmoid = Activation("sigmoid")
print(sigmoid)
# <class 'paddle.nn.layer.activation.Sigmoid'>
not_exit_one = Activation("not_exit_one")
# KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
# 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
# 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
"""
def __init__(self, act: str = None):
super(Activation, self).__init__()
self._act = act
upper_act_names = activation.__dict__.keys()
lower_act_names = [act.lower() for act in upper_act_names]
act_dict = dict(zip(lower_act_names, upper_act_names))
if act is not None:
if act in act_dict.keys():
act_name = act_dict[act]
self.act_func = eval("activation.{}()".format(act_name))
else:
raise KeyError("{} does not exist in the current {}".format(
act, act_dict.keys()))
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
if self._act is not None:
return self.act_func(x)
else:
return x
class ASPPModule(nn.Layer):
"""
Atrous Spatial Pyramid Pooling.
Args:
aspp_ratios (tuple): The dilation rate using in ASSP module.
in_channels (int): The number of input channels.
out_channels (int): The number of output channels.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
image_pooling (bool, optional): If augmented with image-level features. Default: False
"""
def __init__(self,
aspp_ratios: tuple,
in_channels: int,
out_channels: int,
align_corners: bool,
use_sep_conv: bool= False,
image_pooling: bool = False):
super().__init__()
self.align_corners = align_corners
self.aspp_blocks = nn.LayerList()
for ratio in aspp_ratios:
if use_sep_conv and ratio > 1:
conv_func = SeparableConvBNReLU
else:
conv_func = ConvBNReLU
block = conv_func(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1 if ratio == 1 else 3,
dilation=ratio,
padding=0 if ratio == 1 else ratio)
self.aspp_blocks.append(block)
out_size = len(self.aspp_blocks)
if image_pooling:
self.global_avg_pool = nn.Sequential(
nn.AdaptiveAvgPool2D(output_size=(1, 1)),
ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
out_size += 1
self.image_pooling = image_pooling
self.conv_bn_relu = ConvBNReLU(
in_channels=out_channels * out_size,
out_channels=out_channels,
kernel_size=1)
self.dropout = nn.Dropout(p=0.1) # drop rate
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
outputs = []
for block in self.aspp_blocks:
y = block(x)
y = F.interpolate(
y,
x.shape[2:],
mode='bilinear',
align_corners=self.align_corners)
outputs.append(y)
if self.image_pooling:
img_avg = self.global_avg_pool(x)
img_avg = F.interpolate(
img_avg,
x.shape[2:],
mode='bilinear',
align_corners=self.align_corners)
outputs.append(img_avg)
x = paddle.concat(outputs, axis=1)
x = self.conv_bn_relu(x)
x = self.dropout(x)
return x
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from typing import Union, List, Tuple
import paddle
from paddle import nn
import paddle.nn.functional as F
import numpy as np
from paddlehub.module.module import moduleinfo
import paddlehub.vision.segmentation_transforms as T
from paddlehub.module.cv_module import ImageSegmentationModule
from paddleseg.utils import utils
from paddleseg.models import layers
from ginet_resnet101vd_ade20k.resnet import ResNet101_vd
@moduleinfo(
name="ginet_resnet101vd_ade20k",
type="CV/semantic_segmentation",
author="paddlepaddle",
author_email="",
summary="GINetResnet101 is a segmentation model.",
version="1.0.0",
meta=ImageSegmentationModule)
class GINetResNet101(nn.Layer):
"""
The GINetResNet101 implementation based on PaddlePaddle.
The original article refers to
Wu, Tianyi, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, and Guodong Guo. "GINet: Graph interaction network for scene parsing." In European Conference on Computer Vision, pp. 34-51. Springer, Cham, 2020.
(https://arxiv.org/pdf/2009.06160).
Args:
num_classes (int): The unique number of target classes.
backbone_indices (tuple, optional): Values in the tuple indicate the indices of output of backbone.
enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss.
If true, auxiliary loss will be added after LearningToDownsample module. Default: False.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
jpu (bool, optional)): whether to use jpu unit in the base forward. Default:True.
pretrained (str, optional): The path or url of pretrained model. Default: None.
"""
def __init__(self,
num_classes: int = 150,
backbone_indices: Tuple[int]=(0, 1, 2, 3),
enable_auxiliary_loss: bool = True,
align_corners: bool = True,
jpu: bool = True,
pretrained: str = None):
super(GINetResNet101, self).__init__()
self.nclass = num_classes
self.aux = enable_auxiliary_loss
self.jpu = jpu
self.backbone = ResNet101_vd()
self.backbone_indices = backbone_indices
self.align_corners = align_corners
self.transforms = T.Compose([T.Normalize()])
self.jpu = layers.JPU([512, 1024, 2048], width=512) if jpu else None
self.head = GIHead(in_channels=2048, nclass=num_classes)
if self.aux:
self.auxlayer = layers.AuxLayer(
1024, 1024 // 4, num_classes, bias_attr=False)
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'model.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
return self.transforms(img)
def base_forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
feat_list = self.backbone(x)
c1, c2, c3, c4 = [feat_list[i] for i in self.backbone_indices]
if self.jpu:
return self.jpu(c1, c2, c3, c4)
else:
return c1, c2, c3, c4
def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
_, _, h, w = x.shape
_, _, c3, c4 = self.base_forward(x)
logit_list = []
x, _ = self.head(c4)
logit_list.append(x)
if self.aux:
auxout = self.auxlayer(c3)
logit_list.append(auxout)
return [
F.interpolate(
logit, (h, w),
mode='bilinear',
align_corners=self.align_corners) for logit in logit_list
]
class GIHead(nn.Layer):
"""The Graph Interaction Network head."""
def __init__(self, in_channels: int, nclass: int):
super().__init__()
self.nclass = nclass
inter_channels = in_channels // 4
self.inp = paddle.zeros(shape=(nclass, 300), dtype='float32')
self.inp = paddle.create_parameter(
shape=self.inp.shape,
dtype=str(self.inp.numpy().dtype),
default_initializer=paddle.nn.initializer.Assign(self.inp))
self.fc1 = nn.Sequential(
nn.Linear(300, 128), nn.BatchNorm1D(128), nn.ReLU())
self.fc2 = nn.Sequential(
nn.Linear(128, 256), nn.BatchNorm1D(256), nn.ReLU())
self.conv5 = layers.ConvBNReLU(
in_channels,
inter_channels,
3,
padding=1,
bias_attr=False,
stride=1)
self.gloru = GlobalReasonUnit(
in_channels=inter_channels,
num_state=256,
num_node=84,
nclass=nclass)
self.conv6 = nn.Sequential(
nn.Dropout(0.1), nn.Conv2D(inter_channels, nclass, 1))
def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
B, C, H, W = x.shape
inp = self.inp.detach()
inp = self.fc1(inp)
inp = self.fc2(inp).unsqueeze(axis=0).transpose((0, 2, 1))\
.expand((B, 256, self.nclass))
out = self.conv5(x)
out, se_out = self.gloru(out, inp)
out = self.conv6(out)
return out, se_out
class GlobalReasonUnit(nn.Layer):
"""
The original paper refers to:
Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks" (https://arxiv.org/abs/1811.12814)
"""
def __init__(self, in_channels: int, num_state: int = 256, num_node: int = 84, nclass: int = 59):
super().__init__()
self.num_state = num_state
self.conv_theta = nn.Conv2D(
in_channels, num_node, kernel_size=1, stride=1, padding=0)
self.conv_phi = nn.Conv2D(
in_channels, num_state, kernel_size=1, stride=1, padding=0)
self.graph = GraphLayer(num_state, num_node, nclass)
self.extend_dim = nn.Conv2D(
num_state, in_channels, kernel_size=1, bias_attr=False)
self.bn = layers.SyncBatchNorm(in_channels)
def forward(self, x: paddle.Tensor, inp: paddle.Tensor) -> List[paddle.Tensor]:
B = self.conv_theta(x)
sizeB = B.shape
B = B.reshape((sizeB[0], sizeB[1], -1))
sizex = x.shape
x_reduce = self.conv_phi(x)
x_reduce = x_reduce.reshape((sizex[0], -1, sizex[2] * sizex[3]))\
.transpose((0, 2, 1))
V = paddle.bmm(B, x_reduce).transpose((0, 2, 1))
V = paddle.divide(
V, paddle.to_tensor([sizex[2] * sizex[3]], dtype='float32'))
class_node, new_V = self.graph(inp, V)
D = B.reshape((sizeB[0], -1, sizeB[2] * sizeB[3])).transpose((0, 2, 1))
Y = paddle.bmm(D, new_V.transpose((0, 2, 1)))
Y = Y.transpose((0, 2, 1)).reshape((sizex[0], self.num_state, \
sizex[2], -1))
Y = self.extend_dim(Y)
Y = self.bn(Y)
out = Y + x
return out, class_node
class GraphLayer(nn.Layer):
def __init__(self, num_state: int, num_node: int, num_class: int):
super().__init__()
self.vis_gcn = GCN(num_state, num_node)
self.word_gcn = GCN(num_state, num_class)
self.transfer = GraphTransfer(num_state)
self.gamma_vis = paddle.zeros([num_node])
self.gamma_word = paddle.zeros([num_class])
self.gamma_vis = paddle.create_parameter(
shape=self.gamma_vis.shape,
dtype=str(self.gamma_vis.numpy().dtype),
default_initializer=paddle.nn.initializer.Assign(self.gamma_vis))
self.gamma_word = paddle.create_parameter(
shape=self.gamma_word.shape,
dtype=str(self.gamma_word.numpy().dtype),
default_initializer=paddle.nn.initializer.Assign(self.gamma_word))
def forward(self, inp: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
inp = self.word_gcn(inp)
new_V = self.vis_gcn(vis_node)
class_node, vis_node = self.transfer(inp, new_V)
class_node = self.gamma_word * inp + class_node
new_V = self.gamma_vis * vis_node + new_V
return class_node, new_V
class GCN(nn.Layer):
def __init__(self, num_state: int = 128, num_node: int = 64, bias=False):
super().__init__()
self.conv1 = nn.Conv1D(
num_node,
num_node,
kernel_size=1,
padding=0,
stride=1,
groups=1,
)
self.relu = nn.ReLU()
self.conv2 = nn.Conv1D(
num_state,
num_state,
kernel_size=1,
padding=0,
stride=1,
groups=1,
bias_attr=bias)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
h = self.conv1(x.transpose((0, 2, 1))).transpose((0, 2, 1))
h = h + x
h = self.relu(h)
h = self.conv2(h)
return h
class GraphTransfer(nn.Layer):
"""Transfer vis graph to class node, transfer class node to vis feature"""
def __init__(self, in_dim: int):
super().__init__()
self.channle_in = in_dim
self.query_conv = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
self.key_conv = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
self.value_conv_vis = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim, kernel_size=1)
self.value_conv_word = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim, kernel_size=1)
self.softmax_vis = nn.Softmax(axis=-1)
self.softmax_word = nn.Softmax(axis=-2)
def forward(self, word: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
m_batchsize, C, Nc = word.shape
m_batchsize, C, Nn = vis_node.shape
proj_query = self.query_conv(word).reshape((m_batchsize, -1, Nc))\
.transpose((0, 2, 1))
proj_key = self.key_conv(vis_node).reshape((m_batchsize, -1, Nn))
energy = paddle.bmm(proj_query, proj_key)
attention_vis = self.softmax_vis(energy).transpose((0, 2, 1))
attention_word = self.softmax_word(energy)
proj_value_vis = self.value_conv_vis(vis_node).reshape((m_batchsize, -1,
Nn))
proj_value_word = self.value_conv_word(word).reshape((m_batchsize, -1,
Nc))
class_out = paddle.bmm(proj_value_vis, attention_vis)
node_out = paddle.bmm(proj_value_word, attention_word)
return class_out, node_out
\ No newline at end of file
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
import ginet_resnet101vd_ade20k.layers as L
class BasicBlock(nn.Layer):
def __init__(self,
in_channels: int,
out_channels: int,
stride: int,
shortcut: bool = True,
if_first: bool = False,
name: str = None):
super(BasicBlock, self).__init__()
self.stride = stride
self.conv0 = L.ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=3,
stride=stride,
act='relu',
name=name + "_branch2a")
self.conv1 = L.ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
act=None,
name=name + "_branch2b")
if not shortcut:
self.short = L.ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1,
stride=1,
is_vd_mode=False if if_first else True,
name=name + "_branch1")
self.shortcut = shortcut
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv0(inputs)
conv1 = self.conv1(y)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
y = paddle.elementwise_add(x=short, y=conv1, act='relu')
return y
class ResNet101_vd(nn.Layer):
def __init__(self,
multi_grid: tuple = (1, 2, 4)):
super(ResNet101_vd, self).__init__()
depth = [3, 4, 23, 3]
num_channels = [64, 256, 512, 1024]
num_filters = [64, 128, 256, 512]
self.feat_channels = [c * 4 for c in num_filters]
dilation_dict = {2: 2, 3: 4}
self.conv1_1 = L.ConvBNLayer(
in_channels=3,
out_channels=32,
kernel_size=3,
stride=2,
act='relu',
name="conv1_1")
self.conv1_2 = L.ConvBNLayer(
in_channels=32,
out_channels=32,
kernel_size=3,
stride=1,
act='relu',
name="conv1_2")
self.conv1_3 = L.ConvBNLayer(
in_channels=32,
out_channels=64,
kernel_size=3,
stride=1,
act='relu',
name="conv1_3")
self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
self.stage_list = []
for block in range(len(depth)):
shortcut = False
block_list = []
for i in range(depth[block]):
conv_name = "res" + str(block + 2) + chr(97 + i)
dilation_rate = dilation_dict[
block] if dilation_dict and block in dilation_dict else 1
if block == 3:
dilation_rate = dilation_rate * multi_grid[i]
bottleneck_block = self.add_sublayer(
'bb_%d_%d' % (block, i),
L.BottleneckBlock(
in_channels=num_channels[block]
if i == 0 else num_filters[block] * 4,
out_channels=num_filters[block],
stride=2 if i == 0 and block != 0
and dilation_rate == 1 else 1,
shortcut=shortcut,
if_first=block == i == 0,
name=conv_name,
dilation=dilation_rate))
block_list.append(bottleneck_block)
shortcut = True
self.stage_list.append(block_list)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv1_1(inputs)
y = self.conv1_2(y)
y = self.conv1_3(y)
y = self.pool2d_max(y)
feat_list = []
for stage in self.stage_list:
for block in stage:
y = block(y)
feat_list.append(y)
return feat_list
\ No newline at end of file
# ginet_resnet101vd_cityscapes
|模型名称|ginet_resnet101vd_cityscapes|
| :--- | :---: |
|类别|图像-图像分割|
|网络|ginet_resnet101vd|
|数据集|Cityscapes|
|是否支持Fine-tuning|是|
|模型大小|286MB|
|指标|-|
|最新更新日期|2021-12-14|
## 一、模型基本信息
- 样例结果示例:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/145943452-6e3a8cce-b17c-417e-80ad-d47e1dd5e00c.png" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/145943518-b608d555-1ddb-4100-b399-b6f777658caf.png" hspace='10'/>
</p>
- ### 模型介绍
- 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
- 更多详情请参考:[ginet](https://arxiv.org/pdf/2009.06160)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.0.0
- paddlehub >= 2.0.0
- ### 2、安装
- ```shell
$ hub install ginet_resnet101vd_cityscapes
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1.预测代码示例
- ```python
import cv2
import paddle
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet101vd_cityscapes')
img = cv2.imread("/PATH/TO/IMAGE")
result = model.predict(images=[img], visualization=True)
```
- ### 2.如何开始Fine-tune
- 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ginet_resnet101vd_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
- 代码步骤
- Step1: 定义数据预处理方式
- ```python
from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
transform = Compose([Resize(target_size=(512, 512)), Normalize()])
```
- `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
- Step2: 下载数据集并使用
- ```python
from paddlehub.datasets import OpticDiscSeg
train_reader = OpticDiscSeg(transform, mode='train')
```
- `transforms`: 数据预处理方式。
- `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
- 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
- Step3: 加载预训练模型
- ```python
import paddlehub as hub
model = hub.Module(name='ginet_resnet101vd_cityscapes', num_classes=2, pretrained=None)
```
- `name`: 选择预训练模型的名字。
- `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
- Step4: 选择优化策略和运行配置
- ```python
import paddle
from paddlehub.finetune.trainer import Trainer
scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
```
- 模型预测
- 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet101vd_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
- 参数配置正确后,请执行脚本`python predict.py`。
- **Args**
* `images`:原始图像路径或BGR格式图片;
* `visualization`: 是否可视化,默认为True;
* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
## 四、服务部署
- PaddleHub Serving可以部署一个在线图像分割服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m ginet_resnet101vd_cityscapes
```
- 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/ginet_resnet101vd_cityscapes"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
mask = base64_to_cv2(r.json()["results"][0])
```
## 五、更新历史
* 1.0.0
初始发布
\ No newline at end of file
# ginet_resnet101vd_cityscapes
|Module Name|ginet_resnet101vd_cityscapes|
| :--- | :---: |
|Category|Image Segmentation|
|Network|ginet_resnet101vd|
|Dataset|Cityscapes|
|Fine-tuning supported or not|Yes|
|Module Size|286MB|
|Data indicators|-|
|Latest update date|2021-12-14|
## I. Basic Information
- ### Application Effect Display
- Sample results:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/145943452-6e3a8cce-b17c-417e-80ad-d47e1dd5e00c.png" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/145954768-026af5a4-30a5-43f3-abe3-c0ea821e895a.png" hspace='10'/>
</p>
- ### Module Introduction
- We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
- For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
## II. Installation
- ### 1、Environmental Dependence
- paddlepaddle >= 2.0.0
- paddlehub >= 2.0.0
- ### 2、Installation
- ```shell
$ hub install ginet_resnet101vd_cityscapes
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
| [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III. Module API Prediction
- ### 1、Prediction Code Example
- ```python
import cv2
import paddle
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet101vd_cityscapes')
img = cv2.imread("/PATH/TO/IMAGE")
result = model.predict(images=[img], visualization=True)
```
- ### 2.Fine-tune and Encapsulation
- After completing the installation of PaddlePaddle and PaddleHub, you can start using the ginet_resnet101vd_cityscapes model to fine-tune datasets such as OpticDiscSeg.
- Steps:
- Step1: Define the data preprocessing method
- ```python
from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
transform = Compose([Resize(target_size=(512, 512)), Normalize()])
```
- `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
- Step2: Download the dataset
- ```python
from paddlehub.datasets import OpticDiscSeg
train_reader = OpticDiscSeg(transform, mode='train')
```
* `transforms`: data preprocessing methods.
* `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
* Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
- Step3: Load the pre-trained model
- ```python
import paddlehub as hub
model = hub.Module(name='ginet_resnet101vd_cityscapes', num_classes=2, pretrained=None)
```
- `name`: model name.
- `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
- Step4: Optimization strategy
- ```python
import paddle
from paddlehub.finetune.trainer import Trainer
scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
```
- Model prediction
- When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet101vd_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
- **Args**
* `images`: Image path or ndarray data with format [H, W, C], BGR.
* `visualization`: Whether to save the recognition results as picture files.
* `save_path`: Save path of the result, default is 'seg_result'.
## IV. Server Deployment
- PaddleHub Serving can deploy an online service of image segmentation.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m ginet_resnet101vd_cityscapes
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result:
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/ginet_resnet101vd_cityscapes"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
mask = base64_to_cv2(r.json()["results"][0])
```
## V. Release Note
- 1.0.0
First release
\ No newline at end of file
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn.layer import activation
from paddle.nn import Conv2D, AvgPool2D
def SyncBatchNorm(*args, **kwargs):
"""In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
if paddle.get_device() == 'cpu':
return nn.BatchNorm2D(*args, **kwargs)
else:
return nn.SyncBatchNorm(*args, **kwargs)
class ConvBNLayer(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(
self,
in_channels: int,
out_channels: int,
kernel_size: int,
stride: int = 1,
dilation: int = 1,
groups: int = 1,
is_vd_mode: bool = False,
act: str = None,
name: str = None):
super(ConvBNLayer, self).__init__()
self.is_vd_mode = is_vd_mode
self._pool2d_avg = AvgPool2D(
kernel_size=2, stride=2, padding=0, ceil_mode=True)
self._conv = Conv2D(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
dilation=dilation,
groups=groups,
bias_attr=False)
self._batch_norm = SyncBatchNorm(out_channels)
self._act_op = Activation(act=act)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
if self.is_vd_mode:
inputs = self._pool2d_avg(inputs)
y = self._conv(inputs)
y = self._batch_norm(y)
y = self._act_op(y)
return y
class BottleneckBlock(nn.Layer):
"""Residual bottleneck block"""
def __init__(self,
in_channels: int,
out_channels: int,
stride: int,
shortcut: bool = True,
if_first: bool = False,
dilation: int = 1,
name: str = None):
super(BottleneckBlock, self).__init__()
self.conv0 = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1,
act='relu',
name=name + "_branch2a")
self.dilation = dilation
self.conv1 = ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
stride=stride,
act='relu',
dilation=dilation,
name=name + "_branch2b")
self.conv2 = ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels * 4,
kernel_size=1,
act=None,
name=name + "_branch2c")
if not shortcut:
self.short = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels * 4,
kernel_size=1,
stride=1,
is_vd_mode=False if if_first or stride == 1 else True,
name=name + "_branch1")
self.shortcut = shortcut
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv0(inputs)
if self.dilation > 1:
padding = self.dilation
y = F.pad(y, [padding, padding, padding, padding])
conv1 = self.conv1(y)
conv2 = self.conv2(conv1)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
y = paddle.add(x=short, y=conv2)
y = F.relu(y)
return y
class SeparableConvBNReLU(nn.Layer):
"""Depthwise Separable Convolution."""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
padding: str = 'same',
**kwargs: dict):
super(SeparableConvBNReLU, self).__init__()
self.depthwise_conv = ConvBN(
in_channels,
out_channels=in_channels,
kernel_size=kernel_size,
padding=padding,
groups=in_channels,
**kwargs)
self.piontwise_conv = ConvBNReLU(
in_channels, out_channels, kernel_size=1, groups=1)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.depthwise_conv(x)
x = self.piontwise_conv(x)
return x
class ConvBN(nn.Layer):
"""Basic conv bn layer"""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
padding: str = 'same',
**kwargs: dict):
super(ConvBN, self).__init__()
self._conv = Conv2D(
in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
return x
class ConvBNReLU(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
padding: str = 'same',
**kwargs: dict):
super(ConvBNReLU, self).__init__()
self._conv = Conv2D(
in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
x = F.relu(x)
return x
class Activation(nn.Layer):
"""
The wrapper of activations.
Args:
act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
'hsigmoid']. Default: None, means identical transformation.
Returns:
A callable object of Activation.
Raises:
KeyError: When parameter `act` is not in the optional range.
Examples:
from paddleseg.models.common.activation import Activation
relu = Activation("relu")
print(relu)
# <class 'paddle.nn.layer.activation.ReLU'>
sigmoid = Activation("sigmoid")
print(sigmoid)
# <class 'paddle.nn.layer.activation.Sigmoid'>
not_exit_one = Activation("not_exit_one")
# KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
# 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
# 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
"""
def __init__(self, act: str = None):
super(Activation, self).__init__()
self._act = act
upper_act_names = activation.__dict__.keys()
lower_act_names = [act.lower() for act in upper_act_names]
act_dict = dict(zip(lower_act_names, upper_act_names))
if act is not None:
if act in act_dict.keys():
act_name = act_dict[act]
self.act_func = eval("activation.{}()".format(act_name))
else:
raise KeyError("{} does not exist in the current {}".format(
act, act_dict.keys()))
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
if self._act is not None:
return self.act_func(x)
else:
return x
class ASPPModule(nn.Layer):
"""
Atrous Spatial Pyramid Pooling.
Args:
aspp_ratios (tuple): The dilation rate using in ASSP module.
in_channels (int): The number of input channels.
out_channels (int): The number of output channels.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
image_pooling (bool, optional): If augmented with image-level features. Default: False
"""
def __init__(self,
aspp_ratios: tuple,
in_channels: int,
out_channels: int,
align_corners: bool,
use_sep_conv: bool= False,
image_pooling: bool = False):
super().__init__()
self.align_corners = align_corners
self.aspp_blocks = nn.LayerList()
for ratio in aspp_ratios:
if use_sep_conv and ratio > 1:
conv_func = SeparableConvBNReLU
else:
conv_func = ConvBNReLU
block = conv_func(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1 if ratio == 1 else 3,
dilation=ratio,
padding=0 if ratio == 1 else ratio)
self.aspp_blocks.append(block)
out_size = len(self.aspp_blocks)
if image_pooling:
self.global_avg_pool = nn.Sequential(
nn.AdaptiveAvgPool2D(output_size=(1, 1)),
ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
out_size += 1
self.image_pooling = image_pooling
self.conv_bn_relu = ConvBNReLU(
in_channels=out_channels * out_size,
out_channels=out_channels,
kernel_size=1)
self.dropout = nn.Dropout(p=0.1) # drop rate
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
outputs = []
for block in self.aspp_blocks:
y = block(x)
y = F.interpolate(
y,
x.shape[2:],
mode='bilinear',
align_corners=self.align_corners)
outputs.append(y)
if self.image_pooling:
img_avg = self.global_avg_pool(x)
img_avg = F.interpolate(
img_avg,
x.shape[2:],
mode='bilinear',
align_corners=self.align_corners)
outputs.append(img_avg)
x = paddle.concat(outputs, axis=1)
x = self.conv_bn_relu(x)
x = self.dropout(x)
return x
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from typing import Union, List, Tuple
import paddle
from paddle import nn
import paddle.nn.functional as F
import numpy as np
from paddlehub.module.module import moduleinfo
import paddlehub.vision.segmentation_transforms as T
from paddlehub.module.cv_module import ImageSegmentationModule
from paddleseg.utils import utils
from paddleseg.models import layers
from ginet_resnet101vd_cityscapes.resnet import ResNet101_vd
@moduleinfo(
name="ginet_resnet101vd_cityscapes",
type="CV/semantic_segmentation",
author="paddlepaddle",
author_email="",
summary="GINetResnet101 is a segmentation model.",
version="1.0.0",
meta=ImageSegmentationModule)
class GINetResNet101(nn.Layer):
"""
The GINetResNet101 implementation based on PaddlePaddle.
The original article refers to
Wu, Tianyi, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, and Guodong Guo. "GINet: Graph interaction network for scene parsing." In European Conference on Computer Vision, pp. 34-51. Springer, Cham, 2020.
(https://arxiv.org/pdf/2009.06160).
Args:
num_classes (int): The unique number of target classes.
backbone_indices (tuple, optional): Values in the tuple indicate the indices of output of backbone.
enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss.
If true, auxiliary loss will be added after LearningToDownsample module. Default: False.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
jpu (bool, optional)): whether to use jpu unit in the base forward. Default:True.
pretrained (str, optional): The path or url of pretrained model. Default: None.
"""
def __init__(self,
num_classes: int = 19,
backbone_indices: Tuple[int]=(0, 1, 2, 3),
enable_auxiliary_loss: bool = True,
align_corners: bool = True,
jpu: bool = True,
pretrained: str = None):
super(GINetResNet101, self).__init__()
self.nclass = num_classes
self.aux = enable_auxiliary_loss
self.jpu = jpu
self.backbone = ResNet101_vd()
self.backbone_indices = backbone_indices
self.align_corners = align_corners
self.transforms = T.Compose([T.Normalize()])
self.jpu = layers.JPU([512, 1024, 2048], width=512) if jpu else None
self.head = GIHead(in_channels=2048, nclass=num_classes)
if self.aux:
self.auxlayer = layers.AuxLayer(
1024, 1024 // 4, num_classes, bias_attr=False)
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'model.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
return self.transforms(img)
def base_forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
feat_list = self.backbone(x)
c1, c2, c3, c4 = [feat_list[i] for i in self.backbone_indices]
if self.jpu:
return self.jpu(c1, c2, c3, c4)
else:
return c1, c2, c3, c4
def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
_, _, h, w = x.shape
_, _, c3, c4 = self.base_forward(x)
logit_list = []
x, _ = self.head(c4)
logit_list.append(x)
if self.aux:
auxout = self.auxlayer(c3)
logit_list.append(auxout)
return [
F.interpolate(
logit, (h, w),
mode='bilinear',
align_corners=self.align_corners) for logit in logit_list
]
class GIHead(nn.Layer):
"""The Graph Interaction Network head."""
def __init__(self, in_channels: int, nclass: int):
super().__init__()
self.nclass = nclass
inter_channels = in_channels // 4
self.inp = paddle.zeros(shape=(nclass, 300), dtype='float32')
self.inp = paddle.create_parameter(
shape=self.inp.shape,
dtype=str(self.inp.numpy().dtype),
default_initializer=paddle.nn.initializer.Assign(self.inp))
self.fc1 = nn.Sequential(
nn.Linear(300, 128), nn.BatchNorm1D(128), nn.ReLU())
self.fc2 = nn.Sequential(
nn.Linear(128, 256), nn.BatchNorm1D(256), nn.ReLU())
self.conv5 = layers.ConvBNReLU(
in_channels,
inter_channels,
3,
padding=1,
bias_attr=False,
stride=1)
self.gloru = GlobalReasonUnit(
in_channels=inter_channels,
num_state=256,
num_node=84,
nclass=nclass)
self.conv6 = nn.Sequential(
nn.Dropout(0.1), nn.Conv2D(inter_channels, nclass, 1))
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
B, C, H, W = x.shape
inp = self.inp.detach()
inp = self.fc1(inp)
inp = self.fc2(inp).unsqueeze(axis=0).transpose((0, 2, 1))\
.expand((B, 256, self.nclass))
out = self.conv5(x)
out, se_out = self.gloru(out, inp)
out = self.conv6(out)
return out, se_out
class GlobalReasonUnit(nn.Layer):
"""
The original paper refers to:
Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks" (https://arxiv.org/abs/1811.12814)
"""
def __init__(self, in_channels: int, num_state: int = 256, num_node: int = 84, nclass: int = 59):
super().__init__()
self.num_state = num_state
self.conv_theta = nn.Conv2D(
in_channels, num_node, kernel_size=1, stride=1, padding=0)
self.conv_phi = nn.Conv2D(
in_channels, num_state, kernel_size=1, stride=1, padding=0)
self.graph = GraphLayer(num_state, num_node, nclass)
self.extend_dim = nn.Conv2D(
num_state, in_channels, kernel_size=1, bias_attr=False)
self.bn = layers.SyncBatchNorm(in_channels)
def forward(self, x: paddle.Tensor, inp: paddle.Tensor) -> paddle.Tensor:
B = self.conv_theta(x)
sizeB = B.shape
B = B.reshape((sizeB[0], sizeB[1], -1))
sizex = x.shape
x_reduce = self.conv_phi(x)
x_reduce = x_reduce.reshape((sizex[0], -1, sizex[2] * sizex[3]))\
.transpose((0, 2, 1))
V = paddle.bmm(B, x_reduce).transpose((0, 2, 1))
V = paddle.divide(
V, paddle.to_tensor([sizex[2] * sizex[3]], dtype='float32'))
class_node, new_V = self.graph(inp, V)
D = B.reshape((sizeB[0], -1, sizeB[2] * sizeB[3])).transpose((0, 2, 1))
Y = paddle.bmm(D, new_V.transpose((0, 2, 1)))
Y = Y.transpose((0, 2, 1)).reshape((sizex[0], self.num_state, \
sizex[2], -1))
Y = self.extend_dim(Y)
Y = self.bn(Y)
out = Y + x
return out, class_node
class GraphLayer(nn.Layer):
def __init__(self, num_state: int, num_node: int, num_class: int):
super().__init__()
self.vis_gcn = GCN(num_state, num_node)
self.word_gcn = GCN(num_state, num_class)
self.transfer = GraphTransfer(num_state)
self.gamma_vis = paddle.zeros([num_node])
self.gamma_word = paddle.zeros([num_class])
self.gamma_vis = paddle.create_parameter(
shape=self.gamma_vis.shape,
dtype=str(self.gamma_vis.numpy().dtype),
default_initializer=paddle.nn.initializer.Assign(self.gamma_vis))
self.gamma_word = paddle.create_parameter(
shape=self.gamma_word.shape,
dtype=str(self.gamma_word.numpy().dtype),
default_initializer=paddle.nn.initializer.Assign(self.gamma_word))
def forward(self, inp: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
inp = self.word_gcn(inp)
new_V = self.vis_gcn(vis_node)
class_node, vis_node = self.transfer(inp, new_V)
class_node = self.gamma_word * inp + class_node
new_V = self.gamma_vis * vis_node + new_V
return class_node, new_V
class GCN(nn.Layer):
def __init__(self, num_state: int = 128, num_node: int = 64, bias=False):
super().__init__()
self.conv1 = nn.Conv1D(
num_node,
num_node,
kernel_size=1,
padding=0,
stride=1,
groups=1,
)
self.relu = nn.ReLU()
self.conv2 = nn.Conv1D(
num_state,
num_state,
kernel_size=1,
padding=0,
stride=1,
groups=1,
bias_attr=bias)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
h = self.conv1(x.transpose((0, 2, 1))).transpose((0, 2, 1))
h = h + x
h = self.relu(h)
h = self.conv2(h)
return h
class GraphTransfer(nn.Layer):
"""Transfer vis graph to class node, transfer class node to vis feature"""
def __init__(self, in_dim: int):
super().__init__()
self.channle_in = in_dim
self.query_conv = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
self.key_conv = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
self.value_conv_vis = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim, kernel_size=1)
self.value_conv_word = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim, kernel_size=1)
self.softmax_vis = nn.Softmax(axis=-1)
self.softmax_word = nn.Softmax(axis=-2)
def forward(self, word: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
m_batchsize, C, Nc = word.shape
m_batchsize, C, Nn = vis_node.shape
proj_query = self.query_conv(word).reshape((m_batchsize, -1, Nc))\
.transpose((0, 2, 1))
proj_key = self.key_conv(vis_node).reshape((m_batchsize, -1, Nn))
energy = paddle.bmm(proj_query, proj_key)
attention_vis = self.softmax_vis(energy).transpose((0, 2, 1))
attention_word = self.softmax_word(energy)
proj_value_vis = self.value_conv_vis(vis_node).reshape((m_batchsize, -1,
Nn))
proj_value_word = self.value_conv_word(word).reshape((m_batchsize, -1,
Nc))
class_out = paddle.bmm(proj_value_vis, attention_vis)
node_out = paddle.bmm(proj_value_word, attention_word)
return class_out, node_out
\ No newline at end of file
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
import ginet_resnet101vd_cityscapes.layers as L
class BasicBlock(nn.Layer):
def __init__(self,
in_channels: int,
out_channels: int,
stride: int,
shortcut: bool = True,
if_first: bool = False,
name: str = None):
super(BasicBlock, self).__init__()
self.stride = stride
self.conv0 = L.ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=3,
stride=stride,
act='relu',
name=name + "_branch2a")
self.conv1 = L.ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
act=None,
name=name + "_branch2b")
if not shortcut:
self.short = L.ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1,
stride=1,
is_vd_mode=False if if_first else True,
name=name + "_branch1")
self.shortcut = shortcut
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv0(inputs)
conv1 = self.conv1(y)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
y = paddle.elementwise_add(x=short, y=conv1, act='relu')
return y
class ResNet101_vd(nn.Layer):
def __init__(self,
multi_grid: tuple = (1, 2, 4)):
super(ResNet101_vd, self).__init__()
depth = [3, 4, 23, 3]
num_channels = [64, 256, 512, 1024]
num_filters = [64, 128, 256, 512]
self.feat_channels = [c * 4 for c in num_filters]
dilation_dict = {2: 2, 3: 4}
self.conv1_1 = L.ConvBNLayer(
in_channels=3,
out_channels=32,
kernel_size=3,
stride=2,
act='relu',
name="conv1_1")
self.conv1_2 = L.ConvBNLayer(
in_channels=32,
out_channels=32,
kernel_size=3,
stride=1,
act='relu',
name="conv1_2")
self.conv1_3 = L.ConvBNLayer(
in_channels=32,
out_channels=64,
kernel_size=3,
stride=1,
act='relu',
name="conv1_3")
self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
self.stage_list = []
for block in range(len(depth)):
shortcut = False
block_list = []
for i in range(depth[block]):
conv_name = "res" + str(block + 2) + chr(97 + i)
dilation_rate = dilation_dict[
block] if dilation_dict and block in dilation_dict else 1
if block == 3:
dilation_rate = dilation_rate * multi_grid[i]
bottleneck_block = self.add_sublayer(
'bb_%d_%d' % (block, i),
L.BottleneckBlock(
in_channels=num_channels[block]
if i == 0 else num_filters[block] * 4,
out_channels=num_filters[block],
stride=2 if i == 0 and block != 0
and dilation_rate == 1 else 1,
shortcut=shortcut,
if_first=block == i == 0,
name=conv_name,
dilation=dilation_rate))
block_list.append(bottleneck_block)
shortcut = True
self.stage_list.append(block_list)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv1_1(inputs)
y = self.conv1_2(y)
y = self.conv1_3(y)
y = self.pool2d_max(y)
feat_list = []
for stage in self.stage_list:
for block in stage:
y = block(y)
feat_list.append(y)
return feat_list
\ No newline at end of file
# ginet_resnet101vd_voc
|模型名称|ginet_resnet101vd_voc|
| :--- | :---: |
|类别|图像-图像分割|
|网络|ginet_resnet101vd|
|数据集|PascalVOC2012|
|是否支持Fine-tuning|是|
|模型大小|286MB|
|指标|-|
|最新更新日期|2021-12-14|
## 一、模型基本信息
- 样例结果示例:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/145925887-bf9e62d3-8c6d-43c2-8062-6cb6ba59ec0e.jpg" width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/145925692-badb21d1-10e7-4a5d-82f5-1177d10a7681.png" width = "420" height = "505" />
</p>
- ### 模型介绍
- 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
- 更多详情请参考:[ginet](https://arxiv.org/pdf/2009.06160)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.0.0
- paddlehub >= 2.0.0
- ### 2、安装
- ```shell
$ hub install ginet_resnet101vd_voc
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1.预测代码示例
- ```python
import cv2
import paddle
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet101vd_voc')
img = cv2.imread("/PATH/TO/IMAGE")
result = model.predict(images=[img], visualization=True)
```
- ### 2.如何开始Fine-tune
- 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ginet_resnet101vd_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
- 代码步骤
- Step1: 定义数据预处理方式
- ```python
from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
transform = Compose([Resize(target_size=(512, 512)), Normalize()])
```
- `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
- Step2: 下载数据集并使用
- ```python
from paddlehub.datasets import OpticDiscSeg
train_reader = OpticDiscSeg(transform, mode='train')
```
- `transforms`: 数据预处理方式。
- `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
- 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
- Step3: 加载预训练模型
- ```python
import paddlehub as hub
model = hub.Module(name='ginet_resnet101vd_voc', num_classes=2, pretrained=None)
```
- `name`: 选择预训练模型的名字。
- `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
- Step4: 选择优化策略和运行配置
- ```python
import paddle
from paddlehub.finetune.trainer import Trainer
scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
```
- 模型预测
- 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet101vd_voc', pretrained='/PATH/TO/CHECKPOINT')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
- 参数配置正确后,请执行脚本`python predict.py`。
- **Args**
* `images`:原始图像路径或BGR格式图片;
* `visualization`: 是否可视化,默认为True;
* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
## 四、服务部署
- PaddleHub Serving可以部署一个在线图像分割服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m ginet_resnet101vd_voc
```
- 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/ginet_resnet101vd_voc"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
mask = base64_to_cv2(r.json()["results"][0])
```
## 五、更新历史
* 1.0.0
初始发布
# ginet_resnet101vd_voc
|Module Name|ginet_resnet101vd_voc|
| :--- | :---: |
|Category|Image Segmentation|
|Network|ginet_resnet101vd|
|Dataset|PascalVOC2012|
|Fine-tuning supported or not|Yes|
|Module Size|286MB|
|Data indicators|-|
|Latest update date|2021-12-14|
## I. Basic Information
- ### Application Effect Display
- Sample results:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/145925887-bf9e62d3-8c6d-43c2-8062-6cb6ba59ec0e.jpg" width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/145925692-badb21d1-10e7-4a5d-82f5-1177d10a7681.png" width = "420" height = "505" hspace='10'/>
</p>
- ### Module Introduction
- We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
- For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
## II. Installation
- ### 1、Environmental Dependence
- paddlepaddle >= 2.0.0
- paddlehub >= 2.0.0
- ### 2、Installation
- ```shell
$ hub install ginet_resnet101vd_voc
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
| [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III. Module API Prediction
- ### 1、Prediction Code Example
- ```python
import cv2
import paddle
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet101vd_voc')
img = cv2.imread("/PATH/TO/IMAGE")
result = model.predict(images=[img], visualization=True)
```
- ### 2.Fine-tune and Encapsulation
- After completing the installation of PaddlePaddle and PaddleHub, you can start using the ginet_resnet101vd_voc model to fine-tune datasets such as OpticDiscSeg.
- Steps:
- Step1: Define the data preprocessing method
- ```python
from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
transform = Compose([Resize(target_size=(512, 512)), Normalize()])
```
- `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
- Step2: Download the dataset
- ```python
from paddlehub.datasets import OpticDiscSeg
train_reader = OpticDiscSeg(transform, mode='train')
```
* `transforms`: data preprocessing methods.
* `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
* Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
- Step3: Load the pre-trained model
- ```python
import paddlehub as hub
model = hub.Module(name='ginet_resnet101vd_voc', num_classes=2, pretrained=None)
```
- `name`: model name.
- `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
- Step4: Optimization strategy
- ```python
import paddle
from paddlehub.finetune.trainer import Trainer
scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='ttest_ckpt_img_seg', use_gpu=True)
trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
```
- Model prediction
- When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet101vd_voc', pretrained='/PATH/TO/CHECKPOINT')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
- **Args**
* `images`: Image path or ndarray data with format [H, W, C], BGR.
* `visualization`: Whether to save the recognition results as picture files.
* `save_path`: Save path of the result, default is 'seg_result'.
## IV. Server Deployment
- PaddleHub Serving can deploy an online service of image segmentation.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m ginet_resnet101vd_voc
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result:
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/ginet_resnet101vd_voc"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
mask = base64_to_cv2(r.json()["results"][0])
```
## V. Release Note
- 1.0.0
First release
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn.layer import activation
from paddle.nn import Conv2D, AvgPool2D
def SyncBatchNorm(*args, **kwargs):
"""In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
if paddle.get_device() == 'cpu':
return nn.BatchNorm2D(*args, **kwargs)
else:
return nn.SyncBatchNorm(*args, **kwargs)
class ConvBNLayer(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(
self,
in_channels: int,
out_channels: int,
kernel_size: int,
stride: int = 1,
dilation: int = 1,
groups: int = 1,
is_vd_mode: bool = False,
act: str = None,
name: str = None):
super(ConvBNLayer, self).__init__()
self.is_vd_mode = is_vd_mode
self._pool2d_avg = AvgPool2D(
kernel_size=2, stride=2, padding=0, ceil_mode=True)
self._conv = Conv2D(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
dilation=dilation,
groups=groups,
bias_attr=False)
self._batch_norm = SyncBatchNorm(out_channels)
self._act_op = Activation(act=act)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
if self.is_vd_mode:
inputs = self._pool2d_avg(inputs)
y = self._conv(inputs)
y = self._batch_norm(y)
y = self._act_op(y)
return y
class BottleneckBlock(nn.Layer):
"""Residual bottleneck block"""
def __init__(self,
in_channels: int,
out_channels: int,
stride: int,
shortcut: bool = True,
if_first: bool = False,
dilation: int = 1,
name: str = None):
super(BottleneckBlock, self).__init__()
self.conv0 = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1,
act='relu',
name=name + "_branch2a")
self.dilation = dilation
self.conv1 = ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
stride=stride,
act='relu',
dilation=dilation,
name=name + "_branch2b")
self.conv2 = ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels * 4,
kernel_size=1,
act=None,
name=name + "_branch2c")
if not shortcut:
self.short = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels * 4,
kernel_size=1,
stride=1,
is_vd_mode=False if if_first or stride == 1 else True,
name=name + "_branch1")
self.shortcut = shortcut
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv0(inputs)
if self.dilation > 1:
padding = self.dilation
y = F.pad(y, [padding, padding, padding, padding])
conv1 = self.conv1(y)
conv2 = self.conv2(conv1)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
y = paddle.add(x=short, y=conv2)
y = F.relu(y)
return y
class SeparableConvBNReLU(nn.Layer):
"""Depthwise Separable Convolution."""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
padding: str = 'same',
**kwargs: dict):
super(SeparableConvBNReLU, self).__init__()
self.depthwise_conv = ConvBN(
in_channels,
out_channels=in_channels,
kernel_size=kernel_size,
padding=padding,
groups=in_channels,
**kwargs)
self.piontwise_conv = ConvBNReLU(
in_channels, out_channels, kernel_size=1, groups=1)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.depthwise_conv(x)
x = self.piontwise_conv(x)
return x
class ConvBN(nn.Layer):
"""Basic conv bn layer"""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
padding: str = 'same',
**kwargs: dict):
super(ConvBN, self).__init__()
self._conv = Conv2D(
in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
return x
class ConvBNReLU(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
padding: str = 'same',
**kwargs: dict):
super(ConvBNReLU, self).__init__()
self._conv = Conv2D(
in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
x = F.relu(x)
return x
class Activation(nn.Layer):
"""
The wrapper of activations.
Args:
act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
'hsigmoid']. Default: None, means identical transformation.
Returns:
A callable object of Activation.
Raises:
KeyError: When parameter `act` is not in the optional range.
Examples:
from paddleseg.models.common.activation import Activation
relu = Activation("relu")
print(relu)
# <class 'paddle.nn.layer.activation.ReLU'>
sigmoid = Activation("sigmoid")
print(sigmoid)
# <class 'paddle.nn.layer.activation.Sigmoid'>
not_exit_one = Activation("not_exit_one")
# KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
# 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
# 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
"""
def __init__(self, act: str = None):
super(Activation, self).__init__()
self._act = act
upper_act_names = activation.__dict__.keys()
lower_act_names = [act.lower() for act in upper_act_names]
act_dict = dict(zip(lower_act_names, upper_act_names))
if act is not None:
if act in act_dict.keys():
act_name = act_dict[act]
self.act_func = eval("activation.{}()".format(act_name))
else:
raise KeyError("{} does not exist in the current {}".format(
act, act_dict.keys()))
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
if self._act is not None:
return self.act_func(x)
else:
return x
class ASPPModule(nn.Layer):
"""
Atrous Spatial Pyramid Pooling.
Args:
aspp_ratios (tuple): The dilation rate using in ASSP module.
in_channels (int): The number of input channels.
out_channels (int): The number of output channels.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
image_pooling (bool, optional): If augmented with image-level features. Default: False
"""
def __init__(self,
aspp_ratios: tuple,
in_channels: int,
out_channels: int,
align_corners: bool,
use_sep_conv: bool= False,
image_pooling: bool = False):
super().__init__()
self.align_corners = align_corners
self.aspp_blocks = nn.LayerList()
for ratio in aspp_ratios:
if use_sep_conv and ratio > 1:
conv_func = SeparableConvBNReLU
else:
conv_func = ConvBNReLU
block = conv_func(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1 if ratio == 1 else 3,
dilation=ratio,
padding=0 if ratio == 1 else ratio)
self.aspp_blocks.append(block)
out_size = len(self.aspp_blocks)
if image_pooling:
self.global_avg_pool = nn.Sequential(
nn.AdaptiveAvgPool2D(output_size=(1, 1)),
ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
out_size += 1
self.image_pooling = image_pooling
self.conv_bn_relu = ConvBNReLU(
in_channels=out_channels * out_size,
out_channels=out_channels,
kernel_size=1)
self.dropout = nn.Dropout(p=0.1) # drop rate
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
outputs = []
for block in self.aspp_blocks:
y = block(x)
y = F.interpolate(
y,
x.shape[2:],
mode='bilinear',
align_corners=self.align_corners)
outputs.append(y)
if self.image_pooling:
img_avg = self.global_avg_pool(x)
img_avg = F.interpolate(
img_avg,
x.shape[2:],
mode='bilinear',
align_corners=self.align_corners)
outputs.append(img_avg)
x = paddle.concat(outputs, axis=1)
x = self.conv_bn_relu(x)
x = self.dropout(x)
return x
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from typing import Union, List, Tuple
import paddle
from paddle import nn
import paddle.nn.functional as F
import numpy as np
from paddlehub.module.module import moduleinfo
import paddlehub.vision.segmentation_transforms as T
from paddlehub.module.cv_module import ImageSegmentationModule
from paddleseg.utils import utils
from paddleseg.models import layers
from ginet_resnet101vd_voc.resnet import ResNet101_vd
@moduleinfo(
name="ginet_resnet101vd_voc",
type="CV/semantic_segmentation",
author="paddlepaddle",
author_email="",
summary="GINetResnet101 is a segmentation model.",
version="1.0.0",
meta=ImageSegmentationModule)
class GINetResNet101(nn.Layer):
"""
The GINetResNet101 implementation based on PaddlePaddle.
The original article refers to
Wu, Tianyi, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, and Guodong Guo. "GINet: Graph interaction network for scene parsing." In European Conference on Computer Vision, pp. 34-51. Springer, Cham, 2020.
(https://arxiv.org/pdf/2009.06160).
Args:
num_classes (int): The unique number of target classes.
backbone_indices (tuple, optional): Values in the tuple indicate the indices of output of backbone.
enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss.
If true, auxiliary loss will be added after LearningToDownsample module. Default: False.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
jpu (bool, optional)): whether to use jpu unit in the base forward. Default:True.
pretrained (str, optional): The path or url of pretrained model. Default: None.
"""
def __init__(self,
num_classes: int = 21,
backbone_indices: Tuple[int]=(0, 1, 2, 3),
enable_auxiliary_loss: bool = True,
align_corners: bool = True,
jpu: bool = True,
pretrained: str = None):
super(GINetResNet101, self).__init__()
self.nclass = num_classes
self.aux = enable_auxiliary_loss
self.jpu = jpu
self.backbone = ResNet101_vd()
self.backbone_indices = backbone_indices
self.align_corners = align_corners
self.transforms = T.Compose([T.Normalize()])
self.jpu = layers.JPU([512, 1024, 2048], width=512) if jpu else None
self.head = GIHead(in_channels=2048, nclass=num_classes)
if self.aux:
self.auxlayer = layers.AuxLayer(
1024, 1024 // 4, num_classes, bias_attr=False)
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'model.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
return self.transforms(img)
def base_forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
feat_list = self.backbone(x)
c1, c2, c3, c4 = [feat_list[i] for i in self.backbone_indices]
if self.jpu:
return self.jpu(c1, c2, c3, c4)
else:
return c1, c2, c3, c4
def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
_, _, h, w = x.shape
_, _, c3, c4 = self.base_forward(x)
logit_list = []
x, _ = self.head(c4)
logit_list.append(x)
if self.aux:
auxout = self.auxlayer(c3)
logit_list.append(auxout)
return [
F.interpolate(
logit, (h, w),
mode='bilinear',
align_corners=self.align_corners) for logit in logit_list
]
class GIHead(nn.Layer):
"""The Graph Interaction Network head."""
def __init__(self, in_channels: int, nclass: int):
super().__init__()
self.nclass = nclass
inter_channels = in_channels // 4
self.inp = paddle.zeros(shape=(nclass, 300), dtype='float32')
self.inp = paddle.create_parameter(
shape=self.inp.shape,
dtype=str(self.inp.numpy().dtype),
default_initializer=paddle.nn.initializer.Assign(self.inp))
self.fc1 = nn.Sequential(
nn.Linear(300, 128), nn.BatchNorm1D(128), nn.ReLU())
self.fc2 = nn.Sequential(
nn.Linear(128, 256), nn.BatchNorm1D(256), nn.ReLU())
self.conv5 = layers.ConvBNReLU(
in_channels,
inter_channels,
3,
padding=1,
bias_attr=False,
stride=1)
self.gloru = GlobalReasonUnit(
in_channels=inter_channels,
num_state=256,
num_node=84,
nclass=nclass)
self.conv6 = nn.Sequential(
nn.Dropout(0.1), nn.Conv2D(inter_channels, nclass, 1))
def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
B, C, H, W = x.shape
inp = self.inp.detach()
inp = self.fc1(inp)
inp = self.fc2(inp).unsqueeze(axis=0).transpose((0, 2, 1))\
.expand((B, 256, self.nclass))
out = self.conv5(x)
out, se_out = self.gloru(out, inp)
out = self.conv6(out)
return out, se_out
class GlobalReasonUnit(nn.Layer):
"""
The original paper refers to:
Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks" (https://arxiv.org/abs/1811.12814)
"""
def __init__(self, in_channels: int, num_state: int = 256, num_node: int = 84, nclass: int = 59):
super().__init__()
self.num_state = num_state
self.conv_theta = nn.Conv2D(
in_channels, num_node, kernel_size=1, stride=1, padding=0)
self.conv_phi = nn.Conv2D(
in_channels, num_state, kernel_size=1, stride=1, padding=0)
self.graph = GraphLayer(num_state, num_node, nclass)
self.extend_dim = nn.Conv2D(
num_state, in_channels, kernel_size=1, bias_attr=False)
self.bn = layers.SyncBatchNorm(in_channels)
def forward(self, x: paddle.Tensor, inp: paddle.Tensor) -> List[paddle.Tensor]:
B = self.conv_theta(x)
sizeB = B.shape
B = B.reshape((sizeB[0], sizeB[1], -1))
sizex = x.shape
x_reduce = self.conv_phi(x)
x_reduce = x_reduce.reshape((sizex[0], -1, sizex[2] * sizex[3]))\
.transpose((0, 2, 1))
V = paddle.bmm(B, x_reduce).transpose((0, 2, 1))
V = paddle.divide(
V, paddle.to_tensor([sizex[2] * sizex[3]], dtype='float32'))
class_node, new_V = self.graph(inp, V)
D = B.reshape((sizeB[0], -1, sizeB[2] * sizeB[3])).transpose((0, 2, 1))
Y = paddle.bmm(D, new_V.transpose((0, 2, 1)))
Y = Y.transpose((0, 2, 1)).reshape((sizex[0], self.num_state, \
sizex[2], -1))
Y = self.extend_dim(Y)
Y = self.bn(Y)
out = Y + x
return out, class_node
class GraphLayer(nn.Layer):
def __init__(self, num_state: int, num_node: int, num_class: int):
super().__init__()
self.vis_gcn = GCN(num_state, num_node)
self.word_gcn = GCN(num_state, num_class)
self.transfer = GraphTransfer(num_state)
self.gamma_vis = paddle.zeros([num_node])
self.gamma_word = paddle.zeros([num_class])
self.gamma_vis = paddle.create_parameter(
shape=self.gamma_vis.shape,
dtype=str(self.gamma_vis.numpy().dtype),
default_initializer=paddle.nn.initializer.Assign(self.gamma_vis))
self.gamma_word = paddle.create_parameter(
shape=self.gamma_word.shape,
dtype=str(self.gamma_word.numpy().dtype),
default_initializer=paddle.nn.initializer.Assign(self.gamma_word))
def forward(self, inp: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
inp = self.word_gcn(inp)
new_V = self.vis_gcn(vis_node)
class_node, vis_node = self.transfer(inp, new_V)
class_node = self.gamma_word * inp + class_node
new_V = self.gamma_vis * vis_node + new_V
return class_node, new_V
class GCN(nn.Layer):
def __init__(self, num_state: int = 128, num_node: int = 64, bias=False):
super().__init__()
self.conv1 = nn.Conv1D(
num_node,
num_node,
kernel_size=1,
padding=0,
stride=1,
groups=1,
)
self.relu = nn.ReLU()
self.conv2 = nn.Conv1D(
num_state,
num_state,
kernel_size=1,
padding=0,
stride=1,
groups=1,
bias_attr=bias)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
h = self.conv1(x.transpose((0, 2, 1))).transpose((0, 2, 1))
h = h + x
h = self.relu(h)
h = self.conv2(h)
return h
class GraphTransfer(nn.Layer):
"""Transfer vis graph to class node, transfer class node to vis feature"""
def __init__(self, in_dim: int):
super().__init__()
self.channle_in = in_dim
self.query_conv = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
self.key_conv = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
self.value_conv_vis = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim, kernel_size=1)
self.value_conv_word = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim, kernel_size=1)
self.softmax_vis = nn.Softmax(axis=-1)
self.softmax_word = nn.Softmax(axis=-2)
def forward(self, word: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
m_batchsize, C, Nc = word.shape
m_batchsize, C, Nn = vis_node.shape
proj_query = self.query_conv(word).reshape((m_batchsize, -1, Nc))\
.transpose((0, 2, 1))
proj_key = self.key_conv(vis_node).reshape((m_batchsize, -1, Nn))
energy = paddle.bmm(proj_query, proj_key)
attention_vis = self.softmax_vis(energy).transpose((0, 2, 1))
attention_word = self.softmax_word(energy)
proj_value_vis = self.value_conv_vis(vis_node).reshape((m_batchsize, -1,
Nn))
proj_value_word = self.value_conv_word(word).reshape((m_batchsize, -1,
Nc))
class_out = paddle.bmm(proj_value_vis, attention_vis)
node_out = paddle.bmm(proj_value_word, attention_word)
return class_out, node_out
\ No newline at end of file
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
import ginet_resnet101vd_voc.layers as L
class BasicBlock(nn.Layer):
def __init__(self,
in_channels: int,
out_channels: int,
stride: int,
shortcut: bool = True,
if_first: bool = False,
name: str = None):
super(BasicBlock, self).__init__()
self.stride = stride
self.conv0 = L.ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=3,
stride=stride,
act='relu',
name=name + "_branch2a")
self.conv1 = L.ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
act=None,
name=name + "_branch2b")
if not shortcut:
self.short = L.ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1,
stride=1,
is_vd_mode=False if if_first else True,
name=name + "_branch1")
self.shortcut = shortcut
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv0(inputs)
conv1 = self.conv1(y)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
y = paddle.elementwise_add(x=short, y=conv1, act='relu')
return y
class ResNet101_vd(nn.Layer):
def __init__(self,
multi_grid: tuple = (1, 2, 4)):
super(ResNet101_vd, self).__init__()
depth = [3, 4, 23, 3]
num_channels = [64, 256, 512, 1024]
num_filters = [64, 128, 256, 512]
self.feat_channels = [c * 4 for c in num_filters]
dilation_dict = {2: 2, 3: 4}
self.conv1_1 = L.ConvBNLayer(
in_channels=3,
out_channels=32,
kernel_size=3,
stride=2,
act='relu',
name="conv1_1")
self.conv1_2 = L.ConvBNLayer(
in_channels=32,
out_channels=32,
kernel_size=3,
stride=1,
act='relu',
name="conv1_2")
self.conv1_3 = L.ConvBNLayer(
in_channels=32,
out_channels=64,
kernel_size=3,
stride=1,
act='relu',
name="conv1_3")
self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
self.stage_list = []
for block in range(len(depth)):
shortcut = False
block_list = []
for i in range(depth[block]):
conv_name = "res" + str(block + 2) + chr(97 + i)
dilation_rate = dilation_dict[
block] if dilation_dict and block in dilation_dict else 1
if block == 3:
dilation_rate = dilation_rate * multi_grid[i]
bottleneck_block = self.add_sublayer(
'bb_%d_%d' % (block, i),
L.BottleneckBlock(
in_channels=num_channels[block]
if i == 0 else num_filters[block] * 4,
out_channels=num_filters[block],
stride=2 if i == 0 and block != 0
and dilation_rate == 1 else 1,
shortcut=shortcut,
if_first=block == i == 0,
name=conv_name,
dilation=dilation_rate))
block_list.append(bottleneck_block)
shortcut = True
self.stage_list.append(block_list)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv1_1(inputs)
y = self.conv1_2(y)
y = self.conv1_3(y)
y = self.pool2d_max(y)
feat_list = []
for stage in self.stage_list:
for block in stage:
y = block(y)
feat_list.append(y)
return feat_list
\ No newline at end of file
# ginet_resnet50vd_ade20k
|模型名称|ginet_resnet50vd_ade20k|
| :--- | :---: |
|类别|图像-图像分割|
|网络|ginet_resnet50vd|
|数据集|ADE20K|
|是否支持Fine-tuning|是|
|模型大小|214MB|
|指标|-|
|最新更新日期|2021-12-14|
## 一、模型基本信息
- 样例结果示例:
- Sample results:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/145947107-b6f87161-d824-4c21-b01d-594ad03e56de.jpg" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/145947270-1b8e0671-c5d3-4b61-b99e-0af27ccd9096.png" hspace='10'/>
</p>
- ### 模型介绍
- 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
- 更多详情请参考:[ginet](https://arxiv.org/pdf/2009.06160)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.0.0
- paddlehub >= 2.0.0
- ### 2、安装
- ```shell
$ hub install ginet_resnet50vd_ade20k
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1.预测代码示例
- ```python
import cv2
import paddle
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet50vd_ade20k')
img = cv2.imread("/PATH/TO/IMAGE")
result = model.predict(images=[img], visualization=True)
```
- ### 2.如何开始Fine-tune
- 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ginet_resnet50vd_ade20k模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
- 代码步骤
- Step1: 定义数据预处理方式
- ```python
from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
transform = Compose([Resize(target_size=(512, 512)), Normalize()])
```
- `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
- Step2: 下载数据集并使用
- ```python
from paddlehub.datasets import OpticDiscSeg
train_reader = OpticDiscSeg(transform, mode='train')
```
- `transforms`: 数据预处理方式。
- `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
- 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
- Step3: 加载预训练模型
- ```python
import paddlehub as hub
model = hub.Module(name='ginet_resnet50vd_ade20k', num_classes=2, pretrained=None)
```
- `name`: 选择预训练模型的名字。
- `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
- Step4: 选择优化策略和运行配置
- ```python
import paddle
from paddlehub.finetune.trainer import Trainer
scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
```
- 模型预测
- 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet50vd_ade20k', pretrained='/PATH/TO/CHECKPOINT')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
- 参数配置正确后,请执行脚本`python predict.py`。
- **Args**
* `images`:原始图像路径或BGR格式图片;
* `visualization`: 是否可视化,默认为True;
* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
## 四、服务部署
- PaddleHub Serving可以部署一个在线图像分割服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m ginet_resnet50vd_ade20k
```
- 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_ade20k"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
mask = base64_to_cv2(r.json()["results"][0])
```
## 五、更新历史
* 1.0.0
初始发布
\ No newline at end of file
# ginet_resnet50vd_ade20k
|Module Name|ginet_resnet50vd_ade20k|
| :--- | :---: |
|Category|Image Segmentation|
|Network|ginet_resnet50vd|
|Dataset|ADE20K|
|Fine-tuning supported or not|Yes|
|Module Size|214MB|
|Data indicators|-|
|Latest update date|2021-12-14|
## I. Basic Information
- ### Application Effect Display
- Sample results:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/145947107-b6f87161-d824-4c21-b01d-594ad03e56de.jpg" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/145947270-1b8e0671-c5d3-4b61-b99e-0af27ccd9096.png" hspace='10'/>
</p>
- ### Module Introduction
- We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
- For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
## II. Installation
- ### 1、Environmental Dependence
- paddlepaddle >= 2.0.0
- paddlehub >= 2.0.0
- ### 2、Installation
- ```shell
$ hub install ginet_resnet50vd_ade20k
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
| [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III. Module API Prediction
- ### 1、Prediction Code Example
- ```python
import cv2
import paddle
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet50vd_ade20k')
img = cv2.imread("/PATH/TO/IMAGE")
result = model.predict(images=[img], visualization=True)
```
- ### 2.Fine-tune and Encapsulation
- After completing the installation of PaddlePaddle and PaddleHub, you can start using the ginet_resnet50vd_ade20k model to fine-tune datasets such as OpticDiscSeg.
- Steps:
- Step1: Define the data preprocessing method
- ```python
from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
transform = Compose([Resize(target_size=(512, 512)), Normalize()])
```
- `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
- Step2: Download the dataset
- ```python
from paddlehub.datasets import OpticDiscSeg
train_reader = OpticDiscSeg(transform, mode='train')
```
* `transforms`: data preprocessing methods.
* `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
* Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
- Step3: Load the pre-trained model
- ```python
import paddlehub as hub
model = hub.Module(name='ginet_resnet50vd_ade20k', num_classes=2, pretrained=None)
```
- `name`: model name.
- `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
- Step4: Optimization strategy
- ```python
import paddle
from paddlehub.finetune.trainer import Trainer
scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
```
- Model prediction
- When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet50vd_ade20k', pretrained='/PATH/TO/CHECKPOINT')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
- **Args**
* `images`: Image path or ndarray data with format [H, W, C], BGR.
* `visualization`: Whether to save the recognition results as picture files.
* `save_path`: Save path of the result, default is 'seg_result'.
## IV. Server Deployment
- PaddleHub Serving can deploy an online service of image segmentation.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m ginet_resnet50vd_ade20k
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result:
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_ade20k"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
mask = base64_to_cv2(r.json()["results"][0])
```
## V. Release Note
- 1.0.0
First release
\ No newline at end of file
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn.layer import activation
from paddle.nn import Conv2D, AvgPool2D
def SyncBatchNorm(*args, **kwargs):
"""In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
if paddle.get_device() == 'cpu':
return nn.BatchNorm2D(*args, **kwargs)
else:
return nn.SyncBatchNorm(*args, **kwargs)
class ConvBNLayer(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(
self,
in_channels: int,
out_channels: int,
kernel_size: int,
stride: int = 1,
dilation: int = 1,
groups: int = 1,
is_vd_mode: bool = False,
act: str = None,
name: str = None):
super(ConvBNLayer, self).__init__()
self.is_vd_mode = is_vd_mode
self._pool2d_avg = AvgPool2D(
kernel_size=2, stride=2, padding=0, ceil_mode=True)
self._conv = Conv2D(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
dilation=dilation,
groups=groups,
bias_attr=False)
self._batch_norm = SyncBatchNorm(out_channels)
self._act_op = Activation(act=act)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
if self.is_vd_mode:
inputs = self._pool2d_avg(inputs)
y = self._conv(inputs)
y = self._batch_norm(y)
y = self._act_op(y)
return y
class BottleneckBlock(nn.Layer):
"""Residual bottleneck block"""
def __init__(self,
in_channels: int,
out_channels: int,
stride: int,
shortcut: bool = True,
if_first: bool = False,
dilation: int = 1,
name: str = None):
super(BottleneckBlock, self).__init__()
self.conv0 = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1,
act='relu',
name=name + "_branch2a")
self.dilation = dilation
self.conv1 = ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
stride=stride,
act='relu',
dilation=dilation,
name=name + "_branch2b")
self.conv2 = ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels * 4,
kernel_size=1,
act=None,
name=name + "_branch2c")
if not shortcut:
self.short = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels * 4,
kernel_size=1,
stride=1,
is_vd_mode=False if if_first or stride == 1 else True,
name=name + "_branch1")
self.shortcut = shortcut
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv0(inputs)
if self.dilation > 1:
padding = self.dilation
y = F.pad(y, [padding, padding, padding, padding])
conv1 = self.conv1(y)
conv2 = self.conv2(conv1)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
y = paddle.add(x=short, y=conv2)
y = F.relu(y)
return y
class SeparableConvBNReLU(nn.Layer):
"""Depthwise Separable Convolution."""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
padding: str = 'same',
**kwargs: dict):
super(SeparableConvBNReLU, self).__init__()
self.depthwise_conv = ConvBN(
in_channels,
out_channels=in_channels,
kernel_size=kernel_size,
padding=padding,
groups=in_channels,
**kwargs)
self.piontwise_conv = ConvBNReLU(
in_channels, out_channels, kernel_size=1, groups=1)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.depthwise_conv(x)
x = self.piontwise_conv(x)
return x
class ConvBN(nn.Layer):
"""Basic conv bn layer"""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
padding: str = 'same',
**kwargs: dict):
super(ConvBN, self).__init__()
self._conv = Conv2D(
in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
return x
class ConvBNReLU(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
padding: str = 'same',
**kwargs: dict):
super(ConvBNReLU, self).__init__()
self._conv = Conv2D(
in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
x = F.relu(x)
return x
class Activation(nn.Layer):
"""
The wrapper of activations.
Args:
act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
'hsigmoid']. Default: None, means identical transformation.
Returns:
A callable object of Activation.
Raises:
KeyError: When parameter `act` is not in the optional range.
Examples:
from paddleseg.models.common.activation import Activation
relu = Activation("relu")
print(relu)
# <class 'paddle.nn.layer.activation.ReLU'>
sigmoid = Activation("sigmoid")
print(sigmoid)
# <class 'paddle.nn.layer.activation.Sigmoid'>
not_exit_one = Activation("not_exit_one")
# KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
# 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
# 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
"""
def __init__(self, act: str = None):
super(Activation, self).__init__()
self._act = act
upper_act_names = activation.__dict__.keys()
lower_act_names = [act.lower() for act in upper_act_names]
act_dict = dict(zip(lower_act_names, upper_act_names))
if act is not None:
if act in act_dict.keys():
act_name = act_dict[act]
self.act_func = eval("activation.{}()".format(act_name))
else:
raise KeyError("{} does not exist in the current {}".format(
act, act_dict.keys()))
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
if self._act is not None:
return self.act_func(x)
else:
return x
class ASPPModule(nn.Layer):
"""
Atrous Spatial Pyramid Pooling.
Args:
aspp_ratios (tuple): The dilation rate using in ASSP module.
in_channels (int): The number of input channels.
out_channels (int): The number of output channels.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
image_pooling (bool, optional): If augmented with image-level features. Default: False
"""
def __init__(self,
aspp_ratios: tuple,
in_channels: int,
out_channels: int,
align_corners: bool,
use_sep_conv: bool= False,
image_pooling: bool = False):
super().__init__()
self.align_corners = align_corners
self.aspp_blocks = nn.LayerList()
for ratio in aspp_ratios:
if use_sep_conv and ratio > 1:
conv_func = SeparableConvBNReLU
else:
conv_func = ConvBNReLU
block = conv_func(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1 if ratio == 1 else 3,
dilation=ratio,
padding=0 if ratio == 1 else ratio)
self.aspp_blocks.append(block)
out_size = len(self.aspp_blocks)
if image_pooling:
self.global_avg_pool = nn.Sequential(
nn.AdaptiveAvgPool2D(output_size=(1, 1)),
ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
out_size += 1
self.image_pooling = image_pooling
self.conv_bn_relu = ConvBNReLU(
in_channels=out_channels * out_size,
out_channels=out_channels,
kernel_size=1)
self.dropout = nn.Dropout(p=0.1) # drop rate
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
outputs = []
for block in self.aspp_blocks:
y = block(x)
y = F.interpolate(
y,
x.shape[2:],
mode='bilinear',
align_corners=self.align_corners)
outputs.append(y)
if self.image_pooling:
img_avg = self.global_avg_pool(x)
img_avg = F.interpolate(
img_avg,
x.shape[2:],
mode='bilinear',
align_corners=self.align_corners)
outputs.append(img_avg)
x = paddle.concat(outputs, axis=1)
x = self.conv_bn_relu(x)
x = self.dropout(x)
return x
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from typing import Union, List, Tuple
import paddle
from paddle import nn
import paddle.nn.functional as F
import numpy as np
from paddlehub.module.module import moduleinfo
import paddlehub.vision.segmentation_transforms as T
from paddlehub.module.cv_module import ImageSegmentationModule
from paddleseg.utils import utils
from paddleseg.models import layers
from ginet_resnet50vd_ade20k.resnet import ResNet50_vd
@moduleinfo(
name="ginet_resnet50vd_ade20k",
type="CV/semantic_segmentation",
author="paddlepaddle",
author_email="",
summary="GINetResnet50 is a segmentation model.",
version="1.0.0",
meta=ImageSegmentationModule)
class GINetResNet50(nn.Layer):
"""
The GINetResNet50 implementation based on PaddlePaddle.
The original article refers to
Wu, Tianyi, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, and Guodong Guo. "GINet: Graph interaction network for scene parsing." In European Conference on Computer Vision, pp. 34-51. Springer, Cham, 2020.
(https://arxiv.org/pdf/2009.06160).
Args:
num_classes (int): The unique number of target classes.
backbone_indices (tuple, optional): Values in the tuple indicate the indices of output of backbone.
enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss.
If true, auxiliary loss will be added after LearningToDownsample module. Default: False.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
jpu (bool, optional)): whether to use jpu unit in the base forward. Default:True.
pretrained (str, optional): The path or url of pretrained model. Default: None.
"""
def __init__(self,
num_classes: int = 150,
backbone_indices: Tuple[int]=(0, 1, 2, 3),
enable_auxiliary_loss: bool = True,
align_corners: bool = True,
jpu: bool = True,
pretrained: str = None):
super(GINetResNet50, self).__init__()
self.nclass = num_classes
self.aux = enable_auxiliary_loss
self.jpu = jpu
self.backbone = ResNet50_vd()
self.backbone_indices = backbone_indices
self.align_corners = align_corners
self.transforms = T.Compose([T.Normalize()])
self.jpu = layers.JPU([512, 1024, 2048], width=512) if jpu else None
self.head = GIHead(in_channels=2048, nclass=num_classes)
if self.aux:
self.auxlayer = layers.AuxLayer(
1024, 1024 // 4, num_classes, bias_attr=False)
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'model.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
return self.transforms(img)
def base_forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
feat_list = self.backbone(x)
c1, c2, c3, c4 = [feat_list[i] for i in self.backbone_indices]
if self.jpu:
return self.jpu(c1, c2, c3, c4)
else:
return c1, c2, c3, c4
def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
_, _, h, w = x.shape
_, _, c3, c4 = self.base_forward(x)
logit_list = []
x, _ = self.head(c4)
logit_list.append(x)
if self.aux:
auxout = self.auxlayer(c3)
logit_list.append(auxout)
return [
F.interpolate(
logit, (h, w),
mode='bilinear',
align_corners=self.align_corners) for logit in logit_list
]
class GIHead(nn.Layer):
"""The Graph Interaction Network head."""
def __init__(self, in_channels: int, nclass: int):
super().__init__()
self.nclass = nclass
inter_channels = in_channels // 4
self.inp = paddle.zeros(shape=(nclass, 300), dtype='float32')
self.inp = paddle.create_parameter(
shape=self.inp.shape,
dtype=str(self.inp.numpy().dtype),
default_initializer=paddle.nn.initializer.Assign(self.inp))
self.fc1 = nn.Sequential(
nn.Linear(300, 128), nn.BatchNorm1D(128), nn.ReLU())
self.fc2 = nn.Sequential(
nn.Linear(128, 256), nn.BatchNorm1D(256), nn.ReLU())
self.conv5 = layers.ConvBNReLU(
in_channels,
inter_channels,
3,
padding=1,
bias_attr=False,
stride=1)
self.gloru = GlobalReasonUnit(
in_channels=inter_channels,
num_state=256,
num_node=84,
nclass=nclass)
self.conv6 = nn.Sequential(
nn.Dropout(0.1), nn.Conv2D(inter_channels, nclass, 1))
def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
B, C, H, W = x.shape
inp = self.inp.detach()
inp = self.fc1(inp)
inp = self.fc2(inp).unsqueeze(axis=0).transpose((0, 2, 1))\
.expand((B, 256, self.nclass))
out = self.conv5(x)
out, se_out = self.gloru(out, inp)
out = self.conv6(out)
return out, se_out
class GlobalReasonUnit(nn.Layer):
"""
The original paper refers to:
Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks" (https://arxiv.org/abs/1811.12814)
"""
def __init__(self, in_channels: int, num_state: int = 256, num_node: int = 84, nclass: int = 59):
super().__init__()
self.num_state = num_state
self.conv_theta = nn.Conv2D(
in_channels, num_node, kernel_size=1, stride=1, padding=0)
self.conv_phi = nn.Conv2D(
in_channels, num_state, kernel_size=1, stride=1, padding=0)
self.graph = GraphLayer(num_state, num_node, nclass)
self.extend_dim = nn.Conv2D(
num_state, in_channels, kernel_size=1, bias_attr=False)
self.bn = layers.SyncBatchNorm(in_channels)
def forward(self, x: paddle.Tensor, inp:paddle.Tensor) -> List[paddle.Tensor]:
B = self.conv_theta(x)
sizeB = B.shape
B = B.reshape((sizeB[0], sizeB[1], -1))
sizex = x.shape
x_reduce = self.conv_phi(x)
x_reduce = x_reduce.reshape((sizex[0], -1, sizex[2] * sizex[3]))\
.transpose((0, 2, 1))
V = paddle.bmm(B, x_reduce).transpose((0, 2, 1))
V = paddle.divide(
V, paddle.to_tensor([sizex[2] * sizex[3]], dtype='float32'))
class_node, new_V = self.graph(inp, V)
D = B.reshape((sizeB[0], -1, sizeB[2] * sizeB[3])).transpose((0, 2, 1))
Y = paddle.bmm(D, new_V.transpose((0, 2, 1)))
Y = Y.transpose((0, 2, 1)).reshape((sizex[0], self.num_state, \
sizex[2], -1))
Y = self.extend_dim(Y)
Y = self.bn(Y)
out = Y + x
return out, class_node
class GraphLayer(nn.Layer):
def __init__(self, num_state: int, num_node: int, num_class: int):
super().__init__()
self.vis_gcn = GCN(num_state, num_node)
self.word_gcn = GCN(num_state, num_class)
self.transfer = GraphTransfer(num_state)
self.gamma_vis = paddle.zeros([num_node])
self.gamma_word = paddle.zeros([num_class])
self.gamma_vis = paddle.create_parameter(
shape=self.gamma_vis.shape,
dtype=str(self.gamma_vis.numpy().dtype),
default_initializer=paddle.nn.initializer.Assign(self.gamma_vis))
self.gamma_word = paddle.create_parameter(
shape=self.gamma_word.shape,
dtype=str(self.gamma_word.numpy().dtype),
default_initializer=paddle.nn.initializer.Assign(self.gamma_word))
def forward(self, inp: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
inp = self.word_gcn(inp)
new_V = self.vis_gcn(vis_node)
class_node, vis_node = self.transfer(inp, new_V)
class_node = self.gamma_word * inp + class_node
new_V = self.gamma_vis * vis_node + new_V
return class_node, new_V
class GCN(nn.Layer):
def __init__(self, num_state: int = 128, num_node: int = 64, bias: bool = False):
super().__init__()
self.conv1 = nn.Conv1D(
num_node,
num_node,
kernel_size=1,
padding=0,
stride=1,
groups=1,
)
self.relu = nn.ReLU()
self.conv2 = nn.Conv1D(
num_state,
num_state,
kernel_size=1,
padding=0,
stride=1,
groups=1,
bias_attr=bias)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
h = self.conv1(x.transpose((0, 2, 1))).transpose((0, 2, 1))
h = h + x
h = self.relu(h)
h = self.conv2(h)
return h
class GraphTransfer(nn.Layer):
"""Transfer vis graph to class node, transfer class node to vis feature"""
def __init__(self, in_dim: int):
super().__init__()
self.channle_in = in_dim
self.query_conv = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
self.key_conv = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
self.value_conv_vis = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim, kernel_size=1)
self.value_conv_word = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim, kernel_size=1)
self.softmax_vis = nn.Softmax(axis=-1)
self.softmax_word = nn.Softmax(axis=-2)
def forward(self, word: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
m_batchsize, C, Nc = word.shape
m_batchsize, C, Nn = vis_node.shape
proj_query = self.query_conv(word).reshape((m_batchsize, -1, Nc))\
.transpose((0, 2, 1))
proj_key = self.key_conv(vis_node).reshape((m_batchsize, -1, Nn))
energy = paddle.bmm(proj_query, proj_key)
attention_vis = self.softmax_vis(energy).transpose((0, 2, 1))
attention_word = self.softmax_word(energy)
proj_value_vis = self.value_conv_vis(vis_node).reshape((m_batchsize, -1,
Nn))
proj_value_word = self.value_conv_word(word).reshape((m_batchsize, -1,
Nc))
class_out = paddle.bmm(proj_value_vis, attention_vis)
node_out = paddle.bmm(proj_value_word, attention_word)
return class_out, node_out
\ No newline at end of file
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
import ginet_resnet50vd_ade20k.layers as L
class BasicBlock(nn.Layer):
def __init__(self,
in_channels: int,
out_channels: int,
stride: int,
shortcut: bool = True,
if_first: bool = False,
name: str = None):
super(BasicBlock, self).__init__()
self.stride = stride
self.conv0 = L.ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=3,
stride=stride,
act='relu',
name=name + "_branch2a")
self.conv1 = L.ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
act=None,
name=name + "_branch2b")
if not shortcut:
self.short = L.ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1,
stride=1,
is_vd_mode=False if if_first else True,
name=name + "_branch1")
self.shortcut = shortcut
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv0(inputs)
conv1 = self.conv1(y)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
y = paddle.elementwise_add(x=short, y=conv1, act='relu')
return y
class ResNet50_vd(nn.Layer):
def __init__(self,
multi_grid: tuple = (1, 2, 4)):
super(ResNet50_vd, self).__init__()
depth = [3, 4, 6, 3]
num_channels = [64, 256, 512, 1024]
num_filters = [64, 128, 256, 512]
self.feat_channels = [c * 4 for c in num_filters]
dilation_dict = {2: 2, 3: 4}
self.conv1_1 = L.ConvBNLayer(
in_channels=3,
out_channels=32,
kernel_size=3,
stride=2,
act='relu',
name="conv1_1")
self.conv1_2 = L.ConvBNLayer(
in_channels=32,
out_channels=32,
kernel_size=3,
stride=1,
act='relu',
name="conv1_2")
self.conv1_3 = L.ConvBNLayer(
in_channels=32,
out_channels=64,
kernel_size=3,
stride=1,
act='relu',
name="conv1_3")
self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
self.stage_list = []
for block in range(len(depth)):
shortcut = False
block_list = []
for i in range(depth[block]):
conv_name = "res" + str(block + 2) + chr(97 + i)
dilation_rate = dilation_dict[
block] if dilation_dict and block in dilation_dict else 1
if block == 3:
dilation_rate = dilation_rate * multi_grid[i]
bottleneck_block = self.add_sublayer(
'bb_%d_%d' % (block, i),
L.BottleneckBlock(
in_channels=num_channels[block]
if i == 0 else num_filters[block] * 4,
out_channels=num_filters[block],
stride=2 if i == 0 and block != 0
and dilation_rate == 1 else 1,
shortcut=shortcut,
if_first=block == i == 0,
name=conv_name,
dilation=dilation_rate))
block_list.append(bottleneck_block)
shortcut = True
self.stage_list.append(block_list)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv1_1(inputs)
y = self.conv1_2(y)
y = self.conv1_3(y)
y = self.pool2d_max(y)
feat_list = []
for stage in self.stage_list:
for block in stage:
y = block(y)
feat_list.append(y)
return feat_list
\ No newline at end of file
# ginet_resnet50vd_cityscapes
|模型名称|ginet_resnet50vd_cityscapes|
| :--- | :---: |
|类别|图像-图像分割|
|网络|ginet_resnet50vd|
|数据集|Cityscapes|
|是否支持Fine-tuning|是|
|模型大小|214MB|
|指标|-|
|最新更新日期|2021-12-14|
## 一、模型基本信息
- 样例结果示例:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/145943452-6e3a8cce-b17c-417e-80ad-d47e1dd5e00c.png" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/145943518-b608d555-1ddb-4100-b399-b6f777658caf.png" hspace='10'/>
</p>
- ### 模型介绍
- 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
- 更多详情请参考:[ginet](https://arxiv.org/pdf/2009.06160)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.0.0
- paddlehub >= 2.0.0
- ### 2、安装
- ```shell
$ hub install ginet_resnet50vd_cityscapes
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1.预测代码示例
- ```python
import cv2
import paddle
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet50vd_cityscapes')
img = cv2.imread("/PATH/TO/IMAGE")
result = model.predict(images=[img], visualization=True)
```
- ### 2.如何开始Fine-tune
- 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ginet_resnet50vd_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
- 代码步骤
- Step1: 定义数据预处理方式
- ```python
from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
transform = Compose([Resize(target_size=(512, 512)), Normalize()])
```
- `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
- Step2: 下载数据集并使用
- ```python
from paddlehub.datasets import OpticDiscSeg
train_reader = OpticDiscSeg(transform, mode='train')
```
- `transforms`: 数据预处理方式。
- `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
- 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
- Step3: 加载预训练模型
- ```python
import paddlehub as hub
model = hub.Module(name='ginet_resnet50vd_cityscapes', num_classes=2, pretrained=None)
```
- `name`: 选择预训练模型的名字。
- `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
- Step4: 选择优化策略和运行配置
- ```python
import paddle
from paddlehub.finetune.trainer import Trainer
scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
```
- 模型预测
- 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet50vd_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
- 参数配置正确后,请执行脚本`python predict.py`。
- **Args**
* `images`:原始图像路径或BGR格式图片;
* `visualization`: 是否可视化,默认为True;
* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
## 四、服务部署
- PaddleHub Serving可以部署一个在线图像分割服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m ginet_resnet50vd_cityscapes
```
- 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_cityscapes"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
mask = base64_to_cv2(r.json()["results"][0])
```
## 五、更新历史
* 1.0.0
初始发布
\ No newline at end of file
# ginet_resnet50vd_cityscapes
|Module Name|ginet_resnet50vd_cityscapes|
| :--- | :---: |
|Category|Image Segmentation|
|Network|ginet_resnet50vd|
|Dataset|Cityscapes|
|Fine-tuning supported or not|Yes|
|Module Size|214MB|
|Data indicators|-|
|Latest update date|2021-12-14|
## I. Basic Information
- ### Application Effect Display
- Sample results:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/145943452-6e3a8cce-b17c-417e-80ad-d47e1dd5e00c.png" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/145943518-b608d555-1ddb-4100-b399-b6f777658caf.png" hspace='10'/>
</p>
- ### Module Introduction
- We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
- For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
## II. Installation
- ### 1、Environmental Dependence
- paddlepaddle >= 2.0.0
- paddlehub >= 2.0.0
- ### 2、Installation
- ```shell
$ hub install ginet_resnet50vd_cityscapes
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
| [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III. Module API Prediction
- ### 1、Prediction Code Example
- ```python
import cv2
import paddle
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet50vd_cityscapes')
img = cv2.imread("/PATH/TO/IMAGE")
result = model.predict(images=[img], visualization=True)
```
- ### 2.Fine-tune and Encapsulation
- After completing the installation of PaddlePaddle and PaddleHub, you can start using the ginet_resnet50vd_cityscapes model to fine-tune datasets such as OpticDiscSeg.
- Steps:
- Step1: Define the data preprocessing method
- ```python
from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
transform = Compose([Resize(target_size=(512, 512)), Normalize()])
```
- `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
- Step2: Download the dataset
- ```python
from paddlehub.datasets import OpticDiscSeg
train_reader = OpticDiscSeg(transform, mode='train')
```
* `transforms`: data preprocessing methods.
* `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
* Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
- Step3: Load the pre-trained model
- ```python
import paddlehub as hub
model = hub.Module(name='ginet_resnet50vd_cityscapes', num_classes=2, pretrained=None)
```
- `name`: model name.
- `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
- Step4: Optimization strategy
- ```python
import paddle
from paddlehub.finetune.trainer import Trainer
scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
```
- Model prediction
- When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet50vd_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
- **Args**
* `images`: Image path or ndarray data with format [H, W, C], BGR.
* `visualization`: Whether to save the recognition results as picture files.
* `save_path`: Save path of the result, default is 'seg_result'.
## IV. Server Deployment
- PaddleHub Serving can deploy an online service of image segmentation.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m ginet_resnet50vd_cityscapes
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result:
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_cityscapes"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
mask = base64_to_cv2(r.json()["results"][0])
```
## V. Release Note
- 1.0.0
First release
\ No newline at end of file
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn.layer import activation
from paddle.nn import Conv2D, AvgPool2D
def SyncBatchNorm(*args, **kwargs):
"""In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
if paddle.get_device() == 'cpu':
return nn.BatchNorm2D(*args, **kwargs)
else:
return nn.SyncBatchNorm(*args, **kwargs)
class ConvBNLayer(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(
self,
in_channels: int,
out_channels: int,
kernel_size: int,
stride: int = 1,
dilation: int = 1,
groups: int = 1,
is_vd_mode: bool = False,
act: str = None,
name: str = None):
super(ConvBNLayer, self).__init__()
self.is_vd_mode = is_vd_mode
self._pool2d_avg = AvgPool2D(
kernel_size=2, stride=2, padding=0, ceil_mode=True)
self._conv = Conv2D(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
dilation=dilation,
groups=groups,
bias_attr=False)
self._batch_norm = SyncBatchNorm(out_channels)
self._act_op = Activation(act=act)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
if self.is_vd_mode:
inputs = self._pool2d_avg(inputs)
y = self._conv(inputs)
y = self._batch_norm(y)
y = self._act_op(y)
return y
class BottleneckBlock(nn.Layer):
"""Residual bottleneck block"""
def __init__(self,
in_channels: int,
out_channels: int,
stride: int,
shortcut: bool = True,
if_first: bool = False,
dilation: int = 1,
name: str = None):
super(BottleneckBlock, self).__init__()
self.conv0 = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1,
act='relu',
name=name + "_branch2a")
self.dilation = dilation
self.conv1 = ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
stride=stride,
act='relu',
dilation=dilation,
name=name + "_branch2b")
self.conv2 = ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels * 4,
kernel_size=1,
act=None,
name=name + "_branch2c")
if not shortcut:
self.short = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels * 4,
kernel_size=1,
stride=1,
is_vd_mode=False if if_first or stride == 1 else True,
name=name + "_branch1")
self.shortcut = shortcut
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv0(inputs)
if self.dilation > 1:
padding = self.dilation
y = F.pad(y, [padding, padding, padding, padding])
conv1 = self.conv1(y)
conv2 = self.conv2(conv1)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
y = paddle.add(x=short, y=conv2)
y = F.relu(y)
return y
class SeparableConvBNReLU(nn.Layer):
"""Depthwise Separable Convolution."""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
padding: str = 'same',
**kwargs: dict):
super(SeparableConvBNReLU, self).__init__()
self.depthwise_conv = ConvBN(
in_channels,
out_channels=in_channels,
kernel_size=kernel_size,
padding=padding,
groups=in_channels,
**kwargs)
self.piontwise_conv = ConvBNReLU(
in_channels, out_channels, kernel_size=1, groups=1)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.depthwise_conv(x)
x = self.piontwise_conv(x)
return x
class ConvBN(nn.Layer):
"""Basic conv bn layer"""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
padding: str = 'same',
**kwargs: dict):
super(ConvBN, self).__init__()
self._conv = Conv2D(
in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
return x
class ConvBNReLU(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
padding: str = 'same',
**kwargs: dict):
super(ConvBNReLU, self).__init__()
self._conv = Conv2D(
in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
x = F.relu(x)
return x
class Activation(nn.Layer):
"""
The wrapper of activations.
Args:
act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
'hsigmoid']. Default: None, means identical transformation.
Returns:
A callable object of Activation.
Raises:
KeyError: When parameter `act` is not in the optional range.
Examples:
from paddleseg.models.common.activation import Activation
relu = Activation("relu")
print(relu)
# <class 'paddle.nn.layer.activation.ReLU'>
sigmoid = Activation("sigmoid")
print(sigmoid)
# <class 'paddle.nn.layer.activation.Sigmoid'>
not_exit_one = Activation("not_exit_one")
# KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
# 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
# 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
"""
def __init__(self, act: str = None):
super(Activation, self).__init__()
self._act = act
upper_act_names = activation.__dict__.keys()
lower_act_names = [act.lower() for act in upper_act_names]
act_dict = dict(zip(lower_act_names, upper_act_names))
if act is not None:
if act in act_dict.keys():
act_name = act_dict[act]
self.act_func = eval("activation.{}()".format(act_name))
else:
raise KeyError("{} does not exist in the current {}".format(
act, act_dict.keys()))
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
if self._act is not None:
return self.act_func(x)
else:
return x
class ASPPModule(nn.Layer):
"""
Atrous Spatial Pyramid Pooling.
Args:
aspp_ratios (tuple): The dilation rate using in ASSP module.
in_channels (int): The number of input channels.
out_channels (int): The number of output channels.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
image_pooling (bool, optional): If augmented with image-level features. Default: False
"""
def __init__(self,
aspp_ratios: tuple,
in_channels: int,
out_channels: int,
align_corners: bool,
use_sep_conv: bool= False,
image_pooling: bool = False):
super().__init__()
self.align_corners = align_corners
self.aspp_blocks = nn.LayerList()
for ratio in aspp_ratios:
if use_sep_conv and ratio > 1:
conv_func = SeparableConvBNReLU
else:
conv_func = ConvBNReLU
block = conv_func(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1 if ratio == 1 else 3,
dilation=ratio,
padding=0 if ratio == 1 else ratio)
self.aspp_blocks.append(block)
out_size = len(self.aspp_blocks)
if image_pooling:
self.global_avg_pool = nn.Sequential(
nn.AdaptiveAvgPool2D(output_size=(1, 1)),
ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
out_size += 1
self.image_pooling = image_pooling
self.conv_bn_relu = ConvBNReLU(
in_channels=out_channels * out_size,
out_channels=out_channels,
kernel_size=1)
self.dropout = nn.Dropout(p=0.1) # drop rate
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
outputs = []
for block in self.aspp_blocks:
y = block(x)
y = F.interpolate(
y,
x.shape[2:],
mode='bilinear',
align_corners=self.align_corners)
outputs.append(y)
if self.image_pooling:
img_avg = self.global_avg_pool(x)
img_avg = F.interpolate(
img_avg,
x.shape[2:],
mode='bilinear',
align_corners=self.align_corners)
outputs.append(img_avg)
x = paddle.concat(outputs, axis=1)
x = self.conv_bn_relu(x)
x = self.dropout(x)
return x
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from typing import Union, List, Tuple
import paddle
from paddle import nn
import paddle.nn.functional as F
import numpy as np
from paddlehub.module.module import moduleinfo
import paddlehub.vision.segmentation_transforms as T
from paddlehub.module.cv_module import ImageSegmentationModule
from paddleseg.utils import utils
from paddleseg.models import layers
from ginet_resnet50vd_cityscapes.resnet import ResNet50_vd
@moduleinfo(
name="ginet_resnet50vd_cityscapes",
type="CV/semantic_segmentation",
author="paddlepaddle",
author_email="",
summary="GINetResnet50 is a segmentation model.",
version="1.0.0",
meta=ImageSegmentationModule)
class GINetResNet50(nn.Layer):
"""
The GINetResNet50 implementation based on PaddlePaddle.
The original article refers to
Wu, Tianyi, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, and Guodong Guo. "GINet: Graph interaction network for scene parsing." In European Conference on Computer Vision, pp. 34-51. Springer, Cham, 2020.
(https://arxiv.org/pdf/2009.06160).
Args:
num_classes (int): The unique number of target classes.
backbone_indices (tuple, optional): Values in the tuple indicate the indices of output of backbone.
enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss.
If true, auxiliary loss will be added after LearningToDownsample module. Default: False.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
jpu (bool, optional)): whether to use jpu unit in the base forward. Default:True.
pretrained (str, optional): The path or url of pretrained model. Default: None.
"""
def __init__(self,
num_classes: int = 19,
backbone_indices: Tuple[int]=(0, 1, 2, 3),
enable_auxiliary_loss: bool = True,
align_corners: bool = True,
jpu: bool = True,
pretrained: str = None):
super(GINetResNet50, self).__init__()
self.nclass = num_classes
self.aux = enable_auxiliary_loss
self.jpu = jpu
self.backbone = ResNet50_vd()
self.backbone_indices = backbone_indices
self.align_corners = align_corners
self.transforms = T.Compose([T.Normalize()])
self.jpu = layers.JPU([512, 1024, 2048], width=512) if jpu else None
self.head = GIHead(in_channels=2048, nclass=num_classes)
if self.aux:
self.auxlayer = layers.AuxLayer(
1024, 1024 // 4, num_classes, bias_attr=False)
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'model.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
return self.transforms(img)
def base_forward(self, x: paddle.Tensor) -> paddle.Tensor:
feat_list = self.backbone(x)
c1, c2, c3, c4 = [feat_list[i] for i in self.backbone_indices]
if self.jpu:
return self.jpu(c1, c2, c3, c4)
else:
return c1, c2, c3, c4
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
_, _, h, w = x.shape
_, _, c3, c4 = self.base_forward(x)
logit_list = []
x, _ = self.head(c4)
logit_list.append(x)
if self.aux:
auxout = self.auxlayer(c3)
logit_list.append(auxout)
return [
F.interpolate(
logit, (h, w),
mode='bilinear',
align_corners=self.align_corners) for logit in logit_list
]
class GIHead(nn.Layer):
"""The Graph Interaction Network head."""
def __init__(self, in_channels: int, nclass: int):
super().__init__()
self.nclass = nclass
inter_channels = in_channels // 4
self.inp = paddle.zeros(shape=(nclass, 300), dtype='float32')
self.inp = paddle.create_parameter(
shape=self.inp.shape,
dtype=str(self.inp.numpy().dtype),
default_initializer=paddle.nn.initializer.Assign(self.inp))
self.fc1 = nn.Sequential(
nn.Linear(300, 128), nn.BatchNorm1D(128), nn.ReLU())
self.fc2 = nn.Sequential(
nn.Linear(128, 256), nn.BatchNorm1D(256), nn.ReLU())
self.conv5 = layers.ConvBNReLU(
in_channels,
inter_channels,
3,
padding=1,
bias_attr=False,
stride=1)
self.gloru = GlobalReasonUnit(
in_channels=inter_channels,
num_state=256,
num_node=84,
nclass=nclass)
self.conv6 = nn.Sequential(
nn.Dropout(0.1), nn.Conv2D(inter_channels, nclass, 1))
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
B, C, H, W = x.shape
inp = self.inp.detach()
inp = self.fc1(inp)
inp = self.fc2(inp).unsqueeze(axis=0).transpose((0, 2, 1))\
.expand((B, 256, self.nclass))
out = self.conv5(x)
out, se_out = self.gloru(out, inp)
out = self.conv6(out)
return out, se_out
class GlobalReasonUnit(nn.Layer):
"""
The original paper refers to:
Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks" (https://arxiv.org/abs/1811.12814)
"""
def __init__(self, in_channels: int, num_state: int = 256, num_node: int = 84, nclass: int = 59):
super().__init__()
self.num_state = num_state
self.conv_theta = nn.Conv2D(
in_channels, num_node, kernel_size=1, stride=1, padding=0)
self.conv_phi = nn.Conv2D(
in_channels, num_state, kernel_size=1, stride=1, padding=0)
self.graph = GraphLayer(num_state, num_node, nclass)
self.extend_dim = nn.Conv2D(
num_state, in_channels, kernel_size=1, bias_attr=False)
self.bn = layers.SyncBatchNorm(in_channels)
def forward(self, x: paddle.Tensor, inp: paddle.Tensor) -> paddle.Tensor:
B = self.conv_theta(x)
sizeB = B.shape
B = B.reshape((sizeB[0], sizeB[1], -1))
sizex = x.shape
x_reduce = self.conv_phi(x)
x_reduce = x_reduce.reshape((sizex[0], -1, sizex[2] * sizex[3]))\
.transpose((0, 2, 1))
V = paddle.bmm(B, x_reduce).transpose((0, 2, 1))
V = paddle.divide(
V, paddle.to_tensor([sizex[2] * sizex[3]], dtype='float32'))
class_node, new_V = self.graph(inp, V)
D = B.reshape((sizeB[0], -1, sizeB[2] * sizeB[3])).transpose((0, 2, 1))
Y = paddle.bmm(D, new_V.transpose((0, 2, 1)))
Y = Y.transpose((0, 2, 1)).reshape((sizex[0], self.num_state, \
sizex[2], -1))
Y = self.extend_dim(Y)
Y = self.bn(Y)
out = Y + x
return out, class_node
class GraphLayer(nn.Layer):
def __init__(self, num_state: int, num_node: int, num_class: int):
super().__init__()
self.vis_gcn = GCN(num_state, num_node)
self.word_gcn = GCN(num_state, num_class)
self.transfer = GraphTransfer(num_state)
self.gamma_vis = paddle.zeros([num_node])
self.gamma_word = paddle.zeros([num_class])
self.gamma_vis = paddle.create_parameter(
shape=self.gamma_vis.shape,
dtype=str(self.gamma_vis.numpy().dtype),
default_initializer=paddle.nn.initializer.Assign(self.gamma_vis))
self.gamma_word = paddle.create_parameter(
shape=self.gamma_word.shape,
dtype=str(self.gamma_word.numpy().dtype),
default_initializer=paddle.nn.initializer.Assign(self.gamma_word))
def forward(self, inp: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
inp = self.word_gcn(inp)
new_V = self.vis_gcn(vis_node)
class_node, vis_node = self.transfer(inp, new_V)
class_node = self.gamma_word * inp + class_node
new_V = self.gamma_vis * vis_node + new_V
return class_node, new_V
class GCN(nn.Layer):
def __init__(self, num_state: int = 128, num_node: int = 64, bias: bool = False):
super().__init__()
self.conv1 = nn.Conv1D(
num_node,
num_node,
kernel_size=1,
padding=0,
stride=1,
groups=1,
)
self.relu = nn.ReLU()
self.conv2 = nn.Conv1D(
num_state,
num_state,
kernel_size=1,
padding=0,
stride=1,
groups=1,
bias_attr=bias)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
h = self.conv1(x.transpose((0, 2, 1))).transpose((0, 2, 1))
h = h + x
h = self.relu(h)
h = self.conv2(h)
return h
class GraphTransfer(nn.Layer):
"""Transfer vis graph to class node, transfer class node to vis feature"""
def __init__(self, in_dim: int):
super().__init__()
self.channle_in = in_dim
self.query_conv = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
self.key_conv = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
self.value_conv_vis = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim, kernel_size=1)
self.value_conv_word = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim, kernel_size=1)
self.softmax_vis = nn.Softmax(axis=-1)
self.softmax_word = nn.Softmax(axis=-2)
def forward(self, word: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
m_batchsize, C, Nc = word.shape
m_batchsize, C, Nn = vis_node.shape
proj_query = self.query_conv(word).reshape((m_batchsize, -1, Nc))\
.transpose((0, 2, 1))
proj_key = self.key_conv(vis_node).reshape((m_batchsize, -1, Nn))
energy = paddle.bmm(proj_query, proj_key)
attention_vis = self.softmax_vis(energy).transpose((0, 2, 1))
attention_word = self.softmax_word(energy)
proj_value_vis = self.value_conv_vis(vis_node).reshape((m_batchsize, -1,
Nn))
proj_value_word = self.value_conv_word(word).reshape((m_batchsize, -1,
Nc))
class_out = paddle.bmm(proj_value_vis, attention_vis)
node_out = paddle.bmm(proj_value_word, attention_word)
return class_out, node_out
\ No newline at end of file
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
import ginet_resnet50vd_cityscapes.layers as L
class BasicBlock(nn.Layer):
def __init__(self,
in_channels: int,
out_channels: int,
stride: int,
shortcut: bool = True,
if_first: bool = False,
name: str = None):
super(BasicBlock, self).__init__()
self.stride = stride
self.conv0 = L.ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=3,
stride=stride,
act='relu',
name=name + "_branch2a")
self.conv1 = L.ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
act=None,
name=name + "_branch2b")
if not shortcut:
self.short = L.ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1,
stride=1,
is_vd_mode=False if if_first else True,
name=name + "_branch1")
self.shortcut = shortcut
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv0(inputs)
conv1 = self.conv1(y)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
y = paddle.elementwise_add(x=short, y=conv1, act='relu')
return y
class ResNet50_vd(nn.Layer):
def __init__(self,
multi_grid: tuple = (1, 2, 4)):
super(ResNet50_vd, self).__init__()
depth = [3, 4, 6, 3]
num_channels = [64, 256, 512, 1024]
num_filters = [64, 128, 256, 512]
self.feat_channels = [c * 4 for c in num_filters]
dilation_dict = {2: 2, 3: 4}
self.conv1_1 = L.ConvBNLayer(
in_channels=3,
out_channels=32,
kernel_size=3,
stride=2,
act='relu',
name="conv1_1")
self.conv1_2 = L.ConvBNLayer(
in_channels=32,
out_channels=32,
kernel_size=3,
stride=1,
act='relu',
name="conv1_2")
self.conv1_3 = L.ConvBNLayer(
in_channels=32,
out_channels=64,
kernel_size=3,
stride=1,
act='relu',
name="conv1_3")
self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
self.stage_list = []
for block in range(len(depth)):
shortcut = False
block_list = []
for i in range(depth[block]):
conv_name = "res" + str(block + 2) + chr(97 + i)
dilation_rate = dilation_dict[
block] if dilation_dict and block in dilation_dict else 1
if block == 3:
dilation_rate = dilation_rate * multi_grid[i]
bottleneck_block = self.add_sublayer(
'bb_%d_%d' % (block, i),
L.BottleneckBlock(
in_channels=num_channels[block]
if i == 0 else num_filters[block] * 4,
out_channels=num_filters[block],
stride=2 if i == 0 and block != 0
and dilation_rate == 1 else 1,
shortcut=shortcut,
if_first=block == i == 0,
name=conv_name,
dilation=dilation_rate))
block_list.append(bottleneck_block)
shortcut = True
self.stage_list.append(block_list)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv1_1(inputs)
y = self.conv1_2(y)
y = self.conv1_3(y)
y = self.pool2d_max(y)
feat_list = []
for stage in self.stage_list:
for block in stage:
y = block(y)
feat_list.append(y)
return feat_list
\ No newline at end of file
# ginet_resnet50vd_voc
|模型名称|ginet_resnet50vd_voc|
| :--- | :---: |
|类别|图像-图像分割|
|网络|ginet_resnet50vd|
|数据集|PascalVOC2012|
|是否支持Fine-tuning|是|
|模型大小|214MB|
|指标|-|
|最新更新日期|2021-12-14|
## 一、模型基本信息
- 样例结果示例:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/145925887-bf9e62d3-8c6d-43c2-8062-6cb6ba59ec0e.jpg" width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/145925692-badb21d1-10e7-4a5d-82f5-1177d10a7681.png" width = "420" height = "505" hspace='10'/>
</p>
- ### 模型介绍
- 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
- 更多详情请参考:[ginet](https://arxiv.org/pdf/2009.06160)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.0.0
- paddlehub >= 2.0.0
- ### 2、安装
- ```shell
$ hub install ginet_resnet50vd_voc
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1.预测代码示例
- ```python
import cv2
import paddle
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet50vd_voc')
img = cv2.imread("/PATH/TO/IMAGE")
result = model.predict(images=[img], visualization=True)
```
- ### 2.如何开始Fine-tune
- 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ginet_resnet50vd_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
- 代码步骤
- Step1: 定义数据预处理方式
- ```python
from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
transform = Compose([Resize(target_size=(512, 512)), Normalize()])
```
- `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
- Step2: 下载数据集并使用
- ```python
from paddlehub.datasets import OpticDiscSeg
train_reader = OpticDiscSeg(transform, mode='train')
```
- `transforms`: 数据预处理方式。
- `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
- 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
- Step3: 加载预训练模型
- ```python
import paddlehub as hub
model = hub.Module(name='ginet_resnet50vd_voc', num_classes=2, pretrained=None)
```
- `name`: 选择预训练模型的名字。
- `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
- Step4: 选择优化策略和运行配置
- ```python
import paddle
from paddlehub.finetune.trainer import Trainer
scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
```
- 模型预测
- 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet50vd_voc', pretrained='/PATH/TO/CHECKPOINT')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
- 参数配置正确后,请执行脚本`python predict.py`。
- **Args**
* `images`:原始图像路径或BGR格式图片;
* `visualization`: 是否可视化,默认为True;
* `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
**NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
## 四、服务部署
- PaddleHub Serving可以部署一个在线图像分割服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m ginet_resnet50vd_voc
```
- 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
mask = base64_to_cv2(r.json()["results"][0])
```
## 五、更新历史
* 1.0.0
初始发布
# ginet_resnet50vd_voc
|Module Name|ginet_resnet50vd_voc|
| :--- | :---: |
|Category|Image Segmentation|
|Network|ginet_resnet50vd|
|Dataset|PascalVOC2012|
|Fine-tuning supported or not|Yes|
|Module Size|214MB|
|Data indicators|-|
|Latest update date|2021-12-14|
## I. Basic Information
- ### Application Effect Display
- Sample results:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/145925887-bf9e62d3-8c6d-43c2-8062-6cb6ba59ec0e.jpg" width = "420" height = "505" hspace='10'/> <img src="https://user-images.githubusercontent.com/35907364/145925692-badb21d1-10e7-4a5d-82f5-1177d10a7681.png" width = "420" height = "505" hspace='10'/>
</p>
- ### Module Introduction
- We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
- For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
## II. Installation
- ### 1、Environmental Dependence
- paddlepaddle >= 2.0.0
- paddlehub >= 2.0.0
- ### 2、Installation
- ```shell
$ hub install ginet_resnet50vd_voc
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
| [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III. Module API Prediction
- ### 1、Prediction Code Example
- ```python
import cv2
import paddle
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet50vd_voc')
img = cv2.imread("/PATH/TO/IMAGE")
result = model.predict(images=[img], visualization=True)
```
- ### 2.Fine-tune and Encapsulation
- After completing the installation of PaddlePaddle and PaddleHub, you can start using the ginet_resnet50vd_voc model to fine-tune datasets such as OpticDiscSeg.
- Steps:
- Step1: Define the data preprocessing method
- ```python
from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
transform = Compose([Resize(target_size=(512, 512)), Normalize()])
```
- `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
- Step2: Download the dataset
- ```python
from paddlehub.datasets import OpticDiscSeg
train_reader = OpticDiscSeg(transform, mode='train')
```
* `transforms`: data preprocessing methods.
* `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
* Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
- Step3: Load the pre-trained model
- ```python
import paddlehub as hub
model = hub.Module(name='ginet_resnet50vd_voc', num_classes=2, pretrained=None)
```
- `name`: model name.
- `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
- Step4: Optimization strategy
- ```python
import paddle
from paddlehub.finetune.trainer import Trainer
scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
```
- Model prediction
- When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
```python
import paddle
import cv2
import paddlehub as hub
if __name__ == '__main__':
model = hub.Module(name='ginet_resnet50vd_voc', pretrained='/PATH/TO/CHECKPOINT')
img = cv2.imread("/PATH/TO/IMAGE")
model.predict(images=[img], visualization=True)
```
- **Args**
* `images`: Image path or ndarray data with format [H, W, C], BGR.
* `visualization`: Whether to save the recognition results as picture files.
* `save_path`: Save path of the result, default is 'seg_result'.
## IV. Server Deployment
- PaddleHub Serving can deploy an online service of image segmentation.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m ginet_resnet50vd_voc
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result:
```python
import requests
import json
import cv2
import base64
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
org_im = cv2.imread('/PATH/TO/IMAGE')
data = {'images':[cv2_to_base64(org_im)]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
mask = base64_to_cv2(r.json()["results"][0])
```
## V. Release Note
- 1.0.0
First release
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn.layer import activation
from paddle.nn import Conv2D, AvgPool2D
def SyncBatchNorm(*args, **kwargs):
"""In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
if paddle.get_device() == 'cpu':
return nn.BatchNorm2D(*args, **kwargs)
else:
return nn.SyncBatchNorm(*args, **kwargs)
class ConvBNLayer(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(
self,
in_channels: int,
out_channels: int,
kernel_size: int,
stride: int = 1,
dilation: int = 1,
groups: int = 1,
is_vd_mode: bool = False,
act: str = None,
name: str = None):
super(ConvBNLayer, self).__init__()
self.is_vd_mode = is_vd_mode
self._pool2d_avg = AvgPool2D(
kernel_size=2, stride=2, padding=0, ceil_mode=True)
self._conv = Conv2D(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
dilation=dilation,
groups=groups,
bias_attr=False)
self._batch_norm = SyncBatchNorm(out_channels)
self._act_op = Activation(act=act)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
if self.is_vd_mode:
inputs = self._pool2d_avg(inputs)
y = self._conv(inputs)
y = self._batch_norm(y)
y = self._act_op(y)
return y
class BottleneckBlock(nn.Layer):
"""Residual bottleneck block"""
def __init__(self,
in_channels: int,
out_channels: int,
stride: int,
shortcut: bool = True,
if_first: bool = False,
dilation: int = 1,
name: str = None):
super(BottleneckBlock, self).__init__()
self.conv0 = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1,
act='relu',
name=name + "_branch2a")
self.dilation = dilation
self.conv1 = ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
stride=stride,
act='relu',
dilation=dilation,
name=name + "_branch2b")
self.conv2 = ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels * 4,
kernel_size=1,
act=None,
name=name + "_branch2c")
if not shortcut:
self.short = ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels * 4,
kernel_size=1,
stride=1,
is_vd_mode=False if if_first or stride == 1 else True,
name=name + "_branch1")
self.shortcut = shortcut
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv0(inputs)
if self.dilation > 1:
padding = self.dilation
y = F.pad(y, [padding, padding, padding, padding])
conv1 = self.conv1(y)
conv2 = self.conv2(conv1)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
y = paddle.add(x=short, y=conv2)
y = F.relu(y)
return y
class SeparableConvBNReLU(nn.Layer):
"""Depthwise Separable Convolution."""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
padding: str = 'same',
**kwargs: dict):
super(SeparableConvBNReLU, self).__init__()
self.depthwise_conv = ConvBN(
in_channels,
out_channels=in_channels,
kernel_size=kernel_size,
padding=padding,
groups=in_channels,
**kwargs)
self.piontwise_conv = ConvBNReLU(
in_channels, out_channels, kernel_size=1, groups=1)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.depthwise_conv(x)
x = self.piontwise_conv(x)
return x
class ConvBN(nn.Layer):
"""Basic conv bn layer"""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
padding: str = 'same',
**kwargs: dict):
super(ConvBN, self).__init__()
self._conv = Conv2D(
in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
return x
class ConvBNReLU(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
padding: str = 'same',
**kwargs: dict):
super(ConvBNReLU, self).__init__()
self._conv = Conv2D(
in_channels, out_channels, kernel_size, padding=padding, **kwargs)
self._batch_norm = SyncBatchNorm(out_channels)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self._conv(x)
x = self._batch_norm(x)
x = F.relu(x)
return x
class Activation(nn.Layer):
"""
The wrapper of activations.
Args:
act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
'hsigmoid']. Default: None, means identical transformation.
Returns:
A callable object of Activation.
Raises:
KeyError: When parameter `act` is not in the optional range.
Examples:
from paddleseg.models.common.activation import Activation
relu = Activation("relu")
print(relu)
# <class 'paddle.nn.layer.activation.ReLU'>
sigmoid = Activation("sigmoid")
print(sigmoid)
# <class 'paddle.nn.layer.activation.Sigmoid'>
not_exit_one = Activation("not_exit_one")
# KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
# 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
# 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
"""
def __init__(self, act: str = None):
super(Activation, self).__init__()
self._act = act
upper_act_names = activation.__dict__.keys()
lower_act_names = [act.lower() for act in upper_act_names]
act_dict = dict(zip(lower_act_names, upper_act_names))
if act is not None:
if act in act_dict.keys():
act_name = act_dict[act]
self.act_func = eval("activation.{}()".format(act_name))
else:
raise KeyError("{} does not exist in the current {}".format(
act, act_dict.keys()))
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
if self._act is not None:
return self.act_func(x)
else:
return x
class ASPPModule(nn.Layer):
"""
Atrous Spatial Pyramid Pooling.
Args:
aspp_ratios (tuple): The dilation rate using in ASSP module.
in_channels (int): The number of input channels.
out_channels (int): The number of output channels.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
image_pooling (bool, optional): If augmented with image-level features. Default: False
"""
def __init__(self,
aspp_ratios: tuple,
in_channels: int,
out_channels: int,
align_corners: bool,
use_sep_conv: bool= False,
image_pooling: bool = False):
super().__init__()
self.align_corners = align_corners
self.aspp_blocks = nn.LayerList()
for ratio in aspp_ratios:
if use_sep_conv and ratio > 1:
conv_func = SeparableConvBNReLU
else:
conv_func = ConvBNReLU
block = conv_func(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1 if ratio == 1 else 3,
dilation=ratio,
padding=0 if ratio == 1 else ratio)
self.aspp_blocks.append(block)
out_size = len(self.aspp_blocks)
if image_pooling:
self.global_avg_pool = nn.Sequential(
nn.AdaptiveAvgPool2D(output_size=(1, 1)),
ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
out_size += 1
self.image_pooling = image_pooling
self.conv_bn_relu = ConvBNReLU(
in_channels=out_channels * out_size,
out_channels=out_channels,
kernel_size=1)
self.dropout = nn.Dropout(p=0.1) # drop rate
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
outputs = []
for block in self.aspp_blocks:
y = block(x)
y = F.interpolate(
y,
x.shape[2:],
mode='bilinear',
align_corners=self.align_corners)
outputs.append(y)
if self.image_pooling:
img_avg = self.global_avg_pool(x)
img_avg = F.interpolate(
img_avg,
x.shape[2:],
mode='bilinear',
align_corners=self.align_corners)
outputs.append(img_avg)
x = paddle.concat(outputs, axis=1)
x = self.conv_bn_relu(x)
x = self.dropout(x)
return x
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from typing import Union, List, Tuple
import paddle
from paddle import nn
import paddle.nn.functional as F
import numpy as np
from paddlehub.module.module import moduleinfo
import paddlehub.vision.segmentation_transforms as T
from paddlehub.module.cv_module import ImageSegmentationModule
from paddleseg.utils import utils
from paddleseg.models import layers
from ginet_resnet50vd_voc.resnet import ResNet50_vd
@moduleinfo(
name="ginet_resnet50vd_voc",
type="CV/semantic_segmentation",
author="paddlepaddle",
author_email="",
summary="GINetResnet50 is a segmentation model.",
version="1.0.0",
meta=ImageSegmentationModule)
class GINetResNet50(nn.Layer):
"""
The GINetResNet50 implementation based on PaddlePaddle.
The original article refers to
Wu, Tianyi, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, and Guodong Guo. "GINet: Graph interaction network for scene parsing." In European Conference on Computer Vision, pp. 34-51. Springer, Cham, 2020.
(https://arxiv.org/pdf/2009.06160).
Args:
num_classes (int): The unique number of target classes.
backbone_indices (tuple, optional): Values in the tuple indicate the indices of output of backbone.
enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss.
If true, auxiliary loss will be added after LearningToDownsample module. Default: False.
align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
jpu (bool, optional)): whether to use jpu unit in the base forward. Default:True.
pretrained (str, optional): The path or url of pretrained model. Default: None.
"""
def __init__(self,
num_classes: int = 21,
backbone_indices: Tuple[int]=(0, 1, 2, 3),
enable_auxiliary_loss:bool = True,
align_corners: bool = True,
jpu: bool = True,
pretrained: str = None):
super(GINetResNet50, self).__init__()
self.nclass = num_classes
self.aux = enable_auxiliary_loss
self.jpu = jpu
self.backbone = ResNet50_vd()
self.backbone_indices = backbone_indices
self.align_corners = align_corners
self.transforms = T.Compose([T.Normalize()])
self.jpu = layers.JPU([512, 1024, 2048], width=512) if jpu else None
self.head = GIHead(in_channels=2048, nclass=num_classes)
if self.aux:
self.auxlayer = layers.AuxLayer(
1024, 1024 // 4, num_classes, bias_attr=False)
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'model.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
return self.transforms(img)
def base_forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
feat_list = self.backbone(x)
c1, c2, c3, c4 = [feat_list[i] for i in self.backbone_indices]
if self.jpu:
return self.jpu(c1, c2, c3, c4)
else:
return c1, c2, c3, c4
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
_, _, h, w = x.shape
_, _, c3, c4 = self.base_forward(x)
logit_list = []
x, _ = self.head(c4)
logit_list.append(x)
if self.aux:
auxout = self.auxlayer(c3)
logit_list.append(auxout)
return [
F.interpolate(
logit, (h, w),
mode='bilinear',
align_corners=self.align_corners) for logit in logit_list
]
class GIHead(nn.Layer):
"""The Graph Interaction Network head."""
def __init__(self, in_channels: int, nclass: int):
super().__init__()
self.nclass = nclass
inter_channels = in_channels // 4
self.inp = paddle.zeros(shape=(nclass, 300), dtype='float32')
self.inp = paddle.create_parameter(
shape=self.inp.shape,
dtype=str(self.inp.numpy().dtype),
default_initializer=paddle.nn.initializer.Assign(self.inp))
self.fc1 = nn.Sequential(
nn.Linear(300, 128), nn.BatchNorm1D(128), nn.ReLU())
self.fc2 = nn.Sequential(
nn.Linear(128, 256), nn.BatchNorm1D(256), nn.ReLU())
self.conv5 = layers.ConvBNReLU(
in_channels,
inter_channels,
3,
padding=1,
bias_attr=False,
stride=1)
self.gloru = GlobalReasonUnit(
in_channels=inter_channels,
num_state=256,
num_node=84,
nclass=nclass)
self.conv6 = nn.Sequential(
nn.Dropout(0.1), nn.Conv2D(inter_channels, nclass, 1))
def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
B, C, H, W = x.shape
inp = self.inp.detach()
inp = self.fc1(inp)
inp = self.fc2(inp).unsqueeze(axis=0).transpose((0, 2, 1))\
.expand((B, 256, self.nclass))
out = self.conv5(x)
out, se_out = self.gloru(out, inp)
out = self.conv6(out)
return out, se_out
class GlobalReasonUnit(nn.Layer):
"""
The original paper refers to:
Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks" (https://arxiv.org/abs/1811.12814)
"""
def __init__(self, in_channels: int, num_state: int = 256, num_node: int = 84, nclass: int = 59):
super().__init__()
self.num_state = num_state
self.conv_theta = nn.Conv2D(
in_channels, num_node, kernel_size=1, stride=1, padding=0)
self.conv_phi = nn.Conv2D(
in_channels, num_state, kernel_size=1, stride=1, padding=0)
self.graph = GraphLayer(num_state, num_node, nclass)
self.extend_dim = nn.Conv2D(
num_state, in_channels, kernel_size=1, bias_attr=False)
self.bn = layers.SyncBatchNorm(in_channels)
def forward(self, x: paddle.Tensor, inp: paddle.Tensor) -> List[paddle.Tensor]:
B = self.conv_theta(x)
sizeB = B.shape
B = B.reshape((sizeB[0], sizeB[1], -1))
sizex = x.shape
x_reduce = self.conv_phi(x)
x_reduce = x_reduce.reshape((sizex[0], -1, sizex[2] * sizex[3]))\
.transpose((0, 2, 1))
V = paddle.bmm(B, x_reduce).transpose((0, 2, 1))
V = paddle.divide(
V, paddle.to_tensor([sizex[2] * sizex[3]], dtype='float32'))
class_node, new_V = self.graph(inp, V)
D = B.reshape((sizeB[0], -1, sizeB[2] * sizeB[3])).transpose((0, 2, 1))
Y = paddle.bmm(D, new_V.transpose((0, 2, 1)))
Y = Y.transpose((0, 2, 1)).reshape((sizex[0], self.num_state, \
sizex[2], -1))
Y = self.extend_dim(Y)
Y = self.bn(Y)
out = Y + x
return out, class_node
class GraphLayer(nn.Layer):
def __init__(self, num_state: int, num_node: int, num_class: int):
super().__init__()
self.vis_gcn = GCN(num_state, num_node)
self.word_gcn = GCN(num_state, num_class)
self.transfer = GraphTransfer(num_state)
self.gamma_vis = paddle.zeros([num_node])
self.gamma_word = paddle.zeros([num_class])
self.gamma_vis = paddle.create_parameter(
shape=self.gamma_vis.shape,
dtype=str(self.gamma_vis.numpy().dtype),
default_initializer=paddle.nn.initializer.Assign(self.gamma_vis))
self.gamma_word = paddle.create_parameter(
shape=self.gamma_word.shape,
dtype=str(self.gamma_word.numpy().dtype),
default_initializer=paddle.nn.initializer.Assign(self.gamma_word))
def forward(self, inp: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
inp = self.word_gcn(inp)
new_V = self.vis_gcn(vis_node)
class_node, vis_node = self.transfer(inp, new_V)
class_node = self.gamma_word * inp + class_node
new_V = self.gamma_vis * vis_node + new_V
return class_node, new_V
class GCN(nn.Layer):
def __init__(self, num_state: int = 128, num_node: int = 64, bias: bool = False):
super().__init__()
self.conv1 = nn.Conv1D(
num_node,
num_node,
kernel_size=1,
padding=0,
stride=1,
groups=1,
)
self.relu = nn.ReLU()
self.conv2 = nn.Conv1D(
num_state,
num_state,
kernel_size=1,
padding=0,
stride=1,
groups=1,
bias_attr=bias)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
h = self.conv1(x.transpose((0, 2, 1))).transpose((0, 2, 1))
h = h + x
h = self.relu(h)
h = self.conv2(h)
return h
class GraphTransfer(nn.Layer):
"""Transfer vis graph to class node, transfer class node to vis feature"""
def __init__(self, in_dim: int):
super().__init__()
self.channle_in = in_dim
self.query_conv = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
self.key_conv = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
self.value_conv_vis = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim, kernel_size=1)
self.value_conv_word = nn.Conv1D(
in_channels=in_dim, out_channels=in_dim, kernel_size=1)
self.softmax_vis = nn.Softmax(axis=-1)
self.softmax_word = nn.Softmax(axis=-2)
def forward(self, word: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
m_batchsize, C, Nc = word.shape
m_batchsize, C, Nn = vis_node.shape
proj_query = self.query_conv(word).reshape((m_batchsize, -1, Nc))\
.transpose((0, 2, 1))
proj_key = self.key_conv(vis_node).reshape((m_batchsize, -1, Nn))
energy = paddle.bmm(proj_query, proj_key)
attention_vis = self.softmax_vis(energy).transpose((0, 2, 1))
attention_word = self.softmax_word(energy)
proj_value_vis = self.value_conv_vis(vis_node).reshape((m_batchsize, -1,
Nn))
proj_value_word = self.value_conv_word(word).reshape((m_batchsize, -1,
Nc))
class_out = paddle.bmm(proj_value_vis, attention_vis)
node_out = paddle.bmm(proj_value_word, attention_word)
return class_out, node_out
\ No newline at end of file
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
import ginet_resnet50vd_voc.layers as L
class BasicBlock(nn.Layer):
def __init__(self,
in_channels: int,
out_channels: int,
stride: int,
shortcut: bool = True,
if_first: bool = False,
name: str = None):
super(BasicBlock, self).__init__()
self.stride = stride
self.conv0 = L.ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=3,
stride=stride,
act='relu',
name=name + "_branch2a")
self.conv1 = L.ConvBNLayer(
in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
act=None,
name=name + "_branch2b")
if not shortcut:
self.short = L.ConvBNLayer(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=1,
stride=1,
is_vd_mode=False if if_first else True,
name=name + "_branch1")
self.shortcut = shortcut
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv0(inputs)
conv1 = self.conv1(y)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
y = paddle.elementwise_add(x=short, y=conv1, act='relu')
return y
class ResNet50_vd(nn.Layer):
def __init__(self,
multi_grid: tuple = (1, 2, 4)):
super(ResNet50_vd, self).__init__()
depth = [3, 4, 6, 3]
num_channels = [64, 256, 512, 1024]
num_filters = [64, 128, 256, 512]
self.feat_channels = [c * 4 for c in num_filters]
dilation_dict = {2: 2, 3: 4}
self.conv1_1 = L.ConvBNLayer(
in_channels=3,
out_channels=32,
kernel_size=3,
stride=2,
act='relu',
name="conv1_1")
self.conv1_2 = L.ConvBNLayer(
in_channels=32,
out_channels=32,
kernel_size=3,
stride=1,
act='relu',
name="conv1_2")
self.conv1_3 = L.ConvBNLayer(
in_channels=32,
out_channels=64,
kernel_size=3,
stride=1,
act='relu',
name="conv1_3")
self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
self.stage_list = []
for block in range(len(depth)):
shortcut = False
block_list = []
for i in range(depth[block]):
conv_name = "res" + str(block + 2) + chr(97 + i)
dilation_rate = dilation_dict[
block] if dilation_dict and block in dilation_dict else 1
if block == 3:
dilation_rate = dilation_rate * multi_grid[i]
bottleneck_block = self.add_sublayer(
'bb_%d_%d' % (block, i),
L.BottleneckBlock(
in_channels=num_channels[block]
if i == 0 else num_filters[block] * 4,
out_channels=num_filters[block],
stride=2 if i == 0 and block != 0
and dilation_rate == 1 else 1,
shortcut=shortcut,
if_first=block == i == 0,
name=conv_name,
dilation=dilation_rate))
block_list.append(bottleneck_block)
shortcut = True
self.stage_list.append(block_list)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self.conv1_1(inputs)
y = self.conv1_2(y)
y = self.conv1_3(y)
y = self.pool2d_max(y)
feat_list = []
for stage in self.stage_list:
for block in stage:
y = block(y)
feat_list.append(y)
return feat_list
\ No newline at end of file
# fasttext_crawl_target_word-word_dim300_en
|Module Name|fasttext_crawl_target_word-word_dim300_en|
| :--- | :---: |
|Category|Word Embedding|
|Network|fasttext|
|Dataset|crawl|
|Fine-tuning supported|No|
|Module Size|1.19GB|
|Vocab Size|2,000,002|
|Last update date|26 Feb, 2021|
|Data Indicators|-|
## I. Basic Information
- ### Module Introduction
- PaddleHub provides several open source pretrained word embedding models. These embedding models are distinguished by the corpus, training methods and word embedding dimensions. For more informations, please refer to: [Summary of embedding models](https://github.com/PaddlePaddle/models/blob/release/2.0-beta/PaddleNLP/docs/embeddings.md)
## II. Installation
- ### 1. Environmental Dependence
- paddlepaddle >= 2.0.0
- paddlehub >= 2.0.0 | [PaddleHub Installation Guide](../../../../docs/docs_ch/get_start/installation_en.rst)
- ### 2. Installation
- ```shell
$ hub install fasttext_crawl_target_word-word_dim300_en
```
- In case of any problems during installation, please refer to: [Windows_Quickstart](../../../../docs/docs_ch/get_start/windows_quickstart_en.md) | [Linux_Quickstart](../../../../docs/docs_ch/get_start/linux_quickstart_en.md) | [Mac_Quickstart](../../../../docs/docs_ch/get_start/mac_quickstart_en.md)
## III. Module API Prediction
- ### 1. Prediction Code Example
- ```
import paddlehub as hub
embedding = hub.Module(name='fasttext_crawl_target_word-word_dim300_en')
# Get the embedding of the word
embedding.search("中国")
# Calculate the cosine similarity of two word vectors
embedding.cosine_sim("中国", "美国")
# Calculate the inner product of two word vectors
embedding.dot("中国", "美国")
```
- ### 2、API
- ```python
def __init__(
*args,
**kwargs
)
```
- Construct an embedding module object without parameters by default.
- **Parameters**
- `*args`: Arguments specified by the user.
- `**kwargs`:Keyword arguments specified by the user.
- More info[paddlenlp.embeddings](https://github.com/PaddlePaddle/models/tree/release/2.0-beta/PaddleNLP/paddlenlp/embeddings)
- ```python
def search(
words: Union[List[str], str, int],
)
```
- Return the embedding of one or multiple words. The input data type can be `str`, `List[str]` and `int`, represent word, multiple words and the embedding of specified word id accordingly. Word id is related to the model vocab, vocab can be obtained by the attribute of `vocab`.
- **参数**
- `words`: input words or word id.
- ```python
def cosine_sim(
word_a: str,
word_b: str,
)
```
- Cosine similarity calculation. `word_a` and `word_b` should be in the voacb, or they will be replaced by `unknown_token`.
- **参数**
- `word_a`: input word a.
- `word_b`: input word b.
- ```python
def dot(
word_a: str,
word_b: str,
)
```
- Inner product calculation. `word_a` and `word_b` should be in the voacb, or they will be replaced by `unknown_token`.
- **参数**
- `word_a`: input word a.
- `word_b`: input word b.
- ```python
def get_vocab_path()
```
- Get the path of the local vocab file.
- ```python
def get_tokenizer(*args, **kwargs)
```
- Get the tokenizer of current model, it will return an instance of JiebaTokenizer, only supports the chinese embedding models currently.
- **参数**
- `*args`: Arguments specified by the user.
- `**kwargs`: Keyword arguments specified by the user.
- For more information about the arguments, please refer to[paddlenlp.data.tokenizer.JiebaTokenizer](https://github.com/PaddlePaddle/models/blob/release/2.0-beta/PaddleNLP/paddlenlp/data/tokenizer.py)
- For more information about the usage, please refer to[paddlenlp.embeddings](https://github.com/PaddlePaddle/models/tree/release/2.0-beta/PaddleNLP/paddlenlp/embeddings)
## IV. Server Deployment
- PaddleHub Serving can deploy an online service of cosine similarity calculation.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m fasttext_crawl_target_word-word_dim300_en
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set `CUDA_VISIBLE_DEVICES` environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result
- ```python
import requests
import json
# Specify the word pairs used to calculate the cosine similarity [[word_a, word_b], [word_a, word_b], ... ]]
word_pairs = [["中国", "美国"], ["今天", "明天"]]
data = {"data": word_pairs}
# Send an HTTP request
url = "http://127.0.0.1:8866/predict/fasttext_crawl_target_word-word_dim300_en"
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## V. Release Note
* 1.0.0
First release
* 1.0.1
Model optimization
- ```shell
$ hub install fasttext_crawl_target_word-word_dim300_en==1.0.1
```
\ No newline at end of file
# albert-base-v1
|模型名称|albert-base-v1|
| :--- | :---: |
|类别|文本-语义模型|
|网络|albert-base-v1|
|数据集|-|
|是否支持Fine-tuning|是|
|模型大小|90MB|
|最新更新日期|2022-02-08|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- ALBERT针对当前预训练模型参数量过大的问题,提出了以下改进方案:
- 嵌入向量参数化的因式分解。ALBERT对词嵌入参数进行了因式分解,先将单词映射到一个低维的词嵌入空间E,然后再将其映射到高维的隐藏空间H。
- 跨层参数共享。ALBERT共享了层之间的全部参数。
更多详情请参考[ALBERT论文](https://arxiv.org/abs/1909.11942)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.0.0
- paddlehub >= 2.0.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install albert-base-v1
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、预测代码示例
```python
import paddlehub as hub
data = [
['这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般'],
['怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片'],
['作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。'],
]
label_map = {0: 'negative', 1: 'positive'}
model = hub.Module(
name='albert-base-v1',
version='1.0.0',
task='seq-cls',
load_checkpoint='/path/to/parameters',
label_map=label_map)
results = model.predict(data, max_seq_len=50, batch_size=1, use_gpu=False)
for idx, text in enumerate(data):
print('Data: {} \t Label: {}'.format(text, results[idx]))
```
详情可参考PaddleHub示例:
- [文本分类](../../../../demo/text_classification)
- [序列标注](../../../../demo/sequence_labeling)
- ### 2、API
- ```python
def __init__(
task=None,
load_checkpoint=None,
label_map=None,
num_classes=2,
suffix=False,
**kwargs,
)
```
- 创建Module对象(动态图组网版本)
- **参数**
- `task`: 任务名称,可为`seq-cls`(文本分类任务)或`token-cls`(序列标注任务)。
- `load_checkpoint`:使用PaddleHub Fine-tune api训练保存的模型参数文件路径。
- `label_map`:预测时的类别映射表。
- `num_classes`:分类任务的类别数,如果指定了`label_map`,此参数可不传,默认2分类。
- `suffix`: 序列标注任务的标签格式,如果设定为`True`,标签以'-B', '-I', '-E' 或者 '-S'为结尾,此参数默认为`False`。
- `**kwargs`:用户额外指定的关键字字典类型的参数。
- ```python
def predict(
data,
max_seq_len=128,
batch_size=1,
use_gpu=False
)
```
- **参数**
- `data`: 待预测数据,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。每个样例文本数量(1个或者2个)需和训练时保持一致。
- `max_seq_len`:模型处理文本的最大长度
- `batch_size`:模型批处理大小
- `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
- **返回**
- `results`:list类型,不同任务类型的返回结果如下
- 文本分类:列表里包含每个句子的预测标签,格式为\[label\_1, label\_2, …,\]
- 序列标注:列表里包含每个句子每个token的预测标签,格式为\[\[token\_1, token\_2, …,\], \[token\_1, token\_2, …,\], …,\]
- ```python
def get_embedding(
data,
use_gpu=False
)
```
- 用于获取输入文本的句子粒度特征与字粒度特征
- **参数**
- `data`:输入文本列表,格式为\[\[sample\_a\_text\_a, sample\_a\_text\_b\], \[sample\_b\_text\_a, sample\_b\_text\_b\],…,\],其中每个元素都是一个样例,每个样例可以包含text\_a与text\_b。
- `use_gpu`:是否使用gpu,默认为False。对于GPU用户,建议开启use_gpu。
- **返回**
- `results`:list类型,格式为\[\[sample\_a\_pooled\_feature, sample\_a\_seq\_feature\], \[sample\_b\_pooled\_feature, sample\_b\_seq\_feature\],…,\],其中每个元素都是对应样例的特征输出,每个样例都有句子粒度特征pooled\_feature与字粒度特征seq\_feature。
## 四、服务部署
- PaddleHub Serving可以部署一个在线获取预训练词向量。
- ### 第一步:启动PaddleHub Serving
- ```shell
$ hub serving start -m albert-base-v1
```
- 这样就完成了一个获取预训练词向量服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
# 指定用于获取embedding的文本[[text_1], [text_2], ... ]}
text = [["今天是个好日子"], ["天气预报说今天要下雨"]]
# 以key的方式指定text传入预测方法的时的参数,此例中为"data"
# 对应本地部署,则为module.get_embedding(data=text)
data = {"data": text}
# 发送post请求,content-type类型应指定json方式,url中的ip地址需改为对应机器的ip
url = "http://127.0.0.1:8866/predict/albert-base-v1"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 五、更新历史
* 1.0.0
初始发布
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import math
import os
from typing import Dict
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlenlp.metrics import ChunkEvaluator
from paddlenlp.transformers.albert.modeling import AlbertForSequenceClassification
from paddlenlp.transformers.albert.modeling import AlbertForTokenClassification
from paddlenlp.transformers.albert.modeling import AlbertModel
from paddlenlp.transformers.albert.tokenizer import AlbertTokenizer
from paddlehub.module.module import moduleinfo
from paddlehub.module.nlp_module import TransformerModule
from paddlehub.utils.log import logger
@moduleinfo(name="albert-base-v1",
version="1.0.0",
summary="",
author="Baidu",
author_email="",
type="nlp/semantic_model",
meta=TransformerModule)
class Albert(nn.Layer):
"""
ALBERT model
"""
def __init__(
self,
task: str = None,
load_checkpoint: str = None,
label_map: Dict = None,
num_classes: int = 2,
suffix: bool = False,
**kwargs,
):
super(Albert, self).__init__()
if label_map:
self.label_map = label_map
self.num_classes = len(label_map)
else:
self.num_classes = num_classes
if task == 'sequence_classification':
task = 'seq-cls'
logger.warning(
"current task name 'sequence_classification' was renamed to 'seq-cls', "
"'sequence_classification' has been deprecated and will be removed in the future.", )
if task == 'seq-cls':
self.model = AlbertForSequenceClassification.from_pretrained(pretrained_model_name_or_path='albert-base-v1',
num_classes=self.num_classes,
**kwargs)
self.criterion = paddle.nn.loss.CrossEntropyLoss()
self.metric = paddle.metric.Accuracy()
elif task == 'token-cls':
self.model = AlbertForTokenClassification.from_pretrained(pretrained_model_name_or_path='albert-base-v1',
num_classes=self.num_classes,
**kwargs)
self.criterion = paddle.nn.loss.CrossEntropyLoss()
self.metric = ChunkEvaluator(label_list=[self.label_map[i] for i in sorted(self.label_map.keys())],
suffix=suffix)
elif task == 'text-matching':
self.model = AlbertModel.from_pretrained(pretrained_model_name_or_path='albert-base-v1', **kwargs)
self.dropout = paddle.nn.Dropout(0.1)
self.classifier = paddle.nn.Linear(self.model.config['hidden_size'] * 3, 2)
self.criterion = paddle.nn.loss.CrossEntropyLoss()
self.metric = paddle.metric.Accuracy()
elif task is None:
self.model = AlbertModel.from_pretrained(pretrained_model_name_or_path='albert-base-v1', **kwargs)
else:
raise RuntimeError("Unknown task {}, task should be one in {}".format(task, self._tasks_supported))
self.task = task
if load_checkpoint is not None and os.path.isfile(load_checkpoint):
state_dict = paddle.load(load_checkpoint)
self.set_state_dict(state_dict)
logger.info('Loaded parameters from %s' % os.path.abspath(load_checkpoint))
def forward(self,
input_ids=None,
token_type_ids=None,
position_ids=None,
attention_mask=None,
query_input_ids=None,
query_token_type_ids=None,
query_position_ids=None,
query_attention_mask=None,
title_input_ids=None,
title_token_type_ids=None,
title_position_ids=None,
title_attention_mask=None,
seq_lengths=None,
labels=None):
if self.task != 'text-matching':
result = self.model(input_ids, token_type_ids, position_ids, attention_mask)
else:
query_result = self.model(query_input_ids, query_token_type_ids, query_position_ids, query_attention_mask)
title_result = self.model(title_input_ids, title_token_type_ids, title_position_ids, title_attention_mask)
if self.task == 'seq-cls':
logits = result
probs = F.softmax(logits, axis=1)
if labels is not None:
loss = self.criterion(logits, labels)
correct = self.metric.compute(probs, labels)
acc = self.metric.update(correct)
return probs, loss, {'acc': acc}
return probs
elif self.task == 'token-cls':
logits = result
token_level_probs = F.softmax(logits, axis=-1)
preds = token_level_probs.argmax(axis=-1)
if labels is not None:
loss = self.criterion(logits, labels.unsqueeze(-1))
num_infer_chunks, num_label_chunks, num_correct_chunks = \
self.metric.compute(None, seq_lengths, preds, labels)
self.metric.update(num_infer_chunks.numpy(), num_label_chunks.numpy(), num_correct_chunks.numpy())
_, _, f1_score = map(float, self.metric.accumulate())
return token_level_probs, loss, {'f1_score': f1_score}
return token_level_probs
elif self.task == 'text-matching':
query_token_embedding, _ = query_result
query_token_embedding = self.dropout(query_token_embedding)
query_attention_mask = paddle.unsqueeze(
(query_input_ids != self.model.pad_token_id).astype(self.model.pooler.dense.weight.dtype), axis=2)
query_token_embedding = query_token_embedding * query_attention_mask
query_sum_embedding = paddle.sum(query_token_embedding, axis=1)
query_sum_mask = paddle.sum(query_attention_mask, axis=1)
query_mean = query_sum_embedding / query_sum_mask
title_token_embedding, _ = title_result
title_token_embedding = self.dropout(title_token_embedding)
title_attention_mask = paddle.unsqueeze(
(title_input_ids != self.model.pad_token_id).astype(self.model.pooler.dense.weight.dtype), axis=2)
title_token_embedding = title_token_embedding * title_attention_mask
title_sum_embedding = paddle.sum(title_token_embedding, axis=1)
title_sum_mask = paddle.sum(title_attention_mask, axis=1)
title_mean = title_sum_embedding / title_sum_mask
sub = paddle.abs(paddle.subtract(query_mean, title_mean))
projection = paddle.concat([query_mean, title_mean, sub], axis=-1)
logits = self.classifier(projection)
probs = F.softmax(logits)
if labels is not None:
loss = self.criterion(logits, labels)
correct = self.metric.compute(probs, labels)
acc = self.metric.update(correct)
return probs, loss, {'acc': acc}
return probs
else:
sequence_output, pooled_output = result
return sequence_output, pooled_output
@staticmethod
def get_tokenizer(*args, **kwargs):
"""
Gets the tokenizer that is customized for this module.
"""
return AlbertTokenizer.from_pretrained(pretrained_model_name_or_path='albert-base-v1', *args, **kwargs)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册