提交 baf199fa 编写于 作者: H haoyuying

add 12 models, include semantic segmentation(6), lane segmentation(1), matting(5)

上级 9c8d8959
# dim_vgg16_matting
|模型名称|dim_vgg16_matting|
| :--- | :---: |
|类别|图像-抠图|
|网络|dim_vgg16|
|数据集|百度自建数据集|
|是否支持Fine-tuning|否|
|模型大小|164MB|
|指标|SAD112.73|
|最新更新日期|2021-12-03|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例(左为原图,右为效果图):
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/144574288-28671577-8d5d-4b20-adb9-fe737015c841.jpg" />
<img src="https://user-images.githubusercontent.com/35907364/144779164-47146d3a-58c9-4a38-b968-3530aa9a0137.png" />
</p>
- ### 模型介绍
- Matting(精细化分割/影像去背/抠图)是指借由计算前景的颜色和透明度,将前景从影像中撷取出来的技术,可用于替换背景、影像合成、视觉特效,在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值,称作阿法值(Alpha),一张影像中所有阿法值的集合称作阿法遮罩(Alpha Matte),将影像被遮罩所涵盖的部分取出即可完成前景的分离。dim_vgg16_matting是一种需要trimap作为输入的matting模型。
- 更多详情请参考:[dim_vgg16_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- ### 2、安装
- ```shell
$ hub install dim_vgg16_matting
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run dim_vgg16_matting --input_path "/PATH/TO/IMAGE" --trimap_path "/PATH/TO/TRIMAP"
```
- 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="dim_vgg16_matting")
result = model.predict(image_list=["/PATH/TO/IMAGE"], trimap_list=["PATH/TO/TRIMAP"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
trimap_list,
visualization,
save_path):
```
- 人像matting预测API,用于将输入图片中的人像分割出来。
- 参数
- image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
- trimap_list(list(str | numpy.ndarray)):trimap输入路径或者单通道灰度图片。
- visualization (bool): 是否进行可视化,默认为False。
- save_path (str): 当visualization为True时,保存图片的路径,默认为"dim_vgg16_matting_output" 。
- 返回
- result (list(numpy.ndarray)):模型分割结果:
## 四、服务部署
- PaddleHub Serving可以部署人像matting在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m dim_vgg16_matting
```
- 这样就完成了一个人像matting在线服务API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import time
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))], 'trimaps':[cv2_to_base64(cv2.imread("/PATH/TO/TRIMAP"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/dim_vgg16_matting"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
for image in r.json()["results"]['data']:
data = base64_to_cv2(image)
image_path =str(time.time()) + ".png"
cv2.imwrite(image_path, data)
```
## 五、更新历史
* 1.0.0
初始发布
# dim_vgg16_matting
|Module Name|dim_vgg16_matting|
| :--- | :---: |
|Category|Matting|
|Network|dim_vgg16|
|Dataset|Baidu self-built dataset|
|Support Fine-tuning|No|
|Module Size|164MB|
|Data Indicators|-|
|Latest update date|2021-12-03|
## I. Basic Information
- ### Application Effect Display
- Sample results:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/144574288-28671577-8d5d-4b20-adb9-fe737015c841.jpg" />
<img src="https://user-images.githubusercontent.com/35907364/144779164-47146d3a-58c9-4a38-b968-3530aa9a0137.png" />
</p>
- ### Module Introduction
- Mating is the technique of extracting foreground from an image by calculating its color and transparency. It is widely used in the film industry to replace background, image composition, and visual effects. Each pixel in the image will have a value that represents its foreground transparency, called Alpha. The set of all Alpha values in an image is called Alpha Matte. The part of the image covered by the mask can be extracted to complete foreground separation.
- For more information, please refer to: [dim_vgg16_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
## II. Installation
- ### 1、Environmental Dependence
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- ### 2、Installation
- ```shell
$ hub install dim_vgg16_matting
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
| [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III. Module API Prediction
- ### 1、Command line Prediction
- ```shell
$ hub run dim_vgg16_matting --input_path "/PATH/TO/IMAGE" --trimap_path "/PATH/TO/TRIMAP"
```
- If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
- ### 2、Prediction Code Example
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="dim_vgg16_matting")
result = model.predict(image_list=["/PATH/TO/IMAGE"], trimap_list=["PATH/TO/TRIMAP"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
trimap_list,
visualization,
save_path):
```
- Prediction API for matting.
- **Parameter**
- image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\],BGR.
- trimap_list(list(str | numpy.ndarray)): Trimap path or trimap data, ndarray.shape is in the format \[H, W],Gray style.
- visualization (bool): Whether to save the recognition results as picture files, default is False.
- save_path (str): Save path of images, "dim_vgg16_matting_output" by default.
- **Return**
- result (list(numpy.ndarray)):The list of model results.
## IV. Server Deployment
- PaddleHub Serving can deploy an online service of matting.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m dim_vgg16_matting
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result
```python
import requests
import json
import cv2
import base64
import time
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))], 'trimaps':[cv2_to_base64(cv2.imread("/PATH/TO/TRIMAP"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/dim_vgg16_matting"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
for image in r.json()["results"]['data']:
data = base64_to_cv2(image)
image_path =str(time.time()) + ".png"
cv2.imwrite(image_path, data)
```
## V. Release Note
- 1.0.0
First release
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
import argparse
from typing import Callable, Union, List, Tuple
import numpy as np
import cv2
import scipy
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlehub.module.module import moduleinfo
import paddlehub.vision.segmentation_transforms as T
from paddlehub.module.module import moduleinfo, runnable, serving
from paddleseg.models import layers
from dim_vgg16_matting.vgg import VGG16
import dim_vgg16_matting.processor as P
@moduleinfo(
name="dim_vgg16_matting",
type="CV/matting",
author="paddlepaddle",
summary="dim_vgg16_matting is a matting model",
version="1.0.0"
)
class DIMVGG16(nn.Layer):
"""
The DIM implementation based on PaddlePaddle.
The original article refers to
Ning Xu, et, al. "Deep Image Matting"
(https://arxiv.org/pdf/1908.07919.pdf).
Args:
stage (int, optional): The stage of model. Defautl: 3.
decoder_input_channels(int, optional): The channel of decoder input. Default: 512.
pretrained(str, optional): The path of pretrianed model. Defautl: None.
"""
def __init__(self,
stage: int = 3,
decoder_input_channels: int = 512,
pretrained: str = None):
super(DIMVGG16, self).__init__()
self.backbone = VGG16()
self.pretrained = pretrained
self.stage = stage
decoder_output_channels = [64, 128, 256, 512]
self.decoder = Decoder(
input_channels=decoder_input_channels,
output_channels=decoder_output_channels)
if self.stage == 2:
for param in self.backbone.parameters():
param.stop_gradient = True
for param in self.decoder.parameters():
param.stop_gradient = True
if self.stage >= 2:
self.refine = Refine()
self.transforms = P.Compose([P.LoadImages(), P.LimitLong(max_long=3840),P.Normalize()])
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'dim-vgg16.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def preprocess(self, img: Union[str, np.ndarray] , transforms: Callable, trimap: Union[str, np.ndarray] = None) -> dict:
data = {}
data['img'] = img
if trimap is not None:
data['trimap'] = trimap
data['gt_fields'] = ['trimap']
data['trans_info'] = []
data = self.transforms(data)
data['img'] = paddle.to_tensor(data['img'])
data['img'] = data['img'].unsqueeze(0)
if trimap is not None:
data['trimap'] = paddle.to_tensor(data['trimap'])
data['trimap'] = data['trimap'].unsqueeze((0, 1))
return data
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
input_shape = paddle.shape(inputs['img'])[-2:]
x = paddle.concat([inputs['img'], inputs['trimap'] / 255], axis=1)
fea_list = self.backbone(x)
# decoder stage
up_shape = []
for i in range(5):
up_shape.append(paddle.shape(fea_list[i])[-2:])
alpha_raw = self.decoder(fea_list, up_shape)
alpha_raw = F.interpolate(
alpha_raw, input_shape, mode='bilinear', align_corners=False)
logit_dict = {'alpha_raw': alpha_raw}
if self.stage < 2:
return logit_dict
if self.stage >= 2:
# refine stage
refine_input = paddle.concat([inputs['img'], alpha_raw], axis=1)
alpha_refine = self.refine(refine_input)
# finally alpha
alpha_pred = alpha_refine + alpha_raw
alpha_pred = F.interpolate(
alpha_pred, input_shape, mode='bilinear', align_corners=False)
if not self.training:
alpha_pred = paddle.clip(alpha_pred, min=0, max=1)
logit_dict['alpha_pred'] = alpha_pred
return alpha_pred
def predict(self, image_list: list, trimap_list: list, visualization: bool =False, save_path: str = "dim_vgg16_matting_output") -> list:
self.eval()
result= []
with paddle.no_grad():
for i, im_path in enumerate(image_list):
trimap = trimap_list[i] if trimap_list is not None else None
data = self.preprocess(img=im_path, transforms=self.transforms, trimap=trimap)
alpha_pred = self.forward(data)
alpha_pred = P.reverse_transform(alpha_pred, data['trans_info'])
alpha_pred = (alpha_pred.numpy()).squeeze()
alpha_pred = (alpha_pred * 255).astype('uint8')
alpha_pred = P.save_alpha_pred(alpha_pred, trimap)
result.append(alpha_pred)
if visualization:
if not os.path.exists(save_path):
os.makedirs(save_path)
img_name = str(time.time()) + '.png'
image_save_path = os.path.join(save_path, img_name)
cv2.imwrite(image_save_path, alpha_pred)
return result
@serving
def serving_method(self, images: list, trimaps:list, **kwargs) -> dict:
"""
Run as a service.
"""
images_decode = [P.base64_to_cv2(image) for image in images]
if trimaps is not None:
trimap_decoder = [cv2.cvtColor(P.base64_to_cv2(trimap), cv2.COLOR_BGR2GRAY) for trimap in trimaps]
else:
trimap_decoder = None
outputs = self.predict(image_list=images_decode, trimap_list= trimap_decoder, **kwargs)
serving_data = [P.cv2_to_base64(outputs[i]) for i in range(len(outputs))]
results = {'data': serving_data}
return results
@runnable
def run_cmd(self, argvs: list) -> list:
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
args = self.parser.parse_args(argvs)
if args.trimap_path is not None:
trimap_list = [args.trimap_path]
else:
trimap_list = None
results = self.predict(image_list=[args.input_path], trimap_list=trimap_list, save_path=args.output_dir, visualization=args.visualization)
return results
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument(
'--output_dir', type=str, default="dim_vgg16_matting_output", help="The directory to save output images.")
self.arg_config_group.add_argument(
'--visualization', type=bool, default=True, help="whether to save output as images.")
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
self.arg_input_group.add_argument('--trimap_path', type=str, help="path to trimap.")
class Up(nn.Layer):
def __init__(self, input_channels: int, output_channels: int):
super().__init__()
self.conv = layers.ConvBNReLU(
input_channels,
output_channels,
kernel_size=5,
padding=2,
bias_attr=False)
def forward(self, x: paddle.Tensor, skip: paddle.Tensor, output_shape: list) -> paddle.Tensor:
x = F.interpolate(
x, size=output_shape, mode='bilinear', align_corners=False)
x = x + skip
x = self.conv(x)
x = F.relu(x)
return x
class Decoder(nn.Layer):
def __init__(self, input_channels: int, output_channels: list = [64, 128, 256, 512]):
super().__init__()
self.deconv6 = nn.Conv2D(
input_channels, input_channels, kernel_size=1, bias_attr=False)
self.deconv5 = Up(input_channels, output_channels[-1])
self.deconv4 = Up(output_channels[-1], output_channels[-2])
self.deconv3 = Up(output_channels[-2], output_channels[-3])
self.deconv2 = Up(output_channels[-3], output_channels[-4])
self.deconv1 = Up(output_channels[-4], 64)
self.alpha_conv = nn.Conv2D(
64, 1, kernel_size=5, padding=2, bias_attr=False)
def forward(self, fea_list: list, shape_list: list) -> paddle.Tensor:
x = fea_list[-1]
x = self.deconv6(x)
x = self.deconv5(x, fea_list[4], shape_list[4])
x = self.deconv4(x, fea_list[3], shape_list[3])
x = self.deconv3(x, fea_list[2], shape_list[2])
x = self.deconv2(x, fea_list[1], shape_list[1])
x = self.deconv1(x, fea_list[0], shape_list[0])
alpha = self.alpha_conv(x)
alpha = F.sigmoid(alpha)
return alpha
class Refine(nn.Layer):
def __init__(self):
super().__init__()
self.conv1 = layers.ConvBNReLU(
4, 64, kernel_size=3, padding=1, bias_attr=False)
self.conv2 = layers.ConvBNReLU(
64, 64, kernel_size=3, padding=1, bias_attr=False)
self.conv3 = layers.ConvBNReLU(
64, 64, kernel_size=3, padding=1, bias_attr=False)
self.alpha_pred = layers.ConvBNReLU(
64, 1, kernel_size=3, padding=1, bias_attr=False)
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
x = self.conv1(x)
x = self.conv2(x)
x = self.conv3(x)
alpha = self.alpha_pred(x)
return alpha
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import random
import base64
from typing import Callable, Union, List, Tuple
import cv2
import numpy as np
import paddle
import paddle.nn.functional as F
from paddleseg.transforms import functional
from PIL import Image
class Compose:
"""
Do transformation on input data with corresponding pre-processing and augmentation operations.
The shape of input data to all operations is [height, width, channels].
"""
def __init__(self, transforms: Callable, to_rgb: bool = True):
if not isinstance(transforms, list):
raise TypeError('The transforms must be a list!')
self.transforms = transforms
self.to_rgb = to_rgb
def __call__(self, data: dict) -> dict:
if 'trans_info' not in data:
data['trans_info'] = []
for op in self.transforms:
data = op(data)
if data is None:
return None
data['img'] = np.transpose(data['img'], (2, 0, 1))
for key in data.get('gt_fields', []):
if len(data[key].shape) == 2:
continue
data[key] = np.transpose(data[key], (2, 0, 1))
return data
class LoadImages:
"""
Read images from image path.
Args:
to_rgb (bool, optional): If converting image to RGB color space. Default: True.
"""
def __init__(self, to_rgb: bool = True):
self.to_rgb = to_rgb
def __call__(self, data: dict) -> dict:
if isinstance(data['img'], str):
data['img'] = cv2.imread(data['img'])
for key in data.get('gt_fields', []):
if isinstance(data[key], str):
data[key] = cv2.imread(data[key], cv2.IMREAD_UNCHANGED)
# if alpha and trimap has 3 channels, extract one.
if key in ['alpha', 'trimap']:
if len(data[key].shape) > 2:
data[key] = data[key][:, :, 0]
if self.to_rgb:
data['img'] = cv2.cvtColor(data['img'], cv2.COLOR_BGR2RGB)
for key in data.get('gt_fields', []):
if len(data[key].shape) == 2:
continue
data[key] = cv2.cvtColor(data[key], cv2.COLOR_BGR2RGB)
return data
class LimitLong:
"""
Limit the long edge of image.
If the long edge is larger than max_long, resize the long edge
to max_long, while scale the short edge proportionally.
If the long edge is smaller than min_long, resize the long edge
to min_long, while scale the short edge proportionally.
Args:
max_long (int, optional): If the long edge of image is larger than max_long,
it will be resize to max_long. Default: None.
min_long (int, optional): If the long edge of image is smaller than min_long,
it will be resize to min_long. Default: None.
"""
def __init__(self, max_long=None, min_long=None):
if max_long is not None:
if not isinstance(max_long, int):
raise TypeError(
"Type of `max_long` is invalid. It should be int, but it is {}"
.format(type(max_long)))
if min_long is not None:
if not isinstance(min_long, int):
raise TypeError(
"Type of `min_long` is invalid. It should be int, but it is {}"
.format(type(min_long)))
if (max_long is not None) and (min_long is not None):
if min_long > max_long:
raise ValueError(
'`max_long should not smaller than min_long, but they are {} and {}'
.format(max_long, min_long))
self.max_long = max_long
self.min_long = min_long
def __call__(self, data):
h, w = data['img'].shape[:2]
long_edge = max(h, w)
target = long_edge
if (self.max_long is not None) and (long_edge > self.max_long):
target = self.max_long
elif (self.min_long is not None) and (long_edge < self.min_long):
target = self.min_long
if target != long_edge:
data['trans_info'].append(('resize', data['img'].shape[0:2]))
data['img'] = functional.resize_long(data['img'], target)
for key in data.get('gt_fields', []):
data[key] = functional.resize_long(data[key], target)
return data
class Normalize:
"""
Normalize an image.
Args:
mean (list, optional): The mean value of a data set. Default: [0.5, 0.5, 0.5].
std (list, optional): The standard deviation of a data set. Default: [0.5, 0.5, 0.5].
Raises:
ValueError: When mean/std is not list or any value in std is 0.
"""
def __init__(self, mean: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5), std: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5)):
self.mean = mean
self.std = std
if not (isinstance(self.mean, (list, tuple))
and isinstance(self.std, (list, tuple))):
raise ValueError(
"{}: input type is invalid. It should be list or tuple".format(
self))
from functools import reduce
if reduce(lambda x, y: x * y, self.std) == 0:
raise ValueError('{}: std is invalid!'.format(self))
def __call__(self, data: dict) -> dict:
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
std = np.array(self.std)[np.newaxis, np.newaxis, :]
data['img'] = functional.normalize(data['img'], mean, std)
if 'fg' in data.get('gt_fields', []):
data['fg'] = functional.normalize(data['fg'], mean, std)
if 'bg' in data.get('gt_fields', []):
data['bg'] = functional.normalize(data['bg'], mean, std)
return data
def reverse_transform(alpha: paddle.Tensor, trans_info: List[str]):
"""recover pred to origin shape"""
for item in trans_info[::-1]:
if item[0] == 'resize':
h, w = item[1][0], item[1][1]
alpha = F.interpolate(alpha, [h, w], mode='bilinear')
elif item[0] == 'padding':
h, w = item[1][0], item[1][1]
alpha = alpha[:, :, 0:h, 0:w]
else:
raise Exception("Unexpected info '{}' in im_info".format(item[0]))
return alpha
def save_alpha_pred(alpha: np.ndarray, trimap: np.ndarray = None):
"""
The value of alpha is range [0, 1], shape should be [h,w]
"""
if isinstance(trimap, str):
trimap = cv2.imread(trimap, 0)
alpha[trimap == 0] = 0
alpha[trimap == 255] = 255
alpha = (alpha).astype('uint8')
return alpha
def cv2_to_base64(image: np.ndarray):
"""
Convert data from BGR to base64 format.
"""
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str: str):
"""
Convert data from base64 to BGR format.
"""
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import List, Tuple
import paddle
from paddle import ParamAttr
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn import Conv2D, BatchNorm, Linear, Dropout
from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D
from paddleseg.utils import utils
class ConvBlock(nn.Layer):
def __init__(self, input_channels: int, output_channels: int, groups: int, name: str = None):
super(ConvBlock, self).__init__()
self.groups = groups
self._conv_1 = Conv2D(
in_channels=input_channels,
out_channels=output_channels,
kernel_size=3,
stride=1,
padding=1,
weight_attr=ParamAttr(name=name + "1_weights"),
bias_attr=False)
if groups == 2 or groups == 3 or groups == 4:
self._conv_2 = Conv2D(
in_channels=output_channels,
out_channels=output_channels,
kernel_size=3,
stride=1,
padding=1,
weight_attr=ParamAttr(name=name + "2_weights"),
bias_attr=False)
if groups == 3 or groups == 4:
self._conv_3 = Conv2D(
in_channels=output_channels,
out_channels=output_channels,
kernel_size=3,
stride=1,
padding=1,
weight_attr=ParamAttr(name=name + "3_weights"),
bias_attr=False)
if groups == 4:
self._conv_4 = Conv2D(
in_channels=output_channels,
out_channels=output_channels,
kernel_size=3,
stride=1,
padding=1,
weight_attr=ParamAttr(name=name + "4_weights"),
bias_attr=False)
self._pool = MaxPool2D(
kernel_size=2, stride=2, padding=0, return_mask=True)
def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
x = self._conv_1(inputs)
x = F.relu(x)
if self.groups == 2 or self.groups == 3 or self.groups == 4:
x = self._conv_2(x)
x = F.relu(x)
if self.groups == 3 or self.groups == 4:
x = self._conv_3(x)
x = F.relu(x)
if self.groups == 4:
x = self._conv_4(x)
x = F.relu(x)
skip = x
x, max_indices = self._pool(x)
return x, max_indices, skip
class VGGNet(nn.Layer):
def __init__(self, input_channels: int = 4, layers: int = 11, pretrained: str = None):
super(VGGNet, self).__init__()
self.pretrained = pretrained
self.layers = layers
self.vgg_configure = {
11: [1, 1, 2, 2, 2],
13: [2, 2, 2, 2, 2],
16: [2, 2, 3, 3, 3],
19: [2, 2, 4, 4, 4]
}
assert self.layers in self.vgg_configure.keys(), \
"supported layers are {} but input layer is {}".format(
self.vgg_configure.keys(), layers)
self.groups = self.vgg_configure[self.layers]
# matting的第一层卷积输入为4通道,初始化是直接初始化为0
self._conv_block_1 = ConvBlock(
input_channels, 64, self.groups[0], name="conv1_")
self._conv_block_2 = ConvBlock(64, 128, self.groups[1], name="conv2_")
self._conv_block_3 = ConvBlock(128, 256, self.groups[2], name="conv3_")
self._conv_block_4 = ConvBlock(256, 512, self.groups[3], name="conv4_")
self._conv_block_5 = ConvBlock(512, 512, self.groups[4], name="conv5_")
# 这一层的初始化需要利用vgg fc6的参数转换后进行初始化,可以暂时不考虑初始化
self._conv_6 = Conv2D(
512, 512, kernel_size=3, padding=1, bias_attr=False)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
fea_list = []
ids_list = []
x, ids, skip = self._conv_block_1(inputs)
fea_list.append(skip)
ids_list.append(ids)
x, ids, skip = self._conv_block_2(x)
fea_list.append(skip)
ids_list.append(ids)
x, ids, skip = self._conv_block_3(x)
fea_list.append(skip)
ids_list.append(ids)
x, ids, skip = self._conv_block_4(x)
fea_list.append(skip)
ids_list.append(ids)
x, ids, skip = self._conv_block_5(x)
fea_list.append(skip)
ids_list.append(ids)
x = F.relu(self._conv_6(x))
fea_list.append(x)
return fea_list
def VGG16(**args):
model = VGGNet(layers=16, **args)
return model
\ No newline at end of file
# gfm_resnet34_matting
|模型名称|gfm_resnet34_matting|
| :--- | :---: |
|类别|图像-抠图|
|网络|gfm_resnet34|
|数据集|AM-2k|
|是否支持Fine-tuning|否|
|模型大小|562MB|
|指标|SAD10.89|
|最新更新日期|2021-12-03|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例(左为原图,右为效果图):
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/145993777-9b69a85d-d31c-4743-8620-82b2a56ca1e7.jpg" />
<img src="https://user-images.githubusercontent.com/35907364/145993809-b0fb4bae-2c64-4868-99fc-500f19343442.png" />
</p>
- ### 模型介绍
- Matting(精细化分割/影像去背/抠图)是指借由计算前景的颜色和透明度,将前景从影像中撷取出来的技术,可用于替换背景、影像合成、视觉特效,在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值,称作阿法值(Alpha),一张影像中所有阿法值的集合称作阿法遮罩(Alpha Matte),将影像被遮罩所涵盖的部分取出即可完成前景的分离。gfm_resnet34_matting可生成抠图结果。
- 更多详情请参考:[gfm_resnet34_matting](https://github.com/JizhiziLi/GFM)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- ### 2、安装
- ```shell
$ hub install gfm_resnet34_matting
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run gfm_resnet34_matting --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="gfm_resnet34_matting")
result = model.predict(["/PATH/TO/IMAGE"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
visualization,
save_path):
```
- 动物matting预测API,用于将输入图片中的动物分割出来。
- 参数
- image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
- visualization (bool): 是否进行可视化,默认为False。
- save_path (str): 当visualization为True时,保存图片的路径,默认为"gfm_resnet34_matting_output"。
- 返回
- result (list(numpy.ndarray)):模型分割结果:
## 四、服务部署
- PaddleHub Serving可以部署动物matting在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m gfm_resnet34_matting
```
- 这样就完成了一个动物matting在线服务API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import time
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/gfm_resnet34_matting"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
for image in r.json()["results"]['data']:
data = base64_to_cv2(image)
image_path =str(time.time()) + ".png"
cv2.imwrite(image_path, data)
```
## 五、更新历史
* 1.0.0
初始发布
# gfm_resnet34_matting
|Module Name|gfm_resnet34_matting|
| :--- | :---: |
|Category|Image Matting|
|Network|gfm_resnet34|
|Dataset|AM-2k|
|Support Fine-tuning|No|
|Module Size|562MB|
|Data Indicators|SAD10.89|
|Latest update date|2021-12-03|
## I. Basic Information
- ### Application Effect Display
- Sample results:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/145993777-9b69a85d-d31c-4743-8620-82b2a56ca1e7.jpg" />
<img src="https://user-images.githubusercontent.com/35907364/145993809-b0fb4bae-2c64-4868-99fc-500f19343442.png" />
</p>
- ### Module Introduction
- Mating is the technique of extracting foreground from an image by calculating its color and transparency. It is widely used in the film industry to replace background, image composition, and visual effects. Each pixel in the image will have a value that represents its foreground transparency, called Alpha. The set of all Alpha values in an image is called Alpha Matte. The part of the image covered by the mask can be extracted to complete foreground separation.
- For more information, please refer to: [gfm_resnet34_matting](https://github.com/JizhiziLi/GFM)
## II. Installation
- ### 1、Environmental Dependence
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- ### 2、Installation
- ```shell
$ hub install gfm_resnet34_matting
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
| [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III. Module API Prediction
- ### 1、Command line Prediction
- ```shell
$ hub run gfm_resnet34_matting --input_path "/PATH/TO/IMAGE"
```
- If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
- ### 2、Prediction Code Example
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="gfm_resnet34_matting")
result = model.predict(["/PATH/TO/IMAGE"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
visualization,
save_path):
```
- Prediction API for matting.
- **Parameter**
- image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\],BGR.
- visualization (bool): Whether to save the recognition results as picture files, default is False.
- save_path (str): Save path of images, "modnet_mobilenetv2_matting_output" by default.
- **Return**
- result (list(numpy.ndarray)):The list of model results.
## IV. Server Deployment
- PaddleHub Serving can deploy an online service of matting.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m gfm_resnet34_matting
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result
```python
import requests
import json
import cv2
import base64
import time
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/gfm_resnet34_matting"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
for image in r.json()["results"]['data']:
data = base64_to_cv2(image)
image_path =str(time.time()) + ".png"
cv2.imwrite(image_path, data)
```
## V. Release Note
- 1.0.0
First release
\ No newline at end of file
此差异已折叠。
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
import argparse
from typing import Callable, Union, List, Tuple
from PIL import Image
import numpy as np
import cv2
import scipy
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlehub.module.module import moduleinfo
import paddlehub.vision.transforms as T
from paddlehub.module.module import moduleinfo, runnable, serving
from skimage.transform import resize
from gfm_resnet34_matting.gfm import GFM
import gfm_resnet34_matting.processor as P
@moduleinfo(
name="gfm_resnet34_matting",
type="CV/matting",
author="paddlepaddle",
author_email="",
summary="gfm_resnet34_matting is an animal matting model.",
version="1.0.0")
class GFMResNet34(nn.Layer):
"""
The GFM implementation based on PaddlePaddle.
The original article refers to:
Bridging Composite and Real: Towards End-to-end Deep Image Matting [IJCV-2021]
Main network file (GFM).
Github repo: https://github.com/JizhiziLi/GFM
Paper link (Arxiv): https://arxiv.org/abs/2010.16188
"""
def __init__(self, pretrained: str=None):
super(GFMResNet34, self).__init__()
self.model = GFM()
self.resize_by_short = P.ResizeByShort(1080)
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.model.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'model.pdparams')
model_dict = paddle.load(checkpoint)
self.model.set_dict(model_dict)
print("load pretrained parameters success")
def preprocess(self, img: Union[str, np.ndarray], h: int, w: int) -> paddle.Tensor:
if min(h, w) > 1080:
img = self.resize_by_short(img)
tensor_img = self.scale_image(img, h, w)
return tensor_img
def scale_image(self, img: np.ndarray, h: int, w: int, ratio: float = 1/3):
new_h = min(1600, h - (h % 32))
new_w = min(1600, w - (w % 32))
resize_h = int(h*ratio)
resize_w = int(w*ratio)
new_h = min(1600, resize_h - (resize_h % 32))
new_w = min(1600, resize_w - (resize_w % 32))
scale_img = resize(img,(new_h,new_w)) * 255
tensor_img = paddle.to_tensor(scale_img.astype(np.float32)[np.newaxis, :, :, :])
tensor_img = tensor_img.transpose([0,3,1,2])
return tensor_img
def inference_img_scale(self, input: paddle.Tensor) -> List[paddle.Tensor]:
pred_global, pred_local, pred_fusion = self.model(input)
pred_global = P.gen_trimap_from_segmap_e2e(pred_global)
pred_local = pred_local.numpy()[0,0,:,:]
pred_fusion = pred_fusion.numpy()[0,0,:,:]
return pred_global, pred_local, pred_fusion
def predict(self, image_list: list, visualization: bool =True, save_path: str = "gfm_resnet34_matting_output"):
self.model.eval()
result = []
with paddle.no_grad():
for i, img in enumerate(image_list):
if isinstance(img, str):
img = np.array(Image.open(img))[:,:,:3]
else:
img = img[:,:,::-1]
h, w, _ = img.shape
tensor_img = self.preprocess(img, h, w)
pred_glance_1, pred_focus_1, pred_fusion_1 = self.inference_img_scale(tensor_img)
pred_glance_1 = resize(pred_glance_1,(h,w)) * 255.0
tensor_img = self.scale_image(img, h, w, 1/2)
pred_glance_2, pred_focus_2, pred_fusion_2 = self.inference_img_scale(tensor_img)
pred_focus_2 = resize(pred_focus_2,(h,w))
pred_fusion = P.get_masked_local_from_global_test(pred_glance_1, pred_focus_2)
pred_fusion = (pred_fusion * 255).astype(np.uint8)
if visualization:
if not os.path.exists(save_path):
os.makedirs(save_path)
img_name = str(time.time()) + '.png'
image_save_path = os.path.join(save_path, img_name)
cv2.imwrite(image_save_path, pred_fusion)
result.append(pred_fusion)
return result
@serving
def serving_method(self, images: str, **kwargs):
"""
Run as a service.
"""
images_decode = [P.base64_to_cv2(image) for image in images]
outputs = self.predict(image_list=images_decode, **kwargs)
serving_data = [P.cv2_to_base64(outputs[i]) for i in range(len(outputs))]
results = {'data': serving_data}
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
args = self.parser.parse_args(argvs)
results = self.predict(image_list=[args.input_path], save_path=args.output_dir, visualization=args.visualization)
return results
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument(
'--output_dir', type=str, default="gfm_resnet34_matting_output", help="The directory to save output images.")
self.arg_config_group.add_argument(
'--visualization', type=bool, default=True, help="whether to save output as images.")
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import base64
import cv2
import numpy as np
from paddleseg.transforms import functional
class ResizeByLong:
"""
Resize the long side of an image to given size, and then scale the other side proportionally.
Args:
long_size (int): The target size of long side.
"""
def __init__(self, long_size):
self.long_size = long_size
def __call__(self, data):
data = functional.resize_long(data, self.long_size)
return data
class ResizeByShort:
"""
Resize the short side of an image to given size, and then scale the other side proportionally.
Args:
short_size (int): The target size of short side.
"""
def __init__(self, short_size):
self.short_size = short_size
def __call__(self, data):
data = functional.resize_short(data, self.short_size)
return data
def gen_trimap_from_segmap_e2e(segmap):
trimap = np.argmax(segmap, axis=1)[0]
trimap = trimap.astype(np.int64)
trimap[trimap==1]=128
trimap[trimap==2]=255
return trimap.astype(np.uint8)
def get_masked_local_from_global_test(global_result, local_result):
weighted_global = np.ones(global_result.shape)
weighted_global[global_result==255] = 0
weighted_global[global_result==0] = 0
fusion_result = global_result*(1.-weighted_global)/255+local_result*weighted_global
return fusion_result
def cv2_to_base64(image: np.ndarray):
"""
Convert data from BGR to base64 format.
"""
data = cv2.imencode('.png', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str: str):
"""
Convert data from base64 to BGR format.
"""
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
from typing import Type, Any, Callable, Union, List, Optional
def conv3x3(in_planes: int, out_planes: int, stride: int=1, groups: int=1,
dilation: int=1) ->paddle.nn.Conv2D:
"""3x3 convolution with padding"""
return nn.Conv2D(in_planes, out_planes, kernel_size=3, stride=stride,
padding=dilation, groups=groups, dilation=dilation, bias_attr=False)
def conv1x1(in_planes: int, out_planes: int, stride: int=1) ->paddle.nn.Conv2D:
"""1x1 convolution"""
return nn.Conv2D(in_planes, out_planes, kernel_size=1, stride=stride,
bias_attr=False)
class BasicBlock(nn.Layer):
expansion: int = 1
def __init__(self, inplanes: int, planes: int, stride: int=1,
downsample: Optional[nn.Layer]=None, groups: int=1, base_width:
int=64, dilation: int=1, norm_layer: Optional[Callable[..., paddle.
nn.Layer]]=None) ->None:
super(BasicBlock, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2D
if groups != 1 or base_width != 64:
raise ValueError(
'BasicBlock only supports groups=1 and base_width=64')
if dilation > 1:
raise NotImplementedError(
'Dilation > 1 not supported in BasicBlock')
self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = norm_layer(planes)
self.relu = paddle.nn.ReLU()
self.conv2 = conv3x3(planes, planes)
self.bn2 = norm_layer(planes)
self.downsample = downsample
self.stride = stride
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
class Bottleneck(nn.Layer):
expansion: int = 4
def __init__(self, inplanes: int, planes: int, stride: int=1,
downsample: Optional[nn.Layer]=None, groups: int=1, base_width:
int=64, dilation: int=1, norm_layer: Optional[Callable[..., paddle.
nn.Layer]]=None) ->None:
super(Bottleneck, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2D
width = int(planes * (base_width / 64.0)) * groups
self.conv1 = conv1x1(inplanes, width)
self.bn1 = norm_layer(width)
self.conv2 = conv3x3(width, width, stride, groups, dilation)
self.bn2 = norm_layer(width)
self.conv3 = conv1x1(width, planes * self.expansion)
self.bn3 = norm_layer(planes * self.expansion)
self.relu = paddle.nn.ReLU()
self.downsample = downsample
self.stride = stride
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
class ResNet(nn.Layer):
def __init__(self, block: Type[Union[BasicBlock, Bottleneck]], layers:
List[int], num_classes: int=1000, zero_init_residual: bool=False,
groups: int=1, width_per_group: int=64,
replace_stride_with_dilation: Optional[List[bool]]=None, norm_layer:
Optional[Callable[..., paddle.nn.Layer]]=None) ->None:
super(ResNet, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2D
self._norm_layer = norm_layer
self.inplanes = 64
self.dilation = 1
if replace_stride_with_dilation is None:
replace_stride_with_dilation = [False, False, False]
if len(replace_stride_with_dilation) != 3:
raise ValueError(
'replace_stride_with_dilation should be None or a 3-element tuple, got {}'
.format(replace_stride_with_dilation))
self.groups = groups
self.base_width = width_per_group
self.conv1 = nn.Conv2D(3, self.inplanes, kernel_size=7, stride=2,
padding=3, bias_attr=False)
self.bn1 = norm_layer(self.inplanes)
self.relu = paddle.nn.ReLU()
self.maxpool = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
dilate=replace_stride_with_dilation[0])
self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
dilate=replace_stride_with_dilation[1])
self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
dilate=replace_stride_with_dilation[2])
self.avgpool = nn.AdaptiveAvgPool2D((1, 1))
self.fc = nn.Linear(512 * block.expansion, num_classes)
def _make_layer(self, block: Type[Union[BasicBlock, Bottleneck]],
planes: int, blocks: int, stride: int=1, dilate: bool=False
) ->paddle.nn.Sequential:
norm_layer = self._norm_layer
downsample = None
previous_dilation = self.dilation
if dilate:
self.dilation *= stride
stride = 1
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(conv1x1(self.inplanes, planes *
block.expansion, stride), norm_layer(planes * block.expansion))
layers = []
layers.append(block(self.inplanes, planes, stride, downsample, self
.groups, self.base_width, previous_dilation, norm_layer))
self.inplanes = planes * block.expansion
for _ in range(1, blocks):
layers.append(block(self.inplanes, planes, groups=self.groups,
base_width=self.base_width, dilation=self.dilation,
norm_layer=norm_layer))
return nn.Sequential(*layers)
def _forward_impl(self, x: paddle.Tensor) ->paddle.Tensor:
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x= paddle.flatten(x,1)
x = self.fc(x)
return x
def forward(self, x: paddle.Tensor) -> paddle.Tensor:
return self._forward_impl(x)
def _resnet(arch: str, block: Type[Union[BasicBlock, Bottleneck]], layers:
List[int], pretrained: bool, progress: bool, **kwargs: Any) ->ResNet:
model = ResNet(block, layers, **kwargs)
return model
def resnet34(pretrained: bool=False, progress: bool=True, **kwargs: Any
) ->ResNet:
"""ResNet-34 model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
return _resnet('resnet34', BasicBlock, [3, 4, 6, 3], pretrained,
progress, **kwargs)
# modnet_hrnet18_matting
|模型名称|modnet_hrnet18_matting|
| :--- | :---: |
|类别|图像-抠图|
|网络|modnet_hrnet18|
|数据集|百度自建数据集|
|是否支持Fine-tuning|否|
|模型大小|60MB|
|指标|SAD77.96|
|最新更新日期|2021-12-03|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例(左为原图,右为效果图):
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/144574288-28671577-8d5d-4b20-adb9-fe737015c841.jpg" />
<img src="https://user-images.githubusercontent.com/35907364/144780857-13c63c21-5d12-4028-985b-378776f58220.png" />
</p>
- ### 模型介绍
- Matting(精细化分割/影像去背/抠图)是指借由计算前景的颜色和透明度,将前景从影像中撷取出来的技术,可用于替换背景、影像合成、视觉特效,在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值,称作阿法值(Alpha),一张影像中所有阿法值的集合称作阿法遮罩(Alpha Matte),将影像被遮罩所涵盖的部分取出即可完成前景的分离。modnet_hrnet18_matting可生成抠图结果。
- 更多详情请参考:[modnet_hrnet18_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- ### 2、安装
- ```shell
$ hub install modnet_hrnet18_matting
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run modnet_hrnet18_matting --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="modnet_hrnet18_matting")
result = model.predict(["/PATH/TO/IMAGE"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
trimap_list,
visualization,
save_path):
```
- 人像matting预测API,用于将输入图片中的人像分割出来。
- 参数
- image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
- trimap_list(list(str | numpy.ndarray)):trimap输入路径或者单通道灰度图格式图片。
- visualization (bool): 是否进行可视化,默认为False。
- save_path (str): 当visualization为True时,保存图片的路径,默认为"modnet_hrnet18_matting_output"。
- 返回
- result (list(numpy.ndarray)):模型分割结果:
## 四、服务部署
- PaddleHub Serving可以部署人像matting在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m modnet_hrnet18_matting
```
- 这样就完成了一个人像matting在线服务API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import time
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/modnet_hrnet18_matting"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
for image in r.json()["results"]['data']:
data = base64_to_cv2(image)
image_path =str(time.time()) + ".png"
cv2.imwrite(image_path, data)
```
## 五、更新历史
* 1.0.0
初始发布
# modnet_hrnet18_matting
|Module Name|modnet_hrnet18_matting|
| :--- | :---: |
|Category|Image Segmentation|
|Network|modnet_mobilenetv2|
|Dataset|Baidu self-built dataset|
|Support Fine-tuning|No|
|Module Size|60MB|
|Data Indicators|SAD77.96|
|Latest update date|2021-12-03|
## I. Basic Information
- ### Application Effect Display
- Sample results:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/144574288-28671577-8d5d-4b20-adb9-fe737015c841.jpg" />
<img src="https://user-images.githubusercontent.com/35907364/144780857-13c63c21-5d12-4028-985b-378776f58220.png" />
</p>
- ### Module Introduction
- Mating is the technique of extracting foreground from an image by calculating its color and transparency. It is widely used in the film industry to replace background, image composition, and visual effects. Each pixel in the image will have a value that represents its foreground transparency, called Alpha. The set of all Alpha values in an image is called Alpha Matte. The part of the image covered by the mask can be extracted to complete foreground separation.
- For more information, please refer to: [modnet_hrnet18_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
## II. Installation
- ### 1、Environmental Dependence
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- ### 2、Installation
- ```shell
$ hub install modnet_hrnet18_matting
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
| [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III. Module API Prediction
- ### 1、Command line Prediction
- ```shell
$ hub run modnet_hrnet18_matting --input_path "/PATH/TO/IMAGE"
```
- If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
- ### 2、Prediction Code Example
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="modnet_hrnet18_matting")
result = model.predict(["/PATH/TO/IMAGE"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
trimap_list,
visualization,
save_path):
```
- Prediction API for matting.
- **Parameter**
- image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\],BGR.
- trimap_list(list(str | numpy.ndarray)): Trimap path or trimap data, ndarray.shape is in the format \[H, W],gray. Default is None
- visualization (bool): Whether to save the recognition results as picture files, default is False.
- save_path (str): Save path of images, "modnet_hrnet18_matting_output" by default.
- **Return**
- result (list(numpy.ndarray)):The list of model results.
## IV. Server Deployment
- PaddleHub Serving can deploy an online service of matting.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m modnet_hrnet18_matting
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result
```python
import requests
import json
import cv2
import base64
import time
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/modnet_hrnet18_matting"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
for image in r.json()["results"]['data']:
data = base64_to_cv2(image)
image_path =str(time.time()) + ".png"
cv2.imwrite(image_path, data)
```
## V. Release Note
- 1.0.0
First release
\ No newline at end of file
此差异已折叠。
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
import argparse
from typing import Callable, Union, List, Tuple
import numpy as np
import cv2
import scipy
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlehub.module.module import moduleinfo
import paddlehub.vision.segmentation_transforms as T
from paddlehub.module.module import moduleinfo, runnable, serving
from modnet_hrnet18_matting.hrnet import HRNet_W18
import modnet_hrnet18_matting.processor as P
@moduleinfo(
name="modnet_hrnet18_matting",
type="CV/matting",
author="paddlepaddle",
summary="modnet_hrnet18_matting is a matting model",
version="1.0.0"
)
class MODNetHRNet18(nn.Layer):
"""
The MODNet implementation based on PaddlePaddle.
The original article refers to
Zhanghan Ke, et, al. "Is a Green Screen Really Necessary for Real-Time Portrait Matting?"
(https://arxiv.org/pdf/2011.11961.pdf).
Args:
hr_channels(int, optional): The channels of high resolutions branch. Defautl: None.
pretrained(str, optional): The path of pretrianed model. Defautl: None.
"""
def __init__(self, hr_channels:int = 32, pretrained=None):
super(MODNetHRNet18, self).__init__()
self.backbone = HRNet_W18()
self.pretrained = pretrained
self.head = MODNetHead(
hr_channels=hr_channels, backbone_channels=self.backbone.feat_channels)
self.blurer = GaussianBlurLayer(1, 3)
self.transforms = P.Compose([P.LoadImages(), P.ResizeByShort(), P.ResizeToIntMult(), P.Normalize()])
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'modnet-hrnet_w18.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def preprocess(self, img: Union[str, np.ndarray] , transforms: Callable, trimap: Union[str, np.ndarray] = None):
data = {}
data['img'] = img
if trimap is not None:
data['trimap'] = trimap
data['gt_fields'] = ['trimap']
data['trans_info'] = []
data = self.transforms(data)
data['img'] = paddle.to_tensor(data['img'])
data['img'] = data['img'].unsqueeze(0)
if trimap is not None:
data['trimap'] = paddle.to_tensor(data['trimap'])
data['trimap'] = data['trimap'].unsqueeze((0, 1))
return data
def forward(self, inputs: dict) -> paddle.Tensor:
x = inputs['img']
feat_list = self.backbone(x)
y = self.head(inputs=inputs, feat_list=feat_list)
return y
def predict(self, image_list: list, trimap_list: list = None, visualization: bool =False, save_path: str = "modnet_hrnet18_matting_output") -> list:
self.eval()
result= []
with paddle.no_grad():
for i, im_path in enumerate(image_list):
trimap = trimap_list[i] if trimap_list is not None else None
data = self.preprocess(img=im_path, transforms=self.transforms, trimap=trimap)
alpha_pred = self.forward(data)
alpha_pred = P.reverse_transform(alpha_pred, data['trans_info'])
alpha_pred = (alpha_pred.numpy()).squeeze()
alpha_pred = (alpha_pred * 255).astype('uint8')
alpha_pred = P.save_alpha_pred(alpha_pred, trimap)
result.append(alpha_pred)
if visualization:
if not os.path.exists(save_path):
os.makedirs(save_path)
img_name = str(time.time()) + '.png'
image_save_path = os.path.join(save_path, img_name)
cv2.imwrite(image_save_path, alpha_pred)
return result
@serving
def serving_method(self, images: list, trimaps:list = None, **kwargs) -> dict:
"""
Run as a service.
"""
images_decode = [P.base64_to_cv2(image) for image in images]
if trimaps is not None:
trimap_decoder = [cv2.cvtColor(P.base64_to_cv2(trimap), cv2.COLOR_BGR2GRAY) for trimap in trimaps]
else:
trimap_decoder = None
outputs = self.predict(image_list=images_decode, trimap_list= trimap_decoder, **kwargs)
serving_data = [P.cv2_to_base64(outputs[i]) for i in range(len(outputs))]
results = {'data': serving_data}
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
args = self.parser.parse_args(argvs)
if args.trimap_path is not None:
trimap_list = [args.trimap_path]
else:
trimap_list = None
results = self.predict(image_list=[args.input_path], trimap_list=trimap_list, save_path=args.output_dir, visualization=args.visualization)
return results
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument(
'--output_dir', type=str, default="modnet_hrnet18_matting_output", help="The directory to save output images.")
self.arg_config_group.add_argument(
'--visualization', type=bool, default=True, help="whether to save output as images.")
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
self.arg_input_group.add_argument('--trimap_path', type=str, default=None, help="path to image.")
class MODNetHead(nn.Layer):
"""
Segmentation head.
"""
def __init__(self, hr_channels: int, backbone_channels: int):
super().__init__()
self.lr_branch = LRBranch(backbone_channels)
self.hr_branch = HRBranch(hr_channels, backbone_channels)
self.f_branch = FusionBranch(hr_channels, backbone_channels)
def forward(self, inputs: paddle.Tensor, feat_list: list):
pred_semantic, lr8x, [enc2x, enc4x] = self.lr_branch(feat_list)
pred_detail, hr2x = self.hr_branch(inputs['img'], enc2x, enc4x, lr8x)
pred_matte = self.f_branch(inputs['img'], lr8x, hr2x)
if self.training:
logit_dict = {
'semantic': pred_semantic,
'detail': pred_detail,
'matte': pred_matte
}
return logit_dict
else:
return pred_matte
class FusionBranch(nn.Layer):
def __init__(self, hr_channels: int, enc_channels: int):
super().__init__()
self.conv_lr4x = Conv2dIBNormRelu(
enc_channels[2], hr_channels, 5, stride=1, padding=2)
self.conv_f2x = Conv2dIBNormRelu(
2 * hr_channels, hr_channels, 3, stride=1, padding=1)
self.conv_f = nn.Sequential(
Conv2dIBNormRelu(
hr_channels + 3, int(hr_channels / 2), 3, stride=1, padding=1),
Conv2dIBNormRelu(
int(hr_channels / 2),
1,
1,
stride=1,
padding=0,
with_ibn=False,
with_relu=False))
def forward(self, img: paddle.Tensor, lr8x: paddle.Tensor, hr2x: paddle.Tensor):
lr4x = F.interpolate(
lr8x, scale_factor=2, mode='bilinear', align_corners=False)
lr4x = self.conv_lr4x(lr4x)
lr2x = F.interpolate(
lr4x, scale_factor=2, mode='bilinear', align_corners=False)
f2x = self.conv_f2x(paddle.concat((lr2x, hr2x), axis=1))
f = F.interpolate(
f2x, scale_factor=2, mode='bilinear', align_corners=False)
f = self.conv_f(paddle.concat((f, img), axis=1))
pred_matte = F.sigmoid(f)
return pred_matte
class HRBranch(nn.Layer):
"""
High Resolution Branch of MODNet
"""
def __init__(self, hr_channels: int, enc_channels:int):
super().__init__()
self.tohr_enc2x = Conv2dIBNormRelu(
enc_channels[0], hr_channels, 1, stride=1, padding=0)
self.conv_enc2x = Conv2dIBNormRelu(
hr_channels + 3, hr_channels, 3, stride=2, padding=1)
self.tohr_enc4x = Conv2dIBNormRelu(
enc_channels[1], hr_channels, 1, stride=1, padding=0)
self.conv_enc4x = Conv2dIBNormRelu(
2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1)
self.conv_hr4x = nn.Sequential(
Conv2dIBNormRelu(
2 * hr_channels + enc_channels[2] + 3,
2 * hr_channels,
3,
stride=1,
padding=1),
Conv2dIBNormRelu(
2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(
2 * hr_channels, hr_channels, 3, stride=1, padding=1))
self.conv_hr2x = nn.Sequential(
Conv2dIBNormRelu(
2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(
2 * hr_channels, hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1))
self.conv_hr = nn.Sequential(
Conv2dIBNormRelu(
hr_channels + 3, hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(
hr_channels,
1,
1,
stride=1,
padding=0,
with_ibn=False,
with_relu=False))
def forward(self, img: paddle.Tensor, enc2x: paddle.Tensor, enc4x: paddle.Tensor, lr8x: paddle.Tensor):
img2x = F.interpolate(
img, scale_factor=1 / 2, mode='bilinear', align_corners=False)
img4x = F.interpolate(
img, scale_factor=1 / 4, mode='bilinear', align_corners=False)
enc2x = self.tohr_enc2x(enc2x)
hr4x = self.conv_enc2x(paddle.concat((img2x, enc2x), axis=1))
enc4x = self.tohr_enc4x(enc4x)
hr4x = self.conv_enc4x(paddle.concat((hr4x, enc4x), axis=1))
lr4x = F.interpolate(
lr8x, scale_factor=2, mode='bilinear', align_corners=False)
hr4x = self.conv_hr4x(paddle.concat((hr4x, lr4x, img4x), axis=1))
hr2x = F.interpolate(
hr4x, scale_factor=2, mode='bilinear', align_corners=False)
hr2x = self.conv_hr2x(paddle.concat((hr2x, enc2x), axis=1))
pred_detail = None
if self.training:
hr = F.interpolate(
hr2x, scale_factor=2, mode='bilinear', align_corners=False)
hr = self.conv_hr(paddle.concat((hr, img), axis=1))
pred_detail = F.sigmoid(hr)
return pred_detail, hr2x
class LRBranch(nn.Layer):
"""
Low Resolution Branch of MODNet
"""
def __init__(self, backbone_channels: int):
super().__init__()
self.se_block = SEBlock(backbone_channels[4], reduction=4)
self.conv_lr16x = Conv2dIBNormRelu(
backbone_channels[4], backbone_channels[3], 5, stride=1, padding=2)
self.conv_lr8x = Conv2dIBNormRelu(
backbone_channels[3], backbone_channels[2], 5, stride=1, padding=2)
self.conv_lr = Conv2dIBNormRelu(
backbone_channels[2],
1,
3,
stride=2,
padding=1,
with_ibn=False,
with_relu=False)
def forward(self, feat_list: list):
enc2x, enc4x, enc32x = feat_list[0], feat_list[1], feat_list[4]
enc32x = self.se_block(enc32x)
lr16x = F.interpolate(
enc32x, scale_factor=2, mode='bilinear', align_corners=False)
lr16x = self.conv_lr16x(lr16x)
lr8x = F.interpolate(
lr16x, scale_factor=2, mode='bilinear', align_corners=False)
lr8x = self.conv_lr8x(lr8x)
pred_semantic = None
if self.training:
lr = self.conv_lr(lr8x)
pred_semantic = F.sigmoid(lr)
return pred_semantic, lr8x, [enc2x, enc4x]
class IBNorm(nn.Layer):
"""
Combine Instance Norm and Batch Norm into One Layer
"""
def __init__(self, in_channels: int):
super().__init__()
self.bnorm_channels = in_channels // 2
self.inorm_channels = in_channels - self.bnorm_channels
self.bnorm = nn.BatchNorm2D(self.bnorm_channels)
self.inorm = nn.InstanceNorm2D(self.inorm_channels)
def forward(self, x):
bn_x = self.bnorm(x[:, :self.bnorm_channels, :, :])
in_x = self.inorm(x[:, self.bnorm_channels:, :, :])
return paddle.concat((bn_x, in_x), 1)
class Conv2dIBNormRelu(nn.Layer):
"""
Convolution + IBNorm + Relu
"""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
stride: int = 1,
padding: int = 0,
dilation:int = 1,
groups: int = 1,
bias_attr: paddle.ParamAttr = None,
with_ibn: bool = True,
with_relu: bool = True):
super().__init__()
layers = [
nn.Conv2D(
in_channels,
out_channels,
kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
groups=groups,
bias_attr=bias_attr)
]
if with_ibn:
layers.append(IBNorm(out_channels))
if with_relu:
layers.append(nn.ReLU())
self.layers = nn.Sequential(*layers)
def forward(self, x: paddle.Tensor):
return self.layers(x)
class SEBlock(nn.Layer):
"""
SE Block Proposed in https://arxiv.org/pdf/1709.01507.pdf
"""
def __init__(self, num_channels: int, reduction:int = 1):
super().__init__()
self.pool = nn.AdaptiveAvgPool2D(1)
self.conv = nn.Sequential(
nn.Conv2D(
num_channels,
int(num_channels // reduction),
1,
bias_attr=False), nn.ReLU(),
nn.Conv2D(
int(num_channels // reduction),
num_channels,
1,
bias_attr=False), nn.Sigmoid())
def forward(self, x: paddle.Tensor):
w = self.pool(x)
w = self.conv(w)
return w * x
class GaussianBlurLayer(nn.Layer):
""" Add Gaussian Blur to a 4D tensors
This layer takes a 4D tensor of {N, C, H, W} as input.
The Gaussian blur will be performed in given channel number (C) splitly.
"""
def __init__(self, channels: int, kernel_size: int):
"""
Args:
channels (int): Channel for input tensor
kernel_size (int): Size of the kernel used in blurring
"""
super(GaussianBlurLayer, self).__init__()
self.channels = channels
self.kernel_size = kernel_size
assert self.kernel_size % 2 != 0
self.op = nn.Sequential(
nn.Pad2D(int(self.kernel_size / 2), mode='reflect'),
nn.Conv2D(
channels,
channels,
self.kernel_size,
stride=1,
padding=0,
bias_attr=False,
groups=channels))
self._init_kernel()
self.op[1].weight.stop_gradient = True
def forward(self, x: paddle.Tensor):
"""
Args:
x (paddle.Tensor): input 4D tensor
Returns:
paddle.Tensor: Blurred version of the input
"""
if not len(list(x.shape)) == 4:
print('\'GaussianBlurLayer\' requires a 4D tensor as input\n')
exit()
elif not x.shape[1] == self.channels:
print('In \'GaussianBlurLayer\', the required channel ({0}) is'
'not the same as input ({1})\n'.format(
self.channels, x.shape[1]))
exit()
return self.op(x)
def _init_kernel(self):
sigma = 0.3 * ((self.kernel_size - 1) * 0.5 - 1) + 0.8
n = np.zeros((self.kernel_size, self.kernel_size))
i = int(self.kernel_size / 2)
n[i, i] = 1
kernel = scipy.ndimage.gaussian_filter(n, sigma)
kernel = kernel.astype('float32')
kernel = kernel[np.newaxis, np.newaxis, :, :]
paddle.assign(kernel, self.op[1].weight)
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import random
import base64
from typing import Callable, Union, List, Tuple
import cv2
import numpy as np
import paddle
import paddle.nn.functional as F
from paddleseg.transforms import functional
from PIL import Image
class Compose:
"""
Do transformation on input data with corresponding pre-processing and augmentation operations.
The shape of input data to all operations is [height, width, channels].
"""
def __init__(self, transforms: Callable, to_rgb: bool = True):
if not isinstance(transforms, list):
raise TypeError('The transforms must be a list!')
self.transforms = transforms
self.to_rgb = to_rgb
def __call__(self, data: dict) -> dict:
if 'trans_info' not in data:
data['trans_info'] = []
for op in self.transforms:
data = op(data)
if data is None:
return None
data['img'] = np.transpose(data['img'], (2, 0, 1))
for key in data.get('gt_fields', []):
if len(data[key].shape) == 2:
continue
data[key] = np.transpose(data[key], (2, 0, 1))
return data
class LoadImages:
"""
Read images from image path.
Args:
to_rgb (bool, optional): If converting image to RGB color space. Default: True.
"""
def __init__(self, to_rgb: bool = True):
self.to_rgb = to_rgb
def __call__(self, data: dict) -> dict:
if isinstance(data['img'], str):
data['img'] = cv2.imread(data['img'])
for key in data.get('gt_fields', []):
if isinstance(data[key], str):
data[key] = cv2.imread(data[key], cv2.IMREAD_UNCHANGED)
# if alpha and trimap has 3 channels, extract one.
if key in ['alpha', 'trimap']:
if len(data[key].shape) > 2:
data[key] = data[key][:, :, 0]
if self.to_rgb:
data['img'] = cv2.cvtColor(data['img'], cv2.COLOR_BGR2RGB)
for key in data.get('gt_fields', []):
if len(data[key].shape) == 2:
continue
data[key] = cv2.cvtColor(data[key], cv2.COLOR_BGR2RGB)
return data
class ResizeByShort:
"""
Resize the short side of an image to given size, and then scale the other side proportionally.
Args:
short_size (int): The target size of short side.
"""
def __init__(self, short_size: int =512):
self.short_size = short_size
def __call__(self, data: dict) -> dict:
data['trans_info'].append(('resize', data['img'].shape[0:2]))
data['img'] = functional.resize_short(data['img'], self.short_size)
for key in data.get('gt_fields', []):
data[key] = functional.resize_short(data[key], self.short_size)
return data
class ResizeToIntMult:
"""
Resize to some int muitple, d.g. 32.
"""
def __init__(self, mult_int: int = 32):
self.mult_int = mult_int
def __call__(self, data: dict) -> dict:
data['trans_info'].append(('resize', data['img'].shape[0:2]))
h, w = data['img'].shape[0:2]
rw = w - w % 32
rh = h - h % 32
data['img'] = functional.resize(data['img'], (rw, rh))
for key in data.get('gt_fields', []):
data[key] = functional.resize(data[key], (rw, rh))
return data
class Normalize:
"""
Normalize an image.
Args:
mean (list, optional): The mean value of a data set. Default: [0.5, 0.5, 0.5].
std (list, optional): The standard deviation of a data set. Default: [0.5, 0.5, 0.5].
Raises:
ValueError: When mean/std is not list or any value in std is 0.
"""
def __init__(self, mean: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5), std: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5)):
self.mean = mean
self.std = std
if not (isinstance(self.mean, (list, tuple))
and isinstance(self.std, (list, tuple))):
raise ValueError(
"{}: input type is invalid. It should be list or tuple".format(
self))
from functools import reduce
if reduce(lambda x, y: x * y, self.std) == 0:
raise ValueError('{}: std is invalid!'.format(self))
def __call__(self, data: dict) -> dict:
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
std = np.array(self.std)[np.newaxis, np.newaxis, :]
data['img'] = functional.normalize(data['img'], mean, std)
if 'fg' in data.get('gt_fields', []):
data['fg'] = functional.normalize(data['fg'], mean, std)
if 'bg' in data.get('gt_fields', []):
data['bg'] = functional.normalize(data['bg'], mean, std)
return data
def reverse_transform(alpha: paddle.Tensor, trans_info: List[str]):
"""recover pred to origin shape"""
for item in trans_info[::-1]:
if item[0] == 'resize':
h, w = item[1][0], item[1][1]
alpha = F.interpolate(alpha, [h, w], mode='bilinear')
elif item[0] == 'padding':
h, w = item[1][0], item[1][1]
alpha = alpha[:, :, 0:h, 0:w]
else:
raise Exception("Unexpected info '{}' in im_info".format(item[0]))
return alpha
def save_alpha_pred(alpha: np.ndarray, trimap: Union[np.ndarray, str] = None):
"""
The value of alpha is range [0, 1], shape should be [h,w]
"""
if isinstance(trimap, str):
trimap = cv2.imread(trimap, 0)
alpha[trimap == 0] = 0
alpha[trimap == 255] = 255
alpha = (alpha).astype('uint8')
return alpha
def cv2_to_base64(image: np.ndarray):
"""
Convert data from BGR to base64 format.
"""
data = cv2.imencode('.png', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str: str):
"""
Convert data from base64 to BGR format.
"""
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
\ No newline at end of file
# modnet_mobilenetv2_matting
|模型名称|modnet_mobilenetv2_matting|
| :--- | :---: |
|类别|图像-抠图|
|网络|modnet_mobilenetv2|
|数据集|百度自建数据集|
|是否支持Fine-tuning|否|
|模型大小|38MB|
|指标|SAD112.73|
|最新更新日期|2021-12-03|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例(左为原图,右为效果图):
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/144574288-28671577-8d5d-4b20-adb9-fe737015c841.jpg" />
<img src="https://user-images.githubusercontent.com/35907364/144574092-d0dd08f3-309b-4a7d-84d5-8b94604431a1.png" />
</p>
- ### 模型介绍
- Matting(精细化分割/影像去背/抠图)是指借由计算前景的颜色和透明度,将前景从影像中撷取出来的技术,可用于替换背景、影像合成、视觉特效,在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值,称作阿法值(Alpha),一张影像中所有阿法值的集合称作阿法遮罩(Alpha Matte),将影像被遮罩所涵盖的部分取出即可完成前景的分离。modnet_mobilenetv2_matting可生成抠图结果。
- 更多详情请参考:[modnet_mobilenetv2_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- ### 2、安装
- ```shell
$ hub install modnet_mobilenetv2_matting
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run modnet_mobilenetv2_matting --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="modnet_mobilenetv2_matting")
result = model.predict(["/PATH/TO/IMAGE"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
trimap_list,
visualization,
save_path):
```
- 人像matting预测API,用于将输入图片中的人像分割出来。
- 参数
- image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
- trimap_list(list(str | numpy.ndarray)):trimap输入路径或者灰度图单通道格式图片。默认为None。
- visualization (bool): 是否进行可视化,默认为False。
- save_path (str): 当visualization为True时,保存图片的路径,默认为"modnet_mobilenetv2_matting_output"。
- 返回
- result (list(numpy.ndarray)):模型分割结果:
## 四、服务部署
- PaddleHub Serving可以部署人像matting在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m modnet_mobilenetv2_matting
```
- 这样就完成了一个人像matting在线服务API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
import time
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/modnet_mobilenetv2_matting"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
for image in r.json()["results"]['data']:
data = base64_to_cv2(image)
image_path =str(time.time()) + ".png"
cv2.imwrite(image_path, data)
```
## 五、更新历史
* 1.0.0
初始发布
# modnet_mobilenetv2_matting
|Module Name|modnet_mobilenetv2_matting|
| :--- | :---: |
|Category|Image Matting|
|Network|modnet_mobilenetv2|
|Dataset|Baidu self-built dataset|
|Support Fine-tuning|No|
|Module Size|38MB|
|Data Indicators|SAD112.73|
|Latest update date|2021-12-03|
## I. Basic Information
- ### Application Effect Display
- Sample results:
<p align="center">
<img src="https://user-images.githubusercontent.com/35907364/144574288-28671577-8d5d-4b20-adb9-fe737015c841.jpg" />
<img src="https://user-images.githubusercontent.com/35907364/144574092-d0dd08f3-309b-4a7d-84d5-8b94604431a1.png" />
</p>
- ### Module Introduction
- Mating is the technique of extracting foreground from an image by calculating its color and transparency. It is widely used in the film industry to replace background, image composition, and visual effects. Each pixel in the image will have a value that represents its foreground transparency, called Alpha. The set of all Alpha values in an image is called Alpha Matte. The part of the image covered by the mask can be extracted to complete foreground separation.
- For more information, please refer to: [modnet_mobilenetv2_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
## II. Installation
- ### 1、Environmental Dependence
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0
- paddleseg >= 2.3.0
- ### 2、Installation
- ```shell
$ hub install modnet_mobilenetv2_matting
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
| [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III. Module API Prediction
- ### 1、Command line Prediction
- ```shell
$ hub run modnet_mobilenetv2_matting --input_path "/PATH/TO/IMAGE"
```
- If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
- ### 2、Prediction Code Example
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="modnet_mobilenetv2_matting")
result = model.predict(image_list=["/PATH/TO/IMAGE"])
print(result)
```
- ### 3、API
- ```python
def predict(self,
image_list,
trimap_list,
visualization,
save_path):
```
- Prediction API for matting.
- **Parameter**
- image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\],BGR.
- trimap_list(list(str | numpy.ndarray)): Trimap path or trimap data, ndarray.shape is in the format \[H, W],gray. Default is None.
- visualization (bool): Whether to save the recognition results as picture files, default is False.
- save_path (str): Save path of images, "modnet_mobilenetv2_matting_output" by default.
- **Return**
- result (list(numpy.ndarray)):The list of model results.
## IV. Server Deployment
- PaddleHub Serving can deploy an online service of matting.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m modnet_mobilenetv2_matting
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result
```python
import requests
import json
import cv2
import base64
import time
import numpy as np
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/modnet_mobilenetv2_matting"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
for image in r.json()["results"]['data']:
data = base64_to_cv2(image)
image_path =str(time.time()) + ".png"
cv2.imwrite(image_path, data)
```
## V. Release Note
- 1.0.0
First release
\ No newline at end of file
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import math
import numpy as np
import paddle
from paddle import ParamAttr
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn import Conv2D, BatchNorm, Linear, Dropout
from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D
from paddleseg import utils
from paddleseg.cvlibs import manager
__all__ = ["MobileNetV2"]
class ConvBNLayer(nn.Layer):
"""Basic conv bn relu layer."""
def __init__(self,
num_channels: int,
filter_size: int,
num_filters: int,
stride: int,
padding: int,
num_groups: int=1,
name: str = None,
use_cudnn: bool = True):
super(ConvBNLayer, self).__init__()
self._conv = Conv2D(
in_channels=num_channels,
out_channels=num_filters,
kernel_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
weight_attr=ParamAttr(name=name + "_weights"),
bias_attr=False)
self._batch_norm = BatchNorm(
num_filters,
param_attr=ParamAttr(name=name + "_bn_scale"),
bias_attr=ParamAttr(name=name + "_bn_offset"),
moving_mean_name=name + "_bn_mean",
moving_variance_name=name + "_bn_variance")
def forward(self, inputs: paddle.Tensor, if_act: bool = True) -> paddle.Tensor:
y = self._conv(inputs)
y = self._batch_norm(y)
if if_act:
y = F.relu6(y)
return y
class InvertedResidualUnit(nn.Layer):
"""Inverted residual block"""
def __init__(self, num_channels: int, num_in_filter: int, num_filters: int, stride: int,
filter_size: int, padding: int, expansion_factor: int, name: str):
super(InvertedResidualUnit, self).__init__()
num_expfilter = int(round(num_in_filter * expansion_factor))
self._expand_conv = ConvBNLayer(
num_channels=num_channels,
num_filters=num_expfilter,
filter_size=1,
stride=1,
padding=0,
num_groups=1,
name=name + "_expand")
self._bottleneck_conv = ConvBNLayer(
num_channels=num_expfilter,
num_filters=num_expfilter,
filter_size=filter_size,
stride=stride,
padding=padding,
num_groups=num_expfilter,
use_cudnn=False,
name=name + "_dwise")
self._linear_conv = ConvBNLayer(
num_channels=num_expfilter,
num_filters=num_filters,
filter_size=1,
stride=1,
padding=0,
num_groups=1,
name=name + "_linear")
def forward(self, inputs: paddle.Tensor, ifshortcut: bool) -> paddle.Tensor:
y = self._expand_conv(inputs, if_act=True)
y = self._bottleneck_conv(y, if_act=True)
y = self._linear_conv(y, if_act=False)
if ifshortcut:
y = paddle.add(inputs, y)
return y
class InvresiBlocks(nn.Layer):
def __init__(self, in_c: int, t: int, c: int, n: int, s: int, name: str):
super(InvresiBlocks, self).__init__()
self._first_block = InvertedResidualUnit(
num_channels=in_c,
num_in_filter=in_c,
num_filters=c,
stride=s,
filter_size=3,
padding=1,
expansion_factor=t,
name=name + "_1")
self._block_list = []
for i in range(1, n):
block = self.add_sublayer(
name + "_" + str(i + 1),
sublayer=InvertedResidualUnit(
num_channels=c,
num_in_filter=c,
num_filters=c,
stride=1,
filter_size=3,
padding=1,
expansion_factor=t,
name=name + "_" + str(i + 1)))
self._block_list.append(block)
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
y = self._first_block(inputs, ifshortcut=False)
for block in self._block_list:
y = block(y, ifshortcut=True)
return y
class MobileNet(nn.Layer):
"""Networj of MobileNet"""
def __init__(self,
input_channels: int = 3,
scale: float = 1.0,
pretrained: str = None,
prefix_name: str = ""):
super(MobileNet, self).__init__()
self.scale = scale
bottleneck_params_list = [
(1, 16, 1, 1),
(6, 24, 2, 2),
(6, 32, 3, 2),
(6, 64, 4, 2),
(6, 96, 3, 1),
(6, 160, 3, 2),
(6, 320, 1, 1),
]
self.conv1 = ConvBNLayer(
num_channels=input_channels,
num_filters=int(32 * scale),
filter_size=3,
stride=2,
padding=1,
name=prefix_name + "conv1_1")
self.block_list = []
i = 1
in_c = int(32 * scale)
for layer_setting in bottleneck_params_list:
t, c, n, s = layer_setting
i += 1
block = self.add_sublayer(
prefix_name + "conv" + str(i),
sublayer=InvresiBlocks(
in_c=in_c,
t=t,
c=int(c * scale),
n=n,
s=s,
name=prefix_name + "conv" + str(i)))
self.block_list.append(block)
in_c = int(c * scale)
self.out_c = int(1280 * scale) if scale > 1.0 else 1280
self.conv9 = ConvBNLayer(
num_channels=in_c,
num_filters=self.out_c,
filter_size=1,
stride=1,
padding=0,
name=prefix_name + "conv9")
self.feat_channels = [int(i * scale) for i in [16, 24, 32, 96, 1280]]
self.pretrained = pretrained
def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
feat_list = []
y = self.conv1(inputs, if_act=True)
block_index = 0
for block in self.block_list:
y = block(y)
if block_index in [0, 1, 2, 4]:
feat_list.append(y)
block_index += 1
y = self.conv9(y, if_act=True)
feat_list.append(y)
return feat_list
def MobileNetV2(**kwargs):
model = MobileNet(scale=1.0, **kwargs)
return model
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
import argparse
from typing import Callable, Union, List, Tuple
import numpy as np
import cv2
import scipy
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlehub.module.module import moduleinfo
import paddlehub.vision.segmentation_transforms as T
from paddlehub.module.module import moduleinfo, runnable, serving
from modnet_mobilenetv2_matting.mobilenetv2 import MobileNetV2
import modnet_mobilenetv2_matting.processor as P
@moduleinfo(
name="modnet_mobilenetv2_matting",
type="CV",
author="paddlepaddle",
summary="modnet_mobilenetv2_matting is a matting model",
version="1.0.0"
)
class MODNetMobilenetV2(nn.Layer):
"""
The MODNet implementation based on PaddlePaddle.
The original article refers to
Zhanghan Ke, et, al. "Is a Green Screen Really Necessary for Real-Time Portrait Matting?"
(https://arxiv.org/pdf/2011.11961.pdf).
Args:
hr_channels(int, optional): The channels of high resolutions branch. Defautl: None.
pretrained(str, optional): The path of pretrianed model. Defautl: None.
"""
def __init__(self, hr_channels:int = 32, pretrained=None):
super(MODNetMobilenetV2, self).__init__()
self.backbone = MobileNetV2()
self.pretrained = pretrained
self.head = MODNetHead(
hr_channels=hr_channels, backbone_channels=self.backbone.feat_channels)
self.blurer = GaussianBlurLayer(1, 3)
self.transforms = P.Compose([P.LoadImages(), P.ResizeByShort(), P.ResizeToIntMult(), P.Normalize()])
if pretrained is not None:
model_dict = paddle.load(pretrained)
self.set_dict(model_dict)
print("load custom parameters success")
else:
checkpoint = os.path.join(self.directory, 'modnet-mobilenetv2.pdparams')
model_dict = paddle.load(checkpoint)
self.set_dict(model_dict)
print("load pretrained parameters success")
def preprocess(self, img: Union[str, np.ndarray] , transforms: Callable, trimap: Union[str, np.ndarray] = None):
data = {}
data['img'] = img
if trimap is not None:
data['trimap'] = trimap
data['gt_fields'] = ['trimap']
data['trans_info'] = []
data = self.transforms(data)
data['img'] = paddle.to_tensor(data['img'])
data['img'] = data['img'].unsqueeze(0)
if trimap is not None:
data['trimap'] = paddle.to_tensor(data['trimap'])
data['trimap'] = data['trimap'].unsqueeze((0, 1))
return data
def forward(self, inputs: dict):
x = inputs['img']
feat_list = self.backbone(x)
y = self.head(inputs=inputs, feat_list=feat_list)
return y
def predict(self, image_list: list, trimap_list: list = None, visualization: bool =False, save_path: str = "modnet_mobilenetv2_matting_output"):
self.eval()
result = []
with paddle.no_grad():
for i, im_path in enumerate(image_list):
trimap = trimap_list[i] if trimap_list is not None else None
data = self.preprocess(img=im_path, transforms=self.transforms, trimap=trimap)
alpha_pred = self.forward(data)
alpha_pred = P.reverse_transform(alpha_pred, data['trans_info'])
alpha_pred = (alpha_pred.numpy()).squeeze()
alpha_pred = (alpha_pred * 255).astype('uint8')
alpha_pred = P.save_alpha_pred(alpha_pred, trimap)
result.append(alpha_pred)
if visualization:
if not os.path.exists(save_path):
os.makedirs(save_path)
img_name = str(time.time()) + '.png'
image_save_path = os.path.join(save_path, img_name)
cv2.imwrite(image_save_path, alpha_pred)
return result
@serving
def serving_method(self, images: list, trimaps:list = None, **kwargs):
"""
Run as a service.
"""
images_decode = [P.base64_to_cv2(image) for image in images]
if trimaps is not None:
trimap_decoder = [cv2.cvtColor(P.base64_to_cv2(trimap), cv2.COLOR_BGR2GRAY) for trimap in trimaps]
else:
trimap_decoder = None
outputs = self.predict(image_list=images_decode, trimap_list= trimap_decoder, **kwargs)
serving_data = [P.cv2_to_base64(outputs[i]) for i in range(len(outputs))]
results = {'data': serving_data}
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
args = self.parser.parse_args(argvs)
if args.trimap_path is not None:
trimap_list = [args.trimap_path]
else:
trimap_list = None
results = self.predict(image_list=[args.input_path], trimap_list=trimap_list, save_path=args.output_dir, visualization=args.visualization)
return results
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument(
'--output_dir', type=str, default="modnet_mobilenetv2_matting_output", help="The directory to save output images.")
self.arg_config_group.add_argument(
'--visualization', type=bool, default=True, help="whether to save output as images.")
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
self.arg_input_group.add_argument('--trimap_path', type=str, default=None, help="path to image.")
class MODNetHead(nn.Layer):
"""
Segmentation head.
"""
def __init__(self, hr_channels: int, backbone_channels: int):
super().__init__()
self.lr_branch = LRBranch(backbone_channels)
self.hr_branch = HRBranch(hr_channels, backbone_channels)
self.f_branch = FusionBranch(hr_channels, backbone_channels)
def forward(self, inputs: paddle.Tensor, feat_list: list):
pred_semantic, lr8x, [enc2x, enc4x] = self.lr_branch(feat_list)
pred_detail, hr2x = self.hr_branch(inputs['img'], enc2x, enc4x, lr8x)
pred_matte = self.f_branch(inputs['img'], lr8x, hr2x)
if self.training:
logit_dict = {
'semantic': pred_semantic,
'detail': pred_detail,
'matte': pred_matte
}
return logit_dict
else:
return pred_matte
class FusionBranch(nn.Layer):
def __init__(self, hr_channels: int, enc_channels: int):
super().__init__()
self.conv_lr4x = Conv2dIBNormRelu(
enc_channels[2], hr_channels, 5, stride=1, padding=2)
self.conv_f2x = Conv2dIBNormRelu(
2 * hr_channels, hr_channels, 3, stride=1, padding=1)
self.conv_f = nn.Sequential(
Conv2dIBNormRelu(
hr_channels + 3, int(hr_channels / 2), 3, stride=1, padding=1),
Conv2dIBNormRelu(
int(hr_channels / 2),
1,
1,
stride=1,
padding=0,
with_ibn=False,
with_relu=False))
def forward(self, img: paddle.Tensor, lr8x: paddle.Tensor, hr2x: paddle.Tensor):
lr4x = F.interpolate(
lr8x, scale_factor=2, mode='bilinear', align_corners=False)
lr4x = self.conv_lr4x(lr4x)
lr2x = F.interpolate(
lr4x, scale_factor=2, mode='bilinear', align_corners=False)
f2x = self.conv_f2x(paddle.concat((lr2x, hr2x), axis=1))
f = F.interpolate(
f2x, scale_factor=2, mode='bilinear', align_corners=False)
f = self.conv_f(paddle.concat((f, img), axis=1))
pred_matte = F.sigmoid(f)
return pred_matte
class HRBranch(nn.Layer):
"""
High Resolution Branch of MODNet
"""
def __init__(self, hr_channels: int, enc_channels:int):
super().__init__()
self.tohr_enc2x = Conv2dIBNormRelu(
enc_channels[0], hr_channels, 1, stride=1, padding=0)
self.conv_enc2x = Conv2dIBNormRelu(
hr_channels + 3, hr_channels, 3, stride=2, padding=1)
self.tohr_enc4x = Conv2dIBNormRelu(
enc_channels[1], hr_channels, 1, stride=1, padding=0)
self.conv_enc4x = Conv2dIBNormRelu(
2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1)
self.conv_hr4x = nn.Sequential(
Conv2dIBNormRelu(
2 * hr_channels + enc_channels[2] + 3,
2 * hr_channels,
3,
stride=1,
padding=1),
Conv2dIBNormRelu(
2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(
2 * hr_channels, hr_channels, 3, stride=1, padding=1))
self.conv_hr2x = nn.Sequential(
Conv2dIBNormRelu(
2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(
2 * hr_channels, hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1))
self.conv_hr = nn.Sequential(
Conv2dIBNormRelu(
hr_channels + 3, hr_channels, 3, stride=1, padding=1),
Conv2dIBNormRelu(
hr_channels,
1,
1,
stride=1,
padding=0,
with_ibn=False,
with_relu=False))
def forward(self, img: paddle.Tensor, enc2x: paddle.Tensor, enc4x: paddle.Tensor, lr8x: paddle.Tensor):
img2x = F.interpolate(
img, scale_factor=1 / 2, mode='bilinear', align_corners=False)
img4x = F.interpolate(
img, scale_factor=1 / 4, mode='bilinear', align_corners=False)
enc2x = self.tohr_enc2x(enc2x)
hr4x = self.conv_enc2x(paddle.concat((img2x, enc2x), axis=1))
enc4x = self.tohr_enc4x(enc4x)
hr4x = self.conv_enc4x(paddle.concat((hr4x, enc4x), axis=1))
lr4x = F.interpolate(
lr8x, scale_factor=2, mode='bilinear', align_corners=False)
hr4x = self.conv_hr4x(paddle.concat((hr4x, lr4x, img4x), axis=1))
hr2x = F.interpolate(
hr4x, scale_factor=2, mode='bilinear', align_corners=False)
hr2x = self.conv_hr2x(paddle.concat((hr2x, enc2x), axis=1))
pred_detail = None
if self.training:
hr = F.interpolate(
hr2x, scale_factor=2, mode='bilinear', align_corners=False)
hr = self.conv_hr(paddle.concat((hr, img), axis=1))
pred_detail = F.sigmoid(hr)
return pred_detail, hr2x
class LRBranch(nn.Layer):
"""
Low Resolution Branch of MODNet
"""
def __init__(self, backbone_channels: int):
super().__init__()
self.se_block = SEBlock(backbone_channels[4], reduction=4)
self.conv_lr16x = Conv2dIBNormRelu(
backbone_channels[4], backbone_channels[3], 5, stride=1, padding=2)
self.conv_lr8x = Conv2dIBNormRelu(
backbone_channels[3], backbone_channels[2], 5, stride=1, padding=2)
self.conv_lr = Conv2dIBNormRelu(
backbone_channels[2],
1,
3,
stride=2,
padding=1,
with_ibn=False,
with_relu=False)
def forward(self, feat_list: list):
enc2x, enc4x, enc32x = feat_list[0], feat_list[1], feat_list[4]
enc32x = self.se_block(enc32x)
lr16x = F.interpolate(
enc32x, scale_factor=2, mode='bilinear', align_corners=False)
lr16x = self.conv_lr16x(lr16x)
lr8x = F.interpolate(
lr16x, scale_factor=2, mode='bilinear', align_corners=False)
lr8x = self.conv_lr8x(lr8x)
pred_semantic = None
if self.training:
lr = self.conv_lr(lr8x)
pred_semantic = F.sigmoid(lr)
return pred_semantic, lr8x, [enc2x, enc4x]
class IBNorm(nn.Layer):
"""
Combine Instance Norm and Batch Norm into One Layer
"""
def __init__(self, in_channels: int):
super().__init__()
self.bnorm_channels = in_channels // 2
self.inorm_channels = in_channels - self.bnorm_channels
self.bnorm = nn.BatchNorm2D(self.bnorm_channels)
self.inorm = nn.InstanceNorm2D(self.inorm_channels)
def forward(self, x):
bn_x = self.bnorm(x[:, :self.bnorm_channels, :, :])
in_x = self.inorm(x[:, self.bnorm_channels:, :, :])
return paddle.concat((bn_x, in_x), 1)
class Conv2dIBNormRelu(nn.Layer):
"""
Convolution + IBNorm + Relu
"""
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: int,
stride: int = 1,
padding: int = 0,
dilation:int = 1,
groups: int = 1,
bias_attr: paddle.ParamAttr = None,
with_ibn: bool = True,
with_relu: bool = True):
super().__init__()
layers = [
nn.Conv2D(
in_channels,
out_channels,
kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
groups=groups,
bias_attr=bias_attr)
]
if with_ibn:
layers.append(IBNorm(out_channels))
if with_relu:
layers.append(nn.ReLU())
self.layers = nn.Sequential(*layers)
def forward(self, x: paddle.Tensor):
return self.layers(x)
class SEBlock(nn.Layer):
"""
SE Block Proposed in https://arxiv.org/pdf/1709.01507.pdf
"""
def __init__(self, num_channels: int, reduction:int = 1):
super().__init__()
self.pool = nn.AdaptiveAvgPool2D(1)
self.conv = nn.Sequential(
nn.Conv2D(
num_channels,
int(num_channels // reduction),
1,
bias_attr=False), nn.ReLU(),
nn.Conv2D(
int(num_channels // reduction),
num_channels,
1,
bias_attr=False), nn.Sigmoid())
def forward(self, x: paddle.Tensor):
w = self.pool(x)
w = self.conv(w)
return w * x
class GaussianBlurLayer(nn.Layer):
""" Add Gaussian Blur to a 4D tensors
This layer takes a 4D tensor of {N, C, H, W} as input.
The Gaussian blur will be performed in given channel number (C) splitly.
"""
def __init__(self, channels: int, kernel_size: int):
"""
Args:
channels (int): Channel for input tensor
kernel_size (int): Size of the kernel used in blurring
"""
super(GaussianBlurLayer, self).__init__()
self.channels = channels
self.kernel_size = kernel_size
assert self.kernel_size % 2 != 0
self.op = nn.Sequential(
nn.Pad2D(int(self.kernel_size / 2), mode='reflect'),
nn.Conv2D(
channels,
channels,
self.kernel_size,
stride=1,
padding=0,
bias_attr=False,
groups=channels))
self._init_kernel()
self.op[1].weight.stop_gradient = True
def forward(self, x: paddle.Tensor):
"""
Args:
x (paddle.Tensor): input 4D tensor
Returns:
paddle.Tensor: Blurred version of the input
"""
if not len(list(x.shape)) == 4:
print('\'GaussianBlurLayer\' requires a 4D tensor as input\n')
exit()
elif not x.shape[1] == self.channels:
print('In \'GaussianBlurLayer\', the required channel ({0}) is'
'not the same as input ({1})\n'.format(
self.channels, x.shape[1]))
exit()
return self.op(x)
def _init_kernel(self):
sigma = 0.3 * ((self.kernel_size - 1) * 0.5 - 1) + 0.8
n = np.zeros((self.kernel_size, self.kernel_size))
i = int(self.kernel_size / 2)
n[i, i] = 1
kernel = scipy.ndimage.gaussian_filter(n, sigma)
kernel = kernel.astype('float32')
kernel = kernel[np.newaxis, np.newaxis, :, :]
paddle.assign(kernel, self.op[1].weight)
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册