diff --git a/modules/image/Image_gan/gan/stgan_bald/requirements.txt b/modules/image/Image_gan/gan/stgan_bald/requirements.txt
index 2d8443d02d090d830649fbfacbc11c8cebea8d34..00a00fcc8e48e65538cf8b73b2fd4e1157362f20 100644
--- a/modules/image/Image_gan/gan/stgan_bald/requirements.txt
+++ b/modules/image/Image_gan/gan/stgan_bald/requirements.txt
@@ -1,2 +1 @@
-paddlepaddle>=1.8.4
paddlehub>=1.8.0
diff --git a/modules/image/classification/food_classification/requirements.txt b/modules/image/classification/food_classification/requirements.txt
index f3c5b8fb12473794251e0a4669dac313cb93eff4..0e661622dcc3a99395344610b83850e5535961b6 100644
--- a/modules/image/classification/food_classification/requirements.txt
+++ b/modules/image/classification/food_classification/requirements.txt
@@ -1,3 +1,2 @@
-paddlepaddle >= 2.0.0
paddlehub >= 2.0.0
paddlex == 1.3.7
diff --git a/modules/image/matting/dim_vgg16_matting/README.md b/modules/image/matting/dim_vgg16_matting/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..07f8e1ac0d4673c164e692d3854efc077494be44
--- /dev/null
+++ b/modules/image/matting/dim_vgg16_matting/README.md
@@ -0,0 +1,154 @@
+# dim_vgg16_matting
+
+|模型名称|dim_vgg16_matting|
+| :--- | :---: |
+|类别|图像-抠图|
+|网络|dim_vgg16|
+|数据集|百度自建数据集|
+|是否支持Fine-tuning|否|
+|模型大小|164MB|
+|指标|SAD112.73|
+|最新更新日期|2021-12-03|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+
+ - 样例结果示例(左为原图,右为效果图):
+
+
+
+
+
+- ### 模型介绍
+
+ - Matting(精细化分割/影像去背/抠图)是指借由计算前景的颜色和透明度,将前景从影像中撷取出来的技术,可用于替换背景、影像合成、视觉特效,在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值,称作阿法值(Alpha),一张影像中所有阿法值的集合称作阿法遮罩(Alpha Matte),将影像被遮罩所涵盖的部分取出即可完成前景的分离。dim_vgg16_matting是一种需要trimap作为输入的matting模型。
+
+
+
+ - 更多详情请参考:[dim_vgg16_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install dim_vgg16_matting
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run dim_vgg16_matting --input_path "/PATH/TO/IMAGE" --trimap_path "/PATH/TO/TRIMAP"
+ ```
+
+ - 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="dim_vgg16_matting")
+
+ result = model.predict(image_list=["/PATH/TO/IMAGE"], trimap_list=["PATH/TO/TRIMAP"])
+ print(result)
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ trimap_list,
+ visualization,
+ save_path):
+ ```
+
+ - 人像matting预测API,用于将输入图片中的人像分割出来。
+
+ - 参数
+
+ - image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
+ - trimap_list(list(str | numpy.ndarray)):trimap输入路径或者单通道灰度图片。
+ - visualization (bool): 是否进行可视化,默认为False。
+ - save_path (str): 当visualization为True时,保存图片的路径,默认为"dim_vgg16_matting_output" 。
+
+ - 返回
+
+ - result (list(numpy.ndarray)):模型分割结果:
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署人像matting在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m dim_vgg16_matting
+ ```
+
+ - 这样就完成了一个人像matting在线服务API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))], 'trimaps':[cv2_to_base64(cv2.imread("/PATH/TO/TRIMAP"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/dim_vgg16_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/matting/dim_vgg16_matting/README_en.md b/modules/image/matting/dim_vgg16_matting/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..aaffb278a85f8076fd0ed5d536e2d5870bb478ca
--- /dev/null
+++ b/modules/image/matting/dim_vgg16_matting/README_en.md
@@ -0,0 +1,156 @@
+# dim_vgg16_matting
+
+|Module Name|dim_vgg16_matting|
+| :--- | :---: |
+|Category|Matting|
+|Network|dim_vgg16|
+|Dataset|Baidu self-built dataset|
+|Support Fine-tuning|No|
+|Module Size|164MB|
+|Data Indicators|-|
+|Latest update date|2021-12-03|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+- ### Module Introduction
+
+ - Mating is the technique of extracting foreground from an image by calculating its color and transparency. It is widely used in the film industry to replace background, image composition, and visual effects. Each pixel in the image will have a value that represents its foreground transparency, called Alpha. The set of all Alpha values in an image is called Alpha Matte. The part of the image covered by the mask can be extracted to complete foreground separation.
+
+
+
+ - For more information, please refer to: [dim_vgg16_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install dim_vgg16_matting
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run dim_vgg16_matting --input_path "/PATH/TO/IMAGE" --trimap_path "/PATH/TO/TRIMAP"
+ ```
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="dim_vgg16_matting")
+
+ result = model.predict(image_list=["/PATH/TO/IMAGE"], trimap_list=["PATH/TO/TRIMAP"])
+ print(result)
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ trimap_list,
+ visualization,
+ save_path):
+ ```
+
+ - Prediction API for matting.
+
+ - **Parameter**
+
+ - image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\],BGR.
+ - trimap_list(list(str | numpy.ndarray)): Trimap path or trimap data, ndarray.shape is in the format \[H, W],Gray style.
+ - visualization (bool): Whether to save the recognition results as picture files, default is False.
+ - save_path (str): Save path of images, "dim_vgg16_matting_output" by default.
+
+ - **Return**
+
+ - result (list(numpy.ndarray)):The list of model results.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of matting.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m dim_vgg16_matting
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))], 'trimaps':[cv2_to_base64(cv2.imread("/PATH/TO/TRIMAP"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/dim_vgg16_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/matting/dim_vgg16_matting/module.py b/modules/image/matting/dim_vgg16_matting/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..2ae3c0d36fbdf6a827bb1093a80c1def67de17cd
--- /dev/null
+++ b/modules/image/matting/dim_vgg16_matting/module.py
@@ -0,0 +1,288 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import time
+import argparse
+from typing import Callable, Union, List, Tuple
+
+import numpy as np
+import cv2
+import scipy
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.module import moduleinfo, runnable, serving
+from paddleseg.models import layers
+
+from dim_vgg16_matting.vgg import VGG16
+import dim_vgg16_matting.processor as P
+
+
+@moduleinfo(
+ name="dim_vgg16_matting",
+ type="CV/matting",
+ author="paddlepaddle",
+ summary="dim_vgg16_matting is a matting model",
+ version="1.0.0"
+)
+class DIMVGG16(nn.Layer):
+ """
+ The DIM implementation based on PaddlePaddle.
+
+ The original article refers to
+ Ning Xu, et, al. "Deep Image Matting"
+ (https://arxiv.org/pdf/1908.07919.pdf).
+
+ Args:
+ stage (int, optional): The stage of model. Defautl: 3.
+ decoder_input_channels(int, optional): The channel of decoder input. Default: 512.
+ pretrained(str, optional): The path of pretrianed model. Defautl: None.
+
+ """
+ def __init__(self,
+ stage: int = 3,
+ decoder_input_channels: int = 512,
+ pretrained: str = None):
+ super(DIMVGG16, self).__init__()
+
+ self.backbone = VGG16()
+ self.pretrained = pretrained
+ self.stage = stage
+
+ decoder_output_channels = [64, 128, 256, 512]
+ self.decoder = Decoder(
+ input_channels=decoder_input_channels,
+ output_channels=decoder_output_channels)
+ if self.stage == 2:
+ for param in self.backbone.parameters():
+ param.stop_gradient = True
+ for param in self.decoder.parameters():
+ param.stop_gradient = True
+ if self.stage >= 2:
+ self.refine = Refine()
+
+ self.transforms = P.Compose([P.LoadImages(), P.LimitLong(max_long=3840),P.Normalize()])
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'dim-vgg16.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def preprocess(self, img: Union[str, np.ndarray] , transforms: Callable, trimap: Union[str, np.ndarray] = None) -> dict:
+ data = {}
+ data['img'] = img
+ if trimap is not None:
+ data['trimap'] = trimap
+ data['gt_fields'] = ['trimap']
+ data['trans_info'] = []
+ data = self.transforms(data)
+ data['img'] = paddle.to_tensor(data['img'])
+ data['img'] = data['img'].unsqueeze(0)
+ if trimap is not None:
+ data['trimap'] = paddle.to_tensor(data['trimap'])
+ data['trimap'] = data['trimap'].unsqueeze((0, 1))
+
+ return data
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ input_shape = paddle.shape(inputs['img'])[-2:]
+ x = paddle.concat([inputs['img'], inputs['trimap'] / 255], axis=1)
+ fea_list = self.backbone(x)
+
+ # decoder stage
+ up_shape = []
+ for i in range(5):
+ up_shape.append(paddle.shape(fea_list[i])[-2:])
+ alpha_raw = self.decoder(fea_list, up_shape)
+ alpha_raw = F.interpolate(
+ alpha_raw, input_shape, mode='bilinear', align_corners=False)
+ logit_dict = {'alpha_raw': alpha_raw}
+ if self.stage < 2:
+ return logit_dict
+
+ if self.stage >= 2:
+ # refine stage
+ refine_input = paddle.concat([inputs['img'], alpha_raw], axis=1)
+ alpha_refine = self.refine(refine_input)
+
+ # finally alpha
+ alpha_pred = alpha_refine + alpha_raw
+ alpha_pred = F.interpolate(
+ alpha_pred, input_shape, mode='bilinear', align_corners=False)
+ if not self.training:
+ alpha_pred = paddle.clip(alpha_pred, min=0, max=1)
+ logit_dict['alpha_pred'] = alpha_pred
+
+ return alpha_pred
+
+ def predict(self, image_list: list, trimap_list: list, visualization: bool =False, save_path: str = "dim_vgg16_matting_output") -> list:
+ self.eval()
+ result= []
+ with paddle.no_grad():
+ for i, im_path in enumerate(image_list):
+ trimap = trimap_list[i] if trimap_list is not None else None
+ data = self.preprocess(img=im_path, transforms=self.transforms, trimap=trimap)
+ alpha_pred = self.forward(data)
+ alpha_pred = P.reverse_transform(alpha_pred, data['trans_info'])
+ alpha_pred = (alpha_pred.numpy()).squeeze()
+ alpha_pred = (alpha_pred * 255).astype('uint8')
+ alpha_pred = P.save_alpha_pred(alpha_pred, trimap)
+ result.append(alpha_pred)
+ if visualization:
+ if not os.path.exists(save_path):
+ os.makedirs(save_path)
+ img_name = str(time.time()) + '.png'
+ image_save_path = os.path.join(save_path, img_name)
+ cv2.imwrite(image_save_path, alpha_pred)
+
+ return result
+
+ @serving
+ def serving_method(self, images: list, trimaps:list, **kwargs) -> dict:
+ """
+ Run as a service.
+ """
+ images_decode = [P.base64_to_cv2(image) for image in images]
+
+ if trimaps is not None:
+ trimap_decoder = [cv2.cvtColor(P.base64_to_cv2(trimap), cv2.COLOR_BGR2GRAY) for trimap in trimaps]
+ else:
+ trimap_decoder = None
+
+ outputs = self.predict(image_list=images_decode, trimap_list= trimap_decoder, **kwargs)
+
+ serving_data = [P.cv2_to_base64(outputs[i]) for i in range(len(outputs))]
+ results = {'data': serving_data}
+
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list) -> list:
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ args = self.parser.parse_args(argvs)
+ if args.trimap_path is not None:
+ trimap_list = [args.trimap_path]
+ else:
+ trimap_list = None
+
+ results = self.predict(image_list=[args.input_path], trimap_list=trimap_list, save_path=args.output_dir, visualization=args.visualization)
+
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default="dim_vgg16_matting_output", help="The directory to save output images.")
+ self.arg_config_group.add_argument(
+ '--visualization', type=bool, default=True, help="whether to save output as images.")
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
+ self.arg_input_group.add_argument('--trimap_path', type=str, help="path to trimap.")
+
+
+class Up(nn.Layer):
+ def __init__(self, input_channels: int, output_channels: int):
+ super().__init__()
+ self.conv = layers.ConvBNReLU(
+ input_channels,
+ output_channels,
+ kernel_size=5,
+ padding=2,
+ bias_attr=False)
+
+ def forward(self, x: paddle.Tensor, skip: paddle.Tensor, output_shape: list) -> paddle.Tensor:
+ x = F.interpolate(
+ x, size=output_shape, mode='bilinear', align_corners=False)
+ x = x + skip
+ x = self.conv(x)
+ x = F.relu(x)
+
+ return x
+
+
+class Decoder(nn.Layer):
+ def __init__(self, input_channels: int, output_channels: list = [64, 128, 256, 512]):
+ super().__init__()
+ self.deconv6 = nn.Conv2D(
+ input_channels, input_channels, kernel_size=1, bias_attr=False)
+ self.deconv5 = Up(input_channels, output_channels[-1])
+ self.deconv4 = Up(output_channels[-1], output_channels[-2])
+ self.deconv3 = Up(output_channels[-2], output_channels[-3])
+ self.deconv2 = Up(output_channels[-3], output_channels[-4])
+ self.deconv1 = Up(output_channels[-4], 64)
+
+ self.alpha_conv = nn.Conv2D(
+ 64, 1, kernel_size=5, padding=2, bias_attr=False)
+
+ def forward(self, fea_list: list, shape_list: list) -> paddle.Tensor:
+ x = fea_list[-1]
+ x = self.deconv6(x)
+ x = self.deconv5(x, fea_list[4], shape_list[4])
+ x = self.deconv4(x, fea_list[3], shape_list[3])
+ x = self.deconv3(x, fea_list[2], shape_list[2])
+ x = self.deconv2(x, fea_list[1], shape_list[1])
+ x = self.deconv1(x, fea_list[0], shape_list[0])
+ alpha = self.alpha_conv(x)
+ alpha = F.sigmoid(alpha)
+
+ return alpha
+
+
+class Refine(nn.Layer):
+ def __init__(self):
+ super().__init__()
+ self.conv1 = layers.ConvBNReLU(
+ 4, 64, kernel_size=3, padding=1, bias_attr=False)
+ self.conv2 = layers.ConvBNReLU(
+ 64, 64, kernel_size=3, padding=1, bias_attr=False)
+ self.conv3 = layers.ConvBNReLU(
+ 64, 64, kernel_size=3, padding=1, bias_attr=False)
+ self.alpha_pred = layers.ConvBNReLU(
+ 64, 1, kernel_size=3, padding=1, bias_attr=False)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.conv1(x)
+ x = self.conv2(x)
+ x = self.conv3(x)
+ alpha = self.alpha_pred(x)
+
+ return alpha
diff --git a/modules/image/matting/dim_vgg16_matting/processor.py b/modules/image/matting/dim_vgg16_matting/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..87e499c2960bb0e76ba6e498a2f00ca508ee19a6
--- /dev/null
+++ b/modules/image/matting/dim_vgg16_matting/processor.py
@@ -0,0 +1,220 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import random
+import base64
+from typing import Callable, Union, List, Tuple
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+from paddleseg.transforms import functional
+from PIL import Image
+
+
+class Compose:
+ """
+ Do transformation on input data with corresponding pre-processing and augmentation operations.
+ The shape of input data to all operations is [height, width, channels].
+ """
+
+ def __init__(self, transforms: Callable, to_rgb: bool = True):
+ if not isinstance(transforms, list):
+ raise TypeError('The transforms must be a list!')
+ self.transforms = transforms
+ self.to_rgb = to_rgb
+
+ def __call__(self, data: dict) -> dict:
+
+ if 'trans_info' not in data:
+ data['trans_info'] = []
+ for op in self.transforms:
+ data = op(data)
+ if data is None:
+ return None
+
+ data['img'] = np.transpose(data['img'], (2, 0, 1))
+ for key in data.get('gt_fields', []):
+ if len(data[key].shape) == 2:
+ continue
+ data[key] = np.transpose(data[key], (2, 0, 1))
+
+ return data
+
+
+class LoadImages:
+ """
+ Read images from image path.
+
+ Args:
+ to_rgb (bool, optional): If converting image to RGB color space. Default: True.
+ """
+ def __init__(self, to_rgb: bool = True):
+ self.to_rgb = to_rgb
+
+ def __call__(self, data: dict) -> dict:
+
+ if isinstance(data['img'], str):
+ data['img'] = cv2.imread(data['img'])
+
+ for key in data.get('gt_fields', []):
+ if isinstance(data[key], str):
+ data[key] = cv2.imread(data[key], cv2.IMREAD_UNCHANGED)
+ # if alpha and trimap has 3 channels, extract one.
+ if key in ['alpha', 'trimap']:
+ if len(data[key].shape) > 2:
+ data[key] = data[key][:, :, 0]
+
+ if self.to_rgb:
+ data['img'] = cv2.cvtColor(data['img'], cv2.COLOR_BGR2RGB)
+ for key in data.get('gt_fields', []):
+ if len(data[key].shape) == 2:
+ continue
+ data[key] = cv2.cvtColor(data[key], cv2.COLOR_BGR2RGB)
+
+ return data
+
+
+class LimitLong:
+ """
+ Limit the long edge of image.
+
+ If the long edge is larger than max_long, resize the long edge
+ to max_long, while scale the short edge proportionally.
+
+ If the long edge is smaller than min_long, resize the long edge
+ to min_long, while scale the short edge proportionally.
+
+ Args:
+ max_long (int, optional): If the long edge of image is larger than max_long,
+ it will be resize to max_long. Default: None.
+ min_long (int, optional): If the long edge of image is smaller than min_long,
+ it will be resize to min_long. Default: None.
+ """
+
+ def __init__(self, max_long=None, min_long=None):
+ if max_long is not None:
+ if not isinstance(max_long, int):
+ raise TypeError(
+ "Type of `max_long` is invalid. It should be int, but it is {}"
+ .format(type(max_long)))
+ if min_long is not None:
+ if not isinstance(min_long, int):
+ raise TypeError(
+ "Type of `min_long` is invalid. It should be int, but it is {}"
+ .format(type(min_long)))
+ if (max_long is not None) and (min_long is not None):
+ if min_long > max_long:
+ raise ValueError(
+ '`max_long should not smaller than min_long, but they are {} and {}'
+ .format(max_long, min_long))
+ self.max_long = max_long
+ self.min_long = min_long
+
+ def __call__(self, data):
+ h, w = data['img'].shape[:2]
+ long_edge = max(h, w)
+ target = long_edge
+ if (self.max_long is not None) and (long_edge > self.max_long):
+ target = self.max_long
+ elif (self.min_long is not None) and (long_edge < self.min_long):
+ target = self.min_long
+
+ if target != long_edge:
+ data['trans_info'].append(('resize', data['img'].shape[0:2]))
+ data['img'] = functional.resize_long(data['img'], target)
+ for key in data.get('gt_fields', []):
+ data[key] = functional.resize_long(data[key], target)
+
+ return data
+
+
+class Normalize:
+ """
+ Normalize an image.
+
+ Args:
+ mean (list, optional): The mean value of a data set. Default: [0.5, 0.5, 0.5].
+ std (list, optional): The standard deviation of a data set. Default: [0.5, 0.5, 0.5].
+
+ Raises:
+ ValueError: When mean/std is not list or any value in std is 0.
+ """
+
+ def __init__(self, mean: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5), std: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5)):
+ self.mean = mean
+ self.std = std
+ if not (isinstance(self.mean, (list, tuple))
+ and isinstance(self.std, (list, tuple))):
+ raise ValueError(
+ "{}: input type is invalid. It should be list or tuple".format(
+ self))
+ from functools import reduce
+ if reduce(lambda x, y: x * y, self.std) == 0:
+ raise ValueError('{}: std is invalid!'.format(self))
+
+ def __call__(self, data: dict) -> dict:
+ mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
+ std = np.array(self.std)[np.newaxis, np.newaxis, :]
+ data['img'] = functional.normalize(data['img'], mean, std)
+ if 'fg' in data.get('gt_fields', []):
+ data['fg'] = functional.normalize(data['fg'], mean, std)
+ if 'bg' in data.get('gt_fields', []):
+ data['bg'] = functional.normalize(data['bg'], mean, std)
+
+ return data
+
+
+def reverse_transform(alpha: paddle.Tensor, trans_info: List[str]):
+ """recover pred to origin shape"""
+ for item in trans_info[::-1]:
+ if item[0] == 'resize':
+ h, w = item[1][0], item[1][1]
+ alpha = F.interpolate(alpha, [h, w], mode='bilinear')
+ elif item[0] == 'padding':
+ h, w = item[1][0], item[1][1]
+ alpha = alpha[:, :, 0:h, 0:w]
+ else:
+ raise Exception("Unexpected info '{}' in im_info".format(item[0]))
+ return alpha
+
+def save_alpha_pred(alpha: np.ndarray, trimap: np.ndarray = None):
+ """
+ The value of alpha is range [0, 1], shape should be [h,w]
+ """
+ if isinstance(trimap, str):
+ trimap = cv2.imread(trimap, 0)
+ alpha[trimap == 0] = 0
+ alpha[trimap == 255] = 255
+ alpha = (alpha).astype('uint8')
+ return alpha
+
+
+def cv2_to_base64(image: np.ndarray):
+ """
+ Convert data from BGR to base64 format.
+ """
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+def base64_to_cv2(b64str: str):
+ """
+ Convert data from base64 to BGR format.
+ """
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
\ No newline at end of file
diff --git a/modules/image/matting/dim_vgg16_matting/requirements.py b/modules/image/matting/dim_vgg16_matting/requirements.py
new file mode 100644
index 0000000000000000000000000000000000000000..7df0ef23928361724c3fadb8d87d6a3be869e58b
--- /dev/null
+++ b/modules/image/matting/dim_vgg16_matting/requirements.py
@@ -0,0 +1 @@
+paddleseg >= 2.3.0
diff --git a/modules/image/matting/dim_vgg16_matting/vgg.py b/modules/image/matting/dim_vgg16_matting/vgg.py
new file mode 100644
index 0000000000000000000000000000000000000000..11cc9ccc51867996d2726522f0e2f1b156895cd7
--- /dev/null
+++ b/modules/image/matting/dim_vgg16_matting/vgg.py
@@ -0,0 +1,142 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import List, Tuple
+
+import paddle
+from paddle import ParamAttr
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn import Conv2D, BatchNorm, Linear, Dropout
+from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D
+
+from paddleseg.utils import utils
+
+
+class ConvBlock(nn.Layer):
+ def __init__(self, input_channels: int, output_channels: int, groups: int, name: str = None):
+ super(ConvBlock, self).__init__()
+
+ self.groups = groups
+ self._conv_1 = Conv2D(
+ in_channels=input_channels,
+ out_channels=output_channels,
+ kernel_size=3,
+ stride=1,
+ padding=1,
+ weight_attr=ParamAttr(name=name + "1_weights"),
+ bias_attr=False)
+ if groups == 2 or groups == 3 or groups == 4:
+ self._conv_2 = Conv2D(
+ in_channels=output_channels,
+ out_channels=output_channels,
+ kernel_size=3,
+ stride=1,
+ padding=1,
+ weight_attr=ParamAttr(name=name + "2_weights"),
+ bias_attr=False)
+ if groups == 3 or groups == 4:
+ self._conv_3 = Conv2D(
+ in_channels=output_channels,
+ out_channels=output_channels,
+ kernel_size=3,
+ stride=1,
+ padding=1,
+ weight_attr=ParamAttr(name=name + "3_weights"),
+ bias_attr=False)
+ if groups == 4:
+ self._conv_4 = Conv2D(
+ in_channels=output_channels,
+ out_channels=output_channels,
+ kernel_size=3,
+ stride=1,
+ padding=1,
+ weight_attr=ParamAttr(name=name + "4_weights"),
+ bias_attr=False)
+
+ self._pool = MaxPool2D(
+ kernel_size=2, stride=2, padding=0, return_mask=True)
+
+ def forward(self, inputs: paddle.Tensor) -> List[paddle.Tensor]:
+ x = self._conv_1(inputs)
+ x = F.relu(x)
+ if self.groups == 2 or self.groups == 3 or self.groups == 4:
+ x = self._conv_2(x)
+ x = F.relu(x)
+ if self.groups == 3 or self.groups == 4:
+ x = self._conv_3(x)
+ x = F.relu(x)
+ if self.groups == 4:
+ x = self._conv_4(x)
+ x = F.relu(x)
+ skip = x
+ x, max_indices = self._pool(x)
+ return x, max_indices, skip
+
+
+class VGGNet(nn.Layer):
+ def __init__(self, input_channels: int = 4, layers: int = 11, pretrained: str = None):
+ super(VGGNet, self).__init__()
+ self.pretrained = pretrained
+
+ self.layers = layers
+ self.vgg_configure = {
+ 11: [1, 1, 2, 2, 2],
+ 13: [2, 2, 2, 2, 2],
+ 16: [2, 2, 3, 3, 3],
+ 19: [2, 2, 4, 4, 4]
+ }
+ assert self.layers in self.vgg_configure.keys(), \
+ "supported layers are {} but input layer is {}".format(
+ self.vgg_configure.keys(), layers)
+ self.groups = self.vgg_configure[self.layers]
+
+ # matting的第一层卷积输入为4通道,初始化是直接初始化为0
+ self._conv_block_1 = ConvBlock(
+ input_channels, 64, self.groups[0], name="conv1_")
+ self._conv_block_2 = ConvBlock(64, 128, self.groups[1], name="conv2_")
+ self._conv_block_3 = ConvBlock(128, 256, self.groups[2], name="conv3_")
+ self._conv_block_4 = ConvBlock(256, 512, self.groups[3], name="conv4_")
+ self._conv_block_5 = ConvBlock(512, 512, self.groups[4], name="conv5_")
+
+ # 这一层的初始化需要利用vgg fc6的参数转换后进行初始化,可以暂时不考虑初始化
+ self._conv_6 = Conv2D(
+ 512, 512, kernel_size=3, padding=1, bias_attr=False)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ fea_list = []
+ ids_list = []
+ x, ids, skip = self._conv_block_1(inputs)
+ fea_list.append(skip)
+ ids_list.append(ids)
+ x, ids, skip = self._conv_block_2(x)
+ fea_list.append(skip)
+ ids_list.append(ids)
+ x, ids, skip = self._conv_block_3(x)
+ fea_list.append(skip)
+ ids_list.append(ids)
+ x, ids, skip = self._conv_block_4(x)
+ fea_list.append(skip)
+ ids_list.append(ids)
+ x, ids, skip = self._conv_block_5(x)
+ fea_list.append(skip)
+ ids_list.append(ids)
+ x = F.relu(self._conv_6(x))
+ fea_list.append(x)
+ return fea_list
+
+
+def VGG16(**args):
+ model = VGGNet(layers=16, **args)
+ return model
\ No newline at end of file
diff --git a/modules/image/matting/gfm_resnet34_matting/README.md b/modules/image/matting/gfm_resnet34_matting/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..7787fddc230c59995b48f4f1bc8065517d70069b
--- /dev/null
+++ b/modules/image/matting/gfm_resnet34_matting/README.md
@@ -0,0 +1,153 @@
+# gfm_resnet34_matting
+
+|模型名称|gfm_resnet34_matting|
+| :--- | :---: |
+|类别|图像-抠图|
+|网络|gfm_resnet34|
+|数据集|AM-2k|
+|是否支持Fine-tuning|否|
+|模型大小|562MB|
+|指标|SAD10.89|
+|最新更新日期|2021-12-03|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+
+ - 样例结果示例(左为原图,右为效果图):
+
+
+
+
+
+- ### 模型介绍
+
+ - Matting(精细化分割/影像去背/抠图)是指借由计算前景的颜色和透明度,将前景从影像中撷取出来的技术,可用于替换背景、影像合成、视觉特效,在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值,称作阿法值(Alpha),一张影像中所有阿法值的集合称作阿法遮罩(Alpha Matte),将影像被遮罩所涵盖的部分取出即可完成前景的分离。gfm_resnet34_matting可生成抠图结果。
+
+
+
+ - 更多详情请参考:[gfm_resnet34_matting](https://github.com/JizhiziLi/GFM)
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install gfm_resnet34_matting
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run gfm_resnet34_matting --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="gfm_resnet34_matting")
+ result = model.predict(["/PATH/TO/IMAGE"])
+ print(result)
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ visualization,
+ save_path):
+ ```
+
+ - 动物matting预测API,用于将输入图片中的动物分割出来。
+
+ - 参数
+
+ - image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
+ - visualization (bool): 是否进行可视化,默认为False。
+ - save_path (str): 当visualization为True时,保存图片的路径,默认为"gfm_resnet34_matting_output"。
+
+ - 返回
+
+ - result (list(numpy.ndarray)):模型分割结果:
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署动物matting在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m gfm_resnet34_matting
+ ```
+
+ - 这样就完成了一个动物matting在线服务API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/gfm_resnet34_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
diff --git a/modules/image/matting/gfm_resnet34_matting/README_en.md b/modules/image/matting/gfm_resnet34_matting/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..c16a3657b47489845ac44fcadaf99baec55b676e
--- /dev/null
+++ b/modules/image/matting/gfm_resnet34_matting/README_en.md
@@ -0,0 +1,154 @@
+# gfm_resnet34_matting
+
+|Module Name|gfm_resnet34_matting|
+| :--- | :---: |
+|Category|Image Matting|
+|Network|gfm_resnet34|
+|Dataset|AM-2k|
+|Support Fine-tuning|No|
+|Module Size|562MB|
+|Data Indicators|SAD10.89|
+|Latest update date|2021-12-03|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+- ### Module Introduction
+
+ - Mating is the technique of extracting foreground from an image by calculating its color and transparency. It is widely used in the film industry to replace background, image composition, and visual effects. Each pixel in the image will have a value that represents its foreground transparency, called Alpha. The set of all Alpha values in an image is called Alpha Matte. The part of the image covered by the mask can be extracted to complete foreground separation.
+
+
+
+ - For more information, please refer to: [gfm_resnet34_matting](https://github.com/JizhiziLi/GFM)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install gfm_resnet34_matting
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run gfm_resnet34_matting --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="gfm_resnet34_matting")
+ result = model.predict(["/PATH/TO/IMAGE"])
+ print(result)
+
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ visualization,
+ save_path):
+ ```
+
+ - Prediction API for matting.
+
+ - **Parameter**
+
+ - image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\],BGR.
+ - visualization (bool): Whether to save the recognition results as picture files, default is False.
+ - save_path (str): Save path of images, "modnet_mobilenetv2_matting_output" by default.
+
+ - **Return**
+
+ - result (list(numpy.ndarray)):The list of model results.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of matting.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m gfm_resnet34_matting
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/gfm_resnet34_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/matting/gfm_resnet34_matting/gfm.py b/modules/image/matting/gfm_resnet34_matting/gfm.py
new file mode 100644
index 0000000000000000000000000000000000000000..4b7306c2282467ec80bbf8f1c7540afb25a1b72f
--- /dev/null
+++ b/modules/image/matting/gfm_resnet34_matting/gfm.py
@@ -0,0 +1,447 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import Callable, Union, List, Tuple
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+from gfm_resnet34_matting.resnet import resnet34
+
+
+def conv3x3(in_planes: int, out_planes: int, stride: int = 1) -> Callable:
+ """3x3 convolution with padding"""
+ return nn.Conv2D(in_planes, out_planes, kernel_size=3, stride=stride,
+ padding=1, bias_attr=False)
+
+
+def conv_up_psp(in_channels: int, out_channels: int, up_sample: float) -> Callable:
+ return nn.Sequential(nn.Conv2D(in_channels, out_channels, 3, padding=1),
+ nn.BatchNorm2D(out_channels),
+ nn.ReLU(),
+ nn.Upsample(scale_factor=up_sample, mode='bilinear',align_corners = False))
+
+
+def build_bb(in_channels: int, mid_channels: int, out_channels: int) -> Callable:
+ return nn.Sequential(nn.Conv2D(in_channels, mid_channels, 3, dilation=2,
+ padding=2), nn.BatchNorm2D(mid_channels), nn.
+ ReLU(), nn.Conv2D(mid_channels, out_channels, 3,
+ dilation=2, padding=2), nn.BatchNorm2D(out_channels), nn.ReLU(), nn.Conv2D(out_channels,
+ out_channels, 3, dilation=2, padding=2), nn.BatchNorm2D(
+ out_channels), nn.ReLU())
+
+
+def build_decoder(in_channels: int, mid_channels_1: int, mid_channels_2: int, out_channels: int,
+ last_bnrelu: bool, upsample_flag: bool) -> Callable:
+ layers = []
+ layers += [nn.Conv2D(in_channels, mid_channels_1, 3, padding=1), nn.
+ BatchNorm2D(mid_channels_1), nn.ReLU(), nn.Conv2D(mid_channels_1, mid_channels_2, 3, padding=1), nn.
+ BatchNorm2D(mid_channels_2), nn.ReLU(), nn.Conv2D(mid_channels_2, out_channels, 3, padding=1)]
+ if last_bnrelu:
+ layers += [nn.BatchNorm2D(out_channels), nn.ReLU()]
+
+ if upsample_flag:
+ layers += [nn.Upsample(scale_factor=2, mode='bilinear')]
+
+ sequential = nn.Sequential(*layers)
+ return sequential
+
+
+class BasicBlock(nn.Layer):
+ expansion = 1
+ def __init__(self, inplanes: int, planes: int, stride: int = 1, downsample=None):
+ super(BasicBlock, self).__init__()
+ self.conv1 = conv3x3(inplanes, planes, stride)
+ self.bn1 = nn.BatchNorm2D(planes)
+ self.relu = nn.ReLU()
+ self.conv2 = conv3x3(planes, planes)
+ self.bn2 = nn.BatchNorm2D(planes)
+ self.downsample = downsample
+ self.stride = stride
+
+ def forward(self, x: paddle.Tensor) -> Callable:
+ residual = x
+ out = self.conv1(x)
+ out = self.bn1(out)
+ out = self.relu(out)
+ out = self.conv2(out)
+ out = self.bn2(out)
+ if self.downsample is not None:
+ residual = self.downsample(x)
+ out += residual
+ out = self.relu(out)
+ return out
+
+
+class PSPModule(nn.Layer):
+
+ def __init__(self, features: paddle.Tensor, out_features: int = 1024, sizes: List[int] = (1, 2, 3, 6)):
+ super().__init__()
+ #self.stages = []
+ self.stages = nn.LayerList([self._make_stage(features, size) for
+ size in sizes])
+ self.bottleneck = nn.Conv2D(features * (len(sizes) + 1),
+ out_features, kernel_size=1)
+ self.relu = nn.ReLU()
+
+ def _make_stage(self, features: paddle.Tensor, size: int) -> Callable:
+ prior = nn.AdaptiveAvgPool2D(output_size=(size, size))
+ conv = nn.Conv2D(features, features, kernel_size=1, bias_attr=False)
+ return nn.Sequential(prior, conv)
+
+ def forward(self, feats: paddle.Tensor) -> paddle.Tensor:
+ h, w = feats.shape[2], feats.shape[3]
+ priors = [F.upsample(stage(feats), size=(h, w), mode='bilinear',align_corners = True) for stage in self.stages] + [feats]
+ bottle = self.bottleneck(paddle.concat(priors, 1))
+ return self.relu(bottle)
+
+
+class SELayer(nn.Layer):
+
+ def __init__(self, channel: int, reduction: int = 4):
+ super(SELayer, self).__init__()
+ self.avg_pool = nn.AdaptiveAvgPool2D(1)
+ self.fc = nn.Sequential(nn.Linear(channel, channel // reduction,
+ bias_attr=False), nn.ReLU(), nn.
+ Linear(channel // reduction, channel, bias_attr=False), nn.
+ Sigmoid())
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ b, c, _, _ = x.size()
+ y = self.avg_pool(x).view(b, c)
+ y = self.fc(y).view(b, c, 1, 1)
+ return x * y.expand_as(x)
+
+
+class GFM(nn.Layer):
+ """
+ The GFM implementation based on PaddlePaddle.
+
+ The original article refers to:
+ Bridging Composite and Real: Towards End-to-end Deep Image Matting [IJCV-2021]
+ Main network file (GFM).
+
+ Copyright (c) 2021, Jizhizi Li (jili8515@uni.sydney.edu.au)
+ Licensed under the MIT License (see LICENSE for details)
+ Github repo: https://github.com/JizhiziLi/GFM
+ Paper link (Arxiv): https://arxiv.org/abs/2010.16188
+
+ """
+
+ def __init__(self):
+ super().__init__()
+ self.backbone = 'r34_2b'
+ self.rosta = 'TT'
+ if self.rosta == 'TT':
+ self.gd_channel = 3
+ else:
+ self.gd_channel = 2
+ if self.backbone == 'r34_2b':
+ self.resnet = resnet34()
+ self.encoder0 = nn.Sequential(nn.Conv2D(3, 64, 3, padding=1),
+ nn.BatchNorm2D(64), nn.ReLU())
+ self.encoder1 = self.resnet.layer1
+ self.encoder2 = self.resnet.layer2
+ self.encoder3 = self.resnet.layer3
+ self.encoder4 = self.resnet.layer4
+ self.encoder5 = nn.Sequential(nn.MaxPool2D(2, 2, ceil_mode=True
+ ), BasicBlock(512, 512), BasicBlock(512, 512), BasicBlock(
+ 512, 512))
+ self.encoder6 = nn.Sequential(nn.MaxPool2D(2, 2, ceil_mode=True
+ ), BasicBlock(512, 512), BasicBlock(512, 512), BasicBlock(
+ 512, 512))
+ self.psp_module = PSPModule(512, 512, (1, 3, 5))
+ self.psp6 = conv_up_psp(512, 512, 2)
+ self.psp5 = conv_up_psp(512, 512, 4)
+ self.psp4 = conv_up_psp(512, 256, 8)
+ self.psp3 = conv_up_psp(512, 128, 16)
+ self.psp2 = conv_up_psp(512, 64, 32)
+ self.psp1 = conv_up_psp(512, 64, 32)
+ self.decoder6_g = build_decoder(1024, 512, 512, 512, True, True)
+ self.decoder5_g = build_decoder(1024, 512, 512, 512, True, True)
+ self.decoder4_g = build_decoder(1024, 512, 512, 256, True, True)
+ self.decoder3_g = build_decoder(512, 256, 256, 128, True, True)
+ self.decoder2_g = build_decoder(256, 128, 128, 64, True, True)
+ self.decoder1_g = build_decoder(128, 64, 64, 64, True, False)
+ self.bridge_block = build_bb(512, 512, 512)
+ self.decoder6_f = build_decoder(1024, 512, 512, 512, True, True)
+ self.decoder5_f = build_decoder(1024, 512, 512, 512, True, True)
+ self.decoder4_f = build_decoder(1024, 512, 512, 256, True, True)
+ self.decoder3_f = build_decoder(512, 256, 256, 128, True, True)
+ self.decoder2_f = build_decoder(256, 128, 128, 64, True, True)
+ self.decoder1_f = build_decoder(128, 64, 64, 64, True, False)
+ if self.rosta == 'RIM':
+ self.decoder0_g_tt = nn.Sequential(nn.Conv2D(64, 3, 3,
+ padding=1))
+ self.decoder0_g_ft = nn.Sequential(nn.Conv2D(64, 2, 3,
+ padding=1))
+ self.decoder0_g_bt = nn.Sequential(nn.Conv2D(64, 2, 3,
+ padding=1))
+ self.decoder0_f_tt = nn.Sequential(nn.Conv2D(64, 1, 3,
+ padding=1))
+ self.decoder0_f_ft = nn.Sequential(nn.Conv2D(64, 1, 3,
+ padding=1))
+ self.decoder0_f_bt = nn.Sequential(nn.Conv2D(64, 1, 3,
+ padding=1))
+ else:
+ self.decoder0_g = nn.Sequential(nn.Conv2D(64, self.
+ gd_channel, 3, padding=1))
+ self.decoder0_f = nn.Sequential(nn.Conv2D(64, 1, 3, padding=1))
+ if self.backbone == 'r34':
+ self.encoder0 = nn.Sequential(self.resnet.conv1, self.resnet.
+ bn1, self.resnet.relu)
+
+ self.encoder1 = nn.Sequential(self.resnet.maxpool, self.resnet.
+ layer1)
+ self.encoder2 = self.resnet.layer2
+ self.encoder3 = self.resnet.layer3
+ self.encoder4 = self.resnet.layer4
+ self.psp_module = PSPModule(512, 512, (1, 3, 5))
+ self.psp4 = conv_up_psp(512, 256, 2)
+ self.psp3 = conv_up_psp(512, 128, 4)
+ self.psp2 = conv_up_psp(512, 64, 8)
+ self.psp1 = conv_up_psp(512, 64, 16)
+ self.decoder4_g = build_decoder(1024, 512, 512, 256, True, True)
+ self.decoder3_g = build_decoder(512, 256, 256, 128, True, True)
+ self.decoder2_g = build_decoder(256, 128, 128, 64, True, True)
+ self.decoder1_g = build_decoder(128, 64, 64, 64, True, True)
+ self.bridge_block = build_bb(512, 512, 512)
+ self.decoder4_f = build_decoder(1024, 512, 512, 256, True, True)
+ self.decoder3_f = build_decoder(512, 256, 256, 128, True, True)
+ self.decoder2_f = build_decoder(256, 128, 128, 64, True, True)
+ self.decoder1_f = build_decoder(128, 64, 64, 64, True, True)
+ if self.rosta == 'RIM':
+ self.decoder0_g_tt = build_decoder(128, 64, 64, 3, False, True)
+ self.decoder0_g_ft = build_decoder(128, 64, 64, 2, False, True)
+ self.decoder0_g_bt = build_decoder(128, 64, 64, 2, False, True)
+ self.decoder0_f_tt = build_decoder(128, 64, 64, 1, False, True)
+ self.decoder0_f_ft = build_decoder(128, 64, 64, 1, False, True)
+ self.decoder0_f_bt = build_decoder(128, 64, 64, 1, False, True)
+ else:
+ self.decoder0_g = build_decoder(128, 64, 64, self.
+ gd_channel, False, True)
+ self.decoder0_f = build_decoder(128, 64, 64, 1, False, True)
+ elif self.backbone == 'r101':
+ self.encoder0 = nn.Sequential(self.resnet.conv1, self.resnet.
+ bn1, self.resnet.relu)
+ self.encoder1 = nn.Sequential(self.resnet.maxpool, self.resnet.
+ layer1)
+ self.encoder2 = self.resnet.layer2
+ self.encoder3 = self.resnet.layer3
+ self.encoder4 = self.resnet.layer4
+ self.psp_module = PSPModule(2048, 2048, (1, 3, 5))
+ self.bridge_block = build_bb(2048, 2048, 2048)
+ self.psp4 = conv_up_psp(2048, 1024, 2)
+ self.psp3 = conv_up_psp(2048, 512, 4)
+ self.psp2 = conv_up_psp(2048, 256, 8)
+ self.psp1 = conv_up_psp(2048, 64, 16)
+ self.decoder4_g = build_decoder(4096, 2048, 1024, 1024, True, True)
+ self.decoder3_g = build_decoder(2048, 1024, 512, 512, True, True)
+ self.decoder2_g = build_decoder(1024, 512, 256, 256, True, True)
+ self.decoder1_g = build_decoder(512, 256, 128, 64, True, True)
+ self.decoder4_f = build_decoder(4096, 2048, 1024, 1024, True, True)
+ self.decoder3_f = build_decoder(2048, 1024, 512, 512, True, True)
+ self.decoder2_f = build_decoder(1024, 512, 256, 256, True, True)
+ self.decoder1_f = build_decoder(512, 256, 128, 64, True, True)
+ if self.rosta == 'RIM':
+ self.decoder0_g_tt = build_decoder(128, 64, 64, 3, False, True)
+ self.decoder0_g_ft = build_decoder(128, 64, 64, 2, False, True)
+ self.decoder0_g_bt = build_decoder(128, 64, 64, 2, False, True)
+ self.decoder0_f_tt = build_decoder(128, 64, 64, 1, False, True)
+ self.decoder0_f_ft = build_decoder(128, 64, 64, 1, False, True)
+ self.decoder0_f_bt = build_decoder(128, 64, 64, 1, False, True)
+ else:
+ self.decoder0_g = build_decoder(128, 64, 64, self.
+ gd_channel, False, True)
+ self.decoder0_f = build_decoder(128, 64, 64, 1, False, True)
+ elif self.backbone == 'd121':
+ self.encoder0 = nn.Sequential(self.densenet.features.conv0,
+ self.densenet.features.norm0, self.densenet.features.relu0)
+ self.encoder1 = nn.Sequential(self.densenet.features.
+ denseblock1, self.densenet.features.transition1)
+ self.encoder2 = nn.Sequential(self.densenet.features.
+ denseblock2, self.densenet.features.transition2)
+ self.encoder3 = nn.Sequential(self.densenet.features.
+ denseblock3, self.densenet.features.transition3)
+ self.encoder4 = nn.Sequential(self.densenet.features.
+ denseblock4, nn.Conv2D(1024, 512, 3, padding=1), nn.
+ BatchNorm2D(512), nn.ReLU(),
+ nn.MaxPool2D(2, 2, ceil_mode=True))
+ self.psp_module = PSPModule(512, 512, (1, 3, 5))
+ self.psp4 = conv_up_psp(512, 256, 2)
+ self.psp3 = conv_up_psp(512, 128, 4)
+ self.psp2 = conv_up_psp(512, 64, 8)
+ self.psp1 = conv_up_psp(512, 64, 16)
+ self.decoder4_g = build_decoder(1024, 512, 512, 256, True, True)
+ self.decoder3_g = build_decoder(512, 256, 256, 128, True, True)
+ self.decoder2_g = build_decoder(256, 128, 128, 64, True, True)
+ self.decoder1_g = build_decoder(128, 64, 64, 64, True, True)
+ self.bridge_block = build_bb(512, 512, 512)
+ self.decoder4_f = build_decoder(1024, 512, 512, 256, True, True)
+ self.decoder3_f = build_decoder(768, 256, 256, 128, True, True)
+ self.decoder2_f = build_decoder(384, 128, 128, 64, True, True)
+ self.decoder1_f = build_decoder(192, 64, 64, 64, True, True)
+ if self.rosta == 'RIM':
+ self.decoder0_g_tt = build_decoder(128, 64, 64, 3, False, True)
+ self.decoder0_g_ft = build_decoder(128, 64, 64, 2, False, True)
+ self.decoder0_g_bt = build_decoder(128, 64, 64, 2, False, True)
+ self.decoder0_f_tt = build_decoder(128, 64, 64, 1, False, True)
+ self.decoder0_f_ft = build_decoder(128, 64, 64, 1, False, True)
+ self.decoder0_f_bt = build_decoder(128, 64, 64, 1, False, True)
+ else:
+ self.decoder0_g = build_decoder(128, 64, 64, self.
+ gd_channel, False, True)
+ self.decoder0_f = build_decoder(128, 64, 64, 1, False, True)
+ if self.rosta == 'RIM':
+ self.rim = nn.Sequential(nn.Conv2D(3, 16, 1), SELayer(16), nn.
+ Conv2D(16, 1, 1))
+
+ def forward(self, input: paddle.Tensor) -> List[paddle.Tensor]:
+ glance_sigmoid = paddle.zeros(input.shape)
+ glance_sigmoid.stop_gradient = True
+ focus_sigmoid = paddle.zeros(input.shape)
+ focus_sigmoid.stop_gradient = True
+ fusion_sigmoid = paddle.zeros(input.shape)
+ fusion_sigmoid.stop_gradient = True
+ e0 = self.encoder0(input)
+ e1 = self.encoder1(e0)
+ e2 = self.encoder2(e1)
+ e3 = self.encoder3(e2)
+ e4 = self.encoder4(e3)
+ if self.backbone == 'r34_2b':
+ e5 = self.encoder5(e4)
+ e6 = self.encoder6(e5)
+ psp = self.psp_module(e6)
+ d6_g = self.decoder6_g(paddle.concat((psp, e6), 1))
+ d5_g = self.decoder5_g(paddle.concat((self.psp6(psp),
+ d6_g), 1))
+ d4_g = self.decoder4_g(paddle.concat((self.psp5(psp),
+ d5_g), 1))
+ else:
+ psp = self.psp_module(e4)
+ d4_g = self.decoder4_g(paddle.concat((psp, e4), 1))
+ d3_g = self.decoder3_g(paddle.concat((self.psp4(psp), d4_g), 1))
+ d2_g = self.decoder2_g(paddle.concat((self.psp3(psp), d3_g), 1))
+ d1_g = self.decoder1_g(paddle.concat((self.psp2(psp), d2_g), 1))
+ if self.backbone == 'r34_2b':
+ if self.rosta == 'RIM':
+ d0_g_tt = self.decoder0_g_tt(d1_g)
+ d0_g_ft = self.decoder0_g_ft(d1_g)
+ d0_g_bt = self.decoder0_g_bt(d1_g)
+ else:
+ d0_g = self.decoder0_g(d1_g)
+ elif self.rosta == 'RIM':
+ d0_g_tt = self.decoder0_g_tt(paddle.concat((self.psp1(psp
+ ), d1_g), 1))
+ d0_g_ft = self.decoder0_g_ft(paddle.concat((self.psp1(psp
+ ), d1_g), 1))
+ d0_g_bt = self.decoder0_g_bt(paddle.concat((self.psp1(psp
+ ), d1_g), 1))
+ else:
+ d0_g = self.decoder0_g(paddle.concat((self.psp1(psp),
+ d1_g), 1))
+ if self.rosta == 'RIM':
+ glance_sigmoid_tt = F.sigmoid(d0_g_tt)
+ glance_sigmoid_ft = F.sigmoid(d0_g_ft)
+ glance_sigmoid_bt = F.sigmoid(d0_g_bt)
+ else:
+ glance_sigmoid = F.sigmoid(d0_g)
+ if self.backbone == 'r34_2b':
+ bb = self.bridge_block(e6)
+ d6_f = self.decoder6_f(paddle.concat((bb, e6), 1))
+ d5_f = self.decoder5_f(paddle.concat((d6_f, e5), 1))
+ d4_f = self.decoder4_f(paddle.concat((d5_f, e4), 1))
+ else:
+ bb = self.bridge_block(e4)
+ d4_f = self.decoder4_f(paddle.concat((bb, e4), 1))
+ d3_f = self.decoder3_f(paddle.concat((d4_f, e3), 1))
+ d2_f = self.decoder2_f(paddle.concat((d3_f, e2), 1))
+ d1_f = self.decoder1_f(paddle.concat((d2_f, e1), 1))
+ if self.backbone == 'r34_2b':
+ if self.rosta == 'RIM':
+ d0_f_tt = self.decoder0_f_tt(d1_f)
+ d0_f_ft = self.decoder0_f_ft(d1_f)
+ d0_f_bt = self.decoder0_f_bt(d1_f)
+ else:
+ d0_f = self.decoder0_f(d1_f)
+ elif self.rosta == 'RIM':
+ d0_f_tt = self.decoder0_f_tt(paddle.concat((d1_f, e0), 1))
+ d0_f_ft = self.decoder0_f_ft(paddle.concat((d1_f, e0), 1))
+ d0_f_bt = self.decoder0_f_bt(paddle.concat((d1_f, e0), 1))
+ else:
+ d0_f = self.decoder0_f(paddle.concat((d1_f, e0), 1))
+ if self.rosta == 'RIM':
+ focus_sigmoid_tt = F.sigmoid(d0_f_tt)
+ focus_sigmoid_ft = F.sigmoid(d0_f_ft)
+ focus_sigmoid_bt = F.sigmoid(d0_f_bt)
+ else:
+ focus_sigmoid = F.sigmoid(d0_f)
+ if self.rosta == 'RIM':
+ fusion_sigmoid_tt = collaborative_matting('TT',
+ glance_sigmoid_tt, focus_sigmoid_tt)
+ fusion_sigmoid_ft = collaborative_matting('FT',
+ glance_sigmoid_ft, focus_sigmoid_ft)
+ fusion_sigmoid_bt = collaborative_matting('BT',
+ glance_sigmoid_bt, focus_sigmoid_bt)
+ fusion_sigmoid = paddle.concat((fusion_sigmoid_tt,
+ fusion_sigmoid_ft, fusion_sigmoid_bt), 1)
+ fusion_sigmoid = self.rim(fusion_sigmoid)
+ return [[glance_sigmoid_tt, focus_sigmoid_tt, fusion_sigmoid_tt
+ ], [glance_sigmoid_ft, focus_sigmoid_ft, fusion_sigmoid_ft],
+ [glance_sigmoid_bt, focus_sigmoid_bt, fusion_sigmoid_bt],
+ fusion_sigmoid]
+ else:
+ fusion_sigmoid = collaborative_matting(self.rosta,
+ glance_sigmoid, focus_sigmoid)
+ return glance_sigmoid, focus_sigmoid, fusion_sigmoid
+
+
+def collaborative_matting(rosta, glance_sigmoid, focus_sigmoid):
+ if rosta == 'TT':
+ values = paddle.max(glance_sigmoid, axis=1)
+ index = paddle.argmax(glance_sigmoid, axis=1)
+ index = index[:, None, :, :].float()
+ bg_mask = index.clone()
+ bg_mask[bg_mask == 2] = 1
+ bg_mask = 1 - bg_mask
+ trimap_mask = index.clone()
+ trimap_mask[trimap_mask == 2] = 0
+ fg_mask = index.clone()
+ fg_mask[fg_mask == 1] = 0
+ fg_mask[fg_mask == 2] = 1
+ focus_sigmoid = focus_sigmoid.cpu()
+ trimap_mask = trimap_mask.cpu()
+ fg_mask = fg_mask.cpu()
+ fusion_sigmoid = focus_sigmoid * trimap_mask + fg_mask
+ elif rosta == 'BT':
+ values = paddle.max(glance_sigmoid, axis=1)
+ index = paddle.argmax(glance_sigmoid, axis=1)
+ index = index[:, None, :, :].float()
+ fusion_sigmoid = index - focus_sigmoid
+ fusion_sigmoid[fusion_sigmoid < 0] = 0
+ else:
+ values = paddle.max(glance_sigmoid, axis=1)
+ index = paddle.argmax(glance_sigmoid, axis=1)
+ index = index[:, None, :, :].float()
+ fusion_sigmoid = index + focus_sigmoid
+ fusion_sigmoid[fusion_sigmoid > 1] = 1
+ return fusion_sigmoid
+
+
+if __name__ == "__main__":
+ model = GFM()
+ x = paddle.ones([1,3, 256,256])
+ result = model(x)
+ print(x)
\ No newline at end of file
diff --git a/modules/image/matting/gfm_resnet34_matting/module.py b/modules/image/matting/gfm_resnet34_matting/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..f78082fc46da8dadc569ab1db0b78011e4b80bc7
--- /dev/null
+++ b/modules/image/matting/gfm_resnet34_matting/module.py
@@ -0,0 +1,176 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+import os
+import time
+import argparse
+from typing import Callable, Union, List, Tuple
+
+from PIL import Image
+import numpy as np
+import cv2
+import scipy
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.transforms as T
+from paddlehub.module.module import moduleinfo, runnable, serving
+from skimage.transform import resize
+
+from gfm_resnet34_matting.gfm import GFM
+import gfm_resnet34_matting.processor as P
+
+
+@moduleinfo(
+ name="gfm_resnet34_matting",
+ type="CV/matting",
+ author="paddlepaddle",
+ author_email="",
+ summary="gfm_resnet34_matting is an animal matting model.",
+ version="1.0.0")
+class GFMResNet34(nn.Layer):
+ """
+ The GFM implementation based on PaddlePaddle.
+
+ The original article refers to:
+ Bridging Composite and Real: Towards End-to-end Deep Image Matting [IJCV-2021]
+ Main network file (GFM).
+
+ Github repo: https://github.com/JizhiziLi/GFM
+ Paper link (Arxiv): https://arxiv.org/abs/2010.16188
+ """
+
+ def __init__(self, pretrained: str=None):
+ super(GFMResNet34, self).__init__()
+
+ self.model = GFM()
+ self.resize_by_short = P.ResizeByShort(1080)
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.model.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.model.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def preprocess(self, img: Union[str, np.ndarray], h: int, w: int) -> paddle.Tensor:
+ if min(h, w) > 1080:
+ img = self.resize_by_short(img)
+ tensor_img = self.scale_image(img, h, w)
+ return tensor_img
+
+ def scale_image(self, img: np.ndarray, h: int, w: int, ratio: float = 1/3):
+ new_h = min(1600, h - (h % 32))
+ new_w = min(1600, w - (w % 32))
+ resize_h = int(h*ratio)
+ resize_w = int(w*ratio)
+ new_h = min(1600, resize_h - (resize_h % 32))
+ new_w = min(1600, resize_w - (resize_w % 32))
+
+ scale_img = resize(img,(new_h,new_w)) * 255
+ tensor_img = paddle.to_tensor(scale_img.astype(np.float32)[np.newaxis, :, :, :])
+ tensor_img = tensor_img.transpose([0,3,1,2])
+ return tensor_img
+
+
+ def inference_img_scale(self, input: paddle.Tensor) -> List[paddle.Tensor]:
+ pred_global, pred_local, pred_fusion = self.model(input)
+ pred_global = P.gen_trimap_from_segmap_e2e(pred_global)
+ pred_local = pred_local.numpy()[0,0,:,:]
+ pred_fusion = pred_fusion.numpy()[0,0,:,:]
+ return pred_global, pred_local, pred_fusion
+
+
+ def predict(self, image_list: list, visualization: bool =True, save_path: str = "gfm_resnet34_matting_output"):
+ self.model.eval()
+ result = []
+ with paddle.no_grad():
+ for i, img in enumerate(image_list):
+ if isinstance(img, str):
+ img = np.array(Image.open(img))[:,:,:3]
+ else:
+ img = img[:,:,::-1]
+ h, w, _ = img.shape
+ tensor_img = self.preprocess(img, h, w)
+ pred_glance_1, pred_focus_1, pred_fusion_1 = self.inference_img_scale(tensor_img)
+ pred_glance_1 = resize(pred_glance_1,(h,w)) * 255.0
+ tensor_img = self.scale_image(img, h, w, 1/2)
+ pred_glance_2, pred_focus_2, pred_fusion_2 = self.inference_img_scale(tensor_img)
+ pred_focus_2 = resize(pred_focus_2,(h,w))
+ pred_fusion = P.get_masked_local_from_global_test(pred_glance_1, pred_focus_2)
+ pred_fusion = (pred_fusion * 255).astype(np.uint8)
+ if visualization:
+ if not os.path.exists(save_path):
+ os.makedirs(save_path)
+ img_name = str(time.time()) + '.png'
+ image_save_path = os.path.join(save_path, img_name)
+ cv2.imwrite(image_save_path, pred_fusion)
+ result.append(pred_fusion)
+ return result
+
+ @serving
+ def serving_method(self, images: str, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [P.base64_to_cv2(image) for image in images]
+ outputs = self.predict(image_list=images_decode, **kwargs)
+ serving_data = [P.cv2_to_base64(outputs[i]) for i in range(len(outputs))]
+ results = {'data': serving_data}
+
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ args = self.parser.parse_args(argvs)
+
+ results = self.predict(image_list=[args.input_path], save_path=args.output_dir, visualization=args.visualization)
+
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default="gfm_resnet34_matting_output", help="The directory to save output images.")
+ self.arg_config_group.add_argument(
+ '--visualization', type=bool, default=True, help="whether to save output as images.")
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
+
diff --git a/modules/image/matting/gfm_resnet34_matting/processor.py b/modules/image/matting/gfm_resnet34_matting/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..52969d0229111d4cc60ccc02d0d6e39a09231e95
--- /dev/null
+++ b/modules/image/matting/gfm_resnet34_matting/processor.py
@@ -0,0 +1,84 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import base64
+
+import cv2
+import numpy as np
+from paddleseg.transforms import functional
+
+
+class ResizeByLong:
+ """
+ Resize the long side of an image to given size, and then scale the other side proportionally.
+
+ Args:
+ long_size (int): The target size of long side.
+ """
+
+ def __init__(self, long_size):
+ self.long_size = long_size
+
+ def __call__(self, data):
+ data = functional.resize_long(data, self.long_size)
+ return data
+
+
+class ResizeByShort:
+ """
+ Resize the short side of an image to given size, and then scale the other side proportionally.
+
+ Args:
+ short_size (int): The target size of short side.
+ """
+
+ def __init__(self, short_size):
+ self.short_size = short_size
+
+ def __call__(self, data):
+
+ data = functional.resize_short(data, self.short_size)
+
+ return data
+
+def gen_trimap_from_segmap_e2e(segmap):
+ trimap = np.argmax(segmap, axis=1)[0]
+ trimap = trimap.astype(np.int64)
+ trimap[trimap==1]=128
+ trimap[trimap==2]=255
+ return trimap.astype(np.uint8)
+
+def get_masked_local_from_global_test(global_result, local_result):
+ weighted_global = np.ones(global_result.shape)
+ weighted_global[global_result==255] = 0
+ weighted_global[global_result==0] = 0
+ fusion_result = global_result*(1.-weighted_global)/255+local_result*weighted_global
+ return fusion_result
+
+def cv2_to_base64(image: np.ndarray):
+ """
+ Convert data from BGR to base64 format.
+ """
+ data = cv2.imencode('.png', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+def base64_to_cv2(b64str: str):
+ """
+ Convert data from base64 to BGR format.
+ """
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
\ No newline at end of file
diff --git a/modules/image/matting/gfm_resnet34_matting/resnet.py b/modules/image/matting/gfm_resnet34_matting/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..5d2ec70cb6ccd419cdc7725cf35eb267df25dca9
--- /dev/null
+++ b/modules/image/matting/gfm_resnet34_matting/resnet.py
@@ -0,0 +1,201 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+from typing import Type, Any, Callable, Union, List, Optional
+
+
+def conv3x3(in_planes: int, out_planes: int, stride: int=1, groups: int=1,
+ dilation: int=1) ->paddle.nn.Conv2D:
+ """3x3 convolution with padding"""
+ return nn.Conv2D(in_planes, out_planes, kernel_size=3, stride=stride,
+ padding=dilation, groups=groups, dilation=dilation, bias_attr=False)
+
+
+def conv1x1(in_planes: int, out_planes: int, stride: int=1) ->paddle.nn.Conv2D:
+ """1x1 convolution"""
+ return nn.Conv2D(in_planes, out_planes, kernel_size=1, stride=stride,
+ bias_attr=False)
+
+
+class BasicBlock(nn.Layer):
+ expansion: int = 1
+
+ def __init__(self, inplanes: int, planes: int, stride: int=1,
+ downsample: Optional[nn.Layer]=None, groups: int=1, base_width:
+ int=64, dilation: int=1, norm_layer: Optional[Callable[..., paddle.
+ nn.Layer]]=None) ->None:
+ super(BasicBlock, self).__init__()
+ if norm_layer is None:
+ norm_layer = nn.BatchNorm2D
+ if groups != 1 or base_width != 64:
+ raise ValueError(
+ 'BasicBlock only supports groups=1 and base_width=64')
+ if dilation > 1:
+ raise NotImplementedError(
+ 'Dilation > 1 not supported in BasicBlock')
+ self.conv1 = conv3x3(inplanes, planes, stride)
+ self.bn1 = norm_layer(planes)
+ self.relu = paddle.nn.ReLU()
+ self.conv2 = conv3x3(planes, planes)
+ self.bn2 = norm_layer(planes)
+ self.downsample = downsample
+ self.stride = stride
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ identity = x
+ out = self.conv1(x)
+ out = self.bn1(out)
+ out = self.relu(out)
+ out = self.conv2(out)
+ out = self.bn2(out)
+ if self.downsample is not None:
+ identity = self.downsample(x)
+ out += identity
+ out = self.relu(out)
+ return out
+
+
+class Bottleneck(nn.Layer):
+ expansion: int = 4
+
+ def __init__(self, inplanes: int, planes: int, stride: int=1,
+ downsample: Optional[nn.Layer]=None, groups: int=1, base_width:
+ int=64, dilation: int=1, norm_layer: Optional[Callable[..., paddle.
+ nn.Layer]]=None) ->None:
+ super(Bottleneck, self).__init__()
+ if norm_layer is None:
+ norm_layer = nn.BatchNorm2D
+ width = int(planes * (base_width / 64.0)) * groups
+ self.conv1 = conv1x1(inplanes, width)
+ self.bn1 = norm_layer(width)
+ self.conv2 = conv3x3(width, width, stride, groups, dilation)
+ self.bn2 = norm_layer(width)
+ self.conv3 = conv1x1(width, planes * self.expansion)
+ self.bn3 = norm_layer(planes * self.expansion)
+ self.relu = paddle.nn.ReLU()
+ self.downsample = downsample
+ self.stride = stride
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ identity = x
+ out = self.conv1(x)
+ out = self.bn1(out)
+ out = self.relu(out)
+ out = self.conv2(out)
+ out = self.bn2(out)
+ out = self.relu(out)
+ out = self.conv3(out)
+ out = self.bn3(out)
+ if self.downsample is not None:
+ identity = self.downsample(x)
+ out += identity
+ out = self.relu(out)
+ return out
+
+
+class ResNet(nn.Layer):
+
+ def __init__(self, block: Type[Union[BasicBlock, Bottleneck]], layers:
+ List[int], num_classes: int=1000, zero_init_residual: bool=False,
+ groups: int=1, width_per_group: int=64,
+ replace_stride_with_dilation: Optional[List[bool]]=None, norm_layer:
+ Optional[Callable[..., paddle.nn.Layer]]=None) ->None:
+ super(ResNet, self).__init__()
+ if norm_layer is None:
+ norm_layer = nn.BatchNorm2D
+ self._norm_layer = norm_layer
+ self.inplanes = 64
+ self.dilation = 1
+ if replace_stride_with_dilation is None:
+ replace_stride_with_dilation = [False, False, False]
+ if len(replace_stride_with_dilation) != 3:
+ raise ValueError(
+ 'replace_stride_with_dilation should be None or a 3-element tuple, got {}'
+ .format(replace_stride_with_dilation))
+ self.groups = groups
+ self.base_width = width_per_group
+ self.conv1 = nn.Conv2D(3, self.inplanes, kernel_size=7, stride=2,
+ padding=3, bias_attr=False)
+ self.bn1 = norm_layer(self.inplanes)
+ self.relu = paddle.nn.ReLU()
+ self.maxpool = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
+ self.layer1 = self._make_layer(block, 64, layers[0])
+ self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
+ dilate=replace_stride_with_dilation[0])
+ self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
+ dilate=replace_stride_with_dilation[1])
+ self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
+ dilate=replace_stride_with_dilation[2])
+ self.avgpool = nn.AdaptiveAvgPool2D((1, 1))
+ self.fc = nn.Linear(512 * block.expansion, num_classes)
+
+ def _make_layer(self, block: Type[Union[BasicBlock, Bottleneck]],
+ planes: int, blocks: int, stride: int=1, dilate: bool=False
+ ) ->paddle.nn.Sequential:
+ norm_layer = self._norm_layer
+ downsample = None
+ previous_dilation = self.dilation
+ if dilate:
+ self.dilation *= stride
+ stride = 1
+ if stride != 1 or self.inplanes != planes * block.expansion:
+ downsample = nn.Sequential(conv1x1(self.inplanes, planes *
+ block.expansion, stride), norm_layer(planes * block.expansion))
+ layers = []
+ layers.append(block(self.inplanes, planes, stride, downsample, self
+ .groups, self.base_width, previous_dilation, norm_layer))
+ self.inplanes = planes * block.expansion
+ for _ in range(1, blocks):
+ layers.append(block(self.inplanes, planes, groups=self.groups,
+ base_width=self.base_width, dilation=self.dilation,
+ norm_layer=norm_layer))
+ return nn.Sequential(*layers)
+
+ def _forward_impl(self, x: paddle.Tensor) ->paddle.Tensor:
+ x = self.conv1(x)
+ x = self.bn1(x)
+ x = self.relu(x)
+ x = self.maxpool(x)
+ x = self.layer1(x)
+ x = self.layer2(x)
+ x = self.layer3(x)
+ x = self.layer4(x)
+ x = self.avgpool(x)
+ x= paddle.flatten(x,1)
+ x = self.fc(x)
+ return x
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ return self._forward_impl(x)
+
+
+def _resnet(arch: str, block: Type[Union[BasicBlock, Bottleneck]], layers:
+ List[int], pretrained: bool, progress: bool, **kwargs: Any) ->ResNet:
+ model = ResNet(block, layers, **kwargs)
+ return model
+
+
+def resnet34(pretrained: bool=False, progress: bool=True, **kwargs: Any
+ ) ->ResNet:
+ """ResNet-34 model from
+ `"Deep Residual Learning for Image Recognition" `_.
+
+ Args:
+ pretrained (bool): If True, returns a model pre-trained on ImageNet
+ progress (bool): If True, displays a progress bar of the download to stderr
+ """
+ return _resnet('resnet34', BasicBlock, [3, 4, 6, 3], pretrained,
+ progress, **kwargs)
diff --git a/modules/image/matting/modnet_hrnet18_matting/README.md b/modules/image/matting/modnet_hrnet18_matting/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..704635055d6b00a81806987bbd9cd487f09e50b0
--- /dev/null
+++ b/modules/image/matting/modnet_hrnet18_matting/README.md
@@ -0,0 +1,155 @@
+# modnet_hrnet18_matting
+
+|模型名称|modnet_hrnet18_matting|
+| :--- | :---: |
+|类别|图像-抠图|
+|网络|modnet_hrnet18|
+|数据集|百度自建数据集|
+|是否支持Fine-tuning|否|
+|模型大小|60MB|
+|指标|SAD77.96|
+|最新更新日期|2021-12-03|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+
+ - 样例结果示例(左为原图,右为效果图):
+
+
+
+
+
+- ### 模型介绍
+
+ - Matting(精细化分割/影像去背/抠图)是指借由计算前景的颜色和透明度,将前景从影像中撷取出来的技术,可用于替换背景、影像合成、视觉特效,在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值,称作阿法值(Alpha),一张影像中所有阿法值的集合称作阿法遮罩(Alpha Matte),将影像被遮罩所涵盖的部分取出即可完成前景的分离。modnet_hrnet18_matting可生成抠图结果。
+
+
+
+ - 更多详情请参考:[modnet_hrnet18_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install modnet_hrnet18_matting
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run modnet_hrnet18_matting --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="modnet_hrnet18_matting")
+
+ result = model.predict(["/PATH/TO/IMAGE"])
+ print(result)
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ trimap_list,
+ visualization,
+ save_path):
+ ```
+
+ - 人像matting预测API,用于将输入图片中的人像分割出来。
+
+ - 参数
+
+ - image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
+ - trimap_list(list(str | numpy.ndarray)):trimap输入路径或者单通道灰度图格式图片。
+ - visualization (bool): 是否进行可视化,默认为False。
+ - save_path (str): 当visualization为True时,保存图片的路径,默认为"modnet_hrnet18_matting_output"。
+
+ - 返回
+
+ - result (list(numpy.ndarray)):模型分割结果:
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署人像matting在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m modnet_hrnet18_matting
+ ```
+
+ - 这样就完成了一个人像matting在线服务API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/modnet_hrnet18_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/matting/modnet_hrnet18_matting/README_en.md b/modules/image/matting/modnet_hrnet18_matting/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..17524b51b31174b66a01fd13fdb0165d97f46223
--- /dev/null
+++ b/modules/image/matting/modnet_hrnet18_matting/README_en.md
@@ -0,0 +1,156 @@
+# modnet_hrnet18_matting
+
+|Module Name|modnet_hrnet18_matting|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|modnet_mobilenetv2|
+|Dataset|Baidu self-built dataset|
+|Support Fine-tuning|No|
+|Module Size|60MB|
+|Data Indicators|SAD77.96|
+|Latest update date|2021-12-03|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+- ### Module Introduction
+
+ - Mating is the technique of extracting foreground from an image by calculating its color and transparency. It is widely used in the film industry to replace background, image composition, and visual effects. Each pixel in the image will have a value that represents its foreground transparency, called Alpha. The set of all Alpha values in an image is called Alpha Matte. The part of the image covered by the mask can be extracted to complete foreground separation.
+
+
+
+ - For more information, please refer to: [modnet_hrnet18_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install modnet_hrnet18_matting
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run modnet_hrnet18_matting --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="modnet_hrnet18_matting")
+
+ result = model.predict(["/PATH/TO/IMAGE"])
+ print(result)
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ trimap_list,
+ visualization,
+ save_path):
+ ```
+
+ - Prediction API for matting.
+
+ - **Parameter**
+
+ - image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\],BGR.
+ - trimap_list(list(str | numpy.ndarray)): Trimap path or trimap data, ndarray.shape is in the format \[H, W],gray. Default is None
+ - visualization (bool): Whether to save the recognition results as picture files, default is False.
+ - save_path (str): Save path of images, "modnet_hrnet18_matting_output" by default.
+
+ - **Return**
+
+ - result (list(numpy.ndarray)):The list of model results.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of matting.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m modnet_hrnet18_matting
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/modnet_hrnet18_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/matting/modnet_hrnet18_matting/hrnet.py b/modules/image/matting/modnet_hrnet18_matting/hrnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..22cbd377bfd2c5c789f42c273de603d89fd8a24a
--- /dev/null
+++ b/modules/image/matting/modnet_hrnet18_matting/hrnet.py
@@ -0,0 +1,652 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+from paddleseg.cvlibs import manager, param_init
+from paddleseg.models import layers
+from paddleseg.utils import utils
+
+__all__ = ["HRNet_W18"]
+
+
+class HRNet(nn.Layer):
+ """
+ The HRNet implementation based on PaddlePaddle.
+
+ The original article refers to
+ Jingdong Wang, et, al. "HRNet:Deep High-Resolution Representation Learning for Visual Recognition"
+ (https://arxiv.org/pdf/1908.07919.pdf).
+
+ Args:
+ pretrained (str, optional): The path of pretrained model.
+ stage1_num_modules (int, optional): Number of modules for stage1. Default 1.
+ stage1_num_blocks (list, optional): Number of blocks per module for stage1. Default (4).
+ stage1_num_channels (list, optional): Number of channels per branch for stage1. Default (64).
+ stage2_num_modules (int, optional): Number of modules for stage2. Default 1.
+ stage2_num_blocks (list, optional): Number of blocks per module for stage2. Default (4, 4).
+ stage2_num_channels (list, optional): Number of channels per branch for stage2. Default (18, 36).
+ stage3_num_modules (int, optional): Number of modules for stage3. Default 4.
+ stage3_num_blocks (list, optional): Number of blocks per module for stage3. Default (4, 4, 4).
+ stage3_num_channels (list, optional): Number of channels per branch for stage3. Default [18, 36, 72).
+ stage4_num_modules (int, optional): Number of modules for stage4. Default 3.
+ stage4_num_blocks (list, optional): Number of blocks per module for stage4. Default (4, 4, 4, 4).
+ stage4_num_channels (list, optional): Number of channels per branch for stage4. Default (18, 36, 72. 144).
+ has_se (bool, optional): Whether to use Squeeze-and-Excitation module. Default False.
+ align_corners (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even,
+ e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False.
+ """
+
+ def __init__(self,
+ input_channels: int=3,
+ pretrained: int = None,
+ stage1_num_modules: int = 1,
+ stage1_num_blocks: list = (4, ),
+ stage1_num_channels: list = (64, ),
+ stage2_num_modules: int = 1,
+ stage2_num_blocks: list = (4, 4),
+ stage2_num_channels: list = (18, 36),
+ stage3_num_modules: int = 4,
+ stage3_num_blocks: list = (4, 4, 4),
+ stage3_num_channels: list = (18, 36, 72),
+ stage4_num_modules: int = 3,
+ stage4_num_blocks: list = (4, 4, 4, 4),
+ stage4_num_channels: list = (18, 36, 72, 144),
+ has_se: bool = False,
+ align_corners: bool = False,
+ padding_same: bool = True):
+ super(HRNet, self).__init__()
+ self.pretrained = pretrained
+ self.stage1_num_modules = stage1_num_modules
+ self.stage1_num_blocks = stage1_num_blocks
+ self.stage1_num_channels = stage1_num_channels
+ self.stage2_num_modules = stage2_num_modules
+ self.stage2_num_blocks = stage2_num_blocks
+ self.stage2_num_channels = stage2_num_channels
+ self.stage3_num_modules = stage3_num_modules
+ self.stage3_num_blocks = stage3_num_blocks
+ self.stage3_num_channels = stage3_num_channels
+ self.stage4_num_modules = stage4_num_modules
+ self.stage4_num_blocks = stage4_num_blocks
+ self.stage4_num_channels = stage4_num_channels
+ self.has_se = has_se
+ self.align_corners = align_corners
+
+ self.feat_channels = [i for i in stage4_num_channels]
+ self.feat_channels = [64] + self.feat_channels
+
+ self.conv_layer1_1 = layers.ConvBNReLU(
+ in_channels=input_channels,
+ out_channels=64,
+ kernel_size=3,
+ stride=2,
+ padding=1 if not padding_same else 'same',
+ bias_attr=False)
+
+ self.conv_layer1_2 = layers.ConvBNReLU(
+ in_channels=64,
+ out_channels=64,
+ kernel_size=3,
+ stride=2,
+ padding=1 if not padding_same else 'same',
+ bias_attr=False)
+
+ self.la1 = Layer1(
+ num_channels=64,
+ num_blocks=self.stage1_num_blocks[0],
+ num_filters=self.stage1_num_channels[0],
+ has_se=has_se,
+ name="layer2",
+ padding_same=padding_same)
+
+ self.tr1 = TransitionLayer(
+ in_channels=[self.stage1_num_channels[0] * 4],
+ out_channels=self.stage2_num_channels,
+ name="tr1",
+ padding_same=padding_same)
+
+ self.st2 = Stage(
+ num_channels=self.stage2_num_channels,
+ num_modules=self.stage2_num_modules,
+ num_blocks=self.stage2_num_blocks,
+ num_filters=self.stage2_num_channels,
+ has_se=self.has_se,
+ name="st2",
+ align_corners=align_corners,
+ padding_same=padding_same)
+
+ self.tr2 = TransitionLayer(
+ in_channels=self.stage2_num_channels,
+ out_channels=self.stage3_num_channels,
+ name="tr2",
+ padding_same=padding_same)
+ self.st3 = Stage(
+ num_channels=self.stage3_num_channels,
+ num_modules=self.stage3_num_modules,
+ num_blocks=self.stage3_num_blocks,
+ num_filters=self.stage3_num_channels,
+ has_se=self.has_se,
+ name="st3",
+ align_corners=align_corners,
+ padding_same=padding_same)
+
+ self.tr3 = TransitionLayer(
+ in_channels=self.stage3_num_channels,
+ out_channels=self.stage4_num_channels,
+ name="tr3",
+ padding_same=padding_same)
+ self.st4 = Stage(
+ num_channels=self.stage4_num_channels,
+ num_modules=self.stage4_num_modules,
+ num_blocks=self.stage4_num_blocks,
+ num_filters=self.stage4_num_channels,
+ has_se=self.has_se,
+ name="st4",
+ align_corners=align_corners,
+ padding_same=padding_same)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ feat_list = []
+ conv1 = self.conv_layer1_1(x)
+ feat_list.append(conv1)
+ conv2 = self.conv_layer1_2(conv1)
+
+ la1 = self.la1(conv2)
+
+ tr1 = self.tr1([la1])
+ st2 = self.st2(tr1)
+
+ tr2 = self.tr2(st2)
+ st3 = self.st3(tr2)
+
+ tr3 = self.tr3(st3)
+ st4 = self.st4(tr3)
+
+ feat_list = feat_list + st4
+
+ return feat_list
+
+
+class Layer1(nn.Layer):
+ def __init__(self,
+ num_channels: int,
+ num_filters: int,
+ num_blocks: int,
+ has_se: bool = False,
+ name: str = None,
+ padding_same: bool = True):
+ super(Layer1, self).__init__()
+
+ self.bottleneck_block_list = []
+
+ for i in range(num_blocks):
+ bottleneck_block = self.add_sublayer(
+ "bb_{}_{}".format(name, i + 1),
+ BottleneckBlock(
+ num_channels=num_channels if i == 0 else num_filters * 4,
+ num_filters=num_filters,
+ has_se=has_se,
+ stride=1,
+ downsample=True if i == 0 else False,
+ name=name + '_' + str(i + 1),
+ padding_same=padding_same))
+ self.bottleneck_block_list.append(bottleneck_block)
+
+ def forward(self, x: paddle.Tensor):
+ conv = x
+ for block_func in self.bottleneck_block_list:
+ conv = block_func(conv)
+ return conv
+
+
+class TransitionLayer(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ name: str = None,
+ padding_same: bool = True):
+ super(TransitionLayer, self).__init__()
+
+ num_in = len(in_channels)
+ num_out = len(out_channels)
+ self.conv_bn_func_list = []
+ for i in range(num_out):
+ residual = None
+ if i < num_in:
+ if in_channels[i] != out_channels[i]:
+ residual = self.add_sublayer(
+ "transition_{}_layer_{}".format(name, i + 1),
+ layers.ConvBNReLU(
+ in_channels=in_channels[i],
+ out_channels=out_channels[i],
+ kernel_size=3,
+ padding=1 if not padding_same else 'same',
+ bias_attr=False))
+ else:
+ residual = self.add_sublayer(
+ "transition_{}_layer_{}".format(name, i + 1),
+ layers.ConvBNReLU(
+ in_channels=in_channels[-1],
+ out_channels=out_channels[i],
+ kernel_size=3,
+ stride=2,
+ padding=1 if not padding_same else 'same',
+ bias_attr=False))
+ self.conv_bn_func_list.append(residual)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outs = []
+ for idx, conv_bn_func in enumerate(self.conv_bn_func_list):
+ if conv_bn_func is None:
+ outs.append(x[idx])
+ else:
+ if idx < len(x):
+ outs.append(conv_bn_func(x[idx]))
+ else:
+ outs.append(conv_bn_func(x[-1]))
+ return outs
+
+
+class Branches(nn.Layer):
+ def __init__(self,
+ num_blocks: int,
+ in_channels: int,
+ out_channels: int,
+ has_se: bool = False,
+ name: str = None,
+ padding_same: bool = True):
+ super(Branches, self).__init__()
+
+ self.basic_block_list = []
+
+ for i in range(len(out_channels)):
+ self.basic_block_list.append([])
+ for j in range(num_blocks[i]):
+ in_ch = in_channels[i] if j == 0 else out_channels[i]
+ basic_block_func = self.add_sublayer(
+ "bb_{}_branch_layer_{}_{}".format(name, i + 1, j + 1),
+ BasicBlock(
+ num_channels=in_ch,
+ num_filters=out_channels[i],
+ has_se=has_se,
+ name=name + '_branch_layer_' + str(i + 1) + '_' +
+ str(j + 1),
+ padding_same=padding_same))
+ self.basic_block_list[i].append(basic_block_func)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outs = []
+ for idx, input in enumerate(x):
+ conv = input
+ for basic_block_func in self.basic_block_list[idx]:
+ conv = basic_block_func(conv)
+ outs.append(conv)
+ return outs
+
+
+class BottleneckBlock(nn.Layer):
+ def __init__(self,
+ num_channels: int,
+ num_filters: int,
+ has_se: bool,
+ stride: int = 1,
+ downsample: bool = False,
+ name:str = None,
+ padding_same: bool = True):
+ super(BottleneckBlock, self).__init__()
+
+ self.has_se = has_se
+ self.downsample = downsample
+
+ self.conv1 = layers.ConvBNReLU(
+ in_channels=num_channels,
+ out_channels=num_filters,
+ kernel_size=1,
+ bias_attr=False)
+
+ self.conv2 = layers.ConvBNReLU(
+ in_channels=num_filters,
+ out_channels=num_filters,
+ kernel_size=3,
+ stride=stride,
+ padding=1 if not padding_same else 'same',
+ bias_attr=False)
+
+ self.conv3 = layers.ConvBN(
+ in_channels=num_filters,
+ out_channels=num_filters * 4,
+ kernel_size=1,
+ bias_attr=False)
+
+ if self.downsample:
+ self.conv_down = layers.ConvBN(
+ in_channels=num_channels,
+ out_channels=num_filters * 4,
+ kernel_size=1,
+ bias_attr=False)
+
+ if self.has_se:
+ self.se = SELayer(
+ num_channels=num_filters * 4,
+ num_filters=num_filters * 4,
+ reduction_ratio=16,
+ name=name + '_fc')
+
+ self.add = layers.Add()
+ self.relu = layers.Activation("relu")
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ residual = x
+ conv1 = self.conv1(x)
+ conv2 = self.conv2(conv1)
+ conv3 = self.conv3(conv2)
+
+ if self.downsample:
+ residual = self.conv_down(x)
+
+ if self.has_se:
+ conv3 = self.se(conv3)
+
+ y = self.add(conv3, residual)
+ y = self.relu(y)
+ return y
+
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ num_channels: int,
+ num_filters: int,
+ stride: int = 1,
+ has_se: bool = False,
+ downsample: bool = False,
+ name: str = None,
+ padding_same: bool = True):
+ super(BasicBlock, self).__init__()
+
+ self.has_se = has_se
+ self.downsample = downsample
+
+ self.conv1 = layers.ConvBNReLU(
+ in_channels=num_channels,
+ out_channels=num_filters,
+ kernel_size=3,
+ stride=stride,
+ padding=1 if not padding_same else 'same',
+ bias_attr=False)
+ self.conv2 = layers.ConvBN(
+ in_channels=num_filters,
+ out_channels=num_filters,
+ kernel_size=3,
+ padding=1 if not padding_same else 'same',
+ bias_attr=False)
+
+ if self.downsample:
+ self.conv_down = layers.ConvBNReLU(
+ in_channels=num_channels,
+ out_channels=num_filters,
+ kernel_size=1,
+ bias_attr=False)
+
+ if self.has_se:
+ self.se = SELayer(
+ num_channels=num_filters,
+ num_filters=num_filters,
+ reduction_ratio=16,
+ name=name + '_fc')
+
+ self.add = layers.Add()
+ self.relu = layers.Activation("relu")
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ residual = x
+ conv1 = self.conv1(x)
+ conv2 = self.conv2(conv1)
+
+ if self.downsample:
+ residual = self.conv_down(x)
+
+ if self.has_se:
+ conv2 = self.se(conv2)
+
+ y = self.add(conv2, residual)
+ y = self.relu(y)
+ return y
+
+
+class SELayer(nn.Layer):
+ def __init__(self, num_channels: int, num_filters: int, reduction_ratio: int, name: str = None):
+ super(SELayer, self).__init__()
+
+ self.pool2d_gap = nn.AdaptiveAvgPool2D(1)
+
+ self._num_channels = num_channels
+
+ med_ch = int(num_channels / reduction_ratio)
+ stdv = 1.0 / math.sqrt(num_channels * 1.0)
+ self.squeeze = nn.Linear(
+ num_channels,
+ med_ch,
+ weight_attr=paddle.ParamAttr(
+ initializer=nn.initializer.Uniform(-stdv, stdv)))
+
+ stdv = 1.0 / math.sqrt(med_ch * 1.0)
+ self.excitation = nn.Linear(
+ med_ch,
+ num_filters,
+ weight_attr=paddle.ParamAttr(
+ initializer=nn.initializer.Uniform(-stdv, stdv)))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ pool = self.pool2d_gap(x)
+ pool = paddle.reshape(pool, shape=[-1, self._num_channels])
+ squeeze = self.squeeze(pool)
+ squeeze = F.relu(squeeze)
+ excitation = self.excitation(squeeze)
+ excitation = F.sigmoid(excitation)
+ excitation = paddle.reshape(
+ excitation, shape=[-1, self._num_channels, 1, 1])
+ out = x * excitation
+ return out
+
+
+class Stage(nn.Layer):
+ def __init__(self,
+ num_channels: int,
+ num_modules: int,
+ num_blocks: int,
+ num_filters: int,
+ has_se: bool = False,
+ multi_scale_output: bool = True,
+ name: str = None,
+ align_corners: bool = False,
+ padding_same: bool = True):
+ super(Stage, self).__init__()
+
+ self._num_modules = num_modules
+
+ self.stage_func_list = []
+ for i in range(num_modules):
+ if i == num_modules - 1 and not multi_scale_output:
+ stage_func = self.add_sublayer(
+ "stage_{}_{}".format(name, i + 1),
+ HighResolutionModule(
+ num_channels=num_channels,
+ num_blocks=num_blocks,
+ num_filters=num_filters,
+ has_se=has_se,
+ multi_scale_output=False,
+ name=name + '_' + str(i + 1),
+ align_corners=align_corners,
+ padding_same=padding_same))
+ else:
+ stage_func = self.add_sublayer(
+ "stage_{}_{}".format(name, i + 1),
+ HighResolutionModule(
+ num_channels=num_channels,
+ num_blocks=num_blocks,
+ num_filters=num_filters,
+ has_se=has_se,
+ name=name + '_' + str(i + 1),
+ align_corners=align_corners,
+ padding_same=padding_same))
+
+ self.stage_func_list.append(stage_func)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ out = x
+ for idx in range(self._num_modules):
+ out = self.stage_func_list[idx](out)
+ return out
+
+
+class HighResolutionModule(nn.Layer):
+ def __init__(self,
+ num_channels: int,
+ num_blocks: int,
+ num_filters: int,
+ has_se: bool = False,
+ multi_scale_output: bool = True,
+ name: str = None,
+ align_corners: bool = False,
+ padding_same: bool = True):
+ super(HighResolutionModule, self).__init__()
+
+ self.branches_func = Branches(
+ num_blocks=num_blocks,
+ in_channels=num_channels,
+ out_channels=num_filters,
+ has_se=has_se,
+ name=name,
+ padding_same=padding_same)
+
+ self.fuse_func = FuseLayers(
+ in_channels=num_filters,
+ out_channels=num_filters,
+ multi_scale_output=multi_scale_output,
+ name=name,
+ align_corners=align_corners,
+ padding_same=padding_same)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ out = self.branches_func(x)
+ out = self.fuse_func(out)
+ return out
+
+
+class FuseLayers(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ multi_scale_output: bool = True,
+ name: str = None,
+ align_corners: bool = False,
+ padding_same: bool = True):
+ super(FuseLayers, self).__init__()
+
+ self._actual_ch = len(in_channels) if multi_scale_output else 1
+ self._in_channels = in_channels
+ self.align_corners = align_corners
+
+ self.residual_func_list = []
+ for i in range(self._actual_ch):
+ for j in range(len(in_channels)):
+ if j > i:
+ residual_func = self.add_sublayer(
+ "residual_{}_layer_{}_{}".format(name, i + 1, j + 1),
+ layers.ConvBN(
+ in_channels=in_channels[j],
+ out_channels=out_channels[i],
+ kernel_size=1,
+ bias_attr=False))
+ self.residual_func_list.append(residual_func)
+ elif j < i:
+ pre_num_filters = in_channels[j]
+ for k in range(i - j):
+ if k == i - j - 1:
+ residual_func = self.add_sublayer(
+ "residual_{}_layer_{}_{}_{}".format(
+ name, i + 1, j + 1, k + 1),
+ layers.ConvBN(
+ in_channels=pre_num_filters,
+ out_channels=out_channels[i],
+ kernel_size=3,
+ stride=2,
+ padding=1 if not padding_same else 'same',
+ bias_attr=False))
+ pre_num_filters = out_channels[i]
+ else:
+ residual_func = self.add_sublayer(
+ "residual_{}_layer_{}_{}_{}".format(
+ name, i + 1, j + 1, k + 1),
+ layers.ConvBNReLU(
+ in_channels=pre_num_filters,
+ out_channels=out_channels[j],
+ kernel_size=3,
+ stride=2,
+ padding=1 if not padding_same else 'same',
+ bias_attr=False))
+ pre_num_filters = out_channels[j]
+ self.residual_func_list.append(residual_func)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outs = []
+ residual_func_idx = 0
+ for i in range(self._actual_ch):
+ residual = x[i]
+ residual_shape = paddle.shape(residual)[-2:]
+ for j in range(len(self._in_channels)):
+ if j > i:
+ y = self.residual_func_list[residual_func_idx](x[j])
+ residual_func_idx += 1
+
+ y = F.interpolate(
+ y,
+ residual_shape,
+ mode='bilinear',
+ align_corners=self.align_corners)
+ residual = residual + y
+ elif j < i:
+ y = x[j]
+ for k in range(i - j):
+ y = self.residual_func_list[residual_func_idx](y)
+ residual_func_idx += 1
+
+ residual = residual + y
+
+ residual = F.relu(residual)
+ outs.append(residual)
+
+ return outs
+
+
+def HRNet_W18(**kwargs):
+ model = HRNet(
+ stage1_num_modules=1,
+ stage1_num_blocks=[4],
+ stage1_num_channels=[64],
+ stage2_num_modules=1,
+ stage2_num_blocks=[4, 4],
+ stage2_num_channels=[18, 36],
+ stage3_num_modules=4,
+ stage3_num_blocks=[4, 4, 4],
+ stage3_num_channels=[18, 36, 72],
+ stage4_num_modules=3,
+ stage4_num_blocks=[4, 4, 4, 4],
+ stage4_num_channels=[18, 36, 72, 144],
+ **kwargs)
+ return model
\ No newline at end of file
diff --git a/modules/image/matting/modnet_hrnet18_matting/module.py b/modules/image/matting/modnet_hrnet18_matting/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..dd1edbbf7931a92f2ffc03aaf51a35df8b5f2f58
--- /dev/null
+++ b/modules/image/matting/modnet_hrnet18_matting/module.py
@@ -0,0 +1,513 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import time
+import argparse
+from typing import Callable, Union, List, Tuple
+
+import numpy as np
+import cv2
+import scipy
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.module import moduleinfo, runnable, serving
+
+from modnet_hrnet18_matting.hrnet import HRNet_W18
+import modnet_hrnet18_matting.processor as P
+
+
+@moduleinfo(
+ name="modnet_hrnet18_matting",
+ type="CV/matting",
+ author="paddlepaddle",
+ summary="modnet_hrnet18_matting is a matting model",
+ version="1.0.0"
+)
+class MODNetHRNet18(nn.Layer):
+ """
+ The MODNet implementation based on PaddlePaddle.
+
+ The original article refers to
+ Zhanghan Ke, et, al. "Is a Green Screen Really Necessary for Real-Time Portrait Matting?"
+ (https://arxiv.org/pdf/2011.11961.pdf).
+
+ Args:
+ hr_channels(int, optional): The channels of high resolutions branch. Defautl: None.
+ pretrained(str, optional): The path of pretrianed model. Defautl: None.
+ """
+
+ def __init__(self, hr_channels:int = 32, pretrained=None):
+ super(MODNetHRNet18, self).__init__()
+
+ self.backbone = HRNet_W18()
+ self.pretrained = pretrained
+
+ self.head = MODNetHead(
+ hr_channels=hr_channels, backbone_channels=self.backbone.feat_channels)
+ self.blurer = GaussianBlurLayer(1, 3)
+ self.transforms = P.Compose([P.LoadImages(), P.ResizeByShort(), P.ResizeToIntMult(), P.Normalize()])
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'modnet-hrnet_w18.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def preprocess(self, img: Union[str, np.ndarray] , transforms: Callable, trimap: Union[str, np.ndarray] = None):
+ data = {}
+ data['img'] = img
+ if trimap is not None:
+ data['trimap'] = trimap
+ data['gt_fields'] = ['trimap']
+ data['trans_info'] = []
+ data = self.transforms(data)
+ data['img'] = paddle.to_tensor(data['img'])
+ data['img'] = data['img'].unsqueeze(0)
+ if trimap is not None:
+ data['trimap'] = paddle.to_tensor(data['trimap'])
+ data['trimap'] = data['trimap'].unsqueeze((0, 1))
+
+ return data
+
+ def forward(self, inputs: dict) -> paddle.Tensor:
+ x = inputs['img']
+ feat_list = self.backbone(x)
+ y = self.head(inputs=inputs, feat_list=feat_list)
+ return y
+
+ def predict(self, image_list: list, trimap_list: list = None, visualization: bool =False, save_path: str = "modnet_hrnet18_matting_output") -> list:
+ self.eval()
+ result= []
+ with paddle.no_grad():
+ for i, im_path in enumerate(image_list):
+ trimap = trimap_list[i] if trimap_list is not None else None
+ data = self.preprocess(img=im_path, transforms=self.transforms, trimap=trimap)
+ alpha_pred = self.forward(data)
+ alpha_pred = P.reverse_transform(alpha_pred, data['trans_info'])
+ alpha_pred = (alpha_pred.numpy()).squeeze()
+ alpha_pred = (alpha_pred * 255).astype('uint8')
+ alpha_pred = P.save_alpha_pred(alpha_pred, trimap)
+ result.append(alpha_pred)
+ if visualization:
+ if not os.path.exists(save_path):
+ os.makedirs(save_path)
+ img_name = str(time.time()) + '.png'
+ image_save_path = os.path.join(save_path, img_name)
+ cv2.imwrite(image_save_path, alpha_pred)
+
+ return result
+
+ @serving
+ def serving_method(self, images: list, trimaps:list = None, **kwargs) -> dict:
+ """
+ Run as a service.
+ """
+ images_decode = [P.base64_to_cv2(image) for image in images]
+ if trimaps is not None:
+ trimap_decoder = [cv2.cvtColor(P.base64_to_cv2(trimap), cv2.COLOR_BGR2GRAY) for trimap in trimaps]
+ else:
+ trimap_decoder = None
+
+ outputs = self.predict(image_list=images_decode, trimap_list= trimap_decoder, **kwargs)
+ serving_data = [P.cv2_to_base64(outputs[i]) for i in range(len(outputs))]
+ results = {'data': serving_data}
+
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ args = self.parser.parse_args(argvs)
+ if args.trimap_path is not None:
+ trimap_list = [args.trimap_path]
+ else:
+ trimap_list = None
+
+ results = self.predict(image_list=[args.input_path], trimap_list=trimap_list, save_path=args.output_dir, visualization=args.visualization)
+
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default="modnet_hrnet18_matting_output", help="The directory to save output images.")
+ self.arg_config_group.add_argument(
+ '--visualization', type=bool, default=True, help="whether to save output as images.")
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
+ self.arg_input_group.add_argument('--trimap_path', type=str, default=None, help="path to image.")
+
+
+
+class MODNetHead(nn.Layer):
+ """
+ Segmentation head.
+ """
+ def __init__(self, hr_channels: int, backbone_channels: int):
+ super().__init__()
+
+ self.lr_branch = LRBranch(backbone_channels)
+ self.hr_branch = HRBranch(hr_channels, backbone_channels)
+ self.f_branch = FusionBranch(hr_channels, backbone_channels)
+
+ def forward(self, inputs: paddle.Tensor, feat_list: list):
+ pred_semantic, lr8x, [enc2x, enc4x] = self.lr_branch(feat_list)
+ pred_detail, hr2x = self.hr_branch(inputs['img'], enc2x, enc4x, lr8x)
+ pred_matte = self.f_branch(inputs['img'], lr8x, hr2x)
+
+ if self.training:
+ logit_dict = {
+ 'semantic': pred_semantic,
+ 'detail': pred_detail,
+ 'matte': pred_matte
+ }
+ return logit_dict
+ else:
+ return pred_matte
+
+
+
+class FusionBranch(nn.Layer):
+ def __init__(self, hr_channels: int, enc_channels: int):
+ super().__init__()
+ self.conv_lr4x = Conv2dIBNormRelu(
+ enc_channels[2], hr_channels, 5, stride=1, padding=2)
+
+ self.conv_f2x = Conv2dIBNormRelu(
+ 2 * hr_channels, hr_channels, 3, stride=1, padding=1)
+ self.conv_f = nn.Sequential(
+ Conv2dIBNormRelu(
+ hr_channels + 3, int(hr_channels / 2), 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ int(hr_channels / 2),
+ 1,
+ 1,
+ stride=1,
+ padding=0,
+ with_ibn=False,
+ with_relu=False))
+
+ def forward(self, img: paddle.Tensor, lr8x: paddle.Tensor, hr2x: paddle.Tensor):
+ lr4x = F.interpolate(
+ lr8x, scale_factor=2, mode='bilinear', align_corners=False)
+ lr4x = self.conv_lr4x(lr4x)
+ lr2x = F.interpolate(
+ lr4x, scale_factor=2, mode='bilinear', align_corners=False)
+
+ f2x = self.conv_f2x(paddle.concat((lr2x, hr2x), axis=1))
+ f = F.interpolate(
+ f2x, scale_factor=2, mode='bilinear', align_corners=False)
+ f = self.conv_f(paddle.concat((f, img), axis=1))
+ pred_matte = F.sigmoid(f)
+
+ return pred_matte
+
+
+class HRBranch(nn.Layer):
+ """
+ High Resolution Branch of MODNet
+ """
+
+ def __init__(self, hr_channels: int, enc_channels:int):
+ super().__init__()
+
+ self.tohr_enc2x = Conv2dIBNormRelu(
+ enc_channels[0], hr_channels, 1, stride=1, padding=0)
+ self.conv_enc2x = Conv2dIBNormRelu(
+ hr_channels + 3, hr_channels, 3, stride=2, padding=1)
+
+ self.tohr_enc4x = Conv2dIBNormRelu(
+ enc_channels[1], hr_channels, 1, stride=1, padding=0)
+ self.conv_enc4x = Conv2dIBNormRelu(
+ 2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1)
+
+ self.conv_hr4x = nn.Sequential(
+ Conv2dIBNormRelu(
+ 2 * hr_channels + enc_channels[2] + 3,
+ 2 * hr_channels,
+ 3,
+ stride=1,
+ padding=1),
+ Conv2dIBNormRelu(
+ 2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ 2 * hr_channels, hr_channels, 3, stride=1, padding=1))
+
+ self.conv_hr2x = nn.Sequential(
+ Conv2dIBNormRelu(
+ 2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ 2 * hr_channels, hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1))
+
+ self.conv_hr = nn.Sequential(
+ Conv2dIBNormRelu(
+ hr_channels + 3, hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ hr_channels,
+ 1,
+ 1,
+ stride=1,
+ padding=0,
+ with_ibn=False,
+ with_relu=False))
+
+ def forward(self, img: paddle.Tensor, enc2x: paddle.Tensor, enc4x: paddle.Tensor, lr8x: paddle.Tensor):
+ img2x = F.interpolate(
+ img, scale_factor=1 / 2, mode='bilinear', align_corners=False)
+ img4x = F.interpolate(
+ img, scale_factor=1 / 4, mode='bilinear', align_corners=False)
+
+ enc2x = self.tohr_enc2x(enc2x)
+ hr4x = self.conv_enc2x(paddle.concat((img2x, enc2x), axis=1))
+
+ enc4x = self.tohr_enc4x(enc4x)
+ hr4x = self.conv_enc4x(paddle.concat((hr4x, enc4x), axis=1))
+
+ lr4x = F.interpolate(
+ lr8x, scale_factor=2, mode='bilinear', align_corners=False)
+ hr4x = self.conv_hr4x(paddle.concat((hr4x, lr4x, img4x), axis=1))
+
+ hr2x = F.interpolate(
+ hr4x, scale_factor=2, mode='bilinear', align_corners=False)
+ hr2x = self.conv_hr2x(paddle.concat((hr2x, enc2x), axis=1))
+
+ pred_detail = None
+ if self.training:
+ hr = F.interpolate(
+ hr2x, scale_factor=2, mode='bilinear', align_corners=False)
+ hr = self.conv_hr(paddle.concat((hr, img), axis=1))
+ pred_detail = F.sigmoid(hr)
+
+ return pred_detail, hr2x
+
+
+class LRBranch(nn.Layer):
+ """
+ Low Resolution Branch of MODNet
+ """
+ def __init__(self, backbone_channels: int):
+ super().__init__()
+ self.se_block = SEBlock(backbone_channels[4], reduction=4)
+ self.conv_lr16x = Conv2dIBNormRelu(
+ backbone_channels[4], backbone_channels[3], 5, stride=1, padding=2)
+ self.conv_lr8x = Conv2dIBNormRelu(
+ backbone_channels[3], backbone_channels[2], 5, stride=1, padding=2)
+ self.conv_lr = Conv2dIBNormRelu(
+ backbone_channels[2],
+ 1,
+ 3,
+ stride=2,
+ padding=1,
+ with_ibn=False,
+ with_relu=False)
+
+ def forward(self, feat_list: list):
+ enc2x, enc4x, enc32x = feat_list[0], feat_list[1], feat_list[4]
+
+ enc32x = self.se_block(enc32x)
+ lr16x = F.interpolate(
+ enc32x, scale_factor=2, mode='bilinear', align_corners=False)
+ lr16x = self.conv_lr16x(lr16x)
+ lr8x = F.interpolate(
+ lr16x, scale_factor=2, mode='bilinear', align_corners=False)
+ lr8x = self.conv_lr8x(lr8x)
+
+ pred_semantic = None
+ if self.training:
+ lr = self.conv_lr(lr8x)
+ pred_semantic = F.sigmoid(lr)
+
+ return pred_semantic, lr8x, [enc2x, enc4x]
+
+
+class IBNorm(nn.Layer):
+ """
+ Combine Instance Norm and Batch Norm into One Layer
+ """
+
+ def __init__(self, in_channels: int):
+ super().__init__()
+ self.bnorm_channels = in_channels // 2
+ self.inorm_channels = in_channels - self.bnorm_channels
+
+ self.bnorm = nn.BatchNorm2D(self.bnorm_channels)
+ self.inorm = nn.InstanceNorm2D(self.inorm_channels)
+
+ def forward(self, x):
+ bn_x = self.bnorm(x[:, :self.bnorm_channels, :, :])
+ in_x = self.inorm(x[:, self.bnorm_channels:, :, :])
+
+ return paddle.concat((bn_x, in_x), 1)
+
+
+class Conv2dIBNormRelu(nn.Layer):
+ """
+ Convolution + IBNorm + Relu
+ """
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ padding: int = 0,
+ dilation:int = 1,
+ groups: int = 1,
+ bias_attr: paddle.ParamAttr = None,
+ with_ibn: bool = True,
+ with_relu: bool = True):
+
+ super().__init__()
+
+ layers = [
+ nn.Conv2D(
+ in_channels,
+ out_channels,
+ kernel_size,
+ stride=stride,
+ padding=padding,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=bias_attr)
+ ]
+
+ if with_ibn:
+ layers.append(IBNorm(out_channels))
+
+ if with_relu:
+ layers.append(nn.ReLU())
+
+ self.layers = nn.Sequential(*layers)
+
+ def forward(self, x: paddle.Tensor):
+ return self.layers(x)
+
+
+class SEBlock(nn.Layer):
+ """
+ SE Block Proposed in https://arxiv.org/pdf/1709.01507.pdf
+ """
+
+ def __init__(self, num_channels: int, reduction:int = 1):
+ super().__init__()
+ self.pool = nn.AdaptiveAvgPool2D(1)
+ self.conv = nn.Sequential(
+ nn.Conv2D(
+ num_channels,
+ int(num_channels // reduction),
+ 1,
+ bias_attr=False), nn.ReLU(),
+ nn.Conv2D(
+ int(num_channels // reduction),
+ num_channels,
+ 1,
+ bias_attr=False), nn.Sigmoid())
+
+ def forward(self, x: paddle.Tensor):
+ w = self.pool(x)
+ w = self.conv(w)
+ return w * x
+
+
+class GaussianBlurLayer(nn.Layer):
+ """ Add Gaussian Blur to a 4D tensors
+ This layer takes a 4D tensor of {N, C, H, W} as input.
+ The Gaussian blur will be performed in given channel number (C) splitly.
+ """
+
+ def __init__(self, channels: int, kernel_size: int):
+ """
+ Args:
+ channels (int): Channel for input tensor
+ kernel_size (int): Size of the kernel used in blurring
+ """
+
+ super(GaussianBlurLayer, self).__init__()
+ self.channels = channels
+ self.kernel_size = kernel_size
+ assert self.kernel_size % 2 != 0
+
+ self.op = nn.Sequential(
+ nn.Pad2D(int(self.kernel_size / 2), mode='reflect'),
+ nn.Conv2D(
+ channels,
+ channels,
+ self.kernel_size,
+ stride=1,
+ padding=0,
+ bias_attr=False,
+ groups=channels))
+
+ self._init_kernel()
+ self.op[1].weight.stop_gradient = True
+
+ def forward(self, x: paddle.Tensor):
+ """
+ Args:
+ x (paddle.Tensor): input 4D tensor
+ Returns:
+ paddle.Tensor: Blurred version of the input
+ """
+
+ if not len(list(x.shape)) == 4:
+ print('\'GaussianBlurLayer\' requires a 4D tensor as input\n')
+ exit()
+ elif not x.shape[1] == self.channels:
+ print('In \'GaussianBlurLayer\', the required channel ({0}) is'
+ 'not the same as input ({1})\n'.format(
+ self.channels, x.shape[1]))
+ exit()
+
+ return self.op(x)
+
+ def _init_kernel(self):
+ sigma = 0.3 * ((self.kernel_size - 1) * 0.5 - 1) + 0.8
+
+ n = np.zeros((self.kernel_size, self.kernel_size))
+ i = int(self.kernel_size / 2)
+ n[i, i] = 1
+ kernel = scipy.ndimage.gaussian_filter(n, sigma)
+ kernel = kernel.astype('float32')
+ kernel = kernel[np.newaxis, np.newaxis, :, :]
+ paddle.assign(kernel, self.op[1].weight)
\ No newline at end of file
diff --git a/modules/image/matting/modnet_hrnet18_matting/processor.py b/modules/image/matting/modnet_hrnet18_matting/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..361c955390589469625aa985f6b75d5c95ed2e33
--- /dev/null
+++ b/modules/image/matting/modnet_hrnet18_matting/processor.py
@@ -0,0 +1,208 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import random
+import base64
+from typing import Callable, Union, List, Tuple
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+from paddleseg.transforms import functional
+from PIL import Image
+
+
+class Compose:
+ """
+ Do transformation on input data with corresponding pre-processing and augmentation operations.
+ The shape of input data to all operations is [height, width, channels].
+ """
+
+ def __init__(self, transforms: Callable, to_rgb: bool = True):
+ if not isinstance(transforms, list):
+ raise TypeError('The transforms must be a list!')
+ self.transforms = transforms
+ self.to_rgb = to_rgb
+
+ def __call__(self, data: dict) -> dict:
+
+ if 'trans_info' not in data:
+ data['trans_info'] = []
+ for op in self.transforms:
+ data = op(data)
+ if data is None:
+ return None
+
+ data['img'] = np.transpose(data['img'], (2, 0, 1))
+ for key in data.get('gt_fields', []):
+ if len(data[key].shape) == 2:
+ continue
+ data[key] = np.transpose(data[key], (2, 0, 1))
+
+ return data
+
+
+class LoadImages:
+ """
+ Read images from image path.
+
+ Args:
+ to_rgb (bool, optional): If converting image to RGB color space. Default: True.
+ """
+ def __init__(self, to_rgb: bool = True):
+ self.to_rgb = to_rgb
+
+ def __call__(self, data: dict) -> dict:
+
+ if isinstance(data['img'], str):
+ data['img'] = cv2.imread(data['img'])
+
+ for key in data.get('gt_fields', []):
+ if isinstance(data[key], str):
+ data[key] = cv2.imread(data[key], cv2.IMREAD_UNCHANGED)
+ # if alpha and trimap has 3 channels, extract one.
+ if key in ['alpha', 'trimap']:
+ if len(data[key].shape) > 2:
+ data[key] = data[key][:, :, 0]
+
+ if self.to_rgb:
+ data['img'] = cv2.cvtColor(data['img'], cv2.COLOR_BGR2RGB)
+ for key in data.get('gt_fields', []):
+ if len(data[key].shape) == 2:
+ continue
+ data[key] = cv2.cvtColor(data[key], cv2.COLOR_BGR2RGB)
+
+ return data
+
+
+class ResizeByShort:
+ """
+ Resize the short side of an image to given size, and then scale the other side proportionally.
+
+ Args:
+ short_size (int): The target size of short side.
+ """
+
+ def __init__(self, short_size: int =512):
+ self.short_size = short_size
+
+ def __call__(self, data: dict) -> dict:
+
+ data['trans_info'].append(('resize', data['img'].shape[0:2]))
+ data['img'] = functional.resize_short(data['img'], self.short_size)
+ for key in data.get('gt_fields', []):
+ data[key] = functional.resize_short(data[key], self.short_size)
+ return data
+
+
+class ResizeToIntMult:
+ """
+ Resize to some int muitple, d.g. 32.
+ """
+
+ def __init__(self, mult_int: int = 32):
+ self.mult_int = mult_int
+
+ def __call__(self, data: dict) -> dict:
+ data['trans_info'].append(('resize', data['img'].shape[0:2]))
+
+ h, w = data['img'].shape[0:2]
+ rw = w - w % 32
+ rh = h - h % 32
+ data['img'] = functional.resize(data['img'], (rw, rh))
+ for key in data.get('gt_fields', []):
+ data[key] = functional.resize(data[key], (rw, rh))
+
+ return data
+
+
+class Normalize:
+ """
+ Normalize an image.
+
+ Args:
+ mean (list, optional): The mean value of a data set. Default: [0.5, 0.5, 0.5].
+ std (list, optional): The standard deviation of a data set. Default: [0.5, 0.5, 0.5].
+
+ Raises:
+ ValueError: When mean/std is not list or any value in std is 0.
+ """
+
+ def __init__(self, mean: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5), std: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5)):
+ self.mean = mean
+ self.std = std
+ if not (isinstance(self.mean, (list, tuple))
+ and isinstance(self.std, (list, tuple))):
+ raise ValueError(
+ "{}: input type is invalid. It should be list or tuple".format(
+ self))
+ from functools import reduce
+ if reduce(lambda x, y: x * y, self.std) == 0:
+ raise ValueError('{}: std is invalid!'.format(self))
+
+ def __call__(self, data: dict) -> dict:
+ mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
+ std = np.array(self.std)[np.newaxis, np.newaxis, :]
+ data['img'] = functional.normalize(data['img'], mean, std)
+ if 'fg' in data.get('gt_fields', []):
+ data['fg'] = functional.normalize(data['fg'], mean, std)
+ if 'bg' in data.get('gt_fields', []):
+ data['bg'] = functional.normalize(data['bg'], mean, std)
+
+ return data
+
+
+def reverse_transform(alpha: paddle.Tensor, trans_info: List[str]):
+ """recover pred to origin shape"""
+ for item in trans_info[::-1]:
+ if item[0] == 'resize':
+ h, w = item[1][0], item[1][1]
+ alpha = F.interpolate(alpha, [h, w], mode='bilinear')
+ elif item[0] == 'padding':
+ h, w = item[1][0], item[1][1]
+ alpha = alpha[:, :, 0:h, 0:w]
+ else:
+ raise Exception("Unexpected info '{}' in im_info".format(item[0]))
+ return alpha
+
+def save_alpha_pred(alpha: np.ndarray, trimap: Union[np.ndarray, str] = None):
+ """
+ The value of alpha is range [0, 1], shape should be [h,w]
+ """
+ if isinstance(trimap, str):
+ trimap = cv2.imread(trimap, 0)
+
+ alpha[trimap == 0] = 0
+ alpha[trimap == 255] = 255
+ alpha = (alpha).astype('uint8')
+ return alpha
+
+
+def cv2_to_base64(image: np.ndarray):
+ """
+ Convert data from BGR to base64 format.
+ """
+ data = cv2.imencode('.png', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+def base64_to_cv2(b64str: str):
+ """
+ Convert data from base64 to BGR format.
+ """
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
\ No newline at end of file
diff --git a/modules/image/matting/modnet_mobilenetv2_matting/README.md b/modules/image/matting/modnet_mobilenetv2_matting/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..51b8691624e36da0648a1c5fc4f5c670b81a4cde
--- /dev/null
+++ b/modules/image/matting/modnet_mobilenetv2_matting/README.md
@@ -0,0 +1,155 @@
+# modnet_mobilenetv2_matting
+
+|模型名称|modnet_mobilenetv2_matting|
+| :--- | :---: |
+|类别|图像-抠图|
+|网络|modnet_mobilenetv2|
+|数据集|百度自建数据集|
+|是否支持Fine-tuning|否|
+|模型大小|38MB|
+|指标|SAD112.73|
+|最新更新日期|2021-12-03|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+
+ - 样例结果示例(左为原图,右为效果图):
+
+
+
+
+
+- ### 模型介绍
+
+ - Matting(精细化分割/影像去背/抠图)是指借由计算前景的颜色和透明度,将前景从影像中撷取出来的技术,可用于替换背景、影像合成、视觉特效,在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值,称作阿法值(Alpha),一张影像中所有阿法值的集合称作阿法遮罩(Alpha Matte),将影像被遮罩所涵盖的部分取出即可完成前景的分离。modnet_mobilenetv2_matting可生成抠图结果。
+
+
+
+ - 更多详情请参考:[modnet_mobilenetv2_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install modnet_mobilenetv2_matting
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run modnet_mobilenetv2_matting --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="modnet_mobilenetv2_matting")
+
+ result = model.predict(["/PATH/TO/IMAGE"])
+ print(result)
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ trimap_list,
+ visualization,
+ save_path):
+ ```
+
+ - 人像matting预测API,用于将输入图片中的人像分割出来。
+
+ - 参数
+
+ - image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
+ - trimap_list(list(str | numpy.ndarray)):trimap输入路径或者灰度图单通道格式图片。默认为None。
+ - visualization (bool): 是否进行可视化,默认为False。
+ - save_path (str): 当visualization为True时,保存图片的路径,默认为"modnet_mobilenetv2_matting_output"。
+
+ - 返回
+
+ - result (list(numpy.ndarray)):模型分割结果:
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署人像matting在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m modnet_mobilenetv2_matting
+ ```
+
+ - 这样就完成了一个人像matting在线服务API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/modnet_mobilenetv2_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/matting/modnet_mobilenetv2_matting/README_en.md b/modules/image/matting/modnet_mobilenetv2_matting/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..a85aa07e9200e7d80756c0c67958a7f42215cf85
--- /dev/null
+++ b/modules/image/matting/modnet_mobilenetv2_matting/README_en.md
@@ -0,0 +1,156 @@
+# modnet_mobilenetv2_matting
+
+|Module Name|modnet_mobilenetv2_matting|
+| :--- | :---: |
+|Category|Image Matting|
+|Network|modnet_mobilenetv2|
+|Dataset|Baidu self-built dataset|
+|Support Fine-tuning|No|
+|Module Size|38MB|
+|Data Indicators|SAD112.73|
+|Latest update date|2021-12-03|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+- ### Module Introduction
+
+ - Mating is the technique of extracting foreground from an image by calculating its color and transparency. It is widely used in the film industry to replace background, image composition, and visual effects. Each pixel in the image will have a value that represents its foreground transparency, called Alpha. The set of all Alpha values in an image is called Alpha Matte. The part of the image covered by the mask can be extracted to complete foreground separation.
+
+
+
+ - For more information, please refer to: [modnet_mobilenetv2_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install modnet_mobilenetv2_matting
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run modnet_mobilenetv2_matting --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="modnet_mobilenetv2_matting")
+
+ result = model.predict(image_list=["/PATH/TO/IMAGE"])
+ print(result)
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ trimap_list,
+ visualization,
+ save_path):
+ ```
+
+ - Prediction API for matting.
+
+ - **Parameter**
+
+ - image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\],BGR.
+ - trimap_list(list(str | numpy.ndarray)): Trimap path or trimap data, ndarray.shape is in the format \[H, W],gray. Default is None.
+ - visualization (bool): Whether to save the recognition results as picture files, default is False.
+ - save_path (str): Save path of images, "modnet_mobilenetv2_matting_output" by default.
+
+ - **Return**
+
+ - result (list(numpy.ndarray)):The list of model results.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of matting.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m modnet_mobilenetv2_matting
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/modnet_mobilenetv2_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/matting/modnet_mobilenetv2_matting/mobilenetv2.py b/modules/image/matting/modnet_mobilenetv2_matting/mobilenetv2.py
new file mode 100644
index 0000000000000000000000000000000000000000..8895104a34073143ae17c1021519650dad022aeb
--- /dev/null
+++ b/modules/image/matting/modnet_mobilenetv2_matting/mobilenetv2.py
@@ -0,0 +1,224 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+
+import numpy as np
+import paddle
+from paddle import ParamAttr
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn import Conv2D, BatchNorm, Linear, Dropout
+from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D
+
+from paddleseg import utils
+from paddleseg.cvlibs import manager
+
+
+__all__ = ["MobileNetV2"]
+
+
+class ConvBNLayer(nn.Layer):
+ """Basic conv bn relu layer."""
+ def __init__(self,
+ num_channels: int,
+ filter_size: int,
+ num_filters: int,
+ stride: int,
+ padding: int,
+ num_groups: int=1,
+ name: str = None,
+ use_cudnn: bool = True):
+ super(ConvBNLayer, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels=num_channels,
+ out_channels=num_filters,
+ kernel_size=filter_size,
+ stride=stride,
+ padding=padding,
+ groups=num_groups,
+ weight_attr=ParamAttr(name=name + "_weights"),
+ bias_attr=False)
+
+ self._batch_norm = BatchNorm(
+ num_filters,
+ param_attr=ParamAttr(name=name + "_bn_scale"),
+ bias_attr=ParamAttr(name=name + "_bn_offset"),
+ moving_mean_name=name + "_bn_mean",
+ moving_variance_name=name + "_bn_variance")
+
+ def forward(self, inputs: paddle.Tensor, if_act: bool = True) -> paddle.Tensor:
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ if if_act:
+ y = F.relu6(y)
+ return y
+
+
+class InvertedResidualUnit(nn.Layer):
+ """Inverted residual block"""
+ def __init__(self, num_channels: int, num_in_filter: int, num_filters: int, stride: int,
+ filter_size: int, padding: int, expansion_factor: int, name: str):
+ super(InvertedResidualUnit, self).__init__()
+ num_expfilter = int(round(num_in_filter * expansion_factor))
+ self._expand_conv = ConvBNLayer(
+ num_channels=num_channels,
+ num_filters=num_expfilter,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ num_groups=1,
+ name=name + "_expand")
+
+ self._bottleneck_conv = ConvBNLayer(
+ num_channels=num_expfilter,
+ num_filters=num_expfilter,
+ filter_size=filter_size,
+ stride=stride,
+ padding=padding,
+ num_groups=num_expfilter,
+ use_cudnn=False,
+ name=name + "_dwise")
+
+ self._linear_conv = ConvBNLayer(
+ num_channels=num_expfilter,
+ num_filters=num_filters,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ num_groups=1,
+ name=name + "_linear")
+
+ def forward(self, inputs: paddle.Tensor, ifshortcut: bool) -> paddle.Tensor:
+ y = self._expand_conv(inputs, if_act=True)
+ y = self._bottleneck_conv(y, if_act=True)
+ y = self._linear_conv(y, if_act=False)
+ if ifshortcut:
+ y = paddle.add(inputs, y)
+ return y
+
+
+class InvresiBlocks(nn.Layer):
+ def __init__(self, in_c: int, t: int, c: int, n: int, s: int, name: str):
+ super(InvresiBlocks, self).__init__()
+
+ self._first_block = InvertedResidualUnit(
+ num_channels=in_c,
+ num_in_filter=in_c,
+ num_filters=c,
+ stride=s,
+ filter_size=3,
+ padding=1,
+ expansion_factor=t,
+ name=name + "_1")
+
+ self._block_list = []
+ for i in range(1, n):
+ block = self.add_sublayer(
+ name + "_" + str(i + 1),
+ sublayer=InvertedResidualUnit(
+ num_channels=c,
+ num_in_filter=c,
+ num_filters=c,
+ stride=1,
+ filter_size=3,
+ padding=1,
+ expansion_factor=t,
+ name=name + "_" + str(i + 1)))
+ self._block_list.append(block)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self._first_block(inputs, ifshortcut=False)
+ for block in self._block_list:
+ y = block(y, ifshortcut=True)
+ return y
+
+
+class MobileNet(nn.Layer):
+ """Networj of MobileNet"""
+ def __init__(self,
+ input_channels: int = 3,
+ scale: float = 1.0,
+ pretrained: str = None,
+ prefix_name: str = ""):
+ super(MobileNet, self).__init__()
+ self.scale = scale
+
+ bottleneck_params_list = [
+ (1, 16, 1, 1),
+ (6, 24, 2, 2),
+ (6, 32, 3, 2),
+ (6, 64, 4, 2),
+ (6, 96, 3, 1),
+ (6, 160, 3, 2),
+ (6, 320, 1, 1),
+ ]
+
+ self.conv1 = ConvBNLayer(
+ num_channels=input_channels,
+ num_filters=int(32 * scale),
+ filter_size=3,
+ stride=2,
+ padding=1,
+ name=prefix_name + "conv1_1")
+
+ self.block_list = []
+ i = 1
+ in_c = int(32 * scale)
+ for layer_setting in bottleneck_params_list:
+ t, c, n, s = layer_setting
+ i += 1
+ block = self.add_sublayer(
+ prefix_name + "conv" + str(i),
+ sublayer=InvresiBlocks(
+ in_c=in_c,
+ t=t,
+ c=int(c * scale),
+ n=n,
+ s=s,
+ name=prefix_name + "conv" + str(i)))
+ self.block_list.append(block)
+ in_c = int(c * scale)
+
+ self.out_c = int(1280 * scale) if scale > 1.0 else 1280
+ self.conv9 = ConvBNLayer(
+ num_channels=in_c,
+ num_filters=self.out_c,
+ filter_size=1,
+ stride=1,
+ padding=0,
+ name=prefix_name + "conv9")
+
+ self.feat_channels = [int(i * scale) for i in [16, 24, 32, 96, 1280]]
+ self.pretrained = pretrained
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ feat_list = []
+ y = self.conv1(inputs, if_act=True)
+
+ block_index = 0
+ for block in self.block_list:
+ y = block(y)
+ if block_index in [0, 1, 2, 4]:
+ feat_list.append(y)
+ block_index += 1
+ y = self.conv9(y, if_act=True)
+ feat_list.append(y)
+ return feat_list
+
+
+def MobileNetV2(**kwargs):
+ model = MobileNet(scale=1.0, **kwargs)
+ return model
diff --git a/modules/image/matting/modnet_mobilenetv2_matting/module.py b/modules/image/matting/modnet_mobilenetv2_matting/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..e6a0e6cbeb4c7c60f069e2642c4593fc6a4cde93
--- /dev/null
+++ b/modules/image/matting/modnet_mobilenetv2_matting/module.py
@@ -0,0 +1,514 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import time
+import argparse
+from typing import Callable, Union, List, Tuple
+
+import numpy as np
+import cv2
+import scipy
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.module import moduleinfo, runnable, serving
+
+from modnet_mobilenetv2_matting.mobilenetv2 import MobileNetV2
+import modnet_mobilenetv2_matting.processor as P
+
+
+@moduleinfo(
+ name="modnet_mobilenetv2_matting",
+ type="CV",
+ author="paddlepaddle",
+ summary="modnet_mobilenetv2_matting is a matting model",
+ version="1.0.0"
+)
+class MODNetMobilenetV2(nn.Layer):
+ """
+ The MODNet implementation based on PaddlePaddle.
+
+ The original article refers to
+ Zhanghan Ke, et, al. "Is a Green Screen Really Necessary for Real-Time Portrait Matting?"
+ (https://arxiv.org/pdf/2011.11961.pdf).
+
+ Args:
+ hr_channels(int, optional): The channels of high resolutions branch. Defautl: None.
+ pretrained(str, optional): The path of pretrianed model. Defautl: None.
+
+ """
+
+ def __init__(self, hr_channels:int = 32, pretrained=None):
+ super(MODNetMobilenetV2, self).__init__()
+
+ self.backbone = MobileNetV2()
+ self.pretrained = pretrained
+
+ self.head = MODNetHead(
+ hr_channels=hr_channels, backbone_channels=self.backbone.feat_channels)
+ self.blurer = GaussianBlurLayer(1, 3)
+ self.transforms = P.Compose([P.LoadImages(), P.ResizeByShort(), P.ResizeToIntMult(), P.Normalize()])
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'modnet-mobilenetv2.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def preprocess(self, img: Union[str, np.ndarray] , transforms: Callable, trimap: Union[str, np.ndarray] = None):
+ data = {}
+ data['img'] = img
+ if trimap is not None:
+ data['trimap'] = trimap
+ data['gt_fields'] = ['trimap']
+ data['trans_info'] = []
+ data = self.transforms(data)
+ data['img'] = paddle.to_tensor(data['img'])
+ data['img'] = data['img'].unsqueeze(0)
+ if trimap is not None:
+ data['trimap'] = paddle.to_tensor(data['trimap'])
+ data['trimap'] = data['trimap'].unsqueeze((0, 1))
+
+ return data
+
+ def forward(self, inputs: dict):
+ x = inputs['img']
+ feat_list = self.backbone(x)
+ y = self.head(inputs=inputs, feat_list=feat_list)
+ return y
+
+ def predict(self, image_list: list, trimap_list: list = None, visualization: bool =False, save_path: str = "modnet_mobilenetv2_matting_output"):
+ self.eval()
+ result = []
+ with paddle.no_grad():
+ for i, im_path in enumerate(image_list):
+ trimap = trimap_list[i] if trimap_list is not None else None
+ data = self.preprocess(img=im_path, transforms=self.transforms, trimap=trimap)
+ alpha_pred = self.forward(data)
+ alpha_pred = P.reverse_transform(alpha_pred, data['trans_info'])
+ alpha_pred = (alpha_pred.numpy()).squeeze()
+ alpha_pred = (alpha_pred * 255).astype('uint8')
+ alpha_pred = P.save_alpha_pred(alpha_pred, trimap)
+ result.append(alpha_pred)
+ if visualization:
+ if not os.path.exists(save_path):
+ os.makedirs(save_path)
+ img_name = str(time.time()) + '.png'
+ image_save_path = os.path.join(save_path, img_name)
+ cv2.imwrite(image_save_path, alpha_pred)
+
+ return result
+
+ @serving
+ def serving_method(self, images: list, trimaps:list = None, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [P.base64_to_cv2(image) for image in images]
+ if trimaps is not None:
+ trimap_decoder = [cv2.cvtColor(P.base64_to_cv2(trimap), cv2.COLOR_BGR2GRAY) for trimap in trimaps]
+ else:
+ trimap_decoder = None
+
+ outputs = self.predict(image_list=images_decode, trimap_list= trimap_decoder, **kwargs)
+ serving_data = [P.cv2_to_base64(outputs[i]) for i in range(len(outputs))]
+ results = {'data': serving_data}
+
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ args = self.parser.parse_args(argvs)
+ if args.trimap_path is not None:
+ trimap_list = [args.trimap_path]
+ else:
+ trimap_list = None
+
+ results = self.predict(image_list=[args.input_path], trimap_list=trimap_list, save_path=args.output_dir, visualization=args.visualization)
+
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default="modnet_mobilenetv2_matting_output", help="The directory to save output images.")
+ self.arg_config_group.add_argument(
+ '--visualization', type=bool, default=True, help="whether to save output as images.")
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
+ self.arg_input_group.add_argument('--trimap_path', type=str, default=None, help="path to image.")
+
+
+
+class MODNetHead(nn.Layer):
+ """
+ Segmentation head.
+ """
+ def __init__(self, hr_channels: int, backbone_channels: int):
+ super().__init__()
+
+ self.lr_branch = LRBranch(backbone_channels)
+ self.hr_branch = HRBranch(hr_channels, backbone_channels)
+ self.f_branch = FusionBranch(hr_channels, backbone_channels)
+
+ def forward(self, inputs: paddle.Tensor, feat_list: list):
+ pred_semantic, lr8x, [enc2x, enc4x] = self.lr_branch(feat_list)
+ pred_detail, hr2x = self.hr_branch(inputs['img'], enc2x, enc4x, lr8x)
+ pred_matte = self.f_branch(inputs['img'], lr8x, hr2x)
+
+ if self.training:
+ logit_dict = {
+ 'semantic': pred_semantic,
+ 'detail': pred_detail,
+ 'matte': pred_matte
+ }
+ return logit_dict
+ else:
+ return pred_matte
+
+
+
+class FusionBranch(nn.Layer):
+ def __init__(self, hr_channels: int, enc_channels: int):
+ super().__init__()
+ self.conv_lr4x = Conv2dIBNormRelu(
+ enc_channels[2], hr_channels, 5, stride=1, padding=2)
+
+ self.conv_f2x = Conv2dIBNormRelu(
+ 2 * hr_channels, hr_channels, 3, stride=1, padding=1)
+ self.conv_f = nn.Sequential(
+ Conv2dIBNormRelu(
+ hr_channels + 3, int(hr_channels / 2), 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ int(hr_channels / 2),
+ 1,
+ 1,
+ stride=1,
+ padding=0,
+ with_ibn=False,
+ with_relu=False))
+
+ def forward(self, img: paddle.Tensor, lr8x: paddle.Tensor, hr2x: paddle.Tensor):
+ lr4x = F.interpolate(
+ lr8x, scale_factor=2, mode='bilinear', align_corners=False)
+ lr4x = self.conv_lr4x(lr4x)
+ lr2x = F.interpolate(
+ lr4x, scale_factor=2, mode='bilinear', align_corners=False)
+
+ f2x = self.conv_f2x(paddle.concat((lr2x, hr2x), axis=1))
+ f = F.interpolate(
+ f2x, scale_factor=2, mode='bilinear', align_corners=False)
+ f = self.conv_f(paddle.concat((f, img), axis=1))
+ pred_matte = F.sigmoid(f)
+
+ return pred_matte
+
+
+class HRBranch(nn.Layer):
+ """
+ High Resolution Branch of MODNet
+ """
+
+ def __init__(self, hr_channels: int, enc_channels:int):
+ super().__init__()
+
+ self.tohr_enc2x = Conv2dIBNormRelu(
+ enc_channels[0], hr_channels, 1, stride=1, padding=0)
+ self.conv_enc2x = Conv2dIBNormRelu(
+ hr_channels + 3, hr_channels, 3, stride=2, padding=1)
+
+ self.tohr_enc4x = Conv2dIBNormRelu(
+ enc_channels[1], hr_channels, 1, stride=1, padding=0)
+ self.conv_enc4x = Conv2dIBNormRelu(
+ 2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1)
+
+ self.conv_hr4x = nn.Sequential(
+ Conv2dIBNormRelu(
+ 2 * hr_channels + enc_channels[2] + 3,
+ 2 * hr_channels,
+ 3,
+ stride=1,
+ padding=1),
+ Conv2dIBNormRelu(
+ 2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ 2 * hr_channels, hr_channels, 3, stride=1, padding=1))
+
+ self.conv_hr2x = nn.Sequential(
+ Conv2dIBNormRelu(
+ 2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ 2 * hr_channels, hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1))
+
+ self.conv_hr = nn.Sequential(
+ Conv2dIBNormRelu(
+ hr_channels + 3, hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ hr_channels,
+ 1,
+ 1,
+ stride=1,
+ padding=0,
+ with_ibn=False,
+ with_relu=False))
+
+ def forward(self, img: paddle.Tensor, enc2x: paddle.Tensor, enc4x: paddle.Tensor, lr8x: paddle.Tensor):
+ img2x = F.interpolate(
+ img, scale_factor=1 / 2, mode='bilinear', align_corners=False)
+ img4x = F.interpolate(
+ img, scale_factor=1 / 4, mode='bilinear', align_corners=False)
+
+ enc2x = self.tohr_enc2x(enc2x)
+ hr4x = self.conv_enc2x(paddle.concat((img2x, enc2x), axis=1))
+
+ enc4x = self.tohr_enc4x(enc4x)
+ hr4x = self.conv_enc4x(paddle.concat((hr4x, enc4x), axis=1))
+
+ lr4x = F.interpolate(
+ lr8x, scale_factor=2, mode='bilinear', align_corners=False)
+ hr4x = self.conv_hr4x(paddle.concat((hr4x, lr4x, img4x), axis=1))
+
+ hr2x = F.interpolate(
+ hr4x, scale_factor=2, mode='bilinear', align_corners=False)
+ hr2x = self.conv_hr2x(paddle.concat((hr2x, enc2x), axis=1))
+
+ pred_detail = None
+ if self.training:
+ hr = F.interpolate(
+ hr2x, scale_factor=2, mode='bilinear', align_corners=False)
+ hr = self.conv_hr(paddle.concat((hr, img), axis=1))
+ pred_detail = F.sigmoid(hr)
+
+ return pred_detail, hr2x
+
+
+class LRBranch(nn.Layer):
+ """
+ Low Resolution Branch of MODNet
+ """
+ def __init__(self, backbone_channels: int):
+ super().__init__()
+ self.se_block = SEBlock(backbone_channels[4], reduction=4)
+ self.conv_lr16x = Conv2dIBNormRelu(
+ backbone_channels[4], backbone_channels[3], 5, stride=1, padding=2)
+ self.conv_lr8x = Conv2dIBNormRelu(
+ backbone_channels[3], backbone_channels[2], 5, stride=1, padding=2)
+ self.conv_lr = Conv2dIBNormRelu(
+ backbone_channels[2],
+ 1,
+ 3,
+ stride=2,
+ padding=1,
+ with_ibn=False,
+ with_relu=False)
+
+ def forward(self, feat_list: list):
+ enc2x, enc4x, enc32x = feat_list[0], feat_list[1], feat_list[4]
+
+ enc32x = self.se_block(enc32x)
+ lr16x = F.interpolate(
+ enc32x, scale_factor=2, mode='bilinear', align_corners=False)
+ lr16x = self.conv_lr16x(lr16x)
+ lr8x = F.interpolate(
+ lr16x, scale_factor=2, mode='bilinear', align_corners=False)
+ lr8x = self.conv_lr8x(lr8x)
+
+ pred_semantic = None
+ if self.training:
+ lr = self.conv_lr(lr8x)
+ pred_semantic = F.sigmoid(lr)
+
+ return pred_semantic, lr8x, [enc2x, enc4x]
+
+
+class IBNorm(nn.Layer):
+ """
+ Combine Instance Norm and Batch Norm into One Layer
+ """
+
+ def __init__(self, in_channels: int):
+ super().__init__()
+ self.bnorm_channels = in_channels // 2
+ self.inorm_channels = in_channels - self.bnorm_channels
+
+ self.bnorm = nn.BatchNorm2D(self.bnorm_channels)
+ self.inorm = nn.InstanceNorm2D(self.inorm_channels)
+
+ def forward(self, x):
+ bn_x = self.bnorm(x[:, :self.bnorm_channels, :, :])
+ in_x = self.inorm(x[:, self.bnorm_channels:, :, :])
+
+ return paddle.concat((bn_x, in_x), 1)
+
+
+class Conv2dIBNormRelu(nn.Layer):
+ """
+ Convolution + IBNorm + Relu
+ """
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ padding: int = 0,
+ dilation:int = 1,
+ groups: int = 1,
+ bias_attr: paddle.ParamAttr = None,
+ with_ibn: bool = True,
+ with_relu: bool = True):
+
+ super().__init__()
+
+ layers = [
+ nn.Conv2D(
+ in_channels,
+ out_channels,
+ kernel_size,
+ stride=stride,
+ padding=padding,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=bias_attr)
+ ]
+
+ if with_ibn:
+ layers.append(IBNorm(out_channels))
+
+ if with_relu:
+ layers.append(nn.ReLU())
+
+ self.layers = nn.Sequential(*layers)
+
+ def forward(self, x: paddle.Tensor):
+ return self.layers(x)
+
+
+class SEBlock(nn.Layer):
+ """
+ SE Block Proposed in https://arxiv.org/pdf/1709.01507.pdf
+ """
+
+ def __init__(self, num_channels: int, reduction:int = 1):
+ super().__init__()
+ self.pool = nn.AdaptiveAvgPool2D(1)
+ self.conv = nn.Sequential(
+ nn.Conv2D(
+ num_channels,
+ int(num_channels // reduction),
+ 1,
+ bias_attr=False), nn.ReLU(),
+ nn.Conv2D(
+ int(num_channels // reduction),
+ num_channels,
+ 1,
+ bias_attr=False), nn.Sigmoid())
+
+ def forward(self, x: paddle.Tensor):
+ w = self.pool(x)
+ w = self.conv(w)
+ return w * x
+
+
+class GaussianBlurLayer(nn.Layer):
+ """ Add Gaussian Blur to a 4D tensors
+ This layer takes a 4D tensor of {N, C, H, W} as input.
+ The Gaussian blur will be performed in given channel number (C) splitly.
+ """
+
+ def __init__(self, channels: int, kernel_size: int):
+ """
+ Args:
+ channels (int): Channel for input tensor
+ kernel_size (int): Size of the kernel used in blurring
+ """
+
+ super(GaussianBlurLayer, self).__init__()
+ self.channels = channels
+ self.kernel_size = kernel_size
+ assert self.kernel_size % 2 != 0
+
+ self.op = nn.Sequential(
+ nn.Pad2D(int(self.kernel_size / 2), mode='reflect'),
+ nn.Conv2D(
+ channels,
+ channels,
+ self.kernel_size,
+ stride=1,
+ padding=0,
+ bias_attr=False,
+ groups=channels))
+
+ self._init_kernel()
+ self.op[1].weight.stop_gradient = True
+
+ def forward(self, x: paddle.Tensor):
+ """
+ Args:
+ x (paddle.Tensor): input 4D tensor
+ Returns:
+ paddle.Tensor: Blurred version of the input
+ """
+
+ if not len(list(x.shape)) == 4:
+ print('\'GaussianBlurLayer\' requires a 4D tensor as input\n')
+ exit()
+ elif not x.shape[1] == self.channels:
+ print('In \'GaussianBlurLayer\', the required channel ({0}) is'
+ 'not the same as input ({1})\n'.format(
+ self.channels, x.shape[1]))
+ exit()
+
+ return self.op(x)
+
+ def _init_kernel(self):
+ sigma = 0.3 * ((self.kernel_size - 1) * 0.5 - 1) + 0.8
+
+ n = np.zeros((self.kernel_size, self.kernel_size))
+ i = int(self.kernel_size / 2)
+ n[i, i] = 1
+ kernel = scipy.ndimage.gaussian_filter(n, sigma)
+ kernel = kernel.astype('float32')
+ kernel = kernel[np.newaxis, np.newaxis, :, :]
+ paddle.assign(kernel, self.op[1].weight)
\ No newline at end of file
diff --git a/modules/image/matting/modnet_mobilenetv2_matting/processor.py b/modules/image/matting/modnet_mobilenetv2_matting/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..3ae79593f0d3dab19520c3c666ae4a06b81960dd
--- /dev/null
+++ b/modules/image/matting/modnet_mobilenetv2_matting/processor.py
@@ -0,0 +1,207 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import random
+import base64
+from typing import Callable, Union, List, Tuple
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+from paddleseg.transforms import functional
+from PIL import Image
+
+
+class Compose:
+ """
+ Do transformation on input data with corresponding pre-processing and augmentation operations.
+ The shape of input data to all operations is [height, width, channels].
+ """
+
+ def __init__(self, transforms: Callable, to_rgb: bool = True):
+ if not isinstance(transforms, list):
+ raise TypeError('The transforms must be a list!')
+ self.transforms = transforms
+ self.to_rgb = to_rgb
+
+ def __call__(self, data: dict) -> dict:
+
+ if 'trans_info' not in data:
+ data['trans_info'] = []
+ for op in self.transforms:
+ data = op(data)
+ if data is None:
+ return None
+
+ data['img'] = np.transpose(data['img'], (2, 0, 1))
+ for key in data.get('gt_fields', []):
+ if len(data[key].shape) == 2:
+ continue
+ data[key] = np.transpose(data[key], (2, 0, 1))
+
+ return data
+
+
+class LoadImages:
+ """
+ Read images from image path.
+
+ Args:
+ to_rgb (bool, optional): If converting image to RGB color space. Default: True.
+ """
+ def __init__(self, to_rgb: bool = True):
+ self.to_rgb = to_rgb
+
+ def __call__(self, data: dict) -> dict:
+
+ if isinstance(data['img'], str):
+ data['img'] = cv2.imread(data['img'])
+
+ for key in data.get('gt_fields', []):
+ if isinstance(data[key], str):
+ data[key] = cv2.imread(data[key], cv2.IMREAD_UNCHANGED)
+ # if alpha and trimap has 3 channels, extract one.
+ if key in ['alpha', 'trimap']:
+ if len(data[key].shape) > 2:
+ data[key] = data[key][:, :, 0]
+
+ if self.to_rgb:
+ data['img'] = cv2.cvtColor(data['img'], cv2.COLOR_BGR2RGB)
+ for key in data.get('gt_fields', []):
+ if len(data[key].shape) == 2:
+ continue
+ data[key] = cv2.cvtColor(data[key], cv2.COLOR_BGR2RGB)
+
+ return data
+
+
+class ResizeByShort:
+ """
+ Resize the short side of an image to given size, and then scale the other side proportionally.
+
+ Args:
+ short_size (int): The target size of short side.
+ """
+
+ def __init__(self, short_size: int =512):
+ self.short_size = short_size
+
+ def __call__(self, data: dict) -> dict:
+
+ data['trans_info'].append(('resize', data['img'].shape[0:2]))
+ data['img'] = functional.resize_short(data['img'], self.short_size)
+ for key in data.get('gt_fields', []):
+ data[key] = functional.resize_short(data[key], self.short_size)
+ return data
+
+
+class ResizeToIntMult:
+ """
+ Resize to some int muitple, d.g. 32.
+ """
+
+ def __init__(self, mult_int: int = 32):
+ self.mult_int = mult_int
+
+ def __call__(self, data: dict) -> dict:
+ data['trans_info'].append(('resize', data['img'].shape[0:2]))
+
+ h, w = data['img'].shape[0:2]
+ rw = w - w % 32
+ rh = h - h % 32
+ data['img'] = functional.resize(data['img'], (rw, rh))
+ for key in data.get('gt_fields', []):
+ data[key] = functional.resize(data[key], (rw, rh))
+
+ return data
+
+
+class Normalize:
+ """
+ Normalize an image.
+
+ Args:
+ mean (list, optional): The mean value of a data set. Default: [0.5, 0.5, 0.5].
+ std (list, optional): The standard deviation of a data set. Default: [0.5, 0.5, 0.5].
+
+ Raises:
+ ValueError: When mean/std is not list or any value in std is 0.
+ """
+
+ def __init__(self, mean: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5), std: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5)):
+ self.mean = mean
+ self.std = std
+ if not (isinstance(self.mean, (list, tuple))
+ and isinstance(self.std, (list, tuple))):
+ raise ValueError(
+ "{}: input type is invalid. It should be list or tuple".format(
+ self))
+ from functools import reduce
+ if reduce(lambda x, y: x * y, self.std) == 0:
+ raise ValueError('{}: std is invalid!'.format(self))
+
+ def __call__(self, data: dict) -> dict:
+ mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
+ std = np.array(self.std)[np.newaxis, np.newaxis, :]
+ data['img'] = functional.normalize(data['img'], mean, std)
+ if 'fg' in data.get('gt_fields', []):
+ data['fg'] = functional.normalize(data['fg'], mean, std)
+ if 'bg' in data.get('gt_fields', []):
+ data['bg'] = functional.normalize(data['bg'], mean, std)
+
+ return data
+
+
+def reverse_transform(alpha: paddle.Tensor, trans_info: List[str]):
+ """recover pred to origin shape"""
+ for item in trans_info[::-1]:
+ if item[0] == 'resize':
+ h, w = item[1][0], item[1][1]
+ alpha = F.interpolate(alpha, [h, w], mode='bilinear')
+ elif item[0] == 'padding':
+ h, w = item[1][0], item[1][1]
+ alpha = alpha[:, :, 0:h, 0:w]
+ else:
+ raise Exception("Unexpected info '{}' in im_info".format(item[0]))
+ return alpha
+
+def save_alpha_pred(alpha: np.ndarray, trimap: np.ndarray = None):
+ """
+ The value of alpha is range [0, 1], shape should be [h,w]
+ """
+ if isinstance(trimap, str):
+ trimap = cv2.imread(trimap, 0)
+ alpha[trimap == 0] = 0
+ alpha[trimap == 255] = 255
+ alpha = (alpha).astype('uint8')
+ return alpha
+
+
+def cv2_to_base64(image: np.ndarray):
+ """
+ Convert data from BGR to base64 format.
+ """
+ data = cv2.imencode('.png', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+def base64_to_cv2(b64str: str):
+ """
+ Convert data from base64 to BGR format.
+ """
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
\ No newline at end of file
diff --git a/modules/image/matting/modnet_mobilenetv2_matting/requirements.py b/modules/image/matting/modnet_mobilenetv2_matting/requirements.py
new file mode 100644
index 0000000000000000000000000000000000000000..7df0ef23928361724c3fadb8d87d6a3be869e58b
--- /dev/null
+++ b/modules/image/matting/modnet_mobilenetv2_matting/requirements.py
@@ -0,0 +1 @@
+paddleseg >= 2.3.0
diff --git a/modules/image/matting/modnet_resnet50vd_matting/README.md b/modules/image/matting/modnet_resnet50vd_matting/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..03ad69e6732d545861063c85a38e872ff6e60c5d
--- /dev/null
+++ b/modules/image/matting/modnet_resnet50vd_matting/README.md
@@ -0,0 +1,157 @@
+# modnet_resnet50vd_matting
+
+|模型名称|modnet_resnet50vd_matting|
+| :--- | :---: |
+|类别|图像-抠图|
+|网络|modnet_resnet50vd|
+|数据集|百度自建数据集|
+|是否支持Fine-tuning|否|
+|模型大小|535MB|
+|指标|SAD112.73|
+|最新更新日期|2021-12-03|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+
+ - 样例结果示例(左为原图,右为效果图):
+
+
+
+
+
+- ### 模型介绍
+
+ - Matting(精细化分割/影像去背/抠图)是指借由计算前景的颜色和透明度,将前景从影像中撷取出来的技术,可用于替换背景、影像合成、视觉特效,在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值,称作阿法值(Alpha),一张影像中所有阿法值的集合称作阿法遮罩(Alpha Matte),将影像被遮罩所涵盖的部分取出即可完成前景的分离。modnet_resnet50vd_matting可生成抠图结果。
+
+
+
+ - 更多详情请参考:[modnet_resnet50vd_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install modnet_resnet50vd_matting
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run modnet_resnet50vd_matting --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="modnet_resnet50vd_matting")
+
+ result = model.predict(["/PATH/TO/IMAGE"])
+ print(result)
+ ```
+
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ trimap_list,
+ visualization,
+ save_path):
+ ```
+
+ - 人像matting预测API,用于将输入图片中的人像分割出来。
+
+ - 参数
+
+ - image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
+ - trimap_list(list(str | numpy.ndarray)):trimap输入路径或者灰度图单通道格式图片。
+ - visualization (bool): 是否进行可视化,默认为False。
+ - save_path (str): 当visualization为True时,保存图片的路径,默认为"modnet_resnet50vd_matting_output"。
+
+ - 返回
+
+ - result (list(numpy.ndarray)):模型分割结果:
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署人像matting在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m modnet_resnet50vd_matting
+ ```
+
+ - 这样就完成了一个人像matting在线服务API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/modnet_resnet50vd_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/matting/modnet_resnet50vd_matting/README_en.md b/modules/image/matting/modnet_resnet50vd_matting/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..2a6d4e463d2196d3874a8b87892312cb0dc49b31
--- /dev/null
+++ b/modules/image/matting/modnet_resnet50vd_matting/README_en.md
@@ -0,0 +1,156 @@
+# modnet_resnet50vd_matting
+
+|Module Name|modnet_resnet50vd_matting|
+| :--- | :---: |
+|Category|Image Matting|
+|Network|modnet_resnet50vd|
+|Dataset|Baidu self-built dataset|
+|Support Fine-tuning|No|
+|Module Size|535MB|
+|Data Indicators|SAD104.14|
+|Latest update date|2021-12-03|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+- ### Module Introduction
+
+ - Mating is the technique of extracting foreground from an image by calculating its color and transparency. It is widely used in the film industry to replace background, image composition, and visual effects. Each pixel in the image will have a value that represents its foreground transparency, called Alpha. The set of all Alpha values in an image is called Alpha Matte. The part of the image covered by the mask can be extracted to complete foreground separation.
+
+
+
+ - For more information, please refer to: [modnet_resnet50vd_matting](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.3/contrib/Matting)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install modnet_resnet50vd_matting
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run modnet_resnet50vd_matting --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="modnet_resnet50vd_matting")
+
+ result = model.predict(["/PATH/TO/IMAGE"])
+ print(result)
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ trimap_list,
+ visualization,
+ save_path):
+ ```
+
+ - Prediction API for matting.
+
+ - **Parameter**
+
+ - image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\], BGR.
+ - trimap_list(list(str | numpy.ndarray)): Trimap path or trimap data, ndarray.shape is in the format \[H, W\], Gray. Default is None.
+ - visualization (bool): Whether to save the recognition results as picture files, default is False.
+ - save_path (str): Save path of images, "modnet_resnet50vd_matting_output" by default.
+
+ - **Return**
+
+ - result (list(numpy.ndarray)):The list of model results.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of matting.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m modnet_resnet50vd_matting
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+ import time
+ import numpy as np
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/modnet_resnet50vd_matting"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ for image in r.json()["results"]['data']:
+ data = base64_to_cv2(image)
+ image_path =str(time.time()) + ".png"
+ cv2.imwrite(image_path, data)
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/matting/modnet_resnet50vd_matting/module.py b/modules/image/matting/modnet_resnet50vd_matting/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..b57c170a9e281c258fbce8102a52293d93ed0a9e
--- /dev/null
+++ b/modules/image/matting/modnet_resnet50vd_matting/module.py
@@ -0,0 +1,497 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import time
+import argparse
+from typing import Callable, Union, List, Tuple
+
+import numpy as np
+import cv2
+import scipy
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.module import moduleinfo, runnable, serving
+
+from modnet_resnet50vd_matting.resnet import ResNet50_vd
+import modnet_resnet50vd_matting.processor as P
+
+
+@moduleinfo(
+ name="modnet_resnet50vd_matting",
+ type="CV/matting",
+ author="paddlepaddle",
+ summary="modnet_resnet50vd_matting is a matting model",
+ version="1.0.0"
+)
+class MODNetResNet50Vd(nn.Layer):
+ """
+ The MODNet implementation based on PaddlePaddle.
+
+ The original article refers to
+ Zhanghan Ke, et, al. "Is a Green Screen Really Necessary for Real-Time Portrait Matting?"
+ (https://arxiv.org/pdf/2011.11961.pdf).
+
+ Args:
+ hr_channels(int, optional): The channels of high resolutions branch. Defautl: None.
+ pretrained(str, optional): The path of pretrianed model. Defautl: None.
+ """
+
+ def __init__(self, hr_channels:int = 32, pretrained=None):
+ super(MODNetResNet50Vd, self).__init__()
+
+ self.backbone = ResNet50_vd()
+ self.pretrained = pretrained
+
+ self.head = MODNetHead(
+ hr_channels=hr_channels, backbone_channels=self.backbone.feat_channels)
+ self.blurer = GaussianBlurLayer(1, 3)
+ self.transforms = P.Compose([P.LoadImages(), P.ResizeByShort(), P.ResizeToIntMult(), P.Normalize()])
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'modnet-resnet50_vd.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def preprocess(self, img: Union[str, np.ndarray] , transforms: Callable, trimap: Union[str, np.ndarray] = None):
+ data = {}
+ data['img'] = img
+ if trimap is not None:
+ data['trimap'] = trimap
+ data['gt_fields'] = ['trimap']
+ data['trans_info'] = []
+ data = self.transforms(data)
+ data['img'] = paddle.to_tensor(data['img'])
+ data['img'] = data['img'].unsqueeze(0)
+ if trimap is not None:
+ data['trimap'] = paddle.to_tensor(data['trimap'])
+ data['trimap'] = data['trimap'].unsqueeze((0, 1))
+
+ return data
+
+ def forward(self, inputs: dict):
+ x = inputs['img']
+ feat_list = self.backbone(x)
+ y = self.head(inputs=inputs, feat_list=feat_list)
+ return y
+
+ def predict(self, image_list: list, trimap_list: list = None, visualization: bool =False, save_path: str = "modnet_resnet50vd_matting_output"):
+ self.eval()
+ result= []
+ with paddle.no_grad():
+ for i, im_path in enumerate(image_list):
+ trimap = trimap_list[i] if trimap_list is not None else None
+ data = self.preprocess(img=im_path, transforms=self.transforms, trimap=trimap)
+ alpha_pred = self.forward(data)
+ alpha_pred = P.reverse_transform(alpha_pred, data['trans_info'])
+ alpha_pred = (alpha_pred.numpy()).squeeze()
+ alpha_pred = (alpha_pred * 255).astype('uint8')
+ alpha_pred = P.save_alpha_pred(alpha_pred, trimap)
+ result.append(alpha_pred)
+ if visualization:
+ if not os.path.exists(save_path):
+ os.makedirs(save_path)
+ img_name = str(time.time()) + '.png'
+ image_save_path = os.path.join(save_path, img_name)
+ cv2.imwrite(image_save_path, alpha_pred)
+
+ return result
+
+ @serving
+ def serving_method(self, images: list, trimaps:list = None, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [P.base64_to_cv2(image) for image in images]
+ if trimaps is not None:
+ trimap_decoder = [cv2.cvtColor(P.base64_to_cv2(trimap), cv2.COLOR_BGR2GRAY) for trimap in trimaps]
+ else:
+ trimap_decoder = None
+
+ outputs = self.predict(image_list=images_decode, trimap_list= trimap_decoder, **kwargs)
+ serving_data = [P.cv2_to_base64(outputs[i]) for i in range(len(outputs))]
+ results = {'data': serving_data}
+
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ args = self.parser.parse_args(argvs)
+ if args.trimap_path is not None:
+ trimap_list = [args.trimap_path]
+ else:
+ trimap_list = None
+
+ results = self.predict(image_list=[args.input_path], trimap_list=trimap_list, save_path=args.output_dir, visualization=args.visualization)
+
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default="modnet_resnet50vd_matting_output", help="The directory to save output images.")
+ self.arg_config_group.add_argument(
+ '--visualization', type=bool, default=True, help="whether to save output as images.")
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
+ self.arg_input_group.add_argument('--trimap_path', type=str, default=None, help="path to trimap.")
+
+
+
+class MODNetHead(nn.Layer):
+ """
+ Segmentation head.
+ """
+ def __init__(self, hr_channels: int, backbone_channels: int):
+ super().__init__()
+
+ self.lr_branch = LRBranch(backbone_channels)
+ self.hr_branch = HRBranch(hr_channels, backbone_channels)
+ self.f_branch = FusionBranch(hr_channels, backbone_channels)
+
+ def forward(self, inputs: paddle.Tensor, feat_list: list) -> paddle.Tensor:
+ pred_semantic, lr8x, [enc2x, enc4x] = self.lr_branch(feat_list)
+ pred_detail, hr2x = self.hr_branch(inputs['img'], enc2x, enc4x, lr8x)
+ pred_matte = self.f_branch(inputs['img'], lr8x, hr2x)
+ return pred_matte
+
+
+
+class FusionBranch(nn.Layer):
+ def __init__(self, hr_channels: int, enc_channels: int):
+ super().__init__()
+ self.conv_lr4x = Conv2dIBNormRelu(
+ enc_channels[2], hr_channels, 5, stride=1, padding=2)
+
+ self.conv_f2x = Conv2dIBNormRelu(
+ 2 * hr_channels, hr_channels, 3, stride=1, padding=1)
+ self.conv_f = nn.Sequential(
+ Conv2dIBNormRelu(
+ hr_channels + 3, int(hr_channels / 2), 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ int(hr_channels / 2),
+ 1,
+ 1,
+ stride=1,
+ padding=0,
+ with_ibn=False,
+ with_relu=False))
+
+ def forward(self, img: paddle.Tensor, lr8x: paddle.Tensor, hr2x: paddle.Tensor) -> paddle.Tensor:
+ lr4x = F.interpolate(
+ lr8x, scale_factor=2, mode='bilinear', align_corners=False)
+ lr4x = self.conv_lr4x(lr4x)
+ lr2x = F.interpolate(
+ lr4x, scale_factor=2, mode='bilinear', align_corners=False)
+
+ f2x = self.conv_f2x(paddle.concat((lr2x, hr2x), axis=1))
+ f = F.interpolate(
+ f2x, scale_factor=2, mode='bilinear', align_corners=False)
+ f = self.conv_f(paddle.concat((f, img), axis=1))
+ pred_matte = F.sigmoid(f)
+
+ return pred_matte
+
+
+class HRBranch(nn.Layer):
+ """
+ High Resolution Branch of MODNet
+ """
+
+ def __init__(self, hr_channels: int, enc_channels:int):
+ super().__init__()
+
+ self.tohr_enc2x = Conv2dIBNormRelu(
+ enc_channels[0], hr_channels, 1, stride=1, padding=0)
+ self.conv_enc2x = Conv2dIBNormRelu(
+ hr_channels + 3, hr_channels, 3, stride=2, padding=1)
+
+ self.tohr_enc4x = Conv2dIBNormRelu(
+ enc_channels[1], hr_channels, 1, stride=1, padding=0)
+ self.conv_enc4x = Conv2dIBNormRelu(
+ 2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1)
+
+ self.conv_hr4x = nn.Sequential(
+ Conv2dIBNormRelu(
+ 2 * hr_channels + enc_channels[2] + 3,
+ 2 * hr_channels,
+ 3,
+ stride=1,
+ padding=1),
+ Conv2dIBNormRelu(
+ 2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ 2 * hr_channels, hr_channels, 3, stride=1, padding=1))
+
+ self.conv_hr2x = nn.Sequential(
+ Conv2dIBNormRelu(
+ 2 * hr_channels, 2 * hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ 2 * hr_channels, hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(hr_channels, hr_channels, 3, stride=1, padding=1))
+
+ self.conv_hr = nn.Sequential(
+ Conv2dIBNormRelu(
+ hr_channels + 3, hr_channels, 3, stride=1, padding=1),
+ Conv2dIBNormRelu(
+ hr_channels,
+ 1,
+ 1,
+ stride=1,
+ padding=0,
+ with_ibn=False,
+ with_relu=False))
+
+ def forward(self, img: paddle.Tensor, enc2x: paddle.Tensor, enc4x: paddle.Tensor, lr8x: paddle.Tensor) -> paddle.Tensor:
+ img2x = F.interpolate(
+ img, scale_factor=1 / 2, mode='bilinear', align_corners=False)
+ img4x = F.interpolate(
+ img, scale_factor=1 / 4, mode='bilinear', align_corners=False)
+
+ enc2x = self.tohr_enc2x(enc2x)
+ hr4x = self.conv_enc2x(paddle.concat((img2x, enc2x), axis=1))
+
+ enc4x = self.tohr_enc4x(enc4x)
+ hr4x = self.conv_enc4x(paddle.concat((hr4x, enc4x), axis=1))
+
+ lr4x = F.interpolate(
+ lr8x, scale_factor=2, mode='bilinear', align_corners=False)
+ hr4x = self.conv_hr4x(paddle.concat((hr4x, lr4x, img4x), axis=1))
+
+ hr2x = F.interpolate(
+ hr4x, scale_factor=2, mode='bilinear', align_corners=False)
+ hr2x = self.conv_hr2x(paddle.concat((hr2x, enc2x), axis=1))
+ pred_detail = None
+ return pred_detail, hr2x
+
+
+class LRBranch(nn.Layer):
+ """
+ Low Resolution Branch of MODNet
+ """
+ def __init__(self, backbone_channels: int):
+ super().__init__()
+ self.se_block = SEBlock(backbone_channels[4], reduction=4)
+ self.conv_lr16x = Conv2dIBNormRelu(
+ backbone_channels[4], backbone_channels[3], 5, stride=1, padding=2)
+ self.conv_lr8x = Conv2dIBNormRelu(
+ backbone_channels[3], backbone_channels[2], 5, stride=1, padding=2)
+ self.conv_lr = Conv2dIBNormRelu(
+ backbone_channels[2],
+ 1,
+ 3,
+ stride=2,
+ padding=1,
+ with_ibn=False,
+ with_relu=False)
+
+ def forward(self, feat_list: list) -> List[paddle.Tensor]:
+ enc2x, enc4x, enc32x = feat_list[0], feat_list[1], feat_list[4]
+
+ enc32x = self.se_block(enc32x)
+ lr16x = F.interpolate(
+ enc32x, scale_factor=2, mode='bilinear', align_corners=False)
+ lr16x = self.conv_lr16x(lr16x)
+ lr8x = F.interpolate(
+ lr16x, scale_factor=2, mode='bilinear', align_corners=False)
+ lr8x = self.conv_lr8x(lr8x)
+
+ pred_semantic = None
+ if self.training:
+ lr = self.conv_lr(lr8x)
+ pred_semantic = F.sigmoid(lr)
+
+ return pred_semantic, lr8x, [enc2x, enc4x]
+
+
+class IBNorm(nn.Layer):
+ """
+ Combine Instance Norm and Batch Norm into One Layer
+ """
+
+ def __init__(self, in_channels: int):
+ super().__init__()
+ self.bnorm_channels = in_channels // 2
+ self.inorm_channels = in_channels - self.bnorm_channels
+
+ self.bnorm = nn.BatchNorm2D(self.bnorm_channels)
+ self.inorm = nn.InstanceNorm2D(self.inorm_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ bn_x = self.bnorm(x[:, :self.bnorm_channels, :, :])
+ in_x = self.inorm(x[:, self.bnorm_channels:, :, :])
+
+ return paddle.concat((bn_x, in_x), 1)
+
+
+class Conv2dIBNormRelu(nn.Layer):
+ """
+ Convolution + IBNorm + Relu
+ """
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ padding: int = 0,
+ dilation:int = 1,
+ groups: int = 1,
+ bias_attr: paddle.ParamAttr = None,
+ with_ibn: bool = True,
+ with_relu: bool = True):
+
+ super().__init__()
+
+ layers = [
+ nn.Conv2D(
+ in_channels,
+ out_channels,
+ kernel_size,
+ stride=stride,
+ padding=padding,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=bias_attr)
+ ]
+
+ if with_ibn:
+ layers.append(IBNorm(out_channels))
+
+ if with_relu:
+ layers.append(nn.ReLU())
+
+ self.layers = nn.Sequential(*layers)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ return self.layers(x)
+
+
+class SEBlock(nn.Layer):
+ """
+ SE Block Proposed in https://arxiv.org/pdf/1709.01507.pdf
+ """
+
+ def __init__(self, num_channels: int, reduction:int = 1):
+ super().__init__()
+ self.pool = nn.AdaptiveAvgPool2D(1)
+ self.conv = nn.Sequential(
+ nn.Conv2D(
+ num_channels,
+ int(num_channels // reduction),
+ 1,
+ bias_attr=False), nn.ReLU(),
+ nn.Conv2D(
+ int(num_channels // reduction),
+ num_channels,
+ 1,
+ bias_attr=False), nn.Sigmoid())
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ w = self.pool(x)
+ w = self.conv(w)
+ return w * x
+
+
+class GaussianBlurLayer(nn.Layer):
+ """ Add Gaussian Blur to a 4D tensors
+ This layer takes a 4D tensor of {N, C, H, W} as input.
+ The Gaussian blur will be performed in given channel number (C) splitly.
+ """
+
+ def __init__(self, channels: int, kernel_size: int):
+ """
+ Args:
+ channels (int): Channel for input tensor
+ kernel_size (int): Size of the kernel used in blurring
+ """
+
+ super(GaussianBlurLayer, self).__init__()
+ self.channels = channels
+ self.kernel_size = kernel_size
+ assert self.kernel_size % 2 != 0
+
+ self.op = nn.Sequential(
+ nn.Pad2D(int(self.kernel_size / 2), mode='reflect'),
+ nn.Conv2D(
+ channels,
+ channels,
+ self.kernel_size,
+ stride=1,
+ padding=0,
+ bias_attr=False,
+ groups=channels))
+
+ self._init_kernel()
+ self.op[1].weight.stop_gradient = True
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ """
+ Args:
+ x (paddle.Tensor): input 4D tensor
+ Returns:
+ paddle.Tensor: Blurred version of the input
+ """
+
+ if not len(list(x.shape)) == 4:
+ print('\'GaussianBlurLayer\' requires a 4D tensor as input\n')
+ exit()
+ elif not x.shape[1] == self.channels:
+ print('In \'GaussianBlurLayer\', the required channel ({0}) is'
+ 'not the same as input ({1})\n'.format(
+ self.channels, x.shape[1]))
+ exit()
+
+ return self.op(x)
+
+ def _init_kernel(self):
+ sigma = 0.3 * ((self.kernel_size - 1) * 0.5 - 1) + 0.8
+
+ n = np.zeros((self.kernel_size, self.kernel_size))
+ i = int(self.kernel_size / 2)
+ n[i, i] = 1
+ kernel = scipy.ndimage.gaussian_filter(n, sigma)
+ kernel = kernel.astype('float32')
+ kernel = kernel[np.newaxis, np.newaxis, :, :]
+ paddle.assign(kernel, self.op[1].weight)
\ No newline at end of file
diff --git a/modules/image/matting/modnet_resnet50vd_matting/processor.py b/modules/image/matting/modnet_resnet50vd_matting/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..3ae79593f0d3dab19520c3c666ae4a06b81960dd
--- /dev/null
+++ b/modules/image/matting/modnet_resnet50vd_matting/processor.py
@@ -0,0 +1,207 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import random
+import base64
+from typing import Callable, Union, List, Tuple
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+from paddleseg.transforms import functional
+from PIL import Image
+
+
+class Compose:
+ """
+ Do transformation on input data with corresponding pre-processing and augmentation operations.
+ The shape of input data to all operations is [height, width, channels].
+ """
+
+ def __init__(self, transforms: Callable, to_rgb: bool = True):
+ if not isinstance(transforms, list):
+ raise TypeError('The transforms must be a list!')
+ self.transforms = transforms
+ self.to_rgb = to_rgb
+
+ def __call__(self, data: dict) -> dict:
+
+ if 'trans_info' not in data:
+ data['trans_info'] = []
+ for op in self.transforms:
+ data = op(data)
+ if data is None:
+ return None
+
+ data['img'] = np.transpose(data['img'], (2, 0, 1))
+ for key in data.get('gt_fields', []):
+ if len(data[key].shape) == 2:
+ continue
+ data[key] = np.transpose(data[key], (2, 0, 1))
+
+ return data
+
+
+class LoadImages:
+ """
+ Read images from image path.
+
+ Args:
+ to_rgb (bool, optional): If converting image to RGB color space. Default: True.
+ """
+ def __init__(self, to_rgb: bool = True):
+ self.to_rgb = to_rgb
+
+ def __call__(self, data: dict) -> dict:
+
+ if isinstance(data['img'], str):
+ data['img'] = cv2.imread(data['img'])
+
+ for key in data.get('gt_fields', []):
+ if isinstance(data[key], str):
+ data[key] = cv2.imread(data[key], cv2.IMREAD_UNCHANGED)
+ # if alpha and trimap has 3 channels, extract one.
+ if key in ['alpha', 'trimap']:
+ if len(data[key].shape) > 2:
+ data[key] = data[key][:, :, 0]
+
+ if self.to_rgb:
+ data['img'] = cv2.cvtColor(data['img'], cv2.COLOR_BGR2RGB)
+ for key in data.get('gt_fields', []):
+ if len(data[key].shape) == 2:
+ continue
+ data[key] = cv2.cvtColor(data[key], cv2.COLOR_BGR2RGB)
+
+ return data
+
+
+class ResizeByShort:
+ """
+ Resize the short side of an image to given size, and then scale the other side proportionally.
+
+ Args:
+ short_size (int): The target size of short side.
+ """
+
+ def __init__(self, short_size: int =512):
+ self.short_size = short_size
+
+ def __call__(self, data: dict) -> dict:
+
+ data['trans_info'].append(('resize', data['img'].shape[0:2]))
+ data['img'] = functional.resize_short(data['img'], self.short_size)
+ for key in data.get('gt_fields', []):
+ data[key] = functional.resize_short(data[key], self.short_size)
+ return data
+
+
+class ResizeToIntMult:
+ """
+ Resize to some int muitple, d.g. 32.
+ """
+
+ def __init__(self, mult_int: int = 32):
+ self.mult_int = mult_int
+
+ def __call__(self, data: dict) -> dict:
+ data['trans_info'].append(('resize', data['img'].shape[0:2]))
+
+ h, w = data['img'].shape[0:2]
+ rw = w - w % 32
+ rh = h - h % 32
+ data['img'] = functional.resize(data['img'], (rw, rh))
+ for key in data.get('gt_fields', []):
+ data[key] = functional.resize(data[key], (rw, rh))
+
+ return data
+
+
+class Normalize:
+ """
+ Normalize an image.
+
+ Args:
+ mean (list, optional): The mean value of a data set. Default: [0.5, 0.5, 0.5].
+ std (list, optional): The standard deviation of a data set. Default: [0.5, 0.5, 0.5].
+
+ Raises:
+ ValueError: When mean/std is not list or any value in std is 0.
+ """
+
+ def __init__(self, mean: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5), std: Union[List[float], Tuple[float]] = (0.5, 0.5, 0.5)):
+ self.mean = mean
+ self.std = std
+ if not (isinstance(self.mean, (list, tuple))
+ and isinstance(self.std, (list, tuple))):
+ raise ValueError(
+ "{}: input type is invalid. It should be list or tuple".format(
+ self))
+ from functools import reduce
+ if reduce(lambda x, y: x * y, self.std) == 0:
+ raise ValueError('{}: std is invalid!'.format(self))
+
+ def __call__(self, data: dict) -> dict:
+ mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
+ std = np.array(self.std)[np.newaxis, np.newaxis, :]
+ data['img'] = functional.normalize(data['img'], mean, std)
+ if 'fg' in data.get('gt_fields', []):
+ data['fg'] = functional.normalize(data['fg'], mean, std)
+ if 'bg' in data.get('gt_fields', []):
+ data['bg'] = functional.normalize(data['bg'], mean, std)
+
+ return data
+
+
+def reverse_transform(alpha: paddle.Tensor, trans_info: List[str]):
+ """recover pred to origin shape"""
+ for item in trans_info[::-1]:
+ if item[0] == 'resize':
+ h, w = item[1][0], item[1][1]
+ alpha = F.interpolate(alpha, [h, w], mode='bilinear')
+ elif item[0] == 'padding':
+ h, w = item[1][0], item[1][1]
+ alpha = alpha[:, :, 0:h, 0:w]
+ else:
+ raise Exception("Unexpected info '{}' in im_info".format(item[0]))
+ return alpha
+
+def save_alpha_pred(alpha: np.ndarray, trimap: np.ndarray = None):
+ """
+ The value of alpha is range [0, 1], shape should be [h,w]
+ """
+ if isinstance(trimap, str):
+ trimap = cv2.imread(trimap, 0)
+ alpha[trimap == 0] = 0
+ alpha[trimap == 255] = 255
+ alpha = (alpha).astype('uint8')
+ return alpha
+
+
+def cv2_to_base64(image: np.ndarray):
+ """
+ Convert data from BGR to base64 format.
+ """
+ data = cv2.imencode('.png', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+def base64_to_cv2(b64str: str):
+ """
+ Convert data from base64 to BGR format.
+ """
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
\ No newline at end of file
diff --git a/modules/image/matting/modnet_resnet50vd_matting/resnet.py b/modules/image/matting/modnet_resnet50vd_matting/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..19abe41c8e47ca297941eb44e7ffc49e63b996da
--- /dev/null
+++ b/modules/image/matting/modnet_resnet50vd_matting/resnet.py
@@ -0,0 +1,332 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+from paddleseg.models import layers
+from paddleseg.utils import utils
+
+__all__ = ["ResNet50_vd"]
+
+
+class ConvBNLayer(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(
+ self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ ):
+ super(ConvBNLayer, self).__init__()
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = nn.AvgPool2D(
+ kernel_size=2, stride=2, padding=0, ceil_mode=True)
+ self._conv = nn.Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False)
+
+ self._batch_norm = layers.SyncBatchNorm(out_channels)
+ self._act_op = layers.Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ """Residual bottleneck block"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1):
+ super(BottleneckBlock, self).__init__()
+
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu')
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation)
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True)
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+
+ ####################################################################
+ # If given dilation rate > 1, using corresponding padding.
+ # The performance drops down without the follow padding.
+ if self.dilation > 1:
+ padding = self.dilation
+ y = F.pad(y, [padding, padding, padding, padding])
+ #####################################################################
+
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = paddle.add(x=short, y=conv2)
+ y = F.relu(y)
+ return y
+
+
+class BasicBlock(nn.Layer):
+ """Basic residual block"""
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False):
+ super(BasicBlock, self).__init__()
+ self.stride = stride
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu')
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ act=None)
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first else True)
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = paddle.add(x=short, y=conv1)
+ y = F.relu(y)
+
+ return y
+
+
+class ResNet_vd(nn.Layer):
+ """
+ The ResNet_vd implementation based on PaddlePaddle.
+
+ The original article refers to Jingdong
+ Tong He, et, al. "Bag of Tricks for Image Classification with Convolutional Neural Networks"
+ (https://arxiv.org/pdf/1812.01187.pdf).
+
+ """
+
+ def __init__(self,
+ input_channels: int = 3,
+ layers: int = 50,
+ output_stride: int = 32,
+ multi_grid: tuple = (1, 1, 1),
+ pretrained: str = None):
+ super(ResNet_vd, self).__init__()
+
+ self.conv1_logit = None # for gscnn shape stream
+ self.layers = layers
+ supported_layers = [18, 34, 50, 101, 152, 200]
+ assert layers in supported_layers, \
+ "supported layers are {} but input layer is {}".format(
+ supported_layers, layers)
+
+ if layers == 18:
+ depth = [2, 2, 2, 2]
+ elif layers == 34 or layers == 50:
+ depth = [3, 4, 6, 3]
+ elif layers == 101:
+ depth = [3, 4, 23, 3]
+ elif layers == 152:
+ depth = [3, 8, 36, 3]
+ elif layers == 200:
+ depth = [3, 12, 48, 3]
+ num_channels = [64, 256, 512, 1024
+ ] if layers >= 50 else [64, 64, 128, 256]
+ num_filters = [64, 128, 256, 512]
+
+ # for channels of four returned stages
+ self.feat_channels = [c * 4 for c in num_filters
+ ] if layers >= 50 else num_filters
+ self.feat_channels = [64] + self.feat_channels
+
+ dilation_dict = None
+ if output_stride == 8:
+ dilation_dict = {2: 2, 3: 4}
+ elif output_stride == 16:
+ dilation_dict = {3: 2}
+
+ self.conv1_1 = ConvBNLayer(
+ in_channels=input_channels,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu')
+ self.conv1_2 = ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu')
+ self.conv1_3 = ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu')
+ self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
+
+ # self.block_list = []
+ self.stage_list = []
+ if layers >= 50:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ if layers in [101, 152] and block == 2:
+ if i == 0:
+ conv_name = "res" + str(block + 2) + "a"
+ else:
+ conv_name = "res" + str(block + 2) + "b" + str(i)
+ else:
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+
+ ###############################################################################
+ # Add dilation rate for some segmentation tasks, if dilation_dict is not None.
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+
+ # Actually block here is 'stage', and i is 'block' in 'stage'
+ # At the stage 4, expand the the dilation_rate if given multi_grid
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ ###############################################################################
+
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ dilation=dilation_rate))
+
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+ else:
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+ basic_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ BasicBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block],
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0))
+ block_list.append(basic_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ self.pretrained = pretrained
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ feat_list = []
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ feat_list.append(y)
+
+ y = self.pool2d_max(y)
+
+ # A feature list saves the output feature map of each stage.
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+
+ return feat_list
+
+
+def ResNet50_vd(**args):
+ model = ResNet_vd(layers=50, **args)
+ return model
diff --git a/modules/image/semantic_segmentation/bisenet_lane_segmentation/README.md b/modules/image/semantic_segmentation/bisenet_lane_segmentation/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..b9814fe7bb98ca34f13b0a94741a57d365ed035c
--- /dev/null
+++ b/modules/image/semantic_segmentation/bisenet_lane_segmentation/README.md
@@ -0,0 +1,151 @@
+# bisenet_lane_segmentation
+
+|模型名称|bisenet_lane_segmentation|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|bisenet|
+|数据集|TuSimple|
+|是否支持Fine-tuning|否|
+|模型大小|9.7MB|
+|指标|ACC96.09%|
+|最新更新日期|2021-12-03|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+
+ - 样例结果示例(左为原图,右为效果图):
+
+
+
+
+
+- ### 模型介绍
+
+ - 车道线分割是自动驾驶算法的一个范畴,可以用来辅助进行车辆定位和进行决策,早期已有基于传统图像处理的车道线检测方法,但是随着技术的演进,车道线检测任务所应对的场景越来越多样化,目前更多的方式是寻求在语义上对车道线存在位置的检测。bisenet_lane_segmentation是一个轻量化车道线分割模型。
+
+ - 更多详情请参考:[bisenet_lane_segmentation](https://github.com/PaddlePaddle/PaddleSeg)
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+ - Python >= 3.7+
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install bisenet_lane_segmentation
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run bisenet_lane_segmentation --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - 通过命令行方式实现hub模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="bisenet_lane_segmentation")
+ result = model.predict(image_list=["/PATH/TO/IMAGE"])
+ print(result)
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ visualization,
+ save_path):
+ ```
+
+ - 车道线分割预测API,用于将输入图片中的车道线分割出来。
+
+ - 参数
+
+ - image_list (list(str | numpy.ndarray)):图片输入路径或者BGR格式numpy数据。
+ - visualization (bool): 是否进行可视化,默认为False。
+ - save_path (str): 当visualization为True时,保存图片的路径,默认为"bisenet_lane_segmentation_output"。
+
+ - 返回
+
+ - result (list(numpy.ndarray)):模型分割结果:
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署车道线分割在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m bisenet_lane_segmentation
+ ```
+
+ - 这样就完成了一个车道线分割在线服务API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/bisenet_lane_segmentation"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ #print(r.json())
+ mask = base64_to_cv2(r.json()["results"]['data'][0])
+ print(mask)
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
diff --git a/modules/image/semantic_segmentation/bisenet_lane_segmentation/README_en.md b/modules/image/semantic_segmentation/bisenet_lane_segmentation/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..8e6364bc34e44465d6ece095184f7eb1d8cedcd4
--- /dev/null
+++ b/modules/image/semantic_segmentation/bisenet_lane_segmentation/README_en.md
@@ -0,0 +1,154 @@
+# bisenet_lane_segmentation
+
+|Module Name|bisenet_lane_segmentation|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|bisenet|
+|Dataset|TuSimple|
+|Support Fine-tuning|No|
+|Module Size|9.7MB|
+|Data Indicators|ACC96.09%|
+|Latest update date|2021-12-03|
+
+
+## I. Basic Information
+
+- ### Application Effect Display
+
+ - Sample results:
+
+
+
+
+
+- ### Module Introduction
+
+ - Lane segmentation is a category of automatic driving algorithms, which can be used to assist vehicle positioning and decision-making. In the early days, there were lane detection methods based on traditional image processing, but with the evolution of technology, the scenes that lane detection tasks deal with More and more diversified, and more methods are currently seeking to detect the location of lane semantically. bisenet_lane_segmentation is a lightweight model for lane segmentation.
+
+
+
+ - For more information, please refer to: [bisenet_lane_segmentation](https://github.com/PaddlePaddle/PaddleSeg)
+
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.2.0
+
+ - paddlehub >= 2.1.0
+
+ - paddleseg >= 2.3.0
+
+ - Python >= 3.7+
+
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install bisenet_lane_segmentation
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Command line Prediction
+
+ - ```shell
+ $ hub run bisenet_lane_segmentation --input_path "/PATH/TO/IMAGE"
+ ```
+
+ - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_en/tutorial/cmd_usage.rst)
+
+
+- ### 2、Prediction Code Example
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ model = hub.Module(name="bisenet_lane_segmentation")
+ result = model.predict(image_list=["/PATH/TO/IMAGE"])
+ print(result)
+
+ ```
+- ### 3、API
+
+ - ```python
+ def predict(self,
+ image_list,
+ visualization,
+ save_path):
+ ```
+
+ - Prediction API for lane segmentation.
+
+ - **Parameter**
+
+ - image_list (list(str | numpy.ndarray)): Image path or image data, ndarray.shape is in the format \[H, W, C\],BGR.
+ - visualization (bool): Whether to save the recognition results as picture files, default is False.
+ - save_path (str): Save path of images, "bisenet_lane_segmentation_output" by default.
+
+ - **Return**
+
+ - result (list(numpy.ndarray)):The list of model results.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of lane segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m bisenet_lane_segmentation
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result
+
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/bisenet_lane_segmentation"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ #print(r.json())
+ mask = base64_to_cv2(r.json()["results"]['data'][0])
+ print(mask)
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/bisenet_lane_segmentation/lane_processor/get_lane_coords.py b/modules/image/semantic_segmentation/bisenet_lane_segmentation/lane_processor/get_lane_coords.py
new file mode 100644
index 0000000000000000000000000000000000000000..868f0bcc37ed850c90c6bec0616ac4e0b929b30f
--- /dev/null
+++ b/modules/image/semantic_segmentation/bisenet_lane_segmentation/lane_processor/get_lane_coords.py
@@ -0,0 +1,156 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# this code is based on
+# https://github.com/ZJULearning/resa/blob/main/datasets/tusimple.py
+
+import cv2
+import numpy as np
+
+
+class LaneProcessor:
+ def __init__(self,
+ num_classes=2,
+ ori_shape=(720, 1280),
+ cut_height=0,
+ y_pixel_gap=10,
+ points_nums=56,
+ thresh=0.6,
+ smooth=True):
+ super(LaneProcessor, self).__init__()
+ self.num_classes = num_classes
+ self.ori_shape = ori_shape
+ self.cut_height = cut_height
+ self.y_pixel_gap = y_pixel_gap
+ self.points_nums = points_nums
+ self.thresh = thresh
+ self.smooth = smooth
+
+ def get_lane_coords(self, seg_pred):
+ lane_coords_list = []
+ for batch in range(len(seg_pred)):
+ seg = seg_pred[batch]
+ lane_coords = self.heatmap2coords(seg)
+ for i in range(len(lane_coords)):
+ lane_coords[i] = sorted(
+ lane_coords[i], key=lambda pair: pair[1])
+ lane_coords_list.append(lane_coords)
+ return lane_coords_list
+
+ def process_gap(self, coordinate):
+ if any(x > 0 for x in coordinate):
+ start = [i for i, x in enumerate(coordinate) if x > 0][0]
+ end = [
+ i for i, x in reversed(list(enumerate(coordinate))) if x > 0
+ ][0]
+ lane = coordinate[start:end + 1]
+ # The line segment is not continuous
+ if any(x < 0 for x in lane):
+ gap_start = [
+ i for i, x in enumerate(lane[:-1])
+ if x > 0 and lane[i + 1] < 0
+ ]
+ gap_end = [
+ i + 1 for i, x in enumerate(lane[:-1])
+ if x < 0 and lane[i + 1] > 0
+ ]
+ gap_id = [i for i, x in enumerate(lane) if x < 0]
+ if len(gap_start) == 0 or len(gap_end) == 0:
+ return coordinate
+ for id in gap_id:
+ for i in range(len(gap_start)):
+ if i >= len(gap_end):
+ return coordinate
+ if id > gap_start[i] and id < gap_end[i]:
+ gap_width = float(gap_end[i] - gap_start[i])
+ # line interpolation
+ lane[id] = int((id - gap_start[i]) / gap_width *
+ lane[gap_end[i]] +
+ (gap_end[i] - id) / gap_width *
+ lane[gap_start[i]])
+ if not all(x > 0 for x in lane):
+ print("Gaps still exist!")
+ coordinate[start:end + 1] = lane
+ return coordinate
+
+ def get_coords(self, heat_map):
+ dst_height = self.ori_shape[0] - self.cut_height
+ coords = np.zeros(self.points_nums)
+ coords[:] = -2
+ pointCount = 0
+ for i in range(self.points_nums):
+ y_coord = dst_height - 10 - i * self.y_pixel_gap
+ y = int(y_coord / dst_height * heat_map.shape[0])
+ if y < 0:
+ break
+ prob_line = heat_map[y, :]
+ x = np.argmax(prob_line)
+ prob = prob_line[x]
+ if prob > self.thresh:
+ coords[i] = int(x / heat_map.shape[1] * self.ori_shape[1])
+ pointCount = pointCount + 1
+ if pointCount < 2:
+ coords[:] = -2
+ self.process_gap(coords)
+ return coords
+
+ def fix_outliers(self, coords):
+ data = [x for i, x in enumerate(coords) if x > 0]
+ index = [i for i, x in enumerate(coords) if x > 0]
+ if len(data) == 0:
+ return coords
+ diff = []
+ is_outlier = False
+ n = 1
+ x_gap = abs((data[-1] - data[0]) / (1.0 * (len(data) - 1)))
+ for idx, dt in enumerate(data):
+ if is_outlier == False:
+ t = idx - 1
+ n = 1
+ if idx == 0:
+ diff.append(0)
+ else:
+ diff.append(abs(data[idx] - data[t]))
+ if abs(data[idx] - data[t]) > n * (x_gap * 1.5):
+ n = n + 1
+ is_outlier = True
+ ind = index[idx]
+ coords[ind] = -1
+ else:
+ is_outlier = False
+
+ def heatmap2coords(self, seg_pred):
+ coordinates = []
+ for i in range(self.num_classes - 1):
+ heat_map = seg_pred[i + 1]
+ if self.smooth:
+ heat_map = cv2.blur(
+ heat_map, (9, 9), borderType=cv2.BORDER_REPLICATE)
+ coords = self.get_coords(heat_map)
+ indexes = [i for i, x in enumerate(coords) if x > 0]
+ if not indexes:
+ continue
+ self.add_coords(coordinates, coords)
+
+ if len(coordinates) == 0:
+ coords = np.zeros(self.points_nums)
+ self.add_coords(coordinates, coords)
+ return coordinates
+
+ def add_coords(self, coordinates, coords):
+ sub_lanes = []
+ for j in range(self.points_nums):
+ y_lane = self.ori_shape[0] - 10 - j * self.y_pixel_gap
+ x_lane = coords[j] if coords[j] > 0 else -2
+ sub_lanes.append([x_lane, y_lane])
+ coordinates.append(sub_lanes)
diff --git a/modules/image/semantic_segmentation/bisenet_lane_segmentation/lane_processor/lane.py b/modules/image/semantic_segmentation/bisenet_lane_segmentation/lane_processor/lane.py
new file mode 100644
index 0000000000000000000000000000000000000000..8a7a481570e993810079445a7f54a70bd2e41c57
--- /dev/null
+++ b/modules/image/semantic_segmentation/bisenet_lane_segmentation/lane_processor/lane.py
@@ -0,0 +1,141 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# this code is from https://github.com/TuSimple/tusimple-benchmark/blob/master/evaluate/lane.py
+
+import json as json
+import numpy as np
+from sklearn.linear_model import LinearRegression
+
+
+class LaneEval(object):
+ lr = LinearRegression()
+ pixel_thresh = 20
+ pt_thresh = 0.85
+
+ @staticmethod
+ def get_angle(xs, y_samples):
+ xs, ys = xs[xs >= 0], y_samples[xs >= 0]
+ if len(xs) > 1:
+ LaneEval.lr.fit(ys[:, None], xs)
+ k = LaneEval.lr.coef_[0]
+ theta = np.arctan(k)
+ else:
+ theta = 0
+ return theta
+
+ @staticmethod
+ def line_accuracy(pred, gt, thresh):
+ pred = np.array([p if p >= 0 else -100 for p in pred])
+ gt = np.array([g if g >= 0 else -100 for g in gt])
+ return np.sum(np.where(np.abs(pred - gt) < thresh, 1., 0.)) / len(gt)
+
+ @staticmethod
+ def bench(pred, gt, y_samples, running_time):
+ if any(len(p) != len(y_samples) for p in pred):
+ raise Exception('Format of lanes error.')
+ if running_time > 200 or len(gt) + 2 < len(pred):
+ return 0., 0., 1.
+ angles = [
+ LaneEval.get_angle(np.array(x_gts), np.array(y_samples))
+ for x_gts in gt
+ ]
+ threshs = [LaneEval.pixel_thresh / np.cos(angle) for angle in angles]
+ line_accs = []
+ fp, fn = 0., 0.
+ matched = 0.
+ for x_gts, thresh in zip(gt, threshs):
+ accs = [
+ LaneEval.line_accuracy(
+ np.array(x_preds), np.array(x_gts), thresh)
+ for x_preds in pred
+ ]
+ max_acc = np.max(accs) if len(accs) > 0 else 0.
+ if max_acc < LaneEval.pt_thresh:
+ fn += 1
+ else:
+ matched += 1
+ line_accs.append(max_acc)
+ fp = len(pred) - matched
+ if len(gt) > 4 and fn > 0:
+ fn -= 1
+ s = sum(line_accs)
+ if len(gt) > 4:
+ s -= min(line_accs)
+ return s / max(min(4.0, len(gt)),
+ 1.), fp / len(pred) if len(pred) > 0 else 0., fn / max(
+ min(len(gt), 4.), 1.)
+
+ @staticmethod
+ def bench_one_submit(pred_file, gt_file):
+ try:
+ json_pred = [
+ json.loads(line) for line in open(pred_file).readlines()
+ ]
+ except BaseException as e:
+ raise Exception('Fail to load json file of the prediction.')
+ json_gt = [json.loads(line) for line in open(gt_file).readlines()]
+ if len(json_gt) != len(json_pred):
+ raise Exception(
+ 'We do not get the predictions of all the test tasks')
+ gts = {l['raw_file']: l for l in json_gt}
+ accuracy, fp, fn = 0., 0., 0.
+ for pred in json_pred:
+ if 'raw_file' not in pred or 'lanes' not in pred or 'run_time' not in pred:
+ raise Exception(
+ 'raw_file or lanes or run_time not in some predictions.')
+ raw_file = pred['raw_file']
+ pred_lanes = pred['lanes']
+ run_time = pred['run_time']
+ if raw_file not in gts:
+ raise Exception(
+ 'Some raw_file from your predictions do not exist in the test tasks.'
+ )
+ gt = gts[raw_file]
+ gt_lanes = gt['lanes']
+ y_samples = gt['h_samples']
+ try:
+ a, p, n = LaneEval.bench(pred_lanes, gt_lanes, y_samples,
+ run_time)
+ except BaseException as e:
+ raise Exception('Format of lanes error.')
+ accuracy += a
+ fp += p
+ fn += n
+ num = len(gts)
+ # the first return parameter is the default ranking parameter
+ return json.dumps([{
+ 'name': 'Accuracy',
+ 'value': accuracy / num,
+ 'order': 'desc'
+ }, {
+ 'name': 'FP',
+ 'value': fp / num,
+ 'order': 'asc'
+ }, {
+ 'name': 'FN',
+ 'value': fn / num,
+ 'order': 'asc'
+ }]), accuracy / num, fp / num, fn / num
+
+
+if __name__ == '__main__':
+ import sys
+
+ try:
+ if len(sys.argv) != 3:
+ raise Exception('Invalid input arguments')
+ print(LaneEval.bench_one_submit(sys.argv[1], sys.argv[2]))
+ except Exception as e:
+ print(e.message)
+ sys.exit(e.message)
diff --git a/modules/image/semantic_segmentation/bisenet_lane_segmentation/lane_processor/tusimple_processor.py b/modules/image/semantic_segmentation/bisenet_lane_segmentation/lane_processor/tusimple_processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..6fa7fc55d2513e5bd2c4edeb78f761a8882466b2
--- /dev/null
+++ b/modules/image/semantic_segmentation/bisenet_lane_segmentation/lane_processor/tusimple_processor.py
@@ -0,0 +1,125 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+import cv2
+import json
+import paddle.nn as nn
+
+from .lane import LaneEval
+from .get_lane_coords import LaneProcessor
+
+
+def mkdir(path):
+ sub_dir = os.path.dirname(path)
+ if not os.path.exists(sub_dir):
+ os.makedirs(sub_dir)
+
+
+class TusimpleProcessor:
+ def __init__(self,
+ num_classes=2,
+ ori_shape=(720, 1280),
+ cut_height=0,
+ thresh=0.6,
+ test_gt_json=None,
+ save_dir='output/'):
+ super(TusimpleProcessor, self).__init__()
+ self.num_classes = num_classes
+ self.dump_to_json = []
+ self.save_dir = save_dir
+ self.test_gt_json = test_gt_json
+ self.color_map = [(255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0),
+ (255, 0, 255), (0, 255, 125), (50, 100, 50),
+ (100, 50, 100)]
+ self.laneProcessor = LaneProcessor(
+ num_classes=self.num_classes,
+ ori_shape=ori_shape,
+ cut_height=cut_height,
+ y_pixel_gap=10,
+ points_nums=56,
+ thresh=thresh,
+ smooth=True)
+
+ def dump_data_to_json(self,
+ output,
+ im_path,
+ run_time=0,
+ is_dump_json=True,
+ is_view=False):
+ seg_pred = output[0]
+ seg_pred = nn.functional.softmax(seg_pred, axis=1)
+ seg_pred = seg_pred.numpy()
+ lane_coords_list = self.laneProcessor.get_lane_coords(seg_pred)
+
+ for batch in range(len(seg_pred)):
+ lane_coords = lane_coords_list[batch]
+ path_list = im_path[batch].split("/")
+ if is_dump_json:
+ json_pred = {}
+ json_pred['lanes'] = []
+ json_pred['run_time'] = run_time * 1000
+ json_pred['h_sample'] = []
+
+ json_pred['raw_file'] = os.path.join(*path_list[-4:])
+ for l in lane_coords:
+ if len(l) == 0:
+ continue
+ json_pred['lanes'].append([])
+ for (x, y) in l:
+ json_pred['lanes'][-1].append(int(x))
+ for (x, y) in lane_coords[0]:
+ json_pred['h_sample'].append(y)
+ self.dump_to_json.append(json.dumps(json_pred))
+
+ if is_view:
+ img = cv2.imread(im_path[batch])
+ if is_dump_json:
+ img_name = '_'.join([x for x in path_list[-4:]])
+ sub_dir = 'visual_eval'
+ else:
+ img_name = os.path.basename(im_path[batch])
+ sub_dir = 'visual_points'
+ saved_path = os.path.join(self.save_dir, sub_dir, img_name)
+ self.draw(img, lane_coords, saved_path)
+
+ def predict(self, output, im_path):
+ self.dump_data_to_json(
+ output, [im_path], is_dump_json=False, is_view=True)
+
+ def bench_one_submit(self):
+ output_file = os.path.join(self.save_dir, 'pred.json')
+ if output_file is not None:
+ mkdir(output_file)
+ with open(output_file, "w+") as f:
+ for line in self.dump_to_json:
+ print(line, end="\n", file=f)
+
+ eval_rst, acc, fp, fn = LaneEval.bench_one_submit(
+ output_file, self.test_gt_json)
+ self.dump_to_json = []
+ return acc, fp, fn, eval_rst
+
+ def draw(self, img, coords, file_path=None):
+ for i, coord in enumerate(coords):
+ for x, y in coord:
+ if x <= 0 or y <= 0:
+ continue
+ cv2.circle(img, (int(x), int(y)), 4,
+ self.color_map[i % self.num_classes], 2)
+
+ if file_path is not None:
+ mkdir(file_path)
+ cv2.imwrite(file_path, img)
diff --git a/modules/image/semantic_segmentation/bisenet_lane_segmentation/module.py b/modules/image/semantic_segmentation/bisenet_lane_segmentation/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..29dcb93d36f994c831e5ee5a982bb06affc8193f
--- /dev/null
+++ b/modules/image/semantic_segmentation/bisenet_lane_segmentation/module.py
@@ -0,0 +1,165 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import time
+import argparse
+import os
+from typing import Union, List, Tuple
+
+import cv2
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo, runnable, serving
+import paddleseg.transforms as T
+from paddleseg.utils import logger, progbar, visualize
+from paddlehub.module.cv_module import ImageSegmentationModule
+import paddleseg.utils as utils
+from paddleseg.models import layers
+from paddleseg.models import BiSeNetV2
+
+from bisenet_lane_segmentation.processor import Crop, reverse_transform, cv2_to_base64, base64_to_cv2
+from bisenet_lane_segmentation.lane_processor.tusimple_processor import TusimpleProcessor
+
+@moduleinfo(
+ name="bisenet_lane_segmentation",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="BiSeNetLane is a lane segmentation model.",
+ version="1.0.0")
+class BiSeNetLane(nn.Layer):
+ """
+ The BiSeNetLane use BiseNet V2 to process lane segmentation .
+
+ Args:
+ num_classes (int): The unique number of target classes.
+ lambd (float, optional): A factor for controlling the size of semantic branch channels. Default: 0.25.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 7,
+ lambd: float = 0.25,
+ align_corners: bool = False,
+ pretrained: str = None):
+ super(BiSeNetLane, self).__init__()
+
+ self.net = BiSeNetV2(
+ num_classes=num_classes,
+ lambd=lambd,
+ align_corners=align_corners,
+ pretrained=None)
+
+ self.transforms = [Crop(up_h_off=160), T.Resize([640, 368]), T.Normalize()]
+ self.cut_height = 160
+ self.postprocessor = TusimpleProcessor(num_classes=7, cut_height=160,)
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ logit_list = self.net(x)
+ return logit_list
+
+ def predict(self, image_list: list, visualization: bool = False, save_path: str = "bisenet_lane_segmentation_output") -> List[np.ndarray]:
+ self.eval()
+ result = []
+ with paddle.no_grad():
+ for i, im in enumerate(image_list):
+ if isinstance(im, str):
+ im = cv2.imread(im)
+
+ ori_shape = im.shape[:2]
+ for op in self.transforms:
+ outputs = op(im)
+ im = outputs[0]
+
+ im = np.transpose(im, (2, 0, 1))
+ im = im[np.newaxis, ...]
+ im = paddle.to_tensor(im)
+ logit = self.forward(im)[0]
+ pred = reverse_transform(logit, ori_shape, self.transforms, mode='bilinear')
+ pred = paddle.argmax(pred, axis=1, keepdim=True, dtype='int32')
+ pred = paddle.squeeze(pred[0])
+ pred = pred.numpy().astype('uint8')
+ if visualization:
+ color_map = visualize.get_color_map_list(256)
+ pred_mask = visualize.get_pseudo_color_map(pred, color_map)
+ if not os.path.exists(save_path):
+ os.makedirs(save_path)
+ img_name = str(time.time()) + '.png'
+ image_save_path = os.path.join(save_path, img_name)
+ pred_mask.save(image_save_path)
+ result.append(pred)
+ return result
+
+ @serving
+ def serving_method(self, images: str, **kwargs) -> dict:
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ outputs = self.predict(image_list=images_decode, **kwargs)
+ serving_data = [cv2_to_base64(outputs[i]) for i in range(len(outputs))]
+ results = {'data': serving_data}
+
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list) -> List[np.ndarray]:
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(
+ description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ args = self.parser.parse_args(argvs)
+
+ results = self.predict(image_list=[args.input_path], save_path=args.output_dir, visualization=args.visualization)
+
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+
+ self.arg_config_group.add_argument(
+ '--output_dir', type=str, default="bisenet_lane_segmentation_output", help="The directory to save output images.")
+ self.arg_config_group.add_argument(
+ '--visualization', type=bool, default=True, help="whether to save output as images.")
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
+
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/bisenet_lane_segmentation/processor.py b/modules/image/semantic_segmentation/bisenet_lane_segmentation/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..dc1cf08804a03cef641f7620a5fa2262713cce54
--- /dev/null
+++ b/modules/image/semantic_segmentation/bisenet_lane_segmentation/processor.py
@@ -0,0 +1,185 @@
+import base64
+import collections.abc
+from itertools import combinations
+from typing import Union, List, Tuple, Callable
+
+import numpy as np
+import cv2
+import paddle
+import paddle.nn.functional as F
+
+
+def get_reverse_list(ori_shape: list, transforms: Callable) -> list:
+ """
+ get reverse list of transform.
+
+ Args:
+ ori_shape (list): Origin shape of image.
+ transforms (list): List of transform.
+
+ Returns:
+ list: List of tuple, there are two format:
+ ('resize', (h, w)) The image shape before resize,
+ ('padding', (h, w)) The image shape before padding.
+ """
+ reverse_list = []
+ h, w = ori_shape[0], ori_shape[1]
+ for op in transforms:
+ if op.__class__.__name__ in ['Resize']:
+ reverse_list.append(('resize', (h, w)))
+ h, w = op.target_size[0], op.target_size[1]
+ if op.__class__.__name__ in ['Crop']:
+ reverse_list.append(('crop', (op.up_h_off, op.down_h_off),
+ (op.left_w_off, op.right_w_off)))
+ h = h - op.up_h_off
+ h = h - op.down_h_off
+ w = w - op.left_w_off
+ w = w - op.right_w_off
+ if op.__class__.__name__ in ['ResizeByLong']:
+ reverse_list.append(('resize', (h, w)))
+ long_edge = max(h, w)
+ short_edge = min(h, w)
+ short_edge = int(round(short_edge * op.long_size / long_edge))
+ long_edge = op.long_size
+ if h > w:
+ h = long_edge
+ w = short_edge
+ else:
+ w = long_edge
+ h = short_edge
+ if op.__class__.__name__ in ['ResizeByShort']:
+ reverse_list.append(('resize', (h, w)))
+ long_edge = max(h, w)
+ short_edge = min(h, w)
+ long_edge = int(round(long_edge * op.short_size / short_edge))
+ short_edge = op.short_size
+ if h > w:
+ h = long_edge
+ w = short_edge
+ else:
+ w = long_edge
+ h = short_edge
+ if op.__class__.__name__ in ['Padding']:
+ reverse_list.append(('padding', (h, w)))
+ w, h = op.target_size[0], op.target_size[1]
+ if op.__class__.__name__ in ['PaddingByAspectRatio']:
+ reverse_list.append(('padding', (h, w)))
+ ratio = w / h
+ if ratio == op.aspect_ratio:
+ pass
+ elif ratio > op.aspect_ratio:
+ h = int(w / op.aspect_ratio)
+ else:
+ w = int(h * op.aspect_ratio)
+ if op.__class__.__name__ in ['LimitLong']:
+ long_edge = max(h, w)
+ short_edge = min(h, w)
+ if ((op.max_long is not None) and (long_edge > op.max_long)):
+ reverse_list.append(('resize', (h, w)))
+ long_edge = op.max_long
+ short_edge = int(round(short_edge * op.max_long / long_edge))
+ elif ((op.min_long is not None) and (long_edge < op.min_long)):
+ reverse_list.append(('resize', (h, w)))
+ long_edge = op.min_long
+ short_edge = int(round(short_edge * op.min_long / long_edge))
+ if h > w:
+ h = long_edge
+ w = short_edge
+ else:
+ w = long_edge
+ h = short_edge
+ return reverse_list
+
+
+def reverse_transform(pred: paddle.Tensor, ori_shape: list, transforms: Callable, mode: str = 'nearest') -> paddle.Tensor:
+ """recover pred to origin shape"""
+ reverse_list = get_reverse_list(ori_shape, transforms)
+ for item in reverse_list[::-1]:
+ if item[0] == 'resize':
+ h, w = item[1][0], item[1][1]
+ # if paddle.get_device() == 'cpu':
+ # pred = paddle.cast(pred, 'uint8')
+ # pred = F.interpolate(pred, (h, w), mode=mode)
+ # pred = paddle.cast(pred, 'int32')
+ # else:
+ pred = F.interpolate(pred, (h, w), mode=mode)
+ elif item[0] == 'crop':
+ up_h_off, down_h_off = item[1][0], item[1][1]
+ left_w_off, right_w_off = item[2][0], item[2][1]
+ pred = F.pad(
+ pred, [left_w_off, right_w_off, up_h_off, down_h_off],
+ value=0,
+ mode='constant',
+ data_format="NCHW")
+ elif item[0] == 'padding':
+ h, w = item[1][0], item[1][1]
+ pred = pred[:, :, 0:h, 0:w]
+ else:
+ raise Exception("Unexpected info '{}' in im_info".format(item[0]))
+ return pred
+
+
+class Crop:
+ """
+ crop an image from four forwards.
+
+ Args:
+ up_h_off (int, optional): The cut height for image from up to down. Default: 0.
+ down_h_off (int, optional): The cut height for image from down to up . Default: 0.
+ left_w_off (int, optional): The cut height for image from left to right. Default: 0.
+ right_w_off (int, optional): The cut width for image from right to left. Default: 0.
+ """
+
+ def __init__(self, up_h_off: int = 0, down_h_off: int = 0, left_w_off: int = 0, right_w_off: int = 0):
+ self.up_h_off = up_h_off
+ self.down_h_off = down_h_off
+ self.left_w_off = left_w_off
+ self.right_w_off = right_w_off
+
+ def __call__(self, im: np.ndarray, label: np.ndarray = None) -> Tuple[np.ndarray]:
+ if self.up_h_off < 0 or self.down_h_off < 0 or self.left_w_off < 0 or self.right_w_off < 0:
+ raise Exception(
+ "up_h_off, down_h_off, left_w_off, right_w_off must equal or greater zero"
+ )
+
+ if self.up_h_off > 0 and self.up_h_off < im.shape[0]:
+ im = im[self.up_h_off:, :, :]
+ if label is not None:
+ label = label[self.up_h_off:, :]
+
+ if self.down_h_off > 0 and self.down_h_off < im.shape[0]:
+ im = im[:-self.down_h_off, :, :]
+ if label is not None:
+ label = label[:-self.down_h_off, :]
+
+ if self.left_w_off > 0 and self.left_w_off < im.shape[1]:
+ im = im[:, self.left_w_off:, :]
+ if label is not None:
+ label = label[:, self.left_w_off:]
+
+ if self.right_w_off > 0 and self.right_w_off < im.shape[1]:
+ im = im[:, :-self.right_w_off, :]
+ if label is not None:
+ label = label[:, :-self.right_w_off]
+
+ if label is None:
+ return (im, )
+ else:
+ return (im, label)
+
+def cv2_to_base64(image: np.ndarray) -> str:
+ """
+ Convert data from BGR to base64 format.
+ """
+ data = cv2.imencode('.png', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+
+def base64_to_cv2(b64str: str) -> np.ndarray:
+ """
+ Convert data from base64 to BGR format.
+ """
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/README.md b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..8a3951ac11aed63c93fdb383f47537813ef5ea69
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/README.md
@@ -0,0 +1,186 @@
+# ginet_resnet101vd_ade20k
+
+|模型名称|ginet_resnet101vd_ade20k|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|ginet_resnet101vd|
+|数据集|ADE20K|
+|是否支持Fine-tuning|是|
+|模型大小|287MB|
+|指标|-|
+|最新更新日期|2021-12-14|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+ - Sample results:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[ginet](https://arxiv.org/pdf/2009.06160)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install ginet_resnet101vd_ade20k
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_ade20k')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ginet_resnet101vd_ade20k模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet101vd_ade20k', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_ade20k', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet101vd_ade20k
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet101vd_ade20k"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/README_en.md b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..b7d0b3e0fd095c589edfbe29fbb2a19cc3524d2e
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/README_en.md
@@ -0,0 +1,185 @@
+# ginet_resnet101vd_ade20k
+
+|Module Name|ginet_resnet101vd_ade20k|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|ginet_resnet101vd|
+|Dataset|ADE20K|
+|Fine-tuning supported or not|Yes|
+|Module Size|287MB|
+|Data indicators|-|
+|Latest update date|2021-12-14|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install ginet_resnet101vd_ade20k
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_ade20k')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the ginet_resnet101vd_ade20k model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet101vd_ade20k', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_ade20k', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet101vd_ade20k
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet101vd_ade20k"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/layers.py b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..7e46219fd671ed9834795c9881292eed787b990d
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/layers.py
@@ -0,0 +1,345 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(
+ self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ name: str = None):
+ super(ConvBNLayer, self).__init__()
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = AvgPool2D(
+ kernel_size=2, stride=2, padding=0, ceil_mode=True)
+ self._conv = Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False)
+
+ self._batch_norm = SyncBatchNorm(out_channels)
+ self._act_op = Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ """Residual bottleneck block"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ name: str = None):
+ super(BottleneckBlock, self).__init__()
+
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ name=name + "_branch2a")
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ name=name + "_branch2b")
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ name=name + "_branch2c")
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ if self.dilation > 1:
+ padding = self.dilation
+ y = F.pad(y, [padding, padding, padding, padding])
+
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = paddle.add(x=short, y=conv2)
+ y = F.relu(y)
+ return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+
+ Returns:
+ A callable object of Activation.
+
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+
+ Examples:
+
+ from paddleseg.models.common.activation import Activation
+
+ relu = Activation("relu")
+ print(relu)
+ #
+
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool= False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/module.py b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..4a7aff27e9b964b069c0c2be44ab719d2298591d
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/module.py
@@ -0,0 +1,309 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from paddleseg.utils import utils
+from paddleseg.models import layers
+
+from ginet_resnet101vd_ade20k.resnet import ResNet101_vd
+
+
+@moduleinfo(
+ name="ginet_resnet101vd_ade20k",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="GINetResnet101 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class GINetResNet101(nn.Layer):
+ """
+ The GINetResNet101 implementation based on PaddlePaddle.
+ The original article refers to
+ Wu, Tianyi, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, and Guodong Guo. "GINet: Graph interaction network for scene parsing." In European Conference on Computer Vision, pp. 34-51. Springer, Cham, 2020.
+ (https://arxiv.org/pdf/2009.06160).
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple, optional): Values in the tuple indicate the indices of output of backbone.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss.
+ If true, auxiliary loss will be added after LearningToDownsample module. Default: False.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
+ jpu (bool, optional)): whether to use jpu unit in the base forward. Default:True.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 150,
+ backbone_indices: Tuple[int]=(0, 1, 2, 3),
+ enable_auxiliary_loss: bool = True,
+ align_corners: bool = True,
+ jpu: bool = True,
+ pretrained: str = None):
+ super(GINetResNet101, self).__init__()
+ self.nclass = num_classes
+ self.aux = enable_auxiliary_loss
+ self.jpu = jpu
+
+ self.backbone = ResNet101_vd()
+ self.backbone_indices = backbone_indices
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+
+ self.jpu = layers.JPU([512, 1024, 2048], width=512) if jpu else None
+ self.head = GIHead(in_channels=2048, nclass=num_classes)
+
+ if self.aux:
+ self.auxlayer = layers.AuxLayer(
+ 1024, 1024 // 4, num_classes, bias_attr=False)
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def base_forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ feat_list = self.backbone(x)
+ c1, c2, c3, c4 = [feat_list[i] for i in self.backbone_indices]
+
+ if self.jpu:
+ return self.jpu(c1, c2, c3, c4)
+ else:
+ return c1, c2, c3, c4
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ _, _, h, w = x.shape
+ _, _, c3, c4 = self.base_forward(x)
+
+ logit_list = []
+ x, _ = self.head(c4)
+ logit_list.append(x)
+
+ if self.aux:
+ auxout = self.auxlayer(c3)
+
+ logit_list.append(auxout)
+
+ return [
+ F.interpolate(
+ logit, (h, w),
+ mode='bilinear',
+ align_corners=self.align_corners) for logit in logit_list
+ ]
+
+
+class GIHead(nn.Layer):
+ """The Graph Interaction Network head."""
+
+ def __init__(self, in_channels: int, nclass: int):
+ super().__init__()
+ self.nclass = nclass
+ inter_channels = in_channels // 4
+ self.inp = paddle.zeros(shape=(nclass, 300), dtype='float32')
+ self.inp = paddle.create_parameter(
+ shape=self.inp.shape,
+ dtype=str(self.inp.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.inp))
+
+ self.fc1 = nn.Sequential(
+ nn.Linear(300, 128), nn.BatchNorm1D(128), nn.ReLU())
+ self.fc2 = nn.Sequential(
+ nn.Linear(128, 256), nn.BatchNorm1D(256), nn.ReLU())
+ self.conv5 = layers.ConvBNReLU(
+ in_channels,
+ inter_channels,
+ 3,
+ padding=1,
+ bias_attr=False,
+ stride=1)
+
+ self.gloru = GlobalReasonUnit(
+ in_channels=inter_channels,
+ num_state=256,
+ num_node=84,
+ nclass=nclass)
+ self.conv6 = nn.Sequential(
+ nn.Dropout(0.1), nn.Conv2D(inter_channels, nclass, 1))
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ B, C, H, W = x.shape
+ inp = self.inp.detach()
+
+ inp = self.fc1(inp)
+ inp = self.fc2(inp).unsqueeze(axis=0).transpose((0, 2, 1))\
+ .expand((B, 256, self.nclass))
+
+ out = self.conv5(x)
+
+ out, se_out = self.gloru(out, inp)
+ out = self.conv6(out)
+ return out, se_out
+
+
+class GlobalReasonUnit(nn.Layer):
+ """
+ The original paper refers to:
+ Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks" (https://arxiv.org/abs/1811.12814)
+ """
+
+ def __init__(self, in_channels: int, num_state: int = 256, num_node: int = 84, nclass: int = 59):
+ super().__init__()
+ self.num_state = num_state
+ self.conv_theta = nn.Conv2D(
+ in_channels, num_node, kernel_size=1, stride=1, padding=0)
+ self.conv_phi = nn.Conv2D(
+ in_channels, num_state, kernel_size=1, stride=1, padding=0)
+ self.graph = GraphLayer(num_state, num_node, nclass)
+ self.extend_dim = nn.Conv2D(
+ num_state, in_channels, kernel_size=1, bias_attr=False)
+
+ self.bn = layers.SyncBatchNorm(in_channels)
+
+ def forward(self, x: paddle.Tensor, inp: paddle.Tensor) -> List[paddle.Tensor]:
+ B = self.conv_theta(x)
+ sizeB = B.shape
+ B = B.reshape((sizeB[0], sizeB[1], -1))
+
+ sizex = x.shape
+ x_reduce = self.conv_phi(x)
+ x_reduce = x_reduce.reshape((sizex[0], -1, sizex[2] * sizex[3]))\
+ .transpose((0, 2, 1))
+
+ V = paddle.bmm(B, x_reduce).transpose((0, 2, 1))
+ V = paddle.divide(
+ V, paddle.to_tensor([sizex[2] * sizex[3]], dtype='float32'))
+
+ class_node, new_V = self.graph(inp, V)
+ D = B.reshape((sizeB[0], -1, sizeB[2] * sizeB[3])).transpose((0, 2, 1))
+ Y = paddle.bmm(D, new_V.transpose((0, 2, 1)))
+ Y = Y.transpose((0, 2, 1)).reshape((sizex[0], self.num_state, \
+ sizex[2], -1))
+ Y = self.extend_dim(Y)
+ Y = self.bn(Y)
+ out = Y + x
+
+ return out, class_node
+
+
+class GraphLayer(nn.Layer):
+ def __init__(self, num_state: int, num_node: int, num_class: int):
+ super().__init__()
+ self.vis_gcn = GCN(num_state, num_node)
+ self.word_gcn = GCN(num_state, num_class)
+ self.transfer = GraphTransfer(num_state)
+ self.gamma_vis = paddle.zeros([num_node])
+ self.gamma_word = paddle.zeros([num_class])
+ self.gamma_vis = paddle.create_parameter(
+ shape=self.gamma_vis.shape,
+ dtype=str(self.gamma_vis.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_vis))
+ self.gamma_word = paddle.create_parameter(
+ shape=self.gamma_word.shape,
+ dtype=str(self.gamma_word.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_word))
+
+ def forward(self, inp: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ inp = self.word_gcn(inp)
+ new_V = self.vis_gcn(vis_node)
+ class_node, vis_node = self.transfer(inp, new_V)
+
+ class_node = self.gamma_word * inp + class_node
+ new_V = self.gamma_vis * vis_node + new_V
+ return class_node, new_V
+
+
+class GCN(nn.Layer):
+ def __init__(self, num_state: int = 128, num_node: int = 64, bias=False):
+ super().__init__()
+ self.conv1 = nn.Conv1D(
+ num_node,
+ num_node,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ )
+ self.relu = nn.ReLU()
+ self.conv2 = nn.Conv1D(
+ num_state,
+ num_state,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ bias_attr=bias)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ h = self.conv1(x.transpose((0, 2, 1))).transpose((0, 2, 1))
+ h = h + x
+ h = self.relu(h)
+ h = self.conv2(h)
+ return h
+
+
+class GraphTransfer(nn.Layer):
+ """Transfer vis graph to class node, transfer class node to vis feature"""
+
+ def __init__(self, in_dim: int):
+ super().__init__()
+ self.channle_in = in_dim
+ self.query_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.key_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.value_conv_vis = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.value_conv_word = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.softmax_vis = nn.Softmax(axis=-1)
+ self.softmax_word = nn.Softmax(axis=-2)
+
+ def forward(self, word: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ m_batchsize, C, Nc = word.shape
+ m_batchsize, C, Nn = vis_node.shape
+
+ proj_query = self.query_conv(word).reshape((m_batchsize, -1, Nc))\
+ .transpose((0, 2, 1))
+ proj_key = self.key_conv(vis_node).reshape((m_batchsize, -1, Nn))
+
+ energy = paddle.bmm(proj_query, proj_key)
+ attention_vis = self.softmax_vis(energy).transpose((0, 2, 1))
+ attention_word = self.softmax_word(energy)
+
+ proj_value_vis = self.value_conv_vis(vis_node).reshape((m_batchsize, -1,
+ Nn))
+ proj_value_word = self.value_conv_word(word).reshape((m_batchsize, -1,
+ Nc))
+
+ class_out = paddle.bmm(proj_value_vis, attention_vis)
+ node_out = paddle.bmm(proj_value_word, attention_word)
+ return class_out, node_out
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/resnet.py b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..e3e031f0e239a2d8e965596579ed16a5501b324f
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_ade20k/resnet.py
@@ -0,0 +1,136 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import ginet_resnet101vd_ade20k.layers as L
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ name: str = None):
+ super(BasicBlock, self).__init__()
+ self.stride = stride
+ self.conv0 = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ name=name + "_branch2a")
+ self.conv1 = L.ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ act=None,
+ name=name + "_branch2b")
+
+ if not shortcut:
+ self.short = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = paddle.elementwise_add(x=short, y=conv1, act='relu')
+
+ return y
+
+
+class ResNet101_vd(nn.Layer):
+ def __init__(self,
+ multi_grid: tuple = (1, 2, 4)):
+ super(ResNet101_vd, self).__init__()
+ depth = [3, 4, 23, 3]
+ num_channels = [64, 256, 512, 1024]
+ num_filters = [64, 128, 256, 512]
+ self.feat_channels = [c * 4 for c in num_filters]
+ dilation_dict = {2: 2, 3: 4}
+ self.conv1_1 = L.ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ name="conv1_1")
+ self.conv1_2 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_2")
+ self.conv1_3 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_3")
+ self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
+ self.stage_list = []
+
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ L.BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ name=conv_name,
+ dilation=dilation_rate))
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ y = self.pool2d_max(y)
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+ return feat_list
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/README.md b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..faa1a537b2e96f2af75ac81a9d6e5247fbe84379
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/README.md
@@ -0,0 +1,185 @@
+# ginet_resnet101vd_cityscapes
+
+|模型名称|ginet_resnet101vd_cityscapes|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|ginet_resnet101vd|
+|数据集|Cityscapes|
+|是否支持Fine-tuning|是|
+|模型大小|286MB|
+|指标|-|
+|最新更新日期|2021-12-14|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[ginet](https://arxiv.org/pdf/2009.06160)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install ginet_resnet101vd_cityscapes
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_cityscapes')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ginet_resnet101vd_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet101vd_cityscapes', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet101vd_cityscapes
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet101vd_cityscapes"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/README_en.md b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..2e09ff0c9121c1531b8f4892a3ae8b492b87019b
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/README_en.md
@@ -0,0 +1,185 @@
+# ginet_resnet101vd_cityscapes
+
+|Module Name|ginet_resnet101vd_cityscapes|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|ginet_resnet101vd|
+|Dataset|Cityscapes|
+|Fine-tuning supported or not|Yes|
+|Module Size|286MB|
+|Data indicators|-|
+|Latest update date|2021-12-14|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install ginet_resnet101vd_cityscapes
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_cityscapes')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the ginet_resnet101vd_cityscapes model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet101vd_cityscapes', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet101vd_cityscapes
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet101vd_cityscapes"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/layers.py b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..7e46219fd671ed9834795c9881292eed787b990d
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/layers.py
@@ -0,0 +1,345 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(
+ self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ name: str = None):
+ super(ConvBNLayer, self).__init__()
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = AvgPool2D(
+ kernel_size=2, stride=2, padding=0, ceil_mode=True)
+ self._conv = Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False)
+
+ self._batch_norm = SyncBatchNorm(out_channels)
+ self._act_op = Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ """Residual bottleneck block"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ name: str = None):
+ super(BottleneckBlock, self).__init__()
+
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ name=name + "_branch2a")
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ name=name + "_branch2b")
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ name=name + "_branch2c")
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ if self.dilation > 1:
+ padding = self.dilation
+ y = F.pad(y, [padding, padding, padding, padding])
+
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = paddle.add(x=short, y=conv2)
+ y = F.relu(y)
+ return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+
+ Returns:
+ A callable object of Activation.
+
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+
+ Examples:
+
+ from paddleseg.models.common.activation import Activation
+
+ relu = Activation("relu")
+ print(relu)
+ #
+
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool= False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/module.py b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..e135d4ab484a4bd9c7c81e6905d527680fe69a04
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/module.py
@@ -0,0 +1,308 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from paddleseg.utils import utils
+from paddleseg.models import layers
+
+from ginet_resnet101vd_cityscapes.resnet import ResNet101_vd
+
+
+@moduleinfo(
+ name="ginet_resnet101vd_cityscapes",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="GINetResnet101 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class GINetResNet101(nn.Layer):
+ """
+ The GINetResNet101 implementation based on PaddlePaddle.
+ The original article refers to
+ Wu, Tianyi, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, and Guodong Guo. "GINet: Graph interaction network for scene parsing." In European Conference on Computer Vision, pp. 34-51. Springer, Cham, 2020.
+ (https://arxiv.org/pdf/2009.06160).
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple, optional): Values in the tuple indicate the indices of output of backbone.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss.
+ If true, auxiliary loss will be added after LearningToDownsample module. Default: False.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
+ jpu (bool, optional)): whether to use jpu unit in the base forward. Default:True.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 19,
+ backbone_indices: Tuple[int]=(0, 1, 2, 3),
+ enable_auxiliary_loss: bool = True,
+ align_corners: bool = True,
+ jpu: bool = True,
+ pretrained: str = None):
+ super(GINetResNet101, self).__init__()
+ self.nclass = num_classes
+ self.aux = enable_auxiliary_loss
+ self.jpu = jpu
+
+ self.backbone = ResNet101_vd()
+ self.backbone_indices = backbone_indices
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+
+ self.jpu = layers.JPU([512, 1024, 2048], width=512) if jpu else None
+ self.head = GIHead(in_channels=2048, nclass=num_classes)
+
+ if self.aux:
+ self.auxlayer = layers.AuxLayer(
+ 1024, 1024 // 4, num_classes, bias_attr=False)
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def base_forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ feat_list = self.backbone(x)
+ c1, c2, c3, c4 = [feat_list[i] for i in self.backbone_indices]
+
+ if self.jpu:
+ return self.jpu(c1, c2, c3, c4)
+ else:
+ return c1, c2, c3, c4
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ _, _, h, w = x.shape
+ _, _, c3, c4 = self.base_forward(x)
+
+ logit_list = []
+ x, _ = self.head(c4)
+ logit_list.append(x)
+
+ if self.aux:
+ auxout = self.auxlayer(c3)
+
+ logit_list.append(auxout)
+
+ return [
+ F.interpolate(
+ logit, (h, w),
+ mode='bilinear',
+ align_corners=self.align_corners) for logit in logit_list
+ ]
+
+
+class GIHead(nn.Layer):
+ """The Graph Interaction Network head."""
+
+ def __init__(self, in_channels: int, nclass: int):
+ super().__init__()
+ self.nclass = nclass
+ inter_channels = in_channels // 4
+ self.inp = paddle.zeros(shape=(nclass, 300), dtype='float32')
+ self.inp = paddle.create_parameter(
+ shape=self.inp.shape,
+ dtype=str(self.inp.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.inp))
+
+ self.fc1 = nn.Sequential(
+ nn.Linear(300, 128), nn.BatchNorm1D(128), nn.ReLU())
+ self.fc2 = nn.Sequential(
+ nn.Linear(128, 256), nn.BatchNorm1D(256), nn.ReLU())
+ self.conv5 = layers.ConvBNReLU(
+ in_channels,
+ inter_channels,
+ 3,
+ padding=1,
+ bias_attr=False,
+ stride=1)
+
+ self.gloru = GlobalReasonUnit(
+ in_channels=inter_channels,
+ num_state=256,
+ num_node=84,
+ nclass=nclass)
+ self.conv6 = nn.Sequential(
+ nn.Dropout(0.1), nn.Conv2D(inter_channels, nclass, 1))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ B, C, H, W = x.shape
+ inp = self.inp.detach()
+
+ inp = self.fc1(inp)
+ inp = self.fc2(inp).unsqueeze(axis=0).transpose((0, 2, 1))\
+ .expand((B, 256, self.nclass))
+
+ out = self.conv5(x)
+
+ out, se_out = self.gloru(out, inp)
+ out = self.conv6(out)
+ return out, se_out
+
+
+class GlobalReasonUnit(nn.Layer):
+ """
+ The original paper refers to:
+ Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks" (https://arxiv.org/abs/1811.12814)
+ """
+
+ def __init__(self, in_channels: int, num_state: int = 256, num_node: int = 84, nclass: int = 59):
+ super().__init__()
+ self.num_state = num_state
+ self.conv_theta = nn.Conv2D(
+ in_channels, num_node, kernel_size=1, stride=1, padding=0)
+ self.conv_phi = nn.Conv2D(
+ in_channels, num_state, kernel_size=1, stride=1, padding=0)
+ self.graph = GraphLayer(num_state, num_node, nclass)
+ self.extend_dim = nn.Conv2D(
+ num_state, in_channels, kernel_size=1, bias_attr=False)
+
+ self.bn = layers.SyncBatchNorm(in_channels)
+
+ def forward(self, x: paddle.Tensor, inp: paddle.Tensor) -> paddle.Tensor:
+ B = self.conv_theta(x)
+ sizeB = B.shape
+ B = B.reshape((sizeB[0], sizeB[1], -1))
+
+ sizex = x.shape
+ x_reduce = self.conv_phi(x)
+ x_reduce = x_reduce.reshape((sizex[0], -1, sizex[2] * sizex[3]))\
+ .transpose((0, 2, 1))
+
+ V = paddle.bmm(B, x_reduce).transpose((0, 2, 1))
+ V = paddle.divide(
+ V, paddle.to_tensor([sizex[2] * sizex[3]], dtype='float32'))
+
+ class_node, new_V = self.graph(inp, V)
+ D = B.reshape((sizeB[0], -1, sizeB[2] * sizeB[3])).transpose((0, 2, 1))
+ Y = paddle.bmm(D, new_V.transpose((0, 2, 1)))
+ Y = Y.transpose((0, 2, 1)).reshape((sizex[0], self.num_state, \
+ sizex[2], -1))
+ Y = self.extend_dim(Y)
+ Y = self.bn(Y)
+ out = Y + x
+
+ return out, class_node
+
+
+class GraphLayer(nn.Layer):
+ def __init__(self, num_state: int, num_node: int, num_class: int):
+ super().__init__()
+ self.vis_gcn = GCN(num_state, num_node)
+ self.word_gcn = GCN(num_state, num_class)
+ self.transfer = GraphTransfer(num_state)
+ self.gamma_vis = paddle.zeros([num_node])
+ self.gamma_word = paddle.zeros([num_class])
+ self.gamma_vis = paddle.create_parameter(
+ shape=self.gamma_vis.shape,
+ dtype=str(self.gamma_vis.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_vis))
+ self.gamma_word = paddle.create_parameter(
+ shape=self.gamma_word.shape,
+ dtype=str(self.gamma_word.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_word))
+
+ def forward(self, inp: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ inp = self.word_gcn(inp)
+ new_V = self.vis_gcn(vis_node)
+ class_node, vis_node = self.transfer(inp, new_V)
+
+ class_node = self.gamma_word * inp + class_node
+ new_V = self.gamma_vis * vis_node + new_V
+ return class_node, new_V
+
+
+class GCN(nn.Layer):
+ def __init__(self, num_state: int = 128, num_node: int = 64, bias=False):
+ super().__init__()
+ self.conv1 = nn.Conv1D(
+ num_node,
+ num_node,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ )
+ self.relu = nn.ReLU()
+ self.conv2 = nn.Conv1D(
+ num_state,
+ num_state,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ bias_attr=bias)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ h = self.conv1(x.transpose((0, 2, 1))).transpose((0, 2, 1))
+ h = h + x
+ h = self.relu(h)
+ h = self.conv2(h)
+ return h
+
+
+class GraphTransfer(nn.Layer):
+ """Transfer vis graph to class node, transfer class node to vis feature"""
+
+ def __init__(self, in_dim: int):
+ super().__init__()
+ self.channle_in = in_dim
+ self.query_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.key_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.value_conv_vis = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.value_conv_word = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.softmax_vis = nn.Softmax(axis=-1)
+ self.softmax_word = nn.Softmax(axis=-2)
+
+ def forward(self, word: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ m_batchsize, C, Nc = word.shape
+ m_batchsize, C, Nn = vis_node.shape
+
+ proj_query = self.query_conv(word).reshape((m_batchsize, -1, Nc))\
+ .transpose((0, 2, 1))
+ proj_key = self.key_conv(vis_node).reshape((m_batchsize, -1, Nn))
+
+ energy = paddle.bmm(proj_query, proj_key)
+ attention_vis = self.softmax_vis(energy).transpose((0, 2, 1))
+ attention_word = self.softmax_word(energy)
+ proj_value_vis = self.value_conv_vis(vis_node).reshape((m_batchsize, -1,
+ Nn))
+ proj_value_word = self.value_conv_word(word).reshape((m_batchsize, -1,
+ Nc))
+
+ class_out = paddle.bmm(proj_value_vis, attention_vis)
+ node_out = paddle.bmm(proj_value_word, attention_word)
+ return class_out, node_out
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/resnet.py b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..6104fa44ac2286e3636960631768599e2467c336
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_cityscapes/resnet.py
@@ -0,0 +1,136 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import ginet_resnet101vd_cityscapes.layers as L
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ name: str = None):
+ super(BasicBlock, self).__init__()
+ self.stride = stride
+ self.conv0 = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ name=name + "_branch2a")
+ self.conv1 = L.ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ act=None,
+ name=name + "_branch2b")
+
+ if not shortcut:
+ self.short = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = paddle.elementwise_add(x=short, y=conv1, act='relu')
+
+ return y
+
+
+class ResNet101_vd(nn.Layer):
+ def __init__(self,
+ multi_grid: tuple = (1, 2, 4)):
+ super(ResNet101_vd, self).__init__()
+ depth = [3, 4, 23, 3]
+ num_channels = [64, 256, 512, 1024]
+ num_filters = [64, 128, 256, 512]
+ self.feat_channels = [c * 4 for c in num_filters]
+ dilation_dict = {2: 2, 3: 4}
+ self.conv1_1 = L.ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ name="conv1_1")
+ self.conv1_2 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_2")
+ self.conv1_3 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_3")
+ self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
+ self.stage_list = []
+
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ L.BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ name=conv_name,
+ dilation=dilation_rate))
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ y = self.pool2d_max(y)
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+ return feat_list
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_voc/README.md b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..41f95d112f885e3e5decb5854b35a71a99eba452
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/README.md
@@ -0,0 +1,185 @@
+# ginet_resnet101vd_voc
+
+|模型名称|ginet_resnet101vd_voc|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|ginet_resnet101vd|
+|数据集|PascalVOC2012|
+|是否支持Fine-tuning|是|
+|模型大小|286MB|
+|指标|-|
+|最新更新日期|2021-12-14|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[ginet](https://arxiv.org/pdf/2009.06160)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install ginet_resnet101vd_voc
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_voc')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ginet_resnet101vd_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet101vd_voc', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_voc', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet101vd_voc
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet101vd_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_voc/README_en.md b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..1bfc41ddd29da74e1df9da24cc23e0c65cf2a02f
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/README_en.md
@@ -0,0 +1,185 @@
+# ginet_resnet101vd_voc
+
+|Module Name|ginet_resnet101vd_voc|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|ginet_resnet101vd|
+|Dataset|PascalVOC2012|
+|Fine-tuning supported or not|Yes|
+|Module Size|286MB|
+|Data indicators|-|
+|Latest update date|2021-12-14|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install ginet_resnet101vd_voc
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_voc')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the ginet_resnet101vd_voc model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet101vd_voc', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='ttest_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet101vd_voc', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet101vd_voc
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet101vd_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_voc/layers.py b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..7e46219fd671ed9834795c9881292eed787b990d
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/layers.py
@@ -0,0 +1,345 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(
+ self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ name: str = None):
+ super(ConvBNLayer, self).__init__()
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = AvgPool2D(
+ kernel_size=2, stride=2, padding=0, ceil_mode=True)
+ self._conv = Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False)
+
+ self._batch_norm = SyncBatchNorm(out_channels)
+ self._act_op = Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ """Residual bottleneck block"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ name: str = None):
+ super(BottleneckBlock, self).__init__()
+
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ name=name + "_branch2a")
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ name=name + "_branch2b")
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ name=name + "_branch2c")
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ if self.dilation > 1:
+ padding = self.dilation
+ y = F.pad(y, [padding, padding, padding, padding])
+
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = paddle.add(x=short, y=conv2)
+ y = F.relu(y)
+ return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+
+ Returns:
+ A callable object of Activation.
+
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+
+ Examples:
+
+ from paddleseg.models.common.activation import Activation
+
+ relu = Activation("relu")
+ print(relu)
+ #
+
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool= False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_voc/module.py b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..19422e3e70d829be67d62256403812df93811e7e
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/module.py
@@ -0,0 +1,309 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from paddleseg.utils import utils
+from paddleseg.models import layers
+
+from ginet_resnet101vd_voc.resnet import ResNet101_vd
+
+
+@moduleinfo(
+ name="ginet_resnet101vd_voc",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="GINetResnet101 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class GINetResNet101(nn.Layer):
+ """
+ The GINetResNet101 implementation based on PaddlePaddle.
+ The original article refers to
+ Wu, Tianyi, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, and Guodong Guo. "GINet: Graph interaction network for scene parsing." In European Conference on Computer Vision, pp. 34-51. Springer, Cham, 2020.
+ (https://arxiv.org/pdf/2009.06160).
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple, optional): Values in the tuple indicate the indices of output of backbone.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss.
+ If true, auxiliary loss will be added after LearningToDownsample module. Default: False.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
+ jpu (bool, optional)): whether to use jpu unit in the base forward. Default:True.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 21,
+ backbone_indices: Tuple[int]=(0, 1, 2, 3),
+ enable_auxiliary_loss: bool = True,
+ align_corners: bool = True,
+ jpu: bool = True,
+ pretrained: str = None):
+ super(GINetResNet101, self).__init__()
+ self.nclass = num_classes
+ self.aux = enable_auxiliary_loss
+ self.jpu = jpu
+
+ self.backbone = ResNet101_vd()
+ self.backbone_indices = backbone_indices
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+
+ self.jpu = layers.JPU([512, 1024, 2048], width=512) if jpu else None
+ self.head = GIHead(in_channels=2048, nclass=num_classes)
+
+ if self.aux:
+ self.auxlayer = layers.AuxLayer(
+ 1024, 1024 // 4, num_classes, bias_attr=False)
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def base_forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ feat_list = self.backbone(x)
+ c1, c2, c3, c4 = [feat_list[i] for i in self.backbone_indices]
+
+ if self.jpu:
+ return self.jpu(c1, c2, c3, c4)
+ else:
+ return c1, c2, c3, c4
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ _, _, h, w = x.shape
+ _, _, c3, c4 = self.base_forward(x)
+
+ logit_list = []
+ x, _ = self.head(c4)
+ logit_list.append(x)
+
+ if self.aux:
+ auxout = self.auxlayer(c3)
+
+ logit_list.append(auxout)
+
+ return [
+ F.interpolate(
+ logit, (h, w),
+ mode='bilinear',
+ align_corners=self.align_corners) for logit in logit_list
+ ]
+
+
+class GIHead(nn.Layer):
+ """The Graph Interaction Network head."""
+
+ def __init__(self, in_channels: int, nclass: int):
+ super().__init__()
+ self.nclass = nclass
+ inter_channels = in_channels // 4
+ self.inp = paddle.zeros(shape=(nclass, 300), dtype='float32')
+ self.inp = paddle.create_parameter(
+ shape=self.inp.shape,
+ dtype=str(self.inp.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.inp))
+
+ self.fc1 = nn.Sequential(
+ nn.Linear(300, 128), nn.BatchNorm1D(128), nn.ReLU())
+ self.fc2 = nn.Sequential(
+ nn.Linear(128, 256), nn.BatchNorm1D(256), nn.ReLU())
+ self.conv5 = layers.ConvBNReLU(
+ in_channels,
+ inter_channels,
+ 3,
+ padding=1,
+ bias_attr=False,
+ stride=1)
+
+ self.gloru = GlobalReasonUnit(
+ in_channels=inter_channels,
+ num_state=256,
+ num_node=84,
+ nclass=nclass)
+ self.conv6 = nn.Sequential(
+ nn.Dropout(0.1), nn.Conv2D(inter_channels, nclass, 1))
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ B, C, H, W = x.shape
+ inp = self.inp.detach()
+
+ inp = self.fc1(inp)
+ inp = self.fc2(inp).unsqueeze(axis=0).transpose((0, 2, 1))\
+ .expand((B, 256, self.nclass))
+
+ out = self.conv5(x)
+
+ out, se_out = self.gloru(out, inp)
+ out = self.conv6(out)
+ return out, se_out
+
+
+class GlobalReasonUnit(nn.Layer):
+ """
+ The original paper refers to:
+ Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks" (https://arxiv.org/abs/1811.12814)
+ """
+
+ def __init__(self, in_channels: int, num_state: int = 256, num_node: int = 84, nclass: int = 59):
+ super().__init__()
+ self.num_state = num_state
+ self.conv_theta = nn.Conv2D(
+ in_channels, num_node, kernel_size=1, stride=1, padding=0)
+ self.conv_phi = nn.Conv2D(
+ in_channels, num_state, kernel_size=1, stride=1, padding=0)
+ self.graph = GraphLayer(num_state, num_node, nclass)
+ self.extend_dim = nn.Conv2D(
+ num_state, in_channels, kernel_size=1, bias_attr=False)
+
+ self.bn = layers.SyncBatchNorm(in_channels)
+
+ def forward(self, x: paddle.Tensor, inp: paddle.Tensor) -> List[paddle.Tensor]:
+ B = self.conv_theta(x)
+ sizeB = B.shape
+ B = B.reshape((sizeB[0], sizeB[1], -1))
+
+ sizex = x.shape
+ x_reduce = self.conv_phi(x)
+ x_reduce = x_reduce.reshape((sizex[0], -1, sizex[2] * sizex[3]))\
+ .transpose((0, 2, 1))
+
+ V = paddle.bmm(B, x_reduce).transpose((0, 2, 1))
+ V = paddle.divide(
+ V, paddle.to_tensor([sizex[2] * sizex[3]], dtype='float32'))
+
+ class_node, new_V = self.graph(inp, V)
+ D = B.reshape((sizeB[0], -1, sizeB[2] * sizeB[3])).transpose((0, 2, 1))
+ Y = paddle.bmm(D, new_V.transpose((0, 2, 1)))
+ Y = Y.transpose((0, 2, 1)).reshape((sizex[0], self.num_state, \
+ sizex[2], -1))
+ Y = self.extend_dim(Y)
+ Y = self.bn(Y)
+ out = Y + x
+
+ return out, class_node
+
+
+class GraphLayer(nn.Layer):
+ def __init__(self, num_state: int, num_node: int, num_class: int):
+ super().__init__()
+ self.vis_gcn = GCN(num_state, num_node)
+ self.word_gcn = GCN(num_state, num_class)
+ self.transfer = GraphTransfer(num_state)
+ self.gamma_vis = paddle.zeros([num_node])
+ self.gamma_word = paddle.zeros([num_class])
+ self.gamma_vis = paddle.create_parameter(
+ shape=self.gamma_vis.shape,
+ dtype=str(self.gamma_vis.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_vis))
+ self.gamma_word = paddle.create_parameter(
+ shape=self.gamma_word.shape,
+ dtype=str(self.gamma_word.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_word))
+
+ def forward(self, inp: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ inp = self.word_gcn(inp)
+ new_V = self.vis_gcn(vis_node)
+ class_node, vis_node = self.transfer(inp, new_V)
+
+ class_node = self.gamma_word * inp + class_node
+ new_V = self.gamma_vis * vis_node + new_V
+ return class_node, new_V
+
+
+class GCN(nn.Layer):
+ def __init__(self, num_state: int = 128, num_node: int = 64, bias=False):
+ super().__init__()
+ self.conv1 = nn.Conv1D(
+ num_node,
+ num_node,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ )
+ self.relu = nn.ReLU()
+ self.conv2 = nn.Conv1D(
+ num_state,
+ num_state,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ bias_attr=bias)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ h = self.conv1(x.transpose((0, 2, 1))).transpose((0, 2, 1))
+ h = h + x
+ h = self.relu(h)
+ h = self.conv2(h)
+ return h
+
+
+class GraphTransfer(nn.Layer):
+ """Transfer vis graph to class node, transfer class node to vis feature"""
+
+ def __init__(self, in_dim: int):
+ super().__init__()
+ self.channle_in = in_dim
+ self.query_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.key_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.value_conv_vis = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.value_conv_word = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.softmax_vis = nn.Softmax(axis=-1)
+ self.softmax_word = nn.Softmax(axis=-2)
+
+ def forward(self, word: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ m_batchsize, C, Nc = word.shape
+ m_batchsize, C, Nn = vis_node.shape
+
+ proj_query = self.query_conv(word).reshape((m_batchsize, -1, Nc))\
+ .transpose((0, 2, 1))
+ proj_key = self.key_conv(vis_node).reshape((m_batchsize, -1, Nn))
+
+ energy = paddle.bmm(proj_query, proj_key)
+ attention_vis = self.softmax_vis(energy).transpose((0, 2, 1))
+ attention_word = self.softmax_word(energy)
+
+ proj_value_vis = self.value_conv_vis(vis_node).reshape((m_batchsize, -1,
+ Nn))
+ proj_value_word = self.value_conv_word(word).reshape((m_batchsize, -1,
+ Nc))
+
+ class_out = paddle.bmm(proj_value_vis, attention_vis)
+ node_out = paddle.bmm(proj_value_word, attention_word)
+ return class_out, node_out
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet101vd_voc/resnet.py b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..4014d4f8932ba9e81cd5afb8ca81a73863197151
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet101vd_voc/resnet.py
@@ -0,0 +1,136 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import ginet_resnet101vd_voc.layers as L
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ name: str = None):
+ super(BasicBlock, self).__init__()
+ self.stride = stride
+ self.conv0 = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ name=name + "_branch2a")
+ self.conv1 = L.ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ act=None,
+ name=name + "_branch2b")
+
+ if not shortcut:
+ self.short = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = paddle.elementwise_add(x=short, y=conv1, act='relu')
+
+ return y
+
+
+class ResNet101_vd(nn.Layer):
+ def __init__(self,
+ multi_grid: tuple = (1, 2, 4)):
+ super(ResNet101_vd, self).__init__()
+ depth = [3, 4, 23, 3]
+ num_channels = [64, 256, 512, 1024]
+ num_filters = [64, 128, 256, 512]
+ self.feat_channels = [c * 4 for c in num_filters]
+ dilation_dict = {2: 2, 3: 4}
+ self.conv1_1 = L.ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ name="conv1_1")
+ self.conv1_2 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_2")
+ self.conv1_3 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_3")
+ self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
+ self.stage_list = []
+
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ L.BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ name=conv_name,
+ dilation=dilation_rate))
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ y = self.pool2d_max(y)
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+ return feat_list
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/README.md b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..341563f32cf13647472b2c0e7a8fd38f4d83adaa
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/README.md
@@ -0,0 +1,186 @@
+# ginet_resnet50vd_ade20k
+
+|模型名称|ginet_resnet50vd_ade20k|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|ginet_resnet50vd|
+|数据集|ADE20K|
+|是否支持Fine-tuning|是|
+|模型大小|214MB|
+|指标|-|
+|最新更新日期|2021-12-14|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+ - Sample results:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[ginet](https://arxiv.org/pdf/2009.06160)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install ginet_resnet50vd_ade20k
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_ade20k')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ginet_resnet50vd_ade20k模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet50vd_ade20k', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_ade20k', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet50vd_ade20k
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_ade20k"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/README_en.md b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..d9c1a26daaecc5b22e622146d67b2664700fca74
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/README_en.md
@@ -0,0 +1,185 @@
+# ginet_resnet50vd_ade20k
+
+|Module Name|ginet_resnet50vd_ade20k|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|ginet_resnet50vd|
+|Dataset|ADE20K|
+|Fine-tuning supported or not|Yes|
+|Module Size|214MB|
+|Data indicators|-|
+|Latest update date|2021-12-14|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install ginet_resnet50vd_ade20k
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_ade20k')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the ginet_resnet50vd_ade20k model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet50vd_ade20k', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_ade20k', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet50vd_ade20k
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_ade20k"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/layers.py b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..7e46219fd671ed9834795c9881292eed787b990d
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/layers.py
@@ -0,0 +1,345 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(
+ self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ name: str = None):
+ super(ConvBNLayer, self).__init__()
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = AvgPool2D(
+ kernel_size=2, stride=2, padding=0, ceil_mode=True)
+ self._conv = Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False)
+
+ self._batch_norm = SyncBatchNorm(out_channels)
+ self._act_op = Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ """Residual bottleneck block"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ name: str = None):
+ super(BottleneckBlock, self).__init__()
+
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ name=name + "_branch2a")
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ name=name + "_branch2b")
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ name=name + "_branch2c")
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ if self.dilation > 1:
+ padding = self.dilation
+ y = F.pad(y, [padding, padding, padding, padding])
+
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = paddle.add(x=short, y=conv2)
+ y = F.relu(y)
+ return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+
+ Returns:
+ A callable object of Activation.
+
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+
+ Examples:
+
+ from paddleseg.models.common.activation import Activation
+
+ relu = Activation("relu")
+ print(relu)
+ #
+
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool= False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/module.py b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..79ce4d0f070472b989c5a83b6f2542bd66f550fc
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/module.py
@@ -0,0 +1,309 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from paddleseg.utils import utils
+from paddleseg.models import layers
+
+from ginet_resnet50vd_ade20k.resnet import ResNet50_vd
+
+
+@moduleinfo(
+ name="ginet_resnet50vd_ade20k",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="GINetResnet50 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class GINetResNet50(nn.Layer):
+ """
+ The GINetResNet50 implementation based on PaddlePaddle.
+ The original article refers to
+ Wu, Tianyi, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, and Guodong Guo. "GINet: Graph interaction network for scene parsing." In European Conference on Computer Vision, pp. 34-51. Springer, Cham, 2020.
+ (https://arxiv.org/pdf/2009.06160).
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple, optional): Values in the tuple indicate the indices of output of backbone.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss.
+ If true, auxiliary loss will be added after LearningToDownsample module. Default: False.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
+ jpu (bool, optional)): whether to use jpu unit in the base forward. Default:True.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 150,
+ backbone_indices: Tuple[int]=(0, 1, 2, 3),
+ enable_auxiliary_loss: bool = True,
+ align_corners: bool = True,
+ jpu: bool = True,
+ pretrained: str = None):
+ super(GINetResNet50, self).__init__()
+ self.nclass = num_classes
+ self.aux = enable_auxiliary_loss
+ self.jpu = jpu
+
+ self.backbone = ResNet50_vd()
+ self.backbone_indices = backbone_indices
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+
+ self.jpu = layers.JPU([512, 1024, 2048], width=512) if jpu else None
+ self.head = GIHead(in_channels=2048, nclass=num_classes)
+
+ if self.aux:
+ self.auxlayer = layers.AuxLayer(
+ 1024, 1024 // 4, num_classes, bias_attr=False)
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def base_forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ feat_list = self.backbone(x)
+ c1, c2, c3, c4 = [feat_list[i] for i in self.backbone_indices]
+
+ if self.jpu:
+ return self.jpu(c1, c2, c3, c4)
+ else:
+ return c1, c2, c3, c4
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ _, _, h, w = x.shape
+ _, _, c3, c4 = self.base_forward(x)
+
+ logit_list = []
+ x, _ = self.head(c4)
+ logit_list.append(x)
+
+ if self.aux:
+ auxout = self.auxlayer(c3)
+
+ logit_list.append(auxout)
+
+ return [
+ F.interpolate(
+ logit, (h, w),
+ mode='bilinear',
+ align_corners=self.align_corners) for logit in logit_list
+ ]
+
+
+class GIHead(nn.Layer):
+ """The Graph Interaction Network head."""
+
+ def __init__(self, in_channels: int, nclass: int):
+ super().__init__()
+ self.nclass = nclass
+ inter_channels = in_channels // 4
+ self.inp = paddle.zeros(shape=(nclass, 300), dtype='float32')
+ self.inp = paddle.create_parameter(
+ shape=self.inp.shape,
+ dtype=str(self.inp.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.inp))
+
+ self.fc1 = nn.Sequential(
+ nn.Linear(300, 128), nn.BatchNorm1D(128), nn.ReLU())
+ self.fc2 = nn.Sequential(
+ nn.Linear(128, 256), nn.BatchNorm1D(256), nn.ReLU())
+ self.conv5 = layers.ConvBNReLU(
+ in_channels,
+ inter_channels,
+ 3,
+ padding=1,
+ bias_attr=False,
+ stride=1)
+
+ self.gloru = GlobalReasonUnit(
+ in_channels=inter_channels,
+ num_state=256,
+ num_node=84,
+ nclass=nclass)
+ self.conv6 = nn.Sequential(
+ nn.Dropout(0.1), nn.Conv2D(inter_channels, nclass, 1))
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ B, C, H, W = x.shape
+ inp = self.inp.detach()
+
+ inp = self.fc1(inp)
+ inp = self.fc2(inp).unsqueeze(axis=0).transpose((0, 2, 1))\
+ .expand((B, 256, self.nclass))
+
+ out = self.conv5(x)
+
+ out, se_out = self.gloru(out, inp)
+ out = self.conv6(out)
+ return out, se_out
+
+
+class GlobalReasonUnit(nn.Layer):
+ """
+ The original paper refers to:
+ Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks" (https://arxiv.org/abs/1811.12814)
+ """
+
+ def __init__(self, in_channels: int, num_state: int = 256, num_node: int = 84, nclass: int = 59):
+ super().__init__()
+ self.num_state = num_state
+ self.conv_theta = nn.Conv2D(
+ in_channels, num_node, kernel_size=1, stride=1, padding=0)
+ self.conv_phi = nn.Conv2D(
+ in_channels, num_state, kernel_size=1, stride=1, padding=0)
+ self.graph = GraphLayer(num_state, num_node, nclass)
+ self.extend_dim = nn.Conv2D(
+ num_state, in_channels, kernel_size=1, bias_attr=False)
+
+ self.bn = layers.SyncBatchNorm(in_channels)
+
+ def forward(self, x: paddle.Tensor, inp:paddle.Tensor) -> List[paddle.Tensor]:
+ B = self.conv_theta(x)
+ sizeB = B.shape
+ B = B.reshape((sizeB[0], sizeB[1], -1))
+
+ sizex = x.shape
+ x_reduce = self.conv_phi(x)
+ x_reduce = x_reduce.reshape((sizex[0], -1, sizex[2] * sizex[3]))\
+ .transpose((0, 2, 1))
+
+ V = paddle.bmm(B, x_reduce).transpose((0, 2, 1))
+ V = paddle.divide(
+ V, paddle.to_tensor([sizex[2] * sizex[3]], dtype='float32'))
+
+ class_node, new_V = self.graph(inp, V)
+ D = B.reshape((sizeB[0], -1, sizeB[2] * sizeB[3])).transpose((0, 2, 1))
+ Y = paddle.bmm(D, new_V.transpose((0, 2, 1)))
+ Y = Y.transpose((0, 2, 1)).reshape((sizex[0], self.num_state, \
+ sizex[2], -1))
+ Y = self.extend_dim(Y)
+ Y = self.bn(Y)
+ out = Y + x
+
+ return out, class_node
+
+
+class GraphLayer(nn.Layer):
+ def __init__(self, num_state: int, num_node: int, num_class: int):
+ super().__init__()
+ self.vis_gcn = GCN(num_state, num_node)
+ self.word_gcn = GCN(num_state, num_class)
+ self.transfer = GraphTransfer(num_state)
+ self.gamma_vis = paddle.zeros([num_node])
+ self.gamma_word = paddle.zeros([num_class])
+ self.gamma_vis = paddle.create_parameter(
+ shape=self.gamma_vis.shape,
+ dtype=str(self.gamma_vis.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_vis))
+ self.gamma_word = paddle.create_parameter(
+ shape=self.gamma_word.shape,
+ dtype=str(self.gamma_word.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_word))
+
+ def forward(self, inp: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ inp = self.word_gcn(inp)
+ new_V = self.vis_gcn(vis_node)
+ class_node, vis_node = self.transfer(inp, new_V)
+
+ class_node = self.gamma_word * inp + class_node
+ new_V = self.gamma_vis * vis_node + new_V
+ return class_node, new_V
+
+
+class GCN(nn.Layer):
+ def __init__(self, num_state: int = 128, num_node: int = 64, bias: bool = False):
+ super().__init__()
+ self.conv1 = nn.Conv1D(
+ num_node,
+ num_node,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ )
+ self.relu = nn.ReLU()
+ self.conv2 = nn.Conv1D(
+ num_state,
+ num_state,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ bias_attr=bias)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ h = self.conv1(x.transpose((0, 2, 1))).transpose((0, 2, 1))
+ h = h + x
+ h = self.relu(h)
+ h = self.conv2(h)
+ return h
+
+
+class GraphTransfer(nn.Layer):
+ """Transfer vis graph to class node, transfer class node to vis feature"""
+
+ def __init__(self, in_dim: int):
+ super().__init__()
+ self.channle_in = in_dim
+ self.query_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.key_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.value_conv_vis = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.value_conv_word = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.softmax_vis = nn.Softmax(axis=-1)
+ self.softmax_word = nn.Softmax(axis=-2)
+
+ def forward(self, word: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ m_batchsize, C, Nc = word.shape
+ m_batchsize, C, Nn = vis_node.shape
+
+ proj_query = self.query_conv(word).reshape((m_batchsize, -1, Nc))\
+ .transpose((0, 2, 1))
+ proj_key = self.key_conv(vis_node).reshape((m_batchsize, -1, Nn))
+
+ energy = paddle.bmm(proj_query, proj_key)
+ attention_vis = self.softmax_vis(energy).transpose((0, 2, 1))
+ attention_word = self.softmax_word(energy)
+
+ proj_value_vis = self.value_conv_vis(vis_node).reshape((m_batchsize, -1,
+ Nn))
+ proj_value_word = self.value_conv_word(word).reshape((m_batchsize, -1,
+ Nc))
+
+ class_out = paddle.bmm(proj_value_vis, attention_vis)
+ node_out = paddle.bmm(proj_value_word, attention_word)
+ return class_out, node_out
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/resnet.py b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..d6e376ddca8c01569f1f20d0e25ec3e9fa513922
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_ade20k/resnet.py
@@ -0,0 +1,137 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import ginet_resnet50vd_ade20k.layers as L
+
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ name: str = None):
+ super(BasicBlock, self).__init__()
+ self.stride = stride
+ self.conv0 = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ name=name + "_branch2a")
+ self.conv1 = L.ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ act=None,
+ name=name + "_branch2b")
+
+ if not shortcut:
+ self.short = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = paddle.elementwise_add(x=short, y=conv1, act='relu')
+
+ return y
+
+
+class ResNet50_vd(nn.Layer):
+ def __init__(self,
+ multi_grid: tuple = (1, 2, 4)):
+ super(ResNet50_vd, self).__init__()
+ depth = [3, 4, 6, 3]
+ num_channels = [64, 256, 512, 1024]
+ num_filters = [64, 128, 256, 512]
+ self.feat_channels = [c * 4 for c in num_filters]
+ dilation_dict = {2: 2, 3: 4}
+ self.conv1_1 = L.ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ name="conv1_1")
+ self.conv1_2 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_2")
+ self.conv1_3 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_3")
+ self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
+ self.stage_list = []
+
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ L.BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ name=conv_name,
+ dilation=dilation_rate))
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ y = self.pool2d_max(y)
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+ return feat_list
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/README.md b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..849f47627fa1e5c3c2150188981e9aff32737ae8
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/README.md
@@ -0,0 +1,185 @@
+# ginet_resnet50vd_cityscapes
+
+|模型名称|ginet_resnet50vd_cityscapes|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|ginet_resnet50vd|
+|数据集|Cityscapes|
+|是否支持Fine-tuning|是|
+|模型大小|214MB|
+|指标|-|
+|最新更新日期|2021-12-14|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[ginet](https://arxiv.org/pdf/2009.06160)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install ginet_resnet50vd_cityscapes
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_cityscapes')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ginet_resnet50vd_cityscapes模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet50vd_cityscapes', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet50vd_cityscapes
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_cityscapes"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/README_en.md b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..b265ee908f2476008405d2f548f8f029a81775a0
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/README_en.md
@@ -0,0 +1,185 @@
+# ginet_resnet50vd_cityscapes
+
+|Module Name|ginet_resnet50vd_cityscapes|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|ginet_resnet50vd|
+|Dataset|Cityscapes|
+|Fine-tuning supported or not|Yes|
+|Module Size|214MB|
+|Data indicators|-|
+|Latest update date|2021-12-14|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install ginet_resnet50vd_cityscapes
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_cityscapes')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the ginet_resnet50vd_cityscapes model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet50vd_cityscapes', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_cityscapes', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet50vd_cityscapes
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_cityscapes"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/layers.py b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..7e46219fd671ed9834795c9881292eed787b990d
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/layers.py
@@ -0,0 +1,345 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(
+ self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ name: str = None):
+ super(ConvBNLayer, self).__init__()
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = AvgPool2D(
+ kernel_size=2, stride=2, padding=0, ceil_mode=True)
+ self._conv = Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False)
+
+ self._batch_norm = SyncBatchNorm(out_channels)
+ self._act_op = Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ """Residual bottleneck block"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ name: str = None):
+ super(BottleneckBlock, self).__init__()
+
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ name=name + "_branch2a")
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ name=name + "_branch2b")
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ name=name + "_branch2c")
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ if self.dilation > 1:
+ padding = self.dilation
+ y = F.pad(y, [padding, padding, padding, padding])
+
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = paddle.add(x=short, y=conv2)
+ y = F.relu(y)
+ return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+
+ Returns:
+ A callable object of Activation.
+
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+
+ Examples:
+
+ from paddleseg.models.common.activation import Activation
+
+ relu = Activation("relu")
+ print(relu)
+ #
+
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool= False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/module.py b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..1dac751bca852b3ee9ae247248b19c878d44365e
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/module.py
@@ -0,0 +1,309 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from paddleseg.utils import utils
+from paddleseg.models import layers
+
+from ginet_resnet50vd_cityscapes.resnet import ResNet50_vd
+
+
+@moduleinfo(
+ name="ginet_resnet50vd_cityscapes",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="GINetResnet50 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class GINetResNet50(nn.Layer):
+ """
+ The GINetResNet50 implementation based on PaddlePaddle.
+ The original article refers to
+ Wu, Tianyi, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, and Guodong Guo. "GINet: Graph interaction network for scene parsing." In European Conference on Computer Vision, pp. 34-51. Springer, Cham, 2020.
+ (https://arxiv.org/pdf/2009.06160).
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple, optional): Values in the tuple indicate the indices of output of backbone.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss.
+ If true, auxiliary loss will be added after LearningToDownsample module. Default: False.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
+ jpu (bool, optional)): whether to use jpu unit in the base forward. Default:True.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 19,
+ backbone_indices: Tuple[int]=(0, 1, 2, 3),
+ enable_auxiliary_loss: bool = True,
+ align_corners: bool = True,
+ jpu: bool = True,
+ pretrained: str = None):
+ super(GINetResNet50, self).__init__()
+ self.nclass = num_classes
+ self.aux = enable_auxiliary_loss
+ self.jpu = jpu
+
+ self.backbone = ResNet50_vd()
+ self.backbone_indices = backbone_indices
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+
+ self.jpu = layers.JPU([512, 1024, 2048], width=512) if jpu else None
+ self.head = GIHead(in_channels=2048, nclass=num_classes)
+
+ if self.aux:
+ self.auxlayer = layers.AuxLayer(
+ 1024, 1024 // 4, num_classes, bias_attr=False)
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def base_forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ feat_list = self.backbone(x)
+ c1, c2, c3, c4 = [feat_list[i] for i in self.backbone_indices]
+
+ if self.jpu:
+ return self.jpu(c1, c2, c3, c4)
+ else:
+ return c1, c2, c3, c4
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ _, _, h, w = x.shape
+ _, _, c3, c4 = self.base_forward(x)
+
+ logit_list = []
+ x, _ = self.head(c4)
+ logit_list.append(x)
+
+ if self.aux:
+ auxout = self.auxlayer(c3)
+
+ logit_list.append(auxout)
+
+ return [
+ F.interpolate(
+ logit, (h, w),
+ mode='bilinear',
+ align_corners=self.align_corners) for logit in logit_list
+ ]
+
+
+class GIHead(nn.Layer):
+ """The Graph Interaction Network head."""
+
+ def __init__(self, in_channels: int, nclass: int):
+ super().__init__()
+ self.nclass = nclass
+ inter_channels = in_channels // 4
+ self.inp = paddle.zeros(shape=(nclass, 300), dtype='float32')
+ self.inp = paddle.create_parameter(
+ shape=self.inp.shape,
+ dtype=str(self.inp.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.inp))
+
+ self.fc1 = nn.Sequential(
+ nn.Linear(300, 128), nn.BatchNorm1D(128), nn.ReLU())
+ self.fc2 = nn.Sequential(
+ nn.Linear(128, 256), nn.BatchNorm1D(256), nn.ReLU())
+ self.conv5 = layers.ConvBNReLU(
+ in_channels,
+ inter_channels,
+ 3,
+ padding=1,
+ bias_attr=False,
+ stride=1)
+
+ self.gloru = GlobalReasonUnit(
+ in_channels=inter_channels,
+ num_state=256,
+ num_node=84,
+ nclass=nclass)
+ self.conv6 = nn.Sequential(
+ nn.Dropout(0.1), nn.Conv2D(inter_channels, nclass, 1))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ B, C, H, W = x.shape
+ inp = self.inp.detach()
+
+ inp = self.fc1(inp)
+ inp = self.fc2(inp).unsqueeze(axis=0).transpose((0, 2, 1))\
+ .expand((B, 256, self.nclass))
+
+ out = self.conv5(x)
+
+ out, se_out = self.gloru(out, inp)
+ out = self.conv6(out)
+ return out, se_out
+
+
+class GlobalReasonUnit(nn.Layer):
+ """
+ The original paper refers to:
+ Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks" (https://arxiv.org/abs/1811.12814)
+ """
+
+ def __init__(self, in_channels: int, num_state: int = 256, num_node: int = 84, nclass: int = 59):
+ super().__init__()
+ self.num_state = num_state
+ self.conv_theta = nn.Conv2D(
+ in_channels, num_node, kernel_size=1, stride=1, padding=0)
+ self.conv_phi = nn.Conv2D(
+ in_channels, num_state, kernel_size=1, stride=1, padding=0)
+ self.graph = GraphLayer(num_state, num_node, nclass)
+ self.extend_dim = nn.Conv2D(
+ num_state, in_channels, kernel_size=1, bias_attr=False)
+
+ self.bn = layers.SyncBatchNorm(in_channels)
+
+ def forward(self, x: paddle.Tensor, inp: paddle.Tensor) -> paddle.Tensor:
+ B = self.conv_theta(x)
+ sizeB = B.shape
+ B = B.reshape((sizeB[0], sizeB[1], -1))
+
+ sizex = x.shape
+ x_reduce = self.conv_phi(x)
+ x_reduce = x_reduce.reshape((sizex[0], -1, sizex[2] * sizex[3]))\
+ .transpose((0, 2, 1))
+
+ V = paddle.bmm(B, x_reduce).transpose((0, 2, 1))
+ V = paddle.divide(
+ V, paddle.to_tensor([sizex[2] * sizex[3]], dtype='float32'))
+
+ class_node, new_V = self.graph(inp, V)
+ D = B.reshape((sizeB[0], -1, sizeB[2] * sizeB[3])).transpose((0, 2, 1))
+ Y = paddle.bmm(D, new_V.transpose((0, 2, 1)))
+ Y = Y.transpose((0, 2, 1)).reshape((sizex[0], self.num_state, \
+ sizex[2], -1))
+ Y = self.extend_dim(Y)
+ Y = self.bn(Y)
+ out = Y + x
+
+ return out, class_node
+
+
+class GraphLayer(nn.Layer):
+ def __init__(self, num_state: int, num_node: int, num_class: int):
+ super().__init__()
+ self.vis_gcn = GCN(num_state, num_node)
+ self.word_gcn = GCN(num_state, num_class)
+ self.transfer = GraphTransfer(num_state)
+ self.gamma_vis = paddle.zeros([num_node])
+ self.gamma_word = paddle.zeros([num_class])
+ self.gamma_vis = paddle.create_parameter(
+ shape=self.gamma_vis.shape,
+ dtype=str(self.gamma_vis.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_vis))
+ self.gamma_word = paddle.create_parameter(
+ shape=self.gamma_word.shape,
+ dtype=str(self.gamma_word.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_word))
+
+ def forward(self, inp: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ inp = self.word_gcn(inp)
+ new_V = self.vis_gcn(vis_node)
+ class_node, vis_node = self.transfer(inp, new_V)
+
+ class_node = self.gamma_word * inp + class_node
+ new_V = self.gamma_vis * vis_node + new_V
+ return class_node, new_V
+
+
+class GCN(nn.Layer):
+ def __init__(self, num_state: int = 128, num_node: int = 64, bias: bool = False):
+ super().__init__()
+ self.conv1 = nn.Conv1D(
+ num_node,
+ num_node,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ )
+ self.relu = nn.ReLU()
+ self.conv2 = nn.Conv1D(
+ num_state,
+ num_state,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ bias_attr=bias)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ h = self.conv1(x.transpose((0, 2, 1))).transpose((0, 2, 1))
+ h = h + x
+ h = self.relu(h)
+ h = self.conv2(h)
+ return h
+
+
+class GraphTransfer(nn.Layer):
+ """Transfer vis graph to class node, transfer class node to vis feature"""
+
+ def __init__(self, in_dim: int):
+ super().__init__()
+ self.channle_in = in_dim
+ self.query_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.key_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.value_conv_vis = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.value_conv_word = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.softmax_vis = nn.Softmax(axis=-1)
+ self.softmax_word = nn.Softmax(axis=-2)
+
+ def forward(self, word: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ m_batchsize, C, Nc = word.shape
+ m_batchsize, C, Nn = vis_node.shape
+
+ proj_query = self.query_conv(word).reshape((m_batchsize, -1, Nc))\
+ .transpose((0, 2, 1))
+ proj_key = self.key_conv(vis_node).reshape((m_batchsize, -1, Nn))
+
+ energy = paddle.bmm(proj_query, proj_key)
+ attention_vis = self.softmax_vis(energy).transpose((0, 2, 1))
+ attention_word = self.softmax_word(energy)
+
+ proj_value_vis = self.value_conv_vis(vis_node).reshape((m_batchsize, -1,
+ Nn))
+ proj_value_word = self.value_conv_word(word).reshape((m_batchsize, -1,
+ Nc))
+
+ class_out = paddle.bmm(proj_value_vis, attention_vis)
+ node_out = paddle.bmm(proj_value_word, attention_word)
+ return class_out, node_out
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/resnet.py b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..d526b26991ff72083d7431971608b8a489f60df9
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_cityscapes/resnet.py
@@ -0,0 +1,137 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import ginet_resnet50vd_cityscapes.layers as L
+
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ name: str = None):
+ super(BasicBlock, self).__init__()
+ self.stride = stride
+ self.conv0 = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ name=name + "_branch2a")
+ self.conv1 = L.ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ act=None,
+ name=name + "_branch2b")
+
+ if not shortcut:
+ self.short = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = paddle.elementwise_add(x=short, y=conv1, act='relu')
+
+ return y
+
+
+class ResNet50_vd(nn.Layer):
+ def __init__(self,
+ multi_grid: tuple = (1, 2, 4)):
+ super(ResNet50_vd, self).__init__()
+ depth = [3, 4, 6, 3]
+ num_channels = [64, 256, 512, 1024]
+ num_filters = [64, 128, 256, 512]
+ self.feat_channels = [c * 4 for c in num_filters]
+ dilation_dict = {2: 2, 3: 4}
+ self.conv1_1 = L.ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ name="conv1_1")
+ self.conv1_2 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_2")
+ self.conv1_3 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_3")
+ self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
+ self.stage_list = []
+
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ L.BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ name=conv_name,
+ dilation=dilation_rate))
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ y = self.pool2d_max(y)
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+ return feat_list
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_voc/README.md b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..e0f1d605c5f8f87c1ad56d6c12b3a1384a514720
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/README.md
@@ -0,0 +1,185 @@
+# ginet_resnet50vd_voc
+
+|模型名称|ginet_resnet50vd_voc|
+| :--- | :---: |
+|类别|图像-图像分割|
+|网络|ginet_resnet50vd|
+|数据集|PascalVOC2012|
+|是否支持Fine-tuning|是|
+|模型大小|214MB|
+|指标|-|
+|最新更新日期|2021-12-14|
+
+## 一、模型基本信息
+
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本示例将展示如何使用PaddleHub对预训练模型进行finetune并完成预测任务。
+ - 更多详情请参考:[ginet](https://arxiv.org/pdf/2009.06160)
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install ginet_resnet50vd_voc
+ ```
+
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+
+## 三、模型API预测
+
+- ### 1.预测代码示例
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_voc')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.如何开始Fine-tune
+
+ - 在完成安装PaddlePaddle与PaddleHub后,通过执行`python train.py`即可开始使用ginet_resnet50vd_voc模型对OpticDiscSeg数据集进行Fine-tune。 `train.py`内容如下:
+
+ - 代码步骤
+
+ - Step1: 定义数据预处理方式
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms` 数据增强模块定义了丰富的针对图像分割数据的预处理方式,用户可按照需求替换自己需要的数据预处理方式。
+
+ - Step2: 下载数据集并使用
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ - `transforms`: 数据预处理方式。
+ - `mode`: `mode`: 选择数据模式,可选项有 `train`, `test`, `val`, 默认为`train`。
+
+ - 数据集的准备代码可以参考 [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录。
+
+ - Step3: 加载预训练模型
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet50vd_voc', num_classes=2, pretrained=None)
+ ```
+ - `name`: 选择预训练模型的名字。
+ - `load_checkpoint`: 是否加载自己训练的模型,若为None,则加载提供的模型默认参数。
+
+ - Step4: 选择优化策略和运行配置
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - 模型预测
+
+ - 当完成Fine-tune后,Fine-tune过程在验证集上表现最优的模型会被保存在`${CHECKPOINT_DIR}/best_model`目录下,其中`${CHECKPOINT_DIR}`目录为Fine-tune时所选择的保存checkpoint的目录。我们使用该模型来进行预测。predict.py脚本如下:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_voc', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - 参数配置正确后,请执行脚本`python predict.py`。
+
+ - **Args**
+ * `images`:原始图像路径或BGR格式图片;
+ * `visualization`: 是否可视化,默认为True;
+ * `save_path`: 保存结果的路径,默认保存路径为'seg_result'。
+
+ **NOTE:** 进行预测时,所选择的module,checkpoint_dir,dataset必须和Fine-tune所用的一样。
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像分割服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet50vd_voc
+ ```
+
+ - 这样就完成了一个图像分割服务化API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ # 发送HTTP请求
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_voc/README_en.md b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..71bba22353984fa84150ed687c9432db6ba0da65
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/README_en.md
@@ -0,0 +1,185 @@
+# ginet_resnet50vd_voc
+
+|Module Name|ginet_resnet50vd_voc|
+| :--- | :---: |
+|Category|Image Segmentation|
+|Network|ginet_resnet50vd|
+|Dataset|PascalVOC2012|
+|Fine-tuning supported or not|Yes|
+|Module Size|214MB|
+|Data indicators|-|
+|Latest update date|2021-12-14|
+
+## I. Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+
+- ### Module Introduction
+
+ - We will show how to use PaddleHub to finetune the pre-trained model and complete the prediction.
+ - For more information, please refer to: [ginet](https://arxiv.org/pdf/2009.06160)
+
+## II. Installation
+
+- ### 1、Environmental Dependence
+
+ - paddlepaddle >= 2.0.0
+
+ - paddlehub >= 2.0.0
+
+- ### 2、Installation
+
+ - ```shell
+ $ hub install ginet_resnet50vd_voc
+ ```
+
+ - In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md)
+ | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
+
+
+## III. Module API Prediction
+
+- ### 1、Prediction Code Example
+
+
+ - ```python
+ import cv2
+ import paddle
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_voc')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ result = model.predict(images=[img], visualization=True)
+ ```
+
+- ### 2.Fine-tune and Encapsulation
+
+ - After completing the installation of PaddlePaddle and PaddleHub, you can start using the ginet_resnet50vd_voc model to fine-tune datasets such as OpticDiscSeg.
+
+ - Steps:
+
+ - Step1: Define the data preprocessing method
+
+ - ```python
+ from paddlehub.vision.segmentation_transforms import Compose, Resize, Normalize
+
+ transform = Compose([Resize(target_size=(512, 512)), Normalize()])
+ ```
+
+ - `segmentation_transforms`: The data enhancement module defines lots of data preprocessing methods. Users can replace the data preprocessing methods according to their needs.
+
+ - Step2: Download the dataset
+
+ - ```python
+ from paddlehub.datasets import OpticDiscSeg
+
+ train_reader = OpticDiscSeg(transform, mode='train')
+
+ ```
+ * `transforms`: data preprocessing methods.
+
+ * `mode`: Select the data mode, the options are `train`, `test`, `val`. Default is `train`.
+
+ * Dataset preparation can be referred to [opticdiscseg.py](../../paddlehub/datasets/opticdiscseg.py)。`hub.datasets.OpticDiscSeg()`will be automatically downloaded from the network and decompressed to the `$HOME/.paddlehub/dataset` directory under the user directory.
+
+ - Step3: Load the pre-trained model
+
+ - ```python
+ import paddlehub as hub
+
+ model = hub.Module(name='ginet_resnet50vd_voc', num_classes=2, pretrained=None)
+ ```
+ - `name`: model name.
+ - `load_checkpoint`: Whether to load the self-trained model, if it is None, load the provided parameters.
+
+ - Step4: Optimization strategy
+
+ - ```python
+ import paddle
+ from paddlehub.finetune.trainer import Trainer
+
+ scheduler = paddle.optimizer.lr.PolynomialDecay(learning_rate=0.01, decay_steps=1000, power=0.9, end_lr=0.0001)
+ optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
+ trainer = Trainer(model, optimizer, checkpoint_dir='test_ckpt_img_seg', use_gpu=True)
+ trainer.train(train_reader, epochs=10, batch_size=4, log_interval=10, save_interval=4)
+ ```
+
+
+ - Model prediction
+
+ - When Fine-tune is completed, the model with the best performance on the verification set will be saved in the `${CHECKPOINT_DIR}/best_model` directory. We use this model to make predictions. The `predict.py` script is as follows:
+
+ ```python
+ import paddle
+ import cv2
+ import paddlehub as hub
+
+ if __name__ == '__main__':
+ model = hub.Module(name='ginet_resnet50vd_voc', pretrained='/PATH/TO/CHECKPOINT')
+ img = cv2.imread("/PATH/TO/IMAGE")
+ model.predict(images=[img], visualization=True)
+ ```
+
+ - **Args**
+ * `images`: Image path or ndarray data with format [H, W, C], BGR.
+ * `visualization`: Whether to save the recognition results as picture files.
+ * `save_path`: Save path of the result, default is 'seg_result'.
+
+
+## IV. Server Deployment
+
+- PaddleHub Serving can deploy an online service of image segmentation.
+
+- ### Step 1: Start PaddleHub Serving
+
+ - Run the startup command:
+
+ - ```shell
+ $ hub serving start -m ginet_resnet50vd_voc
+ ```
+
+ - The servitization API is now deployed and the default port number is 8866.
+
+ - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
+
+- ### Step 2: Send a predictive request
+
+ - With a configured server, use the following lines of code to send the prediction request and obtain the result:
+
+ ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ import numpy as np
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+ org_im = cv2.imread('/PATH/TO/IMAGE')
+ data = {'images':[cv2_to_base64(org_im)]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/ginet_resnet50vd_voc"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+ mask = base64_to_cv2(r.json()["results"][0])
+ ```
+
+## V. Release Note
+
+- 1.0.0
+
+ First release
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_voc/layers.py b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..7e46219fd671ed9834795c9881292eed787b990d
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/layers.py
@@ -0,0 +1,345 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.layer import activation
+from paddle.nn import Conv2D, AvgPool2D
+
+
+def SyncBatchNorm(*args, **kwargs):
+ """In cpu environment nn.SyncBatchNorm does not have kernel so use nn.BatchNorm2D instead"""
+ if paddle.get_device() == 'cpu':
+ return nn.BatchNorm2D(*args, **kwargs)
+ else:
+ return nn.SyncBatchNorm(*args, **kwargs)
+
+
+class ConvBNLayer(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(
+ self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ stride: int = 1,
+ dilation: int = 1,
+ groups: int = 1,
+ is_vd_mode: bool = False,
+ act: str = None,
+ name: str = None):
+ super(ConvBNLayer, self).__init__()
+
+ self.is_vd_mode = is_vd_mode
+ self._pool2d_avg = AvgPool2D(
+ kernel_size=2, stride=2, padding=0, ceil_mode=True)
+ self._conv = Conv2D(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2 if dilation == 1 else 0,
+ dilation=dilation,
+ groups=groups,
+ bias_attr=False)
+
+ self._batch_norm = SyncBatchNorm(out_channels)
+ self._act_op = Activation(act=act)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ if self.is_vd_mode:
+ inputs = self._pool2d_avg(inputs)
+ y = self._conv(inputs)
+ y = self._batch_norm(y)
+ y = self._act_op(y)
+
+ return y
+
+
+class BottleneckBlock(nn.Layer):
+ """Residual bottleneck block"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ dilation: int = 1,
+ name: str = None):
+ super(BottleneckBlock, self).__init__()
+
+ self.conv0 = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ act='relu',
+ name=name + "_branch2a")
+
+ self.dilation = dilation
+
+ self.conv1 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ dilation=dilation,
+ name=name + "_branch2b")
+ self.conv2 = ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ act=None,
+ name=name + "_branch2c")
+
+ if not shortcut:
+ self.short = ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels * 4,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first or stride == 1 else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ if self.dilation > 1:
+ padding = self.dilation
+ y = F.pad(y, [padding, padding, padding, padding])
+
+ conv1 = self.conv1(y)
+ conv2 = self.conv2(conv1)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+
+ y = paddle.add(x=short, y=conv2)
+ y = F.relu(y)
+ return y
+
+
+class SeparableConvBNReLU(nn.Layer):
+ """Depthwise Separable Convolution."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(SeparableConvBNReLU, self).__init__()
+ self.depthwise_conv = ConvBN(
+ in_channels,
+ out_channels=in_channels,
+ kernel_size=kernel_size,
+ padding=padding,
+ groups=in_channels,
+ **kwargs)
+ self.piontwise_conv = ConvBNReLU(
+ in_channels, out_channels, kernel_size=1, groups=1)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self.depthwise_conv(x)
+ x = self.piontwise_conv(x)
+ return x
+
+
+class ConvBN(nn.Layer):
+ """Basic conv bn layer"""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBN, self).__init__()
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ return x
+
+
+class ConvBNReLU(nn.Layer):
+ """Basic conv bn relu layer."""
+
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ kernel_size: int,
+ padding: str = 'same',
+ **kwargs: dict):
+ super(ConvBNReLU, self).__init__()
+
+ self._conv = Conv2D(
+ in_channels, out_channels, kernel_size, padding=padding, **kwargs)
+ self._batch_norm = SyncBatchNorm(out_channels)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ x = self._conv(x)
+ x = self._batch_norm(x)
+ x = F.relu(x)
+ return x
+
+
+class Activation(nn.Layer):
+ """
+ The wrapper of activations.
+
+ Args:
+ act (str, optional): The activation name in lowercase. It must be one of ['elu', 'gelu',
+ 'hardshrink', 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid',
+ 'softmax', 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax',
+ 'hsigmoid']. Default: None, means identical transformation.
+
+ Returns:
+ A callable object of Activation.
+
+ Raises:
+ KeyError: When parameter `act` is not in the optional range.
+
+ Examples:
+
+ from paddleseg.models.common.activation import Activation
+
+ relu = Activation("relu")
+ print(relu)
+ #
+
+ sigmoid = Activation("sigmoid")
+ print(sigmoid)
+ #
+
+ not_exit_one = Activation("not_exit_one")
+ # KeyError: "not_exit_one does not exist in the current dict_keys(['elu', 'gelu', 'hardshrink',
+ # 'tanh', 'hardtanh', 'prelu', 'relu', 'relu6', 'selu', 'leakyrelu', 'sigmoid', 'softmax',
+ # 'softplus', 'softshrink', 'softsign', 'tanhshrink', 'logsigmoid', 'logsoftmax', 'hsigmoid'])"
+ """
+
+ def __init__(self, act: str = None):
+ super(Activation, self).__init__()
+
+ self._act = act
+ upper_act_names = activation.__dict__.keys()
+ lower_act_names = [act.lower() for act in upper_act_names]
+ act_dict = dict(zip(lower_act_names, upper_act_names))
+
+ if act is not None:
+ if act in act_dict.keys():
+ act_name = act_dict[act]
+ self.act_func = eval("activation.{}()".format(act_name))
+ else:
+ raise KeyError("{} does not exist in the current {}".format(
+ act, act_dict.keys()))
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+
+ if self._act is not None:
+ return self.act_func(x)
+ else:
+ return x
+
+
+class ASPPModule(nn.Layer):
+ """
+ Atrous Spatial Pyramid Pooling.
+
+ Args:
+ aspp_ratios (tuple): The dilation rate using in ASSP module.
+ in_channels (int): The number of input channels.
+ out_channels (int): The number of output channels.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.
+ use_sep_conv (bool, optional): If using separable conv in ASPP module. Default: False.
+ image_pooling (bool, optional): If augmented with image-level features. Default: False
+ """
+
+ def __init__(self,
+ aspp_ratios: tuple,
+ in_channels: int,
+ out_channels: int,
+ align_corners: bool,
+ use_sep_conv: bool= False,
+ image_pooling: bool = False):
+ super().__init__()
+
+ self.align_corners = align_corners
+ self.aspp_blocks = nn.LayerList()
+
+ for ratio in aspp_ratios:
+ if use_sep_conv and ratio > 1:
+ conv_func = SeparableConvBNReLU
+ else:
+ conv_func = ConvBNReLU
+
+ block = conv_func(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1 if ratio == 1 else 3,
+ dilation=ratio,
+ padding=0 if ratio == 1 else ratio)
+ self.aspp_blocks.append(block)
+
+ out_size = len(self.aspp_blocks)
+
+ if image_pooling:
+ self.global_avg_pool = nn.Sequential(
+ nn.AdaptiveAvgPool2D(output_size=(1, 1)),
+ ConvBNReLU(in_channels, out_channels, kernel_size=1, bias_attr=False))
+ out_size += 1
+ self.image_pooling = image_pooling
+
+ self.conv_bn_relu = ConvBNReLU(
+ in_channels=out_channels * out_size,
+ out_channels=out_channels,
+ kernel_size=1)
+
+ self.dropout = nn.Dropout(p=0.1) # drop rate
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ outputs = []
+ for block in self.aspp_blocks:
+ y = block(x)
+ y = F.interpolate(
+ y,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(y)
+
+ if self.image_pooling:
+ img_avg = self.global_avg_pool(x)
+ img_avg = F.interpolate(
+ img_avg,
+ x.shape[2:],
+ mode='bilinear',
+ align_corners=self.align_corners)
+ outputs.append(img_avg)
+
+ x = paddle.concat(outputs, axis=1)
+ x = self.conv_bn_relu(x)
+ x = self.dropout(x)
+
+ return x
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_voc/module.py b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..fed27ebf3a07794343c5841dc5c31b51e46f6544
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/module.py
@@ -0,0 +1,309 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Union, List, Tuple
+
+import paddle
+from paddle import nn
+import paddle.nn.functional as F
+import numpy as np
+from paddlehub.module.module import moduleinfo
+import paddlehub.vision.segmentation_transforms as T
+from paddlehub.module.cv_module import ImageSegmentationModule
+from paddleseg.utils import utils
+from paddleseg.models import layers
+
+from ginet_resnet50vd_voc.resnet import ResNet50_vd
+
+
+@moduleinfo(
+ name="ginet_resnet50vd_voc",
+ type="CV/semantic_segmentation",
+ author="paddlepaddle",
+ author_email="",
+ summary="GINetResnet50 is a segmentation model.",
+ version="1.0.0",
+ meta=ImageSegmentationModule)
+class GINetResNet50(nn.Layer):
+ """
+ The GINetResNet50 implementation based on PaddlePaddle.
+ The original article refers to
+ Wu, Tianyi, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, and Guodong Guo. "GINet: Graph interaction network for scene parsing." In European Conference on Computer Vision, pp. 34-51. Springer, Cham, 2020.
+ (https://arxiv.org/pdf/2009.06160).
+ Args:
+ num_classes (int): The unique number of target classes.
+ backbone_indices (tuple, optional): Values in the tuple indicate the indices of output of backbone.
+ enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss.
+ If true, auxiliary loss will be added after LearningToDownsample module. Default: False.
+ align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature
+ is even, e.g. 1024x512, otherwise it is True, e.g. 769x769.. Default: False.
+ jpu (bool, optional)): whether to use jpu unit in the base forward. Default:True.
+ pretrained (str, optional): The path or url of pretrained model. Default: None.
+ """
+
+ def __init__(self,
+ num_classes: int = 21,
+ backbone_indices: Tuple[int]=(0, 1, 2, 3),
+ enable_auxiliary_loss:bool = True,
+ align_corners: bool = True,
+ jpu: bool = True,
+ pretrained: str = None):
+ super(GINetResNet50, self).__init__()
+ self.nclass = num_classes
+ self.aux = enable_auxiliary_loss
+ self.jpu = jpu
+
+ self.backbone = ResNet50_vd()
+ self.backbone_indices = backbone_indices
+ self.align_corners = align_corners
+ self.transforms = T.Compose([T.Normalize()])
+
+ self.jpu = layers.JPU([512, 1024, 2048], width=512) if jpu else None
+ self.head = GIHead(in_channels=2048, nclass=num_classes)
+
+ if self.aux:
+ self.auxlayer = layers.AuxLayer(
+ 1024, 1024 // 4, num_classes, bias_attr=False)
+
+ if pretrained is not None:
+ model_dict = paddle.load(pretrained)
+ self.set_dict(model_dict)
+ print("load custom parameters success")
+
+ else:
+ checkpoint = os.path.join(self.directory, 'model.pdparams')
+ model_dict = paddle.load(checkpoint)
+ self.set_dict(model_dict)
+ print("load pretrained parameters success")
+
+ def transform(self, img: Union[np.ndarray, str]) -> Union[np.ndarray, str]:
+ return self.transforms(img)
+
+ def base_forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ feat_list = self.backbone(x)
+ c1, c2, c3, c4 = [feat_list[i] for i in self.backbone_indices]
+
+ if self.jpu:
+ return self.jpu(c1, c2, c3, c4)
+ else:
+ return c1, c2, c3, c4
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ _, _, h, w = x.shape
+ _, _, c3, c4 = self.base_forward(x)
+
+ logit_list = []
+ x, _ = self.head(c4)
+ logit_list.append(x)
+
+ if self.aux:
+ auxout = self.auxlayer(c3)
+
+ logit_list.append(auxout)
+
+ return [
+ F.interpolate(
+ logit, (h, w),
+ mode='bilinear',
+ align_corners=self.align_corners) for logit in logit_list
+ ]
+
+
+class GIHead(nn.Layer):
+ """The Graph Interaction Network head."""
+
+ def __init__(self, in_channels: int, nclass: int):
+ super().__init__()
+ self.nclass = nclass
+ inter_channels = in_channels // 4
+ self.inp = paddle.zeros(shape=(nclass, 300), dtype='float32')
+ self.inp = paddle.create_parameter(
+ shape=self.inp.shape,
+ dtype=str(self.inp.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.inp))
+
+ self.fc1 = nn.Sequential(
+ nn.Linear(300, 128), nn.BatchNorm1D(128), nn.ReLU())
+ self.fc2 = nn.Sequential(
+ nn.Linear(128, 256), nn.BatchNorm1D(256), nn.ReLU())
+ self.conv5 = layers.ConvBNReLU(
+ in_channels,
+ inter_channels,
+ 3,
+ padding=1,
+ bias_attr=False,
+ stride=1)
+
+ self.gloru = GlobalReasonUnit(
+ in_channels=inter_channels,
+ num_state=256,
+ num_node=84,
+ nclass=nclass)
+ self.conv6 = nn.Sequential(
+ nn.Dropout(0.1), nn.Conv2D(inter_channels, nclass, 1))
+
+ def forward(self, x: paddle.Tensor) -> List[paddle.Tensor]:
+ B, C, H, W = x.shape
+ inp = self.inp.detach()
+
+ inp = self.fc1(inp)
+ inp = self.fc2(inp).unsqueeze(axis=0).transpose((0, 2, 1))\
+ .expand((B, 256, self.nclass))
+
+ out = self.conv5(x)
+
+ out, se_out = self.gloru(out, inp)
+ out = self.conv6(out)
+ return out, se_out
+
+
+class GlobalReasonUnit(nn.Layer):
+ """
+ The original paper refers to:
+ Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks" (https://arxiv.org/abs/1811.12814)
+ """
+
+ def __init__(self, in_channels: int, num_state: int = 256, num_node: int = 84, nclass: int = 59):
+ super().__init__()
+ self.num_state = num_state
+ self.conv_theta = nn.Conv2D(
+ in_channels, num_node, kernel_size=1, stride=1, padding=0)
+ self.conv_phi = nn.Conv2D(
+ in_channels, num_state, kernel_size=1, stride=1, padding=0)
+ self.graph = GraphLayer(num_state, num_node, nclass)
+ self.extend_dim = nn.Conv2D(
+ num_state, in_channels, kernel_size=1, bias_attr=False)
+
+ self.bn = layers.SyncBatchNorm(in_channels)
+
+ def forward(self, x: paddle.Tensor, inp: paddle.Tensor) -> List[paddle.Tensor]:
+ B = self.conv_theta(x)
+ sizeB = B.shape
+ B = B.reshape((sizeB[0], sizeB[1], -1))
+
+ sizex = x.shape
+ x_reduce = self.conv_phi(x)
+ x_reduce = x_reduce.reshape((sizex[0], -1, sizex[2] * sizex[3]))\
+ .transpose((0, 2, 1))
+
+ V = paddle.bmm(B, x_reduce).transpose((0, 2, 1))
+ V = paddle.divide(
+ V, paddle.to_tensor([sizex[2] * sizex[3]], dtype='float32'))
+
+ class_node, new_V = self.graph(inp, V)
+ D = B.reshape((sizeB[0], -1, sizeB[2] * sizeB[3])).transpose((0, 2, 1))
+ Y = paddle.bmm(D, new_V.transpose((0, 2, 1)))
+ Y = Y.transpose((0, 2, 1)).reshape((sizex[0], self.num_state, \
+ sizex[2], -1))
+ Y = self.extend_dim(Y)
+ Y = self.bn(Y)
+ out = Y + x
+
+ return out, class_node
+
+
+class GraphLayer(nn.Layer):
+ def __init__(self, num_state: int, num_node: int, num_class: int):
+ super().__init__()
+ self.vis_gcn = GCN(num_state, num_node)
+ self.word_gcn = GCN(num_state, num_class)
+ self.transfer = GraphTransfer(num_state)
+ self.gamma_vis = paddle.zeros([num_node])
+ self.gamma_word = paddle.zeros([num_class])
+ self.gamma_vis = paddle.create_parameter(
+ shape=self.gamma_vis.shape,
+ dtype=str(self.gamma_vis.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_vis))
+ self.gamma_word = paddle.create_parameter(
+ shape=self.gamma_word.shape,
+ dtype=str(self.gamma_word.numpy().dtype),
+ default_initializer=paddle.nn.initializer.Assign(self.gamma_word))
+
+ def forward(self, inp: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ inp = self.word_gcn(inp)
+ new_V = self.vis_gcn(vis_node)
+ class_node, vis_node = self.transfer(inp, new_V)
+
+ class_node = self.gamma_word * inp + class_node
+ new_V = self.gamma_vis * vis_node + new_V
+ return class_node, new_V
+
+
+class GCN(nn.Layer):
+ def __init__(self, num_state: int = 128, num_node: int = 64, bias: bool = False):
+ super().__init__()
+ self.conv1 = nn.Conv1D(
+ num_node,
+ num_node,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ )
+ self.relu = nn.ReLU()
+ self.conv2 = nn.Conv1D(
+ num_state,
+ num_state,
+ kernel_size=1,
+ padding=0,
+ stride=1,
+ groups=1,
+ bias_attr=bias)
+
+ def forward(self, x: paddle.Tensor) -> paddle.Tensor:
+ h = self.conv1(x.transpose((0, 2, 1))).transpose((0, 2, 1))
+ h = h + x
+ h = self.relu(h)
+ h = self.conv2(h)
+ return h
+
+
+class GraphTransfer(nn.Layer):
+ """Transfer vis graph to class node, transfer class node to vis feature"""
+
+ def __init__(self, in_dim: int):
+ super().__init__()
+ self.channle_in = in_dim
+ self.query_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.key_conv = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim // 2, kernel_size=1)
+ self.value_conv_vis = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.value_conv_word = nn.Conv1D(
+ in_channels=in_dim, out_channels=in_dim, kernel_size=1)
+ self.softmax_vis = nn.Softmax(axis=-1)
+ self.softmax_word = nn.Softmax(axis=-2)
+
+ def forward(self, word: paddle.Tensor, vis_node: paddle.Tensor) -> List[paddle.Tensor]:
+ m_batchsize, C, Nc = word.shape
+ m_batchsize, C, Nn = vis_node.shape
+
+ proj_query = self.query_conv(word).reshape((m_batchsize, -1, Nc))\
+ .transpose((0, 2, 1))
+ proj_key = self.key_conv(vis_node).reshape((m_batchsize, -1, Nn))
+
+ energy = paddle.bmm(proj_query, proj_key)
+ attention_vis = self.softmax_vis(energy).transpose((0, 2, 1))
+ attention_word = self.softmax_word(energy)
+
+ proj_value_vis = self.value_conv_vis(vis_node).reshape((m_batchsize, -1,
+ Nn))
+ proj_value_word = self.value_conv_word(word).reshape((m_batchsize, -1,
+ Nc))
+
+ class_out = paddle.bmm(proj_value_vis, attention_vis)
+ node_out = paddle.bmm(proj_value_word, attention_word)
+ return class_out, node_out
\ No newline at end of file
diff --git a/modules/image/semantic_segmentation/ginet_resnet50vd_voc/resnet.py b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..79f648ef9f3381b41852a8010381a6087d6b7f72
--- /dev/null
+++ b/modules/image/semantic_segmentation/ginet_resnet50vd_voc/resnet.py
@@ -0,0 +1,137 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+import ginet_resnet50vd_voc.layers as L
+
+
+class BasicBlock(nn.Layer):
+ def __init__(self,
+ in_channels: int,
+ out_channels: int,
+ stride: int,
+ shortcut: bool = True,
+ if_first: bool = False,
+ name: str = None):
+ super(BasicBlock, self).__init__()
+ self.stride = stride
+ self.conv0 = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ stride=stride,
+ act='relu',
+ name=name + "_branch2a")
+ self.conv1 = L.ConvBNLayer(
+ in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ act=None,
+ name=name + "_branch2b")
+
+ if not shortcut:
+ self.short = L.ConvBNLayer(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ stride=1,
+ is_vd_mode=False if if_first else True,
+ name=name + "_branch1")
+
+ self.shortcut = shortcut
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv0(inputs)
+ conv1 = self.conv1(y)
+
+ if self.shortcut:
+ short = inputs
+ else:
+ short = self.short(inputs)
+ y = paddle.elementwise_add(x=short, y=conv1, act='relu')
+
+ return y
+
+
+class ResNet50_vd(nn.Layer):
+ def __init__(self,
+ multi_grid: tuple = (1, 2, 4)):
+ super(ResNet50_vd, self).__init__()
+ depth = [3, 4, 6, 3]
+ num_channels = [64, 256, 512, 1024]
+ num_filters = [64, 128, 256, 512]
+ self.feat_channels = [c * 4 for c in num_filters]
+ dilation_dict = {2: 2, 3: 4}
+ self.conv1_1 = L.ConvBNLayer(
+ in_channels=3,
+ out_channels=32,
+ kernel_size=3,
+ stride=2,
+ act='relu',
+ name="conv1_1")
+ self.conv1_2 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=32,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_2")
+ self.conv1_3 = L.ConvBNLayer(
+ in_channels=32,
+ out_channels=64,
+ kernel_size=3,
+ stride=1,
+ act='relu',
+ name="conv1_3")
+ self.pool2d_max = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
+ self.stage_list = []
+
+ for block in range(len(depth)):
+ shortcut = False
+ block_list = []
+ for i in range(depth[block]):
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+ dilation_rate = dilation_dict[
+ block] if dilation_dict and block in dilation_dict else 1
+ if block == 3:
+ dilation_rate = dilation_rate * multi_grid[i]
+ bottleneck_block = self.add_sublayer(
+ 'bb_%d_%d' % (block, i),
+ L.BottleneckBlock(
+ in_channels=num_channels[block]
+ if i == 0 else num_filters[block] * 4,
+ out_channels=num_filters[block],
+ stride=2 if i == 0 and block != 0
+ and dilation_rate == 1 else 1,
+ shortcut=shortcut,
+ if_first=block == i == 0,
+ name=conv_name,
+ dilation=dilation_rate))
+ block_list.append(bottleneck_block)
+ shortcut = True
+ self.stage_list.append(block_list)
+
+ def forward(self, inputs: paddle.Tensor) -> paddle.Tensor:
+ y = self.conv1_1(inputs)
+ y = self.conv1_2(y)
+ y = self.conv1_3(y)
+ y = self.pool2d_max(y)
+ feat_list = []
+ for stage in self.stage_list:
+ for block in stage:
+ y = block(y)
+ feat_list.append(y)
+ return feat_list
\ No newline at end of file