diff --git a/modules/image/classification/esnet_x0_25_imagenet/README.md b/modules/image/classification/esnet_x0_25_imagenet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a87a9ee530d6a211fe91c7504faafc8e4f68168e --- /dev/null +++ b/modules/image/classification/esnet_x0_25_imagenet/README.md @@ -0,0 +1,133 @@ +# esnet_x0_25_imagenet + +|模型名称|esnet_x0_25_imagenet| +| :--- | :---: | +|类别|图像-图像分类| +|网络|ESNet| +|数据集|ImageNet-2012| +|是否支持Fine-tuning|否| +|模型大小|10 MB| +|最新更新日期|2022-04-02| +|数据指标|Acc| + + +## 一、模型基本信息 + + + +- ### 模型介绍 + + - ESNet(Enhanced ShuffleNet)是百度自研的一个轻量级网络,该网络在 ShuffleNetV2 的基础上融合了 MobileNetV3、GhostNet、PPLCNet 的优点,组合成了一个在 ARM 设备上速度更快、精度更高的网络,由于其出色的表现,所以在 PaddleDetection 推出的 [PP-PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/picodet) 使用了该模型做 backbone,配合更强的目标检测算法,最终的指标一举刷新了目标检测模型在 ARM 设备上的 SOTA 指标。该模型为模型规模参数scale为x0.25下的ESNet模型。 + + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 1.6.2 + + - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + +- ### 2、安装 + + - ```shell + $ hub install esnet_x0_25_imagenet + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run esnet_x0_25_imagenet --input_path "/PATH/TO/IMAGE" + ``` + - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、预测代码示例 + + - ```python + import paddlehub as hub + import cv2 + + classifier = hub.Module(name="esnet_x0_25_imagenet") + result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')]) + # or + # result = classifier.classification(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + + - ```python + def classification(images=None, + paths=None, + batch_size=1, + use_gpu=False, + top_k=1): + ``` + - 分类接口API。 + - **参数** + + - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。 + + - **返回** + + - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称) + + +## 四、服务部署 + +- PaddleHub Serving可以部署一个图像识别的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m esnet_x0_25_imagenet + ``` + + - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"\} + url = "http://127.0.0.1:8866/predict/esnet_x0_25_imagenet" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` + + +## 五、更新历史 + +* 1.0.0 + + 初始发布 + + - ```shell + $ hub install esnet_x0_25_imagenet==1.0.0 + ``` diff --git a/modules/image/classification/esnet_x0_25_imagenet/model.py b/modules/image/classification/esnet_x0_25_imagenet/model.py new file mode 100644 index 0000000000000000000000000000000000000000..a2384403f29d18f1602c24233a1c4d6dc9df713d --- /dev/null +++ b/modules/image/classification/esnet_x0_25_imagenet/model.py @@ -0,0 +1,506 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import math +from typing import Any +from typing import Callable +from typing import Dict +from typing import List +from typing import Tuple +from typing import Union + +import paddle +import paddle.nn as nn +from paddle import concat +from paddle import ParamAttr +from paddle import reshape +from paddle import split +from paddle import transpose +from paddle.nn import AdaptiveAvgPool2D +from paddle.nn import BatchNorm +from paddle.nn import Conv2D +from paddle.nn import Dropout +from paddle.nn import Linear +from paddle.nn import MaxPool2D +from paddle.nn.initializer import KaimingNormal +from paddle.regularizer import L2Decay + +MODEL_STAGES_PATTERN = {"ESNet": ["blocks[2]", "blocks[9]", "blocks[12]"]} + + +class Identity(nn.Layer): + + def __init__(self): + super(Identity, self).__init__() + + def forward(self, inputs): + return inputs + + +class TheseusLayer(nn.Layer): + + def __init__(self, *args, **kwargs): + super(TheseusLayer, self).__init__() + self.res_dict = {} + self.res_name = self.full_name() + self.pruner = None + self.quanter = None + + def _return_dict_hook(self, layer, input, output): + res_dict = {"output": output} + # 'list' is needed to avoid error raised by popping self.res_dict + for res_key in list(self.res_dict): + # clear the res_dict because the forward process may change according to input + res_dict[res_key] = self.res_dict.pop(res_key) + return res_dict + + def init_res(self, stages_pattern, return_patterns=None, return_stages=None): + if return_patterns and return_stages: + msg = f"The 'return_patterns' would be ignored when 'return_stages' is set." + return_stages = None + + if return_stages is True: + return_patterns = stages_pattern + # return_stages is int or bool + if type(return_stages) is int: + return_stages = [return_stages] + if isinstance(return_stages, list): + if max(return_stages) > len(stages_pattern) or min(return_stages) < 0: + msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}." + return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)] + return_patterns = [stages_pattern[i] for i in return_stages] + + if return_patterns: + self.update_res(return_patterns) + + def replace_sub(self, *args, **kwargs) -> None: + msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead." + raise DeprecationWarning(msg) + + def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]], + handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]: + """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'. + + Args: + layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'. + handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed. + + Returns: + Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'. + + Examples: + + from paddle import nn + import paddleclas + + def rep_func(layer: nn.Layer, pattern: str): + new_layer = nn.Conv2D( + in_channels=layer._in_channels, + out_channels=layer._out_channels, + kernel_size=5, + padding=2 + ) + return new_layer + + net = paddleclas.MobileNetV1() + res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func) + print(res) + # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer} + """ + + if not isinstance(layer_name_pattern, list): + layer_name_pattern = [layer_name_pattern] + + hit_layer_pattern_list = [] + for pattern in layer_name_pattern: + # parse pattern to find target layer and its parent + layer_list = parse_pattern_str(pattern=pattern, parent_layer=self) + if not layer_list: + continue + sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self + + sub_layer = layer_list[-1]["layer"] + sub_layer_name = layer_list[-1]["name"] + sub_layer_index = layer_list[-1]["index"] + + new_sub_layer = handle_func(sub_layer, pattern) + + if sub_layer_index: + getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer + else: + setattr(sub_layer_parent, sub_layer_name, new_sub_layer) + + hit_layer_pattern_list.append(pattern) + return hit_layer_pattern_list + + def stop_after(self, stop_layer_name: str) -> bool: + """stop forward and backward after 'stop_layer_name'. + + Args: + stop_layer_name (str): The name of layer that stop forward and backward after this layer. + + Returns: + bool: 'True' if successful, 'False' otherwise. + """ + + layer_list = parse_pattern_str(stop_layer_name, self) + if not layer_list: + return False + + parent_layer = self + for layer_dict in layer_list: + name, index = layer_dict["name"], layer_dict["index"] + if not set_identity(parent_layer, name, index): + msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'." + return False + parent_layer = layer_dict["layer"] + + return True + + def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]: + """update the result(s) to be returned. + + Args: + return_patterns (Union[str, List[str]]): The name of layer to return output. + + Returns: + Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully. + """ + + # clear res_dict that could have been set + self.res_dict = {} + + class Handler(object): + + def __init__(self, res_dict): + # res_dict is a reference + self.res_dict = res_dict + + def __call__(self, layer, pattern): + layer.res_dict = self.res_dict + layer.res_name = pattern + if hasattr(layer, "hook_remove_helper"): + layer.hook_remove_helper.remove() + layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook) + return layer + + handle_func = Handler(self.res_dict) + + hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func) + + if hasattr(self, "hook_remove_helper"): + self.hook_remove_helper.remove() + self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook) + + return hit_layer_pattern_list + + +def save_sub_res_hook(layer, input, output): + layer.res_dict[layer.res_name] = output + + +def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool: + """set the layer specified by layer_name and layer_index to Indentity. + + Args: + parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index. + layer_name (str): The name of target layer to be set to Indentity. + layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None. + + Returns: + bool: True if successfully, False otherwise. + """ + + stop_after = False + for sub_layer_name in parent_layer._sub_layers: + if stop_after: + parent_layer._sub_layers[sub_layer_name] = Identity() + continue + if sub_layer_name == layer_name: + stop_after = True + + if layer_index and stop_after: + stop_after = False + for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers: + if stop_after: + parent_layer._sub_layers[layer_name][sub_layer_index] = Identity() + continue + if layer_index == sub_layer_index: + stop_after = True + + return stop_after + + +def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: + """parse the string type pattern. + + Args: + pattern (str): The pattern to discribe layer. + parent_layer (nn.Layer): The root layer relative to the pattern. + + Returns: + Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order: + [ + {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist}, + {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist}, + ... + ] + """ + + pattern_list = pattern.split(".") + if not pattern_list: + msg = f"The pattern('{pattern}') is illegal. Please check and retry." + return None + + layer_list = [] + while len(pattern_list) > 0: + if '[' in pattern_list[0]: + target_layer_name = pattern_list[0].split('[')[0] + target_layer_index = pattern_list[0].split('[')[1].split(']')[0] + else: + target_layer_name = pattern_list[0] + target_layer_index = None + + target_layer = getattr(parent_layer, target_layer_name, None) + + if target_layer is None: + msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')." + return None + + if target_layer_index and target_layer: + if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer): + msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0." + return None + + target_layer = target_layer[target_layer_index] + + layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index}) + + pattern_list = pattern_list[1:] + parent_layer = target_layer + return layer_list + + +def channel_shuffle(x, groups): + batch_size, num_channels, height, width = x.shape[0:4] + channels_per_group = num_channels // groups + x = reshape(x=x, shape=[batch_size, groups, channels_per_group, height, width]) + x = transpose(x=x, perm=[0, 2, 1, 3, 4]) + x = reshape(x=x, shape=[batch_size, num_channels, height, width]) + return x + + +def make_divisible(v, divisor=8, min_value=None): + if min_value is None: + min_value = divisor + new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) + if new_v < 0.9 * v: + new_v += divisor + return new_v + + +class ConvBNLayer(TheseusLayer): + + def __init__(self, in_channels, out_channels, kernel_size, stride=1, groups=1, if_act=True): + super().__init__() + self.conv = Conv2D(in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2, + groups=groups, + weight_attr=ParamAttr(initializer=KaimingNormal()), + bias_attr=False) + + self.bn = BatchNorm(out_channels, + param_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + self.if_act = if_act + self.hardswish = nn.Hardswish() + + def forward(self, x): + x = self.conv(x) + x = self.bn(x) + if self.if_act: + x = self.hardswish(x) + return x + + +class SEModule(TheseusLayer): + + def __init__(self, channel, reduction=4): + super().__init__() + self.avg_pool = AdaptiveAvgPool2D(1) + self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0) + self.relu = nn.ReLU() + self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0) + self.hardsigmoid = nn.Hardsigmoid() + + def forward(self, x): + identity = x + x = self.avg_pool(x) + x = self.conv1(x) + x = self.relu(x) + x = self.conv2(x) + x = self.hardsigmoid(x) + x = paddle.multiply(x=identity, y=x) + return x + + +class ESBlock1(TheseusLayer): + + def __init__(self, in_channels, out_channels): + super().__init__() + self.pw_1_1 = ConvBNLayer(in_channels=in_channels // 2, out_channels=out_channels // 2, kernel_size=1, stride=1) + self.dw_1 = ConvBNLayer(in_channels=out_channels // 2, + out_channels=out_channels // 2, + kernel_size=3, + stride=1, + groups=out_channels // 2, + if_act=False) + self.se = SEModule(out_channels) + + self.pw_1_2 = ConvBNLayer(in_channels=out_channels, out_channels=out_channels // 2, kernel_size=1, stride=1) + + def forward(self, x): + x1, x2 = split(x, num_or_sections=[x.shape[1] // 2, x.shape[1] // 2], axis=1) + x2 = self.pw_1_1(x2) + x3 = self.dw_1(x2) + x3 = concat([x2, x3], axis=1) + x3 = self.se(x3) + x3 = self.pw_1_2(x3) + x = concat([x1, x3], axis=1) + return channel_shuffle(x, 2) + + +class ESBlock2(TheseusLayer): + + def __init__(self, in_channels, out_channels): + super().__init__() + + # branch1 + self.dw_1 = ConvBNLayer(in_channels=in_channels, + out_channels=in_channels, + kernel_size=3, + stride=2, + groups=in_channels, + if_act=False) + self.pw_1 = ConvBNLayer(in_channels=in_channels, out_channels=out_channels // 2, kernel_size=1, stride=1) + # branch2 + self.pw_2_1 = ConvBNLayer(in_channels=in_channels, out_channels=out_channels // 2, kernel_size=1) + self.dw_2 = ConvBNLayer(in_channels=out_channels // 2, + out_channels=out_channels // 2, + kernel_size=3, + stride=2, + groups=out_channels // 2, + if_act=False) + self.se = SEModule(out_channels // 2) + self.pw_2_2 = ConvBNLayer(in_channels=out_channels // 2, out_channels=out_channels // 2, kernel_size=1) + self.concat_dw = ConvBNLayer(in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + groups=out_channels) + self.concat_pw = ConvBNLayer(in_channels=out_channels, out_channels=out_channels, kernel_size=1) + + def forward(self, x): + x1 = self.dw_1(x) + x1 = self.pw_1(x1) + x2 = self.pw_2_1(x) + x2 = self.dw_2(x2) + x2 = self.se(x2) + x2 = self.pw_2_2(x2) + x = concat([x1, x2], axis=1) + x = self.concat_dw(x) + x = self.concat_pw(x) + return x + + +class ESNet(TheseusLayer): + + def __init__(self, + stages_pattern, + class_num=1000, + scale=1.0, + dropout_prob=0.2, + class_expand=1280, + return_patterns=None, + return_stages=None): + super().__init__() + self.scale = scale + self.class_num = class_num + self.class_expand = class_expand + stage_repeats = [3, 7, 3] + stage_out_channels = [ + -1, 24, make_divisible(116 * scale), + make_divisible(232 * scale), + make_divisible(464 * scale), 1024 + ] + + self.conv1 = ConvBNLayer(in_channels=3, out_channels=stage_out_channels[1], kernel_size=3, stride=2) + self.max_pool = MaxPool2D(kernel_size=3, stride=2, padding=1) + + block_list = [] + for stage_id, num_repeat in enumerate(stage_repeats): + for i in range(num_repeat): + if i == 0: + block = ESBlock2(in_channels=stage_out_channels[stage_id + 1], + out_channels=stage_out_channels[stage_id + 2]) + else: + block = ESBlock1(in_channels=stage_out_channels[stage_id + 2], + out_channels=stage_out_channels[stage_id + 2]) + block_list.append(block) + self.blocks = nn.Sequential(*block_list) + + self.conv2 = ConvBNLayer(in_channels=stage_out_channels[-2], out_channels=stage_out_channels[-1], kernel_size=1) + + self.avg_pool = AdaptiveAvgPool2D(1) + + self.last_conv = Conv2D(in_channels=stage_out_channels[-1], + out_channels=self.class_expand, + kernel_size=1, + stride=1, + padding=0, + bias_attr=False) + self.hardswish = nn.Hardswish() + self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer") + self.flatten = nn.Flatten(start_axis=1, stop_axis=-1) + self.fc = Linear(self.class_expand, self.class_num) + + super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages) + + def forward(self, x): + x = self.conv1(x) + x = self.max_pool(x) + x = self.blocks(x) + x = self.conv2(x) + x = self.avg_pool(x) + x = self.last_conv(x) + x = self.hardswish(x) + x = self.dropout(x) + x = self.flatten(x) + x = self.fc(x) + return x + + +def ESNet_x0_25(pretrained=False, use_ssld=False, **kwargs): + """ + ESNet_x0_25 + Args: + pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise. + If str, means the path of the pretrained model. + use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True. + Returns: + model: nn.Layer. Specific `ESNet_x0_25` model depends on args. + """ + model = ESNet(scale=0.25, stages_pattern=MODEL_STAGES_PATTERN["ESNet"], **kwargs) + return model diff --git a/modules/image/classification/esnet_x0_25_imagenet/module.py b/modules/image/classification/esnet_x0_25_imagenet/module.py new file mode 100644 index 0000000000000000000000000000000000000000..2c2edaab788a0ac410978528303c45cd86c95f76 --- /dev/null +++ b/modules/image/classification/esnet_x0_25_imagenet/module.py @@ -0,0 +1,154 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import cv2 +import numpy as np +import paddle +from skimage.io import imread +from skimage.transform import rescale +from skimage.transform import resize + +import paddlehub as hub +from .model import ESNet_x0_25 +from .processor import base64_to_cv2 +from .processor import create_operators +from .processor import Topk +from .utils import get_config +from paddlehub.module.module import moduleinfo +from paddlehub.module.module import runnable +from paddlehub.module.module import serving + + +@moduleinfo(name="esnet_x0_25_imagenet", + type="cv/classification", + author="paddlepaddle", + author_email="", + summary="", + version="1.0.0") +class Esnet_x0_25_Imagenet: + + def __init__(self): + self.config = get_config(os.path.join(self.directory, 'ESNet_x0_25.yaml'), show=False) + self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt') + self.pretrain_path = os.path.join(self.directory, 'ESNet_x0_25_pretrained.pdparams') + self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path + self.model = ESNet_x0_25() + param_state_dict = paddle.load(self.pretrain_path) + self.model.set_dict(param_state_dict) + self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"]) + + def classification(self, + images: list = None, + paths: list = None, + batch_size: int = 1, + use_gpu: bool = False, + top_k: int = 1): + ''' + Args: + images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR. + paths (list[str]): The paths of images. + batch_size (int): batch size. + use_gpu (bool): Whether to use gpu. + top_k (int): Return top k results. + + Returns: + res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'. + ''' + postprocess_func = Topk(top_k, self.label_path) + inputs = [] + results = [] + paddle.disable_static() + place = 'gpu:0' if use_gpu else 'cpu' + place = paddle.set_device(place) + if images == None and paths == None: + print('No image provided. Please input an image or a image path.') + return + + if images != None: + for image in images: + image = image[:, :, ::-1] + inputs.append(image) + + if paths != None: + for path in paths: + image = cv2.imread(path)[:, :, ::-1] + inputs.append(image) + + batch_data = [] + for idx, imagedata in enumerate(inputs): + for process in self.preprocess_funcs: + imagedata = process(imagedata) + batch_data.append(imagedata) + if len(batch_data) >= batch_size or idx == len(inputs) - 1: + batch_tensor = paddle.to_tensor(batch_data) + out = self.model(batch_tensor) + if isinstance(out, list): + out = out[0] + if isinstance(out, dict) and "logits" in out: + out = out["logits"] + if isinstance(out, dict) and "output" in out: + out = out["output"] + result = postprocess_func(out) + results.extend(result) + batch_data.clear() + return results + + @runnable + def run_cmd(self, argvs: list): + """ + Run as a command. + """ + self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name), + prog='hub run {}'.format(self.name), + usage='%(prog)s', + add_help=True) + + self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required") + self.arg_config_group = self.parser.add_argument_group( + title="Config options", description="Run configuration for controlling module behavior, not required.") + self.add_module_config_arg() + self.add_module_input_arg() + self.args = self.parser.parse_args(argvs) + results = self.classification(paths=[self.args.input_path], + use_gpu=self.args.use_gpu, + batch_size=self.args.batch_size, + top_k=self.args.top_k) + return results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.classification(images=images_decode, **kwargs) + return results + + def add_module_config_arg(self): + """ + Add the command config options. + """ + self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not") + + self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size') + self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.') + + def add_module_input_arg(self): + """ + Add the command input options. + """ + self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.") diff --git a/modules/image/classification/esnet_x0_25_imagenet/processor.py b/modules/image/classification/esnet_x0_25_imagenet/processor.py new file mode 100644 index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8 --- /dev/null +++ b/modules/image/classification/esnet_x0_25_imagenet/processor.py @@ -0,0 +1,374 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import base64 +import inspect +import math +import os +import random +import sys +from functools import partial + +import cv2 +import numpy as np +import paddle +import paddle.nn.functional as F +import six +from paddle.vision.transforms import ColorJitter as RawColorJitter +from PIL import Image + + +def create_operators(params, class_num=None): + """ + create operators based on the config + + Args: + params(list): a dict list, used to create some operators + """ + assert isinstance(params, list), ('operator config should be a list') + ops = [] + current_module = sys.modules[__name__] + for operator in params: + assert isinstance(operator, dict) and len(operator) == 1, "yaml format error" + op_name = list(operator)[0] + param = {} if operator[op_name] is None else operator[op_name] + op_func = getattr(current_module, op_name) + if "class_num" in inspect.getfullargspec(op_func).args: + param.update({"class_num": class_num}) + op = op_func(**param) + ops.append(op) + + return ops + + +class UnifiedResize(object): + + def __init__(self, interpolation=None, backend="cv2"): + _cv2_interp_from_str = { + 'nearest': cv2.INTER_NEAREST, + 'bilinear': cv2.INTER_LINEAR, + 'area': cv2.INTER_AREA, + 'bicubic': cv2.INTER_CUBIC, + 'lanczos': cv2.INTER_LANCZOS4 + } + _pil_interp_from_str = { + 'nearest': Image.NEAREST, + 'bilinear': Image.BILINEAR, + 'bicubic': Image.BICUBIC, + 'box': Image.BOX, + 'lanczos': Image.LANCZOS, + 'hamming': Image.HAMMING + } + + def _pil_resize(src, size, resample): + pil_img = Image.fromarray(src) + pil_img = pil_img.resize(size, resample) + return np.asarray(pil_img) + + if backend.lower() == "cv2": + if isinstance(interpolation, str): + interpolation = _cv2_interp_from_str[interpolation.lower()] + # compatible with opencv < version 4.4.0 + elif interpolation is None: + interpolation = cv2.INTER_LINEAR + self.resize_func = partial(cv2.resize, interpolation=interpolation) + elif backend.lower() == "pil": + if isinstance(interpolation, str): + interpolation = _pil_interp_from_str[interpolation.lower()] + self.resize_func = partial(_pil_resize, resample=interpolation) + else: + self.resize_func = cv2.resize + + def __call__(self, src, size): + return self.resize_func(src, size) + + +class OperatorParamError(ValueError): + """ OperatorParamError + """ + pass + + +class DecodeImage(object): + """ decode image """ + + def __init__(self, to_rgb=True, to_np=False, channel_first=False): + self.to_rgb = to_rgb + self.to_np = to_np # to numpy + self.channel_first = channel_first # only enabled when to_np is True + + def __call__(self, img): + if six.PY2: + assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage" + else: + assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage" + data = np.frombuffer(img, dtype='uint8') + img = cv2.imdecode(data, 1) + if self.to_rgb: + assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape) + img = img[:, :, ::-1] + + if self.channel_first: + img = img.transpose((2, 0, 1)) + + return img + + +class ResizeImage(object): + """ resize image """ + + def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"): + if resize_short is not None and resize_short > 0: + self.resize_short = resize_short + self.w = None + self.h = None + elif size is not None: + self.resize_short = None + self.w = size if type(size) is int else size[0] + self.h = size if type(size) is int else size[1] + else: + raise OperatorParamError("invalid params for ReisizeImage for '\ + 'both 'size' and 'resize_short' are None") + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + img_h, img_w = img.shape[:2] + if self.resize_short is not None: + percent = float(self.resize_short) / min(img_w, img_h) + w = int(round(img_w * percent)) + h = int(round(img_h * percent)) + else: + w = self.w + h = self.h + return self._resize_func(img, (w, h)) + + +class CropImage(object): + """ crop image """ + + def __init__(self, size): + if type(size) is int: + self.size = (size, size) + else: + self.size = size # (h, w) + + def __call__(self, img): + w, h = self.size + img_h, img_w = img.shape[:2] + w_start = (img_w - w) // 2 + h_start = (img_h - h) // 2 + + w_end = w_start + w + h_end = h_start + h + return img[h_start:h_end, w_start:w_end, :] + + +class RandCropImage(object): + """ random crop image """ + + def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"): + if type(size) is int: + self.size = (size, size) # (h, w) + else: + self.size = size + + self.scale = [0.08, 1.0] if scale is None else scale + self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + size = self.size + scale = self.scale + ratio = self.ratio + + aspect_ratio = math.sqrt(random.uniform(*ratio)) + w = 1. * aspect_ratio + h = 1. / aspect_ratio + + img_h, img_w = img.shape[:2] + + bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2)) + scale_max = min(scale[1], bound) + scale_min = min(scale[0], bound) + + target_area = img_w * img_h * random.uniform(scale_min, scale_max) + target_size = math.sqrt(target_area) + w = int(target_size * w) + h = int(target_size * h) + + i = random.randint(0, img_w - w) + j = random.randint(0, img_h - h) + + img = img[j:j + h, i:i + w, :] + + return self._resize_func(img, size) + + +class RandFlipImage(object): + """ random flip image + flip_code: + 1: Flipped Horizontally + 0: Flipped Vertically + -1: Flipped Horizontally & Vertically + """ + + def __init__(self, flip_code=1): + assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]" + self.flip_code = flip_code + + def __call__(self, img): + if random.randint(0, 1) == 1: + return cv2.flip(img, self.flip_code) + else: + return img + + +class NormalizeImage(object): + """ normalize image such as substract mean, divide std + """ + + def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3): + if isinstance(scale, str): + scale = eval(scale) + assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4." + self.channel_num = channel_num + self.output_dtype = 'float16' if output_fp16 else 'float32' + self.scale = np.float32(scale if scale is not None else 1.0 / 255.0) + self.order = order + mean = mean if mean is not None else [0.485, 0.456, 0.406] + std = std if std is not None else [0.229, 0.224, 0.225] + + shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3) + self.mean = np.array(mean).reshape(shape).astype('float32') + self.std = np.array(std).reshape(shape).astype('float32') + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage" + + img = (img.astype('float32') * self.scale - self.mean) / self.std + + if self.channel_num == 4: + img_h = img.shape[1] if self.order == 'chw' else img.shape[0] + img_w = img.shape[2] if self.order == 'chw' else img.shape[1] + pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1)) + img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate( + (img, pad_zeros), axis=2)) + return img.astype(self.output_dtype) + + +class ToCHWImage(object): + """ convert hwc image to chw image + """ + + def __init__(self): + pass + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + return img.transpose((2, 0, 1)) + + +class ColorJitter(RawColorJitter): + """ColorJitter. + """ + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + def __call__(self, img): + if not isinstance(img, Image.Image): + img = np.ascontiguousarray(img) + img = Image.fromarray(img) + img = super()._apply_image(img) + if isinstance(img, Image.Image): + img = np.asarray(img) + return img + + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + +class Topk(object): + + def __init__(self, topk=1, class_id_map_file=None): + assert isinstance(topk, (int, )) + self.class_id_map = self.parse_class_id_map(class_id_map_file) + self.topk = topk + + def parse_class_id_map(self, class_id_map_file): + if class_id_map_file is None: + return None + if not os.path.exists(class_id_map_file): + print( + "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!" + ) + return None + + try: + class_id_map = {} + with open(class_id_map_file, "r") as fin: + lines = fin.readlines() + for line in lines: + partition = line.split("\n")[0].partition(" ") + class_id_map[int(partition[0])] = str(partition[-1]) + except Exception as ex: + print(ex) + class_id_map = None + return class_id_map + + def __call__(self, x, file_names=None, multilabel=False): + assert isinstance(x, paddle.Tensor) + if file_names is not None: + assert x.shape[0] == len(file_names) + x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x) + x = x.numpy() + y = [] + for idx, probs in enumerate(x): + index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where( + probs >= 0.5)[0].astype("int32") + clas_id_list = [] + score_list = [] + label_name_list = [] + for i in index: + clas_id_list.append(i.item()) + score_list.append(probs[i].item()) + if self.class_id_map is not None: + label_name_list.append(self.class_id_map[i.item()]) + result = { + "class_ids": clas_id_list, + "scores": np.around(score_list, decimals=5).tolist(), + } + if file_names is not None: + result["file_name"] = file_names[idx] + if label_name_list is not None: + result["label_names"] = label_name_list + y.append(result) + return y diff --git a/modules/image/classification/esnet_x0_25_imagenet/utils.py b/modules/image/classification/esnet_x0_25_imagenet/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537 --- /dev/null +++ b/modules/image/classification/esnet_x0_25_imagenet/utils.py @@ -0,0 +1,129 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import yaml + +__all__ = ['get_config'] + + +class AttrDict(dict): + + def __getattr__(self, key): + return self[key] + + def __setattr__(self, key, value): + if key in self.__dict__: + self.__dict__[key] = value + else: + self[key] = value + + def __deepcopy__(self, content): + return copy.deepcopy(dict(self)) + + +def create_attr_dict(yaml_config): + from ast import literal_eval + for key, value in yaml_config.items(): + if type(value) is dict: + yaml_config[key] = value = AttrDict(value) + if isinstance(value, str): + try: + value = literal_eval(value) + except BaseException: + pass + if isinstance(value, AttrDict): + create_attr_dict(yaml_config[key]) + else: + yaml_config[key] = value + + +def parse_config(cfg_file): + """Load a config file into AttrDict""" + with open(cfg_file, 'r') as fopen: + yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader)) + create_attr_dict(yaml_config) + return yaml_config + + +def override(dl, ks, v): + """ + Recursively replace dict of list + Args: + dl(dict or list): dict or list to be replaced + ks(list): list of keys + v(str): value to be replaced + """ + + def str2num(v): + try: + return eval(v) + except Exception: + return v + + assert isinstance(dl, (list, dict)), ("{} should be a list or a dict") + assert len(ks) > 0, ('lenght of keys should larger than 0') + if isinstance(dl, list): + k = str2num(ks[0]) + if len(ks) == 1: + assert k < len(dl), ('index({}) out of range({})'.format(k, dl)) + dl[k] = str2num(v) + else: + override(dl[k], ks[1:], v) + else: + if len(ks) == 1: + # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl)) + if not ks[0] in dl: + print('A new filed ({}) detected!'.format(ks[0], dl)) + dl[ks[0]] = str2num(v) + else: + override(dl[ks[0]], ks[1:], v) + + +def override_config(config, options=None): + """ + Recursively override the config + Args: + config(dict): dict to be replaced + options(list): list of pairs(key0.key1.idx.key2=value) + such as: [ + 'topk=2', + 'VALID.transforms.1.ResizeImage.resize_short=300' + ] + Returns: + config(dict): replaced config + """ + if options is not None: + for opt in options: + assert isinstance(opt, str), ("option({}) should be a str".format(opt)) + assert "=" in opt, ("option({}) should contain a =" + "to distinguish between key and value".format(opt)) + pair = opt.split('=') + assert len(pair) == 2, ("there can be only a = in the option") + key, value = pair + keys = key.split('.') + override(config, keys, value) + return config + + +def get_config(fname, overrides=None, show=False): + """ + Read config from file + """ + assert os.path.exists(fname), ('config file({}) is not exist'.format(fname)) + config = parse_config(fname) + override_config(config, overrides) + return config diff --git a/modules/image/classification/esnet_x0_5_imagenet/README.md b/modules/image/classification/esnet_x0_5_imagenet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..f620be394670f0cfe102dd626f1033c22f7be466 --- /dev/null +++ b/modules/image/classification/esnet_x0_5_imagenet/README.md @@ -0,0 +1,133 @@ +# esnet_x0_5_imagenet + +|模型名称|esnet_x0_5_imagenet| +| :--- | :---: | +|类别|图像-图像分类| +|网络|ESNet| +|数据集|ImageNet-2012| +|是否支持Fine-tuning|否| +|模型大小|12 MB| +|最新更新日期|2022-04-02| +|数据指标|Acc| + + +## 一、模型基本信息 + + + +- ### 模型介绍 + + - ESNet(Enhanced ShuffleNet)是百度自研的一个轻量级网络,该网络在 ShuffleNetV2 的基础上融合了 MobileNetV3、GhostNet、PPLCNet 的优点,组合成了一个在 ARM 设备上速度更快、精度更高的网络,由于其出色的表现,所以在 PaddleDetection 推出的 [PP-PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/picodet) 使用了该模型做 backbone,配合更强的目标检测算法,最终的指标一举刷新了目标检测模型在 ARM 设备上的 SOTA 指标。该模型为模型规模参数scale为x0.5下的ESNet模型。 + + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 1.6.2 + + - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + +- ### 2、安装 + + - ```shell + $ hub install esnet_x0_5_imagenet + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run esnet_x0_5_imagenet --input_path "/PATH/TO/IMAGE" + ``` + - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、预测代码示例 + + - ```python + import paddlehub as hub + import cv2 + + classifier = hub.Module(name="esnet_x0_5_imagenet") + result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')]) + # or + # result = classifier.classification(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + + - ```python + def classification(images=None, + paths=None, + batch_size=1, + use_gpu=False, + top_k=1): + ``` + - 分类接口API。 + - **参数** + + - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。 + + - **返回** + + - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称) + + +## 四、服务部署 + +- PaddleHub Serving可以部署一个图像识别的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m esnet_x0_5_imagenet + ``` + + - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"\} + url = "http://127.0.0.1:8866/predict/esnet_x0_5_imagenet" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` + + +## 五、更新历史 + +* 1.0.0 + + 初始发布 + + - ```shell + $ hub install esnet_x0_5_imagenet==1.0.0 + ``` diff --git a/modules/image/classification/esnet_x0_5_imagenet/model.py b/modules/image/classification/esnet_x0_5_imagenet/model.py new file mode 100644 index 0000000000000000000000000000000000000000..4e6bd8c7b1dc4c3207ab2ad20113861b94d5af16 --- /dev/null +++ b/modules/image/classification/esnet_x0_5_imagenet/model.py @@ -0,0 +1,506 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import math +from typing import Any +from typing import Callable +from typing import Dict +from typing import List +from typing import Tuple +from typing import Union + +import paddle +import paddle.nn as nn +from paddle import concat +from paddle import ParamAttr +from paddle import reshape +from paddle import split +from paddle import transpose +from paddle.nn import AdaptiveAvgPool2D +from paddle.nn import BatchNorm +from paddle.nn import Conv2D +from paddle.nn import Dropout +from paddle.nn import Linear +from paddle.nn import MaxPool2D +from paddle.nn.initializer import KaimingNormal +from paddle.regularizer import L2Decay + +MODEL_STAGES_PATTERN = {"ESNet": ["blocks[2]", "blocks[9]", "blocks[12]"]} + + +class Identity(nn.Layer): + + def __init__(self): + super(Identity, self).__init__() + + def forward(self, inputs): + return inputs + + +class TheseusLayer(nn.Layer): + + def __init__(self, *args, **kwargs): + super(TheseusLayer, self).__init__() + self.res_dict = {} + self.res_name = self.full_name() + self.pruner = None + self.quanter = None + + def _return_dict_hook(self, layer, input, output): + res_dict = {"output": output} + # 'list' is needed to avoid error raised by popping self.res_dict + for res_key in list(self.res_dict): + # clear the res_dict because the forward process may change according to input + res_dict[res_key] = self.res_dict.pop(res_key) + return res_dict + + def init_res(self, stages_pattern, return_patterns=None, return_stages=None): + if return_patterns and return_stages: + msg = f"The 'return_patterns' would be ignored when 'return_stages' is set." + return_stages = None + + if return_stages is True: + return_patterns = stages_pattern + # return_stages is int or bool + if type(return_stages) is int: + return_stages = [return_stages] + if isinstance(return_stages, list): + if max(return_stages) > len(stages_pattern) or min(return_stages) < 0: + msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}." + return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)] + return_patterns = [stages_pattern[i] for i in return_stages] + + if return_patterns: + self.update_res(return_patterns) + + def replace_sub(self, *args, **kwargs) -> None: + msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead." + raise DeprecationWarning(msg) + + def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]], + handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]: + """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'. + + Args: + layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'. + handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed. + + Returns: + Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'. + + Examples: + + from paddle import nn + import paddleclas + + def rep_func(layer: nn.Layer, pattern: str): + new_layer = nn.Conv2D( + in_channels=layer._in_channels, + out_channels=layer._out_channels, + kernel_size=5, + padding=2 + ) + return new_layer + + net = paddleclas.MobileNetV1() + res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func) + print(res) + # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer} + """ + + if not isinstance(layer_name_pattern, list): + layer_name_pattern = [layer_name_pattern] + + hit_layer_pattern_list = [] + for pattern in layer_name_pattern: + # parse pattern to find target layer and its parent + layer_list = parse_pattern_str(pattern=pattern, parent_layer=self) + if not layer_list: + continue + sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self + + sub_layer = layer_list[-1]["layer"] + sub_layer_name = layer_list[-1]["name"] + sub_layer_index = layer_list[-1]["index"] + + new_sub_layer = handle_func(sub_layer, pattern) + + if sub_layer_index: + getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer + else: + setattr(sub_layer_parent, sub_layer_name, new_sub_layer) + + hit_layer_pattern_list.append(pattern) + return hit_layer_pattern_list + + def stop_after(self, stop_layer_name: str) -> bool: + """stop forward and backward after 'stop_layer_name'. + + Args: + stop_layer_name (str): The name of layer that stop forward and backward after this layer. + + Returns: + bool: 'True' if successful, 'False' otherwise. + """ + + layer_list = parse_pattern_str(stop_layer_name, self) + if not layer_list: + return False + + parent_layer = self + for layer_dict in layer_list: + name, index = layer_dict["name"], layer_dict["index"] + if not set_identity(parent_layer, name, index): + msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'." + return False + parent_layer = layer_dict["layer"] + + return True + + def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]: + """update the result(s) to be returned. + + Args: + return_patterns (Union[str, List[str]]): The name of layer to return output. + + Returns: + Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully. + """ + + # clear res_dict that could have been set + self.res_dict = {} + + class Handler(object): + + def __init__(self, res_dict): + # res_dict is a reference + self.res_dict = res_dict + + def __call__(self, layer, pattern): + layer.res_dict = self.res_dict + layer.res_name = pattern + if hasattr(layer, "hook_remove_helper"): + layer.hook_remove_helper.remove() + layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook) + return layer + + handle_func = Handler(self.res_dict) + + hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func) + + if hasattr(self, "hook_remove_helper"): + self.hook_remove_helper.remove() + self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook) + + return hit_layer_pattern_list + + +def save_sub_res_hook(layer, input, output): + layer.res_dict[layer.res_name] = output + + +def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool: + """set the layer specified by layer_name and layer_index to Indentity. + + Args: + parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index. + layer_name (str): The name of target layer to be set to Indentity. + layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None. + + Returns: + bool: True if successfully, False otherwise. + """ + + stop_after = False + for sub_layer_name in parent_layer._sub_layers: + if stop_after: + parent_layer._sub_layers[sub_layer_name] = Identity() + continue + if sub_layer_name == layer_name: + stop_after = True + + if layer_index and stop_after: + stop_after = False + for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers: + if stop_after: + parent_layer._sub_layers[layer_name][sub_layer_index] = Identity() + continue + if layer_index == sub_layer_index: + stop_after = True + + return stop_after + + +def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: + """parse the string type pattern. + + Args: + pattern (str): The pattern to discribe layer. + parent_layer (nn.Layer): The root layer relative to the pattern. + + Returns: + Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order: + [ + {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist}, + {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist}, + ... + ] + """ + + pattern_list = pattern.split(".") + if not pattern_list: + msg = f"The pattern('{pattern}') is illegal. Please check and retry." + return None + + layer_list = [] + while len(pattern_list) > 0: + if '[' in pattern_list[0]: + target_layer_name = pattern_list[0].split('[')[0] + target_layer_index = pattern_list[0].split('[')[1].split(']')[0] + else: + target_layer_name = pattern_list[0] + target_layer_index = None + + target_layer = getattr(parent_layer, target_layer_name, None) + + if target_layer is None: + msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')." + return None + + if target_layer_index and target_layer: + if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer): + msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0." + return None + + target_layer = target_layer[target_layer_index] + + layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index}) + + pattern_list = pattern_list[1:] + parent_layer = target_layer + return layer_list + + +def channel_shuffle(x, groups): + batch_size, num_channels, height, width = x.shape[0:4] + channels_per_group = num_channels // groups + x = reshape(x=x, shape=[batch_size, groups, channels_per_group, height, width]) + x = transpose(x=x, perm=[0, 2, 1, 3, 4]) + x = reshape(x=x, shape=[batch_size, num_channels, height, width]) + return x + + +def make_divisible(v, divisor=8, min_value=None): + if min_value is None: + min_value = divisor + new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) + if new_v < 0.9 * v: + new_v += divisor + return new_v + + +class ConvBNLayer(TheseusLayer): + + def __init__(self, in_channels, out_channels, kernel_size, stride=1, groups=1, if_act=True): + super().__init__() + self.conv = Conv2D(in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=stride, + padding=(kernel_size - 1) // 2, + groups=groups, + weight_attr=ParamAttr(initializer=KaimingNormal()), + bias_attr=False) + + self.bn = BatchNorm(out_channels, + param_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + self.if_act = if_act + self.hardswish = nn.Hardswish() + + def forward(self, x): + x = self.conv(x) + x = self.bn(x) + if self.if_act: + x = self.hardswish(x) + return x + + +class SEModule(TheseusLayer): + + def __init__(self, channel, reduction=4): + super().__init__() + self.avg_pool = AdaptiveAvgPool2D(1) + self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0) + self.relu = nn.ReLU() + self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0) + self.hardsigmoid = nn.Hardsigmoid() + + def forward(self, x): + identity = x + x = self.avg_pool(x) + x = self.conv1(x) + x = self.relu(x) + x = self.conv2(x) + x = self.hardsigmoid(x) + x = paddle.multiply(x=identity, y=x) + return x + + +class ESBlock1(TheseusLayer): + + def __init__(self, in_channels, out_channels): + super().__init__() + self.pw_1_1 = ConvBNLayer(in_channels=in_channels // 2, out_channels=out_channels // 2, kernel_size=1, stride=1) + self.dw_1 = ConvBNLayer(in_channels=out_channels // 2, + out_channels=out_channels // 2, + kernel_size=3, + stride=1, + groups=out_channels // 2, + if_act=False) + self.se = SEModule(out_channels) + + self.pw_1_2 = ConvBNLayer(in_channels=out_channels, out_channels=out_channels // 2, kernel_size=1, stride=1) + + def forward(self, x): + x1, x2 = split(x, num_or_sections=[x.shape[1] // 2, x.shape[1] // 2], axis=1) + x2 = self.pw_1_1(x2) + x3 = self.dw_1(x2) + x3 = concat([x2, x3], axis=1) + x3 = self.se(x3) + x3 = self.pw_1_2(x3) + x = concat([x1, x3], axis=1) + return channel_shuffle(x, 2) + + +class ESBlock2(TheseusLayer): + + def __init__(self, in_channels, out_channels): + super().__init__() + + # branch1 + self.dw_1 = ConvBNLayer(in_channels=in_channels, + out_channels=in_channels, + kernel_size=3, + stride=2, + groups=in_channels, + if_act=False) + self.pw_1 = ConvBNLayer(in_channels=in_channels, out_channels=out_channels // 2, kernel_size=1, stride=1) + # branch2 + self.pw_2_1 = ConvBNLayer(in_channels=in_channels, out_channels=out_channels // 2, kernel_size=1) + self.dw_2 = ConvBNLayer(in_channels=out_channels // 2, + out_channels=out_channels // 2, + kernel_size=3, + stride=2, + groups=out_channels // 2, + if_act=False) + self.se = SEModule(out_channels // 2) + self.pw_2_2 = ConvBNLayer(in_channels=out_channels // 2, out_channels=out_channels // 2, kernel_size=1) + self.concat_dw = ConvBNLayer(in_channels=out_channels, + out_channels=out_channels, + kernel_size=3, + groups=out_channels) + self.concat_pw = ConvBNLayer(in_channels=out_channels, out_channels=out_channels, kernel_size=1) + + def forward(self, x): + x1 = self.dw_1(x) + x1 = self.pw_1(x1) + x2 = self.pw_2_1(x) + x2 = self.dw_2(x2) + x2 = self.se(x2) + x2 = self.pw_2_2(x2) + x = concat([x1, x2], axis=1) + x = self.concat_dw(x) + x = self.concat_pw(x) + return x + + +class ESNet(TheseusLayer): + + def __init__(self, + stages_pattern, + class_num=1000, + scale=1.0, + dropout_prob=0.2, + class_expand=1280, + return_patterns=None, + return_stages=None): + super().__init__() + self.scale = scale + self.class_num = class_num + self.class_expand = class_expand + stage_repeats = [3, 7, 3] + stage_out_channels = [ + -1, 24, make_divisible(116 * scale), + make_divisible(232 * scale), + make_divisible(464 * scale), 1024 + ] + + self.conv1 = ConvBNLayer(in_channels=3, out_channels=stage_out_channels[1], kernel_size=3, stride=2) + self.max_pool = MaxPool2D(kernel_size=3, stride=2, padding=1) + + block_list = [] + for stage_id, num_repeat in enumerate(stage_repeats): + for i in range(num_repeat): + if i == 0: + block = ESBlock2(in_channels=stage_out_channels[stage_id + 1], + out_channels=stage_out_channels[stage_id + 2]) + else: + block = ESBlock1(in_channels=stage_out_channels[stage_id + 2], + out_channels=stage_out_channels[stage_id + 2]) + block_list.append(block) + self.blocks = nn.Sequential(*block_list) + + self.conv2 = ConvBNLayer(in_channels=stage_out_channels[-2], out_channels=stage_out_channels[-1], kernel_size=1) + + self.avg_pool = AdaptiveAvgPool2D(1) + + self.last_conv = Conv2D(in_channels=stage_out_channels[-1], + out_channels=self.class_expand, + kernel_size=1, + stride=1, + padding=0, + bias_attr=False) + self.hardswish = nn.Hardswish() + self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer") + self.flatten = nn.Flatten(start_axis=1, stop_axis=-1) + self.fc = Linear(self.class_expand, self.class_num) + + super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages) + + def forward(self, x): + x = self.conv1(x) + x = self.max_pool(x) + x = self.blocks(x) + x = self.conv2(x) + x = self.avg_pool(x) + x = self.last_conv(x) + x = self.hardswish(x) + x = self.dropout(x) + x = self.flatten(x) + x = self.fc(x) + return x + + +def ESNet_x0_5(pretrained=False, use_ssld=False, **kwargs): + """ + ESNet_x0_5 + Args: + pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise. + If str, means the path of the pretrained model. + use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True. + Returns: + model: nn.Layer. Specific `ESNet_x0_5` model depends on args. + """ + model = ESNet(scale=0.5, stages_pattern=MODEL_STAGES_PATTERN["ESNet"], **kwargs) + return model diff --git a/modules/image/classification/esnet_x0_5_imagenet/module.py b/modules/image/classification/esnet_x0_5_imagenet/module.py new file mode 100644 index 0000000000000000000000000000000000000000..0abb6c0f5dbf0e83ee51ea730ed7cb16c4a6c6b7 --- /dev/null +++ b/modules/image/classification/esnet_x0_5_imagenet/module.py @@ -0,0 +1,154 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import cv2 +import numpy as np +import paddle +from skimage.io import imread +from skimage.transform import rescale +from skimage.transform import resize + +import paddlehub as hub +from .model import ESNet_x0_5 +from .processor import base64_to_cv2 +from .processor import create_operators +from .processor import Topk +from .utils import get_config +from paddlehub.module.module import moduleinfo +from paddlehub.module.module import runnable +from paddlehub.module.module import serving + + +@moduleinfo(name="esnet_x0_5_imagenet", + type="cv/classification", + author="paddlepaddle", + author_email="", + summary="", + version="1.0.0") +class Esnet_x0_5_Imagenet: + + def __init__(self): + self.config = get_config(os.path.join(self.directory, 'ESNet_x0_5.yaml'), show=False) + self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt') + self.pretrain_path = os.path.join(self.directory, 'ESNet_x0_5_pretrained.pdparams') + self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path + self.model = ESNet_x0_5() + param_state_dict = paddle.load(self.pretrain_path) + self.model.set_dict(param_state_dict) + self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"]) + + def classification(self, + images: list = None, + paths: list = None, + batch_size: int = 1, + use_gpu: bool = False, + top_k: int = 1): + ''' + Args: + images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR. + paths (list[str]): The paths of images. + batch_size (int): batch size. + use_gpu (bool): Whether to use gpu. + top_k (int): Return top k results. + + Returns: + res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'. + ''' + postprocess_func = Topk(top_k, self.label_path) + inputs = [] + results = [] + paddle.disable_static() + place = 'gpu:0' if use_gpu else 'cpu' + place = paddle.set_device(place) + if images == None and paths == None: + print('No image provided. Please input an image or a image path.') + return + + if images != None: + for image in images: + image = image[:, :, ::-1] + inputs.append(image) + + if paths != None: + for path in paths: + image = cv2.imread(path)[:, :, ::-1] + inputs.append(image) + + batch_data = [] + for idx, imagedata in enumerate(inputs): + for process in self.preprocess_funcs: + imagedata = process(imagedata) + batch_data.append(imagedata) + if len(batch_data) >= batch_size or idx == len(inputs) - 1: + batch_tensor = paddle.to_tensor(batch_data) + out = self.model(batch_tensor) + if isinstance(out, list): + out = out[0] + if isinstance(out, dict) and "logits" in out: + out = out["logits"] + if isinstance(out, dict) and "output" in out: + out = out["output"] + result = postprocess_func(out) + results.extend(result) + batch_data.clear() + return results + + @runnable + def run_cmd(self, argvs: list): + """ + Run as a command. + """ + self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name), + prog='hub run {}'.format(self.name), + usage='%(prog)s', + add_help=True) + + self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required") + self.arg_config_group = self.parser.add_argument_group( + title="Config options", description="Run configuration for controlling module behavior, not required.") + self.add_module_config_arg() + self.add_module_input_arg() + self.args = self.parser.parse_args(argvs) + results = self.classification(paths=[self.args.input_path], + use_gpu=self.args.use_gpu, + batch_size=self.args.batch_size, + top_k=self.args.top_k) + return results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.classification(images=images_decode, **kwargs) + return results + + def add_module_config_arg(self): + """ + Add the command config options. + """ + self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not") + + self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size') + self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.') + + def add_module_input_arg(self): + """ + Add the command input options. + """ + self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.") diff --git a/modules/image/classification/esnet_x0_5_imagenet/processor.py b/modules/image/classification/esnet_x0_5_imagenet/processor.py new file mode 100644 index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8 --- /dev/null +++ b/modules/image/classification/esnet_x0_5_imagenet/processor.py @@ -0,0 +1,374 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import base64 +import inspect +import math +import os +import random +import sys +from functools import partial + +import cv2 +import numpy as np +import paddle +import paddle.nn.functional as F +import six +from paddle.vision.transforms import ColorJitter as RawColorJitter +from PIL import Image + + +def create_operators(params, class_num=None): + """ + create operators based on the config + + Args: + params(list): a dict list, used to create some operators + """ + assert isinstance(params, list), ('operator config should be a list') + ops = [] + current_module = sys.modules[__name__] + for operator in params: + assert isinstance(operator, dict) and len(operator) == 1, "yaml format error" + op_name = list(operator)[0] + param = {} if operator[op_name] is None else operator[op_name] + op_func = getattr(current_module, op_name) + if "class_num" in inspect.getfullargspec(op_func).args: + param.update({"class_num": class_num}) + op = op_func(**param) + ops.append(op) + + return ops + + +class UnifiedResize(object): + + def __init__(self, interpolation=None, backend="cv2"): + _cv2_interp_from_str = { + 'nearest': cv2.INTER_NEAREST, + 'bilinear': cv2.INTER_LINEAR, + 'area': cv2.INTER_AREA, + 'bicubic': cv2.INTER_CUBIC, + 'lanczos': cv2.INTER_LANCZOS4 + } + _pil_interp_from_str = { + 'nearest': Image.NEAREST, + 'bilinear': Image.BILINEAR, + 'bicubic': Image.BICUBIC, + 'box': Image.BOX, + 'lanczos': Image.LANCZOS, + 'hamming': Image.HAMMING + } + + def _pil_resize(src, size, resample): + pil_img = Image.fromarray(src) + pil_img = pil_img.resize(size, resample) + return np.asarray(pil_img) + + if backend.lower() == "cv2": + if isinstance(interpolation, str): + interpolation = _cv2_interp_from_str[interpolation.lower()] + # compatible with opencv < version 4.4.0 + elif interpolation is None: + interpolation = cv2.INTER_LINEAR + self.resize_func = partial(cv2.resize, interpolation=interpolation) + elif backend.lower() == "pil": + if isinstance(interpolation, str): + interpolation = _pil_interp_from_str[interpolation.lower()] + self.resize_func = partial(_pil_resize, resample=interpolation) + else: + self.resize_func = cv2.resize + + def __call__(self, src, size): + return self.resize_func(src, size) + + +class OperatorParamError(ValueError): + """ OperatorParamError + """ + pass + + +class DecodeImage(object): + """ decode image """ + + def __init__(self, to_rgb=True, to_np=False, channel_first=False): + self.to_rgb = to_rgb + self.to_np = to_np # to numpy + self.channel_first = channel_first # only enabled when to_np is True + + def __call__(self, img): + if six.PY2: + assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage" + else: + assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage" + data = np.frombuffer(img, dtype='uint8') + img = cv2.imdecode(data, 1) + if self.to_rgb: + assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape) + img = img[:, :, ::-1] + + if self.channel_first: + img = img.transpose((2, 0, 1)) + + return img + + +class ResizeImage(object): + """ resize image """ + + def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"): + if resize_short is not None and resize_short > 0: + self.resize_short = resize_short + self.w = None + self.h = None + elif size is not None: + self.resize_short = None + self.w = size if type(size) is int else size[0] + self.h = size if type(size) is int else size[1] + else: + raise OperatorParamError("invalid params for ReisizeImage for '\ + 'both 'size' and 'resize_short' are None") + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + img_h, img_w = img.shape[:2] + if self.resize_short is not None: + percent = float(self.resize_short) / min(img_w, img_h) + w = int(round(img_w * percent)) + h = int(round(img_h * percent)) + else: + w = self.w + h = self.h + return self._resize_func(img, (w, h)) + + +class CropImage(object): + """ crop image """ + + def __init__(self, size): + if type(size) is int: + self.size = (size, size) + else: + self.size = size # (h, w) + + def __call__(self, img): + w, h = self.size + img_h, img_w = img.shape[:2] + w_start = (img_w - w) // 2 + h_start = (img_h - h) // 2 + + w_end = w_start + w + h_end = h_start + h + return img[h_start:h_end, w_start:w_end, :] + + +class RandCropImage(object): + """ random crop image """ + + def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"): + if type(size) is int: + self.size = (size, size) # (h, w) + else: + self.size = size + + self.scale = [0.08, 1.0] if scale is None else scale + self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + size = self.size + scale = self.scale + ratio = self.ratio + + aspect_ratio = math.sqrt(random.uniform(*ratio)) + w = 1. * aspect_ratio + h = 1. / aspect_ratio + + img_h, img_w = img.shape[:2] + + bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2)) + scale_max = min(scale[1], bound) + scale_min = min(scale[0], bound) + + target_area = img_w * img_h * random.uniform(scale_min, scale_max) + target_size = math.sqrt(target_area) + w = int(target_size * w) + h = int(target_size * h) + + i = random.randint(0, img_w - w) + j = random.randint(0, img_h - h) + + img = img[j:j + h, i:i + w, :] + + return self._resize_func(img, size) + + +class RandFlipImage(object): + """ random flip image + flip_code: + 1: Flipped Horizontally + 0: Flipped Vertically + -1: Flipped Horizontally & Vertically + """ + + def __init__(self, flip_code=1): + assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]" + self.flip_code = flip_code + + def __call__(self, img): + if random.randint(0, 1) == 1: + return cv2.flip(img, self.flip_code) + else: + return img + + +class NormalizeImage(object): + """ normalize image such as substract mean, divide std + """ + + def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3): + if isinstance(scale, str): + scale = eval(scale) + assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4." + self.channel_num = channel_num + self.output_dtype = 'float16' if output_fp16 else 'float32' + self.scale = np.float32(scale if scale is not None else 1.0 / 255.0) + self.order = order + mean = mean if mean is not None else [0.485, 0.456, 0.406] + std = std if std is not None else [0.229, 0.224, 0.225] + + shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3) + self.mean = np.array(mean).reshape(shape).astype('float32') + self.std = np.array(std).reshape(shape).astype('float32') + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage" + + img = (img.astype('float32') * self.scale - self.mean) / self.std + + if self.channel_num == 4: + img_h = img.shape[1] if self.order == 'chw' else img.shape[0] + img_w = img.shape[2] if self.order == 'chw' else img.shape[1] + pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1)) + img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate( + (img, pad_zeros), axis=2)) + return img.astype(self.output_dtype) + + +class ToCHWImage(object): + """ convert hwc image to chw image + """ + + def __init__(self): + pass + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + return img.transpose((2, 0, 1)) + + +class ColorJitter(RawColorJitter): + """ColorJitter. + """ + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + def __call__(self, img): + if not isinstance(img, Image.Image): + img = np.ascontiguousarray(img) + img = Image.fromarray(img) + img = super()._apply_image(img) + if isinstance(img, Image.Image): + img = np.asarray(img) + return img + + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + +class Topk(object): + + def __init__(self, topk=1, class_id_map_file=None): + assert isinstance(topk, (int, )) + self.class_id_map = self.parse_class_id_map(class_id_map_file) + self.topk = topk + + def parse_class_id_map(self, class_id_map_file): + if class_id_map_file is None: + return None + if not os.path.exists(class_id_map_file): + print( + "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!" + ) + return None + + try: + class_id_map = {} + with open(class_id_map_file, "r") as fin: + lines = fin.readlines() + for line in lines: + partition = line.split("\n")[0].partition(" ") + class_id_map[int(partition[0])] = str(partition[-1]) + except Exception as ex: + print(ex) + class_id_map = None + return class_id_map + + def __call__(self, x, file_names=None, multilabel=False): + assert isinstance(x, paddle.Tensor) + if file_names is not None: + assert x.shape[0] == len(file_names) + x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x) + x = x.numpy() + y = [] + for idx, probs in enumerate(x): + index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where( + probs >= 0.5)[0].astype("int32") + clas_id_list = [] + score_list = [] + label_name_list = [] + for i in index: + clas_id_list.append(i.item()) + score_list.append(probs[i].item()) + if self.class_id_map is not None: + label_name_list.append(self.class_id_map[i.item()]) + result = { + "class_ids": clas_id_list, + "scores": np.around(score_list, decimals=5).tolist(), + } + if file_names is not None: + result["file_name"] = file_names[idx] + if label_name_list is not None: + result["label_names"] = label_name_list + y.append(result) + return y diff --git a/modules/image/classification/esnet_x0_5_imagenet/utils.py b/modules/image/classification/esnet_x0_5_imagenet/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537 --- /dev/null +++ b/modules/image/classification/esnet_x0_5_imagenet/utils.py @@ -0,0 +1,129 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import yaml + +__all__ = ['get_config'] + + +class AttrDict(dict): + + def __getattr__(self, key): + return self[key] + + def __setattr__(self, key, value): + if key in self.__dict__: + self.__dict__[key] = value + else: + self[key] = value + + def __deepcopy__(self, content): + return copy.deepcopy(dict(self)) + + +def create_attr_dict(yaml_config): + from ast import literal_eval + for key, value in yaml_config.items(): + if type(value) is dict: + yaml_config[key] = value = AttrDict(value) + if isinstance(value, str): + try: + value = literal_eval(value) + except BaseException: + pass + if isinstance(value, AttrDict): + create_attr_dict(yaml_config[key]) + else: + yaml_config[key] = value + + +def parse_config(cfg_file): + """Load a config file into AttrDict""" + with open(cfg_file, 'r') as fopen: + yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader)) + create_attr_dict(yaml_config) + return yaml_config + + +def override(dl, ks, v): + """ + Recursively replace dict of list + Args: + dl(dict or list): dict or list to be replaced + ks(list): list of keys + v(str): value to be replaced + """ + + def str2num(v): + try: + return eval(v) + except Exception: + return v + + assert isinstance(dl, (list, dict)), ("{} should be a list or a dict") + assert len(ks) > 0, ('lenght of keys should larger than 0') + if isinstance(dl, list): + k = str2num(ks[0]) + if len(ks) == 1: + assert k < len(dl), ('index({}) out of range({})'.format(k, dl)) + dl[k] = str2num(v) + else: + override(dl[k], ks[1:], v) + else: + if len(ks) == 1: + # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl)) + if not ks[0] in dl: + print('A new filed ({}) detected!'.format(ks[0], dl)) + dl[ks[0]] = str2num(v) + else: + override(dl[ks[0]], ks[1:], v) + + +def override_config(config, options=None): + """ + Recursively override the config + Args: + config(dict): dict to be replaced + options(list): list of pairs(key0.key1.idx.key2=value) + such as: [ + 'topk=2', + 'VALID.transforms.1.ResizeImage.resize_short=300' + ] + Returns: + config(dict): replaced config + """ + if options is not None: + for opt in options: + assert isinstance(opt, str), ("option({}) should be a str".format(opt)) + assert "=" in opt, ("option({}) should contain a =" + "to distinguish between key and value".format(opt)) + pair = opt.split('=') + assert len(pair) == 2, ("there can be only a = in the option") + key, value = pair + keys = key.split('.') + override(config, keys, value) + return config + + +def get_config(fname, overrides=None, show=False): + """ + Read config from file + """ + assert os.path.exists(fname), ('config file({}) is not exist'.format(fname)) + config = parse_config(fname) + override_config(config, overrides) + return config diff --git a/modules/image/classification/levit_128_imagenet/README.md b/modules/image/classification/levit_128_imagenet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..5a1bbedfcee657fba445bd53ad12e41489b79afb --- /dev/null +++ b/modules/image/classification/levit_128_imagenet/README.md @@ -0,0 +1,132 @@ +# levit_128_imagenet + +|模型名称|levit_128_imagenet| +| :--- | :---: | +|类别|图像-图像分类| +|网络|LeViT| +|数据集|ImageNet-2012| +|是否支持Fine-tuning|否| +|模型大小|54 MB| +|最新更新日期|2022-04-02| +|数据指标|Acc| + + +## 一、模型基本信息 + + +- ### 模型介绍 + + - LeViT 是一种快速推理的、用于图像分类任务的混合神经网络。其设计之初考虑了网络模型在不同的硬件平台上的性能,因此能够更好地反映普遍应用的真实场景。通过大量实验,作者找到了卷积神经网络与 Transformer 体系更好的结合方式,并且提出了 attention-based 方法,用于整合 Transformer 中的位置信息编码, 该模块的模型结构配置为LeViT128, 详情可参考[论文地址](https://arxiv.org/abs/2104.01136)。 + + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 1.6.2 + + - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + +- ### 2、安装 + + - ```shell + $ hub install levit_128_imagenet + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run levit_128_imagenet --input_path "/PATH/TO/IMAGE" + ``` + - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、预测代码示例 + + - ```python + import paddlehub as hub + import cv2 + + classifier = hub.Module(name="levit_128_imagenet") + result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')]) + # or + # result = classifier.classification(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + + - ```python + def classification(images=None, + paths=None, + batch_size=1, + use_gpu=False, + top_k=1): + ``` + - 分类接口API。 + - **参数** + + - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。 + + - **返回** + + - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称) + + +## 四、服务部署 + +- PaddleHub Serving可以部署一个图像识别的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m levit_128_imagenet + ``` + + - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"\} + url = "http://127.0.0.1:8866/predict/levit_128_imagenet" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` + + +## 五、更新历史 + +* 1.0.0 + + 初始发布 + + - ```shell + $ hub install levit_128_imagenet==1.0.0 + ``` diff --git a/modules/image/classification/levit_128_imagenet/model.py b/modules/image/classification/levit_128_imagenet/model.py new file mode 100644 index 0000000000000000000000000000000000000000..2cf87d515201935a705b77737137ed2d4567fd40 --- /dev/null +++ b/modules/image/classification/levit_128_imagenet/model.py @@ -0,0 +1,450 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# Code was based on https://github.com/facebookresearch/LeViT +import itertools +import math +import warnings + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.initializer import Constant +from paddle.nn.initializer import TruncatedNormal +from paddle.regularizer import L2Decay + +from .vision_transformer import Identity +from .vision_transformer import ones_ +from .vision_transformer import trunc_normal_ +from .vision_transformer import zeros_ + + +def cal_attention_biases(attention_biases, attention_bias_idxs): + gather_list = [] + attention_bias_t = paddle.transpose(attention_biases, (1, 0)) + nums = attention_bias_idxs.shape[0] + for idx in range(nums): + gather = paddle.gather(attention_bias_t, attention_bias_idxs[idx]) + gather_list.append(gather) + shape0, shape1 = attention_bias_idxs.shape + gather = paddle.concat(gather_list) + return paddle.transpose(gather, (1, 0)).reshape((0, shape0, shape1)) + + +class Conv2d_BN(nn.Sequential): + + def __init__(self, a, b, ks=1, stride=1, pad=0, dilation=1, groups=1, bn_weight_init=1, resolution=-10000): + super().__init__() + self.add_sublayer('c', nn.Conv2D(a, b, ks, stride, pad, dilation, groups, bias_attr=False)) + bn = nn.BatchNorm2D(b) + ones_(bn.weight) + zeros_(bn.bias) + self.add_sublayer('bn', bn) + + +class Linear_BN(nn.Sequential): + + def __init__(self, a, b, bn_weight_init=1): + super().__init__() + self.add_sublayer('c', nn.Linear(a, b, bias_attr=False)) + bn = nn.BatchNorm1D(b) + if bn_weight_init == 0: + zeros_(bn.weight) + else: + ones_(bn.weight) + zeros_(bn.bias) + self.add_sublayer('bn', bn) + + def forward(self, x): + l, bn = self._sub_layers.values() + x = l(x) + return paddle.reshape(bn(x.flatten(0, 1)), x.shape) + + +class BN_Linear(nn.Sequential): + + def __init__(self, a, b, bias=True, std=0.02): + super().__init__() + self.add_sublayer('bn', nn.BatchNorm1D(a)) + l = nn.Linear(a, b, bias_attr=bias) + trunc_normal_(l.weight) + if bias: + zeros_(l.bias) + self.add_sublayer('l', l) + + +def b16(n, activation, resolution=224): + return nn.Sequential(Conv2d_BN(3, n // 8, 3, 2, 1, resolution=resolution), activation(), + Conv2d_BN(n // 8, n // 4, 3, 2, 1, resolution=resolution // 2), activation(), + Conv2d_BN(n // 4, n // 2, 3, 2, 1, resolution=resolution // 4), activation(), + Conv2d_BN(n // 2, n, 3, 2, 1, resolution=resolution // 8)) + + +class Residual(nn.Layer): + + def __init__(self, m, drop): + super().__init__() + self.m = m + self.drop = drop + + def forward(self, x): + if self.training and self.drop > 0: + y = paddle.rand(shape=[x.shape[0], 1, 1]).__ge__(self.drop).astype("float32") + y = y.divide(paddle.full_like(y, 1 - self.drop)) + return paddle.add(x, y) + else: + return paddle.add(x, self.m(x)) + + +class Attention(nn.Layer): + + def __init__(self, dim, key_dim, num_heads=8, attn_ratio=4, activation=None, resolution=14): + super().__init__() + self.num_heads = num_heads + self.scale = key_dim**-0.5 + self.key_dim = key_dim + self.nh_kd = nh_kd = key_dim * num_heads + self.d = int(attn_ratio * key_dim) + self.dh = int(attn_ratio * key_dim) * num_heads + self.attn_ratio = attn_ratio + self.h = self.dh + nh_kd * 2 + self.qkv = Linear_BN(dim, self.h) + self.proj = nn.Sequential(activation(), Linear_BN(self.dh, dim, bn_weight_init=0)) + points = list(itertools.product(range(resolution), range(resolution))) + N = len(points) + attention_offsets = {} + idxs = [] + for p1 in points: + for p2 in points: + offset = (abs(p1[0] - p2[0]), abs(p1[1] - p2[1])) + if offset not in attention_offsets: + attention_offsets[offset] = len(attention_offsets) + idxs.append(attention_offsets[offset]) + self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)), + default_initializer=zeros_, + attr=paddle.ParamAttr(regularizer=L2Decay(0.0))) + tensor_idxs = paddle.to_tensor(idxs, dtype='int64') + self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs, [N, N])) + + @paddle.no_grad() + def train(self, mode=True): + if mode: + super().train() + else: + super().eval() + if mode and hasattr(self, 'ab'): + del self.ab + else: + self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + + def forward(self, x): + self.training = True + B, N, C = x.shape + qkv = self.qkv(x) + qkv = paddle.reshape(qkv, [B, N, self.num_heads, self.h // self.num_heads]) + q, k, v = paddle.split(qkv, [self.key_dim, self.key_dim, self.d], axis=3) + q = paddle.transpose(q, perm=[0, 2, 1, 3]) + k = paddle.transpose(k, perm=[0, 2, 1, 3]) + v = paddle.transpose(v, perm=[0, 2, 1, 3]) + k_transpose = paddle.transpose(k, perm=[0, 1, 3, 2]) + + if self.training: + attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + else: + attention_biases = self.ab + attn = (paddle.matmul(q, k_transpose) * self.scale + attention_biases) + attn = F.softmax(attn) + x = paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3]) + x = paddle.reshape(x, [B, N, self.dh]) + x = self.proj(x) + return x + + +class Subsample(nn.Layer): + + def __init__(self, stride, resolution): + super().__init__() + self.stride = stride + self.resolution = resolution + + def forward(self, x): + B, N, C = x.shape + x = paddle.reshape(x, [B, self.resolution, self.resolution, C]) + end1, end2 = x.shape[1], x.shape[2] + x = x[:, 0:end1:self.stride, 0:end2:self.stride] + x = paddle.reshape(x, [B, -1, C]) + return x + + +class AttentionSubsample(nn.Layer): + + def __init__(self, + in_dim, + out_dim, + key_dim, + num_heads=8, + attn_ratio=2, + activation=None, + stride=2, + resolution=14, + resolution_=7): + super().__init__() + self.num_heads = num_heads + self.scale = key_dim**-0.5 + self.key_dim = key_dim + self.nh_kd = nh_kd = key_dim * num_heads + self.d = int(attn_ratio * key_dim) + self.dh = int(attn_ratio * key_dim) * self.num_heads + self.attn_ratio = attn_ratio + self.resolution_ = resolution_ + self.resolution_2 = resolution_**2 + self.training = True + h = self.dh + nh_kd + self.kv = Linear_BN(in_dim, h) + + self.q = nn.Sequential(Subsample(stride, resolution), Linear_BN(in_dim, nh_kd)) + self.proj = nn.Sequential(activation(), Linear_BN(self.dh, out_dim)) + + self.stride = stride + self.resolution = resolution + points = list(itertools.product(range(resolution), range(resolution))) + points_ = list(itertools.product(range(resolution_), range(resolution_))) + + N = len(points) + N_ = len(points_) + attention_offsets = {} + idxs = [] + i = 0 + j = 0 + for p1 in points_: + i += 1 + for p2 in points: + j += 1 + size = 1 + offset = (abs(p1[0] * stride - p2[0] + (size - 1) / 2), abs(p1[1] * stride - p2[1] + (size - 1) / 2)) + if offset not in attention_offsets: + attention_offsets[offset] = len(attention_offsets) + idxs.append(attention_offsets[offset]) + self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)), + default_initializer=zeros_, + attr=paddle.ParamAttr(regularizer=L2Decay(0.0))) + + tensor_idxs_ = paddle.to_tensor(idxs, dtype='int64') + self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs_, [N_, N])) + + @paddle.no_grad() + def train(self, mode=True): + if mode: + super().train() + else: + super().eval() + if mode and hasattr(self, 'ab'): + del self.ab + else: + self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + + def forward(self, x): + self.training = True + B, N, C = x.shape + kv = self.kv(x) + kv = paddle.reshape(kv, [B, N, self.num_heads, -1]) + k, v = paddle.split(kv, [self.key_dim, self.d], axis=3) + k = paddle.transpose(k, perm=[0, 2, 1, 3]) # BHNC + v = paddle.transpose(v, perm=[0, 2, 1, 3]) + q = paddle.reshape(self.q(x), [B, self.resolution_2, self.num_heads, self.key_dim]) + q = paddle.transpose(q, perm=[0, 2, 1, 3]) + + if self.training: + attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + else: + attention_biases = self.ab + + attn = (paddle.matmul(q, paddle.transpose(k, perm=[0, 1, 3, 2]))) * self.scale + attention_biases + attn = F.softmax(attn) + + x = paddle.reshape(paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3]), [B, -1, self.dh]) + x = self.proj(x) + return x + + +class LeViT(nn.Layer): + """ Vision Transformer with support for patch or hybrid CNN input stage + """ + + def __init__(self, + img_size=224, + patch_size=16, + in_chans=3, + class_num=1000, + embed_dim=[192], + key_dim=[64], + depth=[12], + num_heads=[3], + attn_ratio=[2], + mlp_ratio=[2], + hybrid_backbone=None, + down_ops=[], + attention_activation=nn.Hardswish, + mlp_activation=nn.Hardswish, + distillation=True, + drop_path=0): + super().__init__() + + self.class_num = class_num + self.num_features = embed_dim[-1] + self.embed_dim = embed_dim + self.distillation = distillation + + self.patch_embed = hybrid_backbone + + self.blocks = [] + down_ops.append(['']) + resolution = img_size // patch_size + for i, (ed, kd, dpth, nh, ar, mr, + do) in enumerate(zip(embed_dim, key_dim, depth, num_heads, attn_ratio, mlp_ratio, down_ops)): + for _ in range(dpth): + self.blocks.append( + Residual( + Attention( + ed, + kd, + nh, + attn_ratio=ar, + activation=attention_activation, + resolution=resolution, + ), drop_path)) + if mr > 0: + h = int(ed * mr) + self.blocks.append( + Residual( + nn.Sequential( + Linear_BN(ed, h), + mlp_activation(), + Linear_BN(h, ed, bn_weight_init=0), + ), drop_path)) + if do[0] == 'Subsample': + #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride) + resolution_ = (resolution - 1) // do[5] + 1 + self.blocks.append( + AttentionSubsample(*embed_dim[i:i + 2], + key_dim=do[1], + num_heads=do[2], + attn_ratio=do[3], + activation=attention_activation, + stride=do[5], + resolution=resolution, + resolution_=resolution_)) + resolution = resolution_ + if do[4] > 0: # mlp_ratio + h = int(embed_dim[i + 1] * do[4]) + self.blocks.append( + Residual( + nn.Sequential( + Linear_BN(embed_dim[i + 1], h), + mlp_activation(), + Linear_BN(h, embed_dim[i + 1], bn_weight_init=0), + ), drop_path)) + self.blocks = nn.Sequential(*self.blocks) + + # Classifier head + self.head = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity() + if distillation: + self.head_dist = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity() + + def forward(self, x): + x = self.patch_embed(x) + x = x.flatten(2) + x = paddle.transpose(x, perm=[0, 2, 1]) + x = self.blocks(x) + x = x.mean(1) + + x = paddle.reshape(x, [-1, self.embed_dim[-1]]) + if self.distillation: + x = self.head(x), self.head_dist(x) + if not self.training: + x = (x[0] + x[1]) / 2 + else: + x = self.head(x) + return x + + +def model_factory(C, D, X, N, drop_path, class_num, distillation): + embed_dim = [int(x) for x in C.split('_')] + num_heads = [int(x) for x in N.split('_')] + depth = [int(x) for x in X.split('_')] + act = nn.Hardswish + model = LeViT( + patch_size=16, + embed_dim=embed_dim, + num_heads=num_heads, + key_dim=[D] * 3, + depth=depth, + attn_ratio=[2, 2, 2], + mlp_ratio=[2, 2, 2], + down_ops=[ + #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride) + ['Subsample', D, embed_dim[0] // D, 4, 2, 2], + ['Subsample', D, embed_dim[1] // D, 4, 2, 2], + ], + attention_activation=act, + mlp_activation=act, + hybrid_backbone=b16(embed_dim[0], activation=act), + class_num=class_num, + drop_path=drop_path, + distillation=distillation) + + return model + + +specification = { + 'LeViT_128S': { + 'C': '128_256_384', + 'D': 16, + 'N': '4_6_8', + 'X': '2_3_4', + 'drop_path': 0 + }, + 'LeViT_128': { + 'C': '128_256_384', + 'D': 16, + 'N': '4_8_12', + 'X': '4_4_4', + 'drop_path': 0 + }, + 'LeViT_192': { + 'C': '192_288_384', + 'D': 32, + 'N': '3_5_6', + 'X': '4_4_4', + 'drop_path': 0 + }, + 'LeViT_256': { + 'C': '256_384_512', + 'D': 32, + 'N': '4_6_8', + 'X': '4_4_4', + 'drop_path': 0 + }, + 'LeViT_384': { + 'C': '384_512_768', + 'D': 32, + 'N': '6_9_12', + 'X': '4_4_4', + 'drop_path': 0.1 + }, +} + + +def LeViT_128(**kwargs): + model = model_factory(**specification['LeViT_128'], class_num=1000, distillation=False) + return model diff --git a/modules/image/classification/levit_128_imagenet/module.py b/modules/image/classification/levit_128_imagenet/module.py new file mode 100644 index 0000000000000000000000000000000000000000..1ed4aba85900e8051e912730892aee68f4e21bf0 --- /dev/null +++ b/modules/image/classification/levit_128_imagenet/module.py @@ -0,0 +1,154 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import cv2 +import numpy as np +import paddle +from skimage.io import imread +from skimage.transform import rescale +from skimage.transform import resize + +import paddlehub as hub +from .model import LeViT_128 +from .processor import base64_to_cv2 +from .processor import create_operators +from .processor import Topk +from .utils import get_config +from paddlehub.module.module import moduleinfo +from paddlehub.module.module import runnable +from paddlehub.module.module import serving + + +@moduleinfo(name="levit_128_imagenet", + type="cv/classification", + author="paddlepaddle", + author_email="", + summary="", + version="1.0.0") +class LeViT_128_ImageNet: + + def __init__(self): + self.config = get_config(os.path.join(self.directory, 'LeViT_128.yaml'), show=False) + self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt') + self.pretrain_path = os.path.join(self.directory, 'LeViT_128_pretrained.pdparams') + self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path + self.model = LeViT_128() + param_state_dict = paddle.load(self.pretrain_path) + self.model.set_dict(param_state_dict) + self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"]) + + def classification(self, + images: list = None, + paths: list = None, + batch_size: int = 1, + use_gpu: bool = False, + top_k: int = 1): + ''' + Args: + images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR. + paths (list[str]): The paths of images. + batch_size (int): batch size. + use_gpu (bool): Whether to use gpu. + top_k (int): Return top k results. + + Returns: + res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'. + ''' + postprocess_func = Topk(top_k, self.label_path) + inputs = [] + results = [] + paddle.disable_static() + place = 'gpu:0' if use_gpu else 'cpu' + place = paddle.set_device(place) + if images == None and paths == None: + print('No image provided. Please input an image or a image path.') + return + + if images != None: + for image in images: + image = image[:, :, ::-1] + inputs.append(image) + + if paths != None: + for path in paths: + image = cv2.imread(path)[:, :, ::-1] + inputs.append(image) + + batch_data = [] + for idx, imagedata in enumerate(inputs): + for process in self.preprocess_funcs: + imagedata = process(imagedata) + batch_data.append(imagedata) + if len(batch_data) >= batch_size or idx == len(inputs) - 1: + batch_tensor = paddle.to_tensor(batch_data) + out = self.model(batch_tensor) + if isinstance(out, list): + out = out[0] + if isinstance(out, dict) and "logits" in out: + out = out["logits"] + if isinstance(out, dict) and "output" in out: + out = out["output"] + result = postprocess_func(out) + results.extend(result) + batch_data.clear() + return results + + @runnable + def run_cmd(self, argvs: list): + """ + Run as a command. + """ + self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name), + prog='hub run {}'.format(self.name), + usage='%(prog)s', + add_help=True) + + self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required") + self.arg_config_group = self.parser.add_argument_group( + title="Config options", description="Run configuration for controlling module behavior, not required.") + self.add_module_config_arg() + self.add_module_input_arg() + self.args = self.parser.parse_args(argvs) + results = self.classification(paths=[self.args.input_path], + use_gpu=self.args.use_gpu, + batch_size=self.args.batch_size, + top_k=self.args.top_k) + return results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.classification(images=images_decode, **kwargs) + return results + + def add_module_config_arg(self): + """ + Add the command config options. + """ + self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not") + + self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size') + self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.') + + def add_module_input_arg(self): + """ + Add the command input options. + """ + self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.") diff --git a/modules/image/classification/levit_128_imagenet/processor.py b/modules/image/classification/levit_128_imagenet/processor.py new file mode 100644 index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8 --- /dev/null +++ b/modules/image/classification/levit_128_imagenet/processor.py @@ -0,0 +1,374 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import base64 +import inspect +import math +import os +import random +import sys +from functools import partial + +import cv2 +import numpy as np +import paddle +import paddle.nn.functional as F +import six +from paddle.vision.transforms import ColorJitter as RawColorJitter +from PIL import Image + + +def create_operators(params, class_num=None): + """ + create operators based on the config + + Args: + params(list): a dict list, used to create some operators + """ + assert isinstance(params, list), ('operator config should be a list') + ops = [] + current_module = sys.modules[__name__] + for operator in params: + assert isinstance(operator, dict) and len(operator) == 1, "yaml format error" + op_name = list(operator)[0] + param = {} if operator[op_name] is None else operator[op_name] + op_func = getattr(current_module, op_name) + if "class_num" in inspect.getfullargspec(op_func).args: + param.update({"class_num": class_num}) + op = op_func(**param) + ops.append(op) + + return ops + + +class UnifiedResize(object): + + def __init__(self, interpolation=None, backend="cv2"): + _cv2_interp_from_str = { + 'nearest': cv2.INTER_NEAREST, + 'bilinear': cv2.INTER_LINEAR, + 'area': cv2.INTER_AREA, + 'bicubic': cv2.INTER_CUBIC, + 'lanczos': cv2.INTER_LANCZOS4 + } + _pil_interp_from_str = { + 'nearest': Image.NEAREST, + 'bilinear': Image.BILINEAR, + 'bicubic': Image.BICUBIC, + 'box': Image.BOX, + 'lanczos': Image.LANCZOS, + 'hamming': Image.HAMMING + } + + def _pil_resize(src, size, resample): + pil_img = Image.fromarray(src) + pil_img = pil_img.resize(size, resample) + return np.asarray(pil_img) + + if backend.lower() == "cv2": + if isinstance(interpolation, str): + interpolation = _cv2_interp_from_str[interpolation.lower()] + # compatible with opencv < version 4.4.0 + elif interpolation is None: + interpolation = cv2.INTER_LINEAR + self.resize_func = partial(cv2.resize, interpolation=interpolation) + elif backend.lower() == "pil": + if isinstance(interpolation, str): + interpolation = _pil_interp_from_str[interpolation.lower()] + self.resize_func = partial(_pil_resize, resample=interpolation) + else: + self.resize_func = cv2.resize + + def __call__(self, src, size): + return self.resize_func(src, size) + + +class OperatorParamError(ValueError): + """ OperatorParamError + """ + pass + + +class DecodeImage(object): + """ decode image """ + + def __init__(self, to_rgb=True, to_np=False, channel_first=False): + self.to_rgb = to_rgb + self.to_np = to_np # to numpy + self.channel_first = channel_first # only enabled when to_np is True + + def __call__(self, img): + if six.PY2: + assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage" + else: + assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage" + data = np.frombuffer(img, dtype='uint8') + img = cv2.imdecode(data, 1) + if self.to_rgb: + assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape) + img = img[:, :, ::-1] + + if self.channel_first: + img = img.transpose((2, 0, 1)) + + return img + + +class ResizeImage(object): + """ resize image """ + + def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"): + if resize_short is not None and resize_short > 0: + self.resize_short = resize_short + self.w = None + self.h = None + elif size is not None: + self.resize_short = None + self.w = size if type(size) is int else size[0] + self.h = size if type(size) is int else size[1] + else: + raise OperatorParamError("invalid params for ReisizeImage for '\ + 'both 'size' and 'resize_short' are None") + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + img_h, img_w = img.shape[:2] + if self.resize_short is not None: + percent = float(self.resize_short) / min(img_w, img_h) + w = int(round(img_w * percent)) + h = int(round(img_h * percent)) + else: + w = self.w + h = self.h + return self._resize_func(img, (w, h)) + + +class CropImage(object): + """ crop image """ + + def __init__(self, size): + if type(size) is int: + self.size = (size, size) + else: + self.size = size # (h, w) + + def __call__(self, img): + w, h = self.size + img_h, img_w = img.shape[:2] + w_start = (img_w - w) // 2 + h_start = (img_h - h) // 2 + + w_end = w_start + w + h_end = h_start + h + return img[h_start:h_end, w_start:w_end, :] + + +class RandCropImage(object): + """ random crop image """ + + def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"): + if type(size) is int: + self.size = (size, size) # (h, w) + else: + self.size = size + + self.scale = [0.08, 1.0] if scale is None else scale + self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + size = self.size + scale = self.scale + ratio = self.ratio + + aspect_ratio = math.sqrt(random.uniform(*ratio)) + w = 1. * aspect_ratio + h = 1. / aspect_ratio + + img_h, img_w = img.shape[:2] + + bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2)) + scale_max = min(scale[1], bound) + scale_min = min(scale[0], bound) + + target_area = img_w * img_h * random.uniform(scale_min, scale_max) + target_size = math.sqrt(target_area) + w = int(target_size * w) + h = int(target_size * h) + + i = random.randint(0, img_w - w) + j = random.randint(0, img_h - h) + + img = img[j:j + h, i:i + w, :] + + return self._resize_func(img, size) + + +class RandFlipImage(object): + """ random flip image + flip_code: + 1: Flipped Horizontally + 0: Flipped Vertically + -1: Flipped Horizontally & Vertically + """ + + def __init__(self, flip_code=1): + assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]" + self.flip_code = flip_code + + def __call__(self, img): + if random.randint(0, 1) == 1: + return cv2.flip(img, self.flip_code) + else: + return img + + +class NormalizeImage(object): + """ normalize image such as substract mean, divide std + """ + + def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3): + if isinstance(scale, str): + scale = eval(scale) + assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4." + self.channel_num = channel_num + self.output_dtype = 'float16' if output_fp16 else 'float32' + self.scale = np.float32(scale if scale is not None else 1.0 / 255.0) + self.order = order + mean = mean if mean is not None else [0.485, 0.456, 0.406] + std = std if std is not None else [0.229, 0.224, 0.225] + + shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3) + self.mean = np.array(mean).reshape(shape).astype('float32') + self.std = np.array(std).reshape(shape).astype('float32') + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage" + + img = (img.astype('float32') * self.scale - self.mean) / self.std + + if self.channel_num == 4: + img_h = img.shape[1] if self.order == 'chw' else img.shape[0] + img_w = img.shape[2] if self.order == 'chw' else img.shape[1] + pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1)) + img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate( + (img, pad_zeros), axis=2)) + return img.astype(self.output_dtype) + + +class ToCHWImage(object): + """ convert hwc image to chw image + """ + + def __init__(self): + pass + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + return img.transpose((2, 0, 1)) + + +class ColorJitter(RawColorJitter): + """ColorJitter. + """ + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + def __call__(self, img): + if not isinstance(img, Image.Image): + img = np.ascontiguousarray(img) + img = Image.fromarray(img) + img = super()._apply_image(img) + if isinstance(img, Image.Image): + img = np.asarray(img) + return img + + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + +class Topk(object): + + def __init__(self, topk=1, class_id_map_file=None): + assert isinstance(topk, (int, )) + self.class_id_map = self.parse_class_id_map(class_id_map_file) + self.topk = topk + + def parse_class_id_map(self, class_id_map_file): + if class_id_map_file is None: + return None + if not os.path.exists(class_id_map_file): + print( + "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!" + ) + return None + + try: + class_id_map = {} + with open(class_id_map_file, "r") as fin: + lines = fin.readlines() + for line in lines: + partition = line.split("\n")[0].partition(" ") + class_id_map[int(partition[0])] = str(partition[-1]) + except Exception as ex: + print(ex) + class_id_map = None + return class_id_map + + def __call__(self, x, file_names=None, multilabel=False): + assert isinstance(x, paddle.Tensor) + if file_names is not None: + assert x.shape[0] == len(file_names) + x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x) + x = x.numpy() + y = [] + for idx, probs in enumerate(x): + index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where( + probs >= 0.5)[0].astype("int32") + clas_id_list = [] + score_list = [] + label_name_list = [] + for i in index: + clas_id_list.append(i.item()) + score_list.append(probs[i].item()) + if self.class_id_map is not None: + label_name_list.append(self.class_id_map[i.item()]) + result = { + "class_ids": clas_id_list, + "scores": np.around(score_list, decimals=5).tolist(), + } + if file_names is not None: + result["file_name"] = file_names[idx] + if label_name_list is not None: + result["label_names"] = label_name_list + y.append(result) + return y diff --git a/modules/image/classification/levit_128_imagenet/utils.py b/modules/image/classification/levit_128_imagenet/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537 --- /dev/null +++ b/modules/image/classification/levit_128_imagenet/utils.py @@ -0,0 +1,129 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import yaml + +__all__ = ['get_config'] + + +class AttrDict(dict): + + def __getattr__(self, key): + return self[key] + + def __setattr__(self, key, value): + if key in self.__dict__: + self.__dict__[key] = value + else: + self[key] = value + + def __deepcopy__(self, content): + return copy.deepcopy(dict(self)) + + +def create_attr_dict(yaml_config): + from ast import literal_eval + for key, value in yaml_config.items(): + if type(value) is dict: + yaml_config[key] = value = AttrDict(value) + if isinstance(value, str): + try: + value = literal_eval(value) + except BaseException: + pass + if isinstance(value, AttrDict): + create_attr_dict(yaml_config[key]) + else: + yaml_config[key] = value + + +def parse_config(cfg_file): + """Load a config file into AttrDict""" + with open(cfg_file, 'r') as fopen: + yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader)) + create_attr_dict(yaml_config) + return yaml_config + + +def override(dl, ks, v): + """ + Recursively replace dict of list + Args: + dl(dict or list): dict or list to be replaced + ks(list): list of keys + v(str): value to be replaced + """ + + def str2num(v): + try: + return eval(v) + except Exception: + return v + + assert isinstance(dl, (list, dict)), ("{} should be a list or a dict") + assert len(ks) > 0, ('lenght of keys should larger than 0') + if isinstance(dl, list): + k = str2num(ks[0]) + if len(ks) == 1: + assert k < len(dl), ('index({}) out of range({})'.format(k, dl)) + dl[k] = str2num(v) + else: + override(dl[k], ks[1:], v) + else: + if len(ks) == 1: + # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl)) + if not ks[0] in dl: + print('A new filed ({}) detected!'.format(ks[0], dl)) + dl[ks[0]] = str2num(v) + else: + override(dl[ks[0]], ks[1:], v) + + +def override_config(config, options=None): + """ + Recursively override the config + Args: + config(dict): dict to be replaced + options(list): list of pairs(key0.key1.idx.key2=value) + such as: [ + 'topk=2', + 'VALID.transforms.1.ResizeImage.resize_short=300' + ] + Returns: + config(dict): replaced config + """ + if options is not None: + for opt in options: + assert isinstance(opt, str), ("option({}) should be a str".format(opt)) + assert "=" in opt, ("option({}) should contain a =" + "to distinguish between key and value".format(opt)) + pair = opt.split('=') + assert len(pair) == 2, ("there can be only a = in the option") + key, value = pair + keys = key.split('.') + override(config, keys, value) + return config + + +def get_config(fname, overrides=None, show=False): + """ + Read config from file + """ + assert os.path.exists(fname), ('config file({}) is not exist'.format(fname)) + config = parse_config(fname) + override_config(config, overrides) + return config diff --git a/modules/image/classification/levit_128s_imagenet/README.md b/modules/image/classification/levit_128s_imagenet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..34a1b82fa2df9fc6f641d467838e6d58727a9bfe --- /dev/null +++ b/modules/image/classification/levit_128s_imagenet/README.md @@ -0,0 +1,132 @@ +# levit_128s_imagenet + +|模型名称|levit_128s_imagenet| +| :--- | :---: | +|类别|图像-图像分类| +|网络|LeViT| +|数据集|ImageNet-2012| +|是否支持Fine-tuning|否| +|模型大小|45 MB| +|最新更新日期|2022-04-02| +|数据指标|Acc| + + +## 一、模型基本信息 + + +- ### 模型介绍 + + - LeViT 是一种快速推理的、用于图像分类任务的混合神经网络。其设计之初考虑了网络模型在不同的硬件平台上的性能,因此能够更好地反映普遍应用的真实场景。通过大量实验,作者找到了卷积神经网络与 Transformer 体系更好的结合方式,并且提出了 attention-based 方法,用于整合 Transformer 中的位置信息编码, 该模块的模型结构配置为LeViT128s, 详情可参考[论文地址](https://arxiv.org/abs/2104.01136)。 + + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 1.6.2 + + - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + +- ### 2、安装 + + - ```shell + $ hub install levit_128s_imagenet + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run levit_128s_imagenet --input_path "/PATH/TO/IMAGE" + ``` + - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、预测代码示例 + + - ```python + import paddlehub as hub + import cv2 + + classifier = hub.Module(name="levit_128s_imagenet") + result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')]) + # or + # result = classifier.classification(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + + - ```python + def classification(images=None, + paths=None, + batch_size=1, + use_gpu=False, + top_k=1): + ``` + - 分类接口API。 + - **参数** + + - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。 + + - **返回** + + - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称) + + +## 四、服务部署 + +- PaddleHub Serving可以部署一个图像识别的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m levit_128s_imagenet + ``` + + - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"\} + url = "http://127.0.0.1:8866/predict/levit_128s_imagenet" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` + + +## 五、更新历史 + +* 1.0.0 + + 初始发布 + + - ```shell + $ hub install levit_128s_imagenet==1.0.0 + ``` diff --git a/modules/image/classification/levit_128s_imagenet/model.py b/modules/image/classification/levit_128s_imagenet/model.py new file mode 100644 index 0000000000000000000000000000000000000000..7a1b8467183fc7146ea739331b6e58456812a867 --- /dev/null +++ b/modules/image/classification/levit_128s_imagenet/model.py @@ -0,0 +1,450 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# Code was based on https://github.com/facebookresearch/LeViT +import itertools +import math +import warnings + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.initializer import Constant +from paddle.nn.initializer import TruncatedNormal +from paddle.regularizer import L2Decay + +from .vision_transformer import Identity +from .vision_transformer import ones_ +from .vision_transformer import trunc_normal_ +from .vision_transformer import zeros_ + + +def cal_attention_biases(attention_biases, attention_bias_idxs): + gather_list = [] + attention_bias_t = paddle.transpose(attention_biases, (1, 0)) + nums = attention_bias_idxs.shape[0] + for idx in range(nums): + gather = paddle.gather(attention_bias_t, attention_bias_idxs[idx]) + gather_list.append(gather) + shape0, shape1 = attention_bias_idxs.shape + gather = paddle.concat(gather_list) + return paddle.transpose(gather, (1, 0)).reshape((0, shape0, shape1)) + + +class Conv2d_BN(nn.Sequential): + + def __init__(self, a, b, ks=1, stride=1, pad=0, dilation=1, groups=1, bn_weight_init=1, resolution=-10000): + super().__init__() + self.add_sublayer('c', nn.Conv2D(a, b, ks, stride, pad, dilation, groups, bias_attr=False)) + bn = nn.BatchNorm2D(b) + ones_(bn.weight) + zeros_(bn.bias) + self.add_sublayer('bn', bn) + + +class Linear_BN(nn.Sequential): + + def __init__(self, a, b, bn_weight_init=1): + super().__init__() + self.add_sublayer('c', nn.Linear(a, b, bias_attr=False)) + bn = nn.BatchNorm1D(b) + if bn_weight_init == 0: + zeros_(bn.weight) + else: + ones_(bn.weight) + zeros_(bn.bias) + self.add_sublayer('bn', bn) + + def forward(self, x): + l, bn = self._sub_layers.values() + x = l(x) + return paddle.reshape(bn(x.flatten(0, 1)), x.shape) + + +class BN_Linear(nn.Sequential): + + def __init__(self, a, b, bias=True, std=0.02): + super().__init__() + self.add_sublayer('bn', nn.BatchNorm1D(a)) + l = nn.Linear(a, b, bias_attr=bias) + trunc_normal_(l.weight) + if bias: + zeros_(l.bias) + self.add_sublayer('l', l) + + +def b16(n, activation, resolution=224): + return nn.Sequential(Conv2d_BN(3, n // 8, 3, 2, 1, resolution=resolution), activation(), + Conv2d_BN(n // 8, n // 4, 3, 2, 1, resolution=resolution // 2), activation(), + Conv2d_BN(n // 4, n // 2, 3, 2, 1, resolution=resolution // 4), activation(), + Conv2d_BN(n // 2, n, 3, 2, 1, resolution=resolution // 8)) + + +class Residual(nn.Layer): + + def __init__(self, m, drop): + super().__init__() + self.m = m + self.drop = drop + + def forward(self, x): + if self.training and self.drop > 0: + y = paddle.rand(shape=[x.shape[0], 1, 1]).__ge__(self.drop).astype("float32") + y = y.divide(paddle.full_like(y, 1 - self.drop)) + return paddle.add(x, y) + else: + return paddle.add(x, self.m(x)) + + +class Attention(nn.Layer): + + def __init__(self, dim, key_dim, num_heads=8, attn_ratio=4, activation=None, resolution=14): + super().__init__() + self.num_heads = num_heads + self.scale = key_dim**-0.5 + self.key_dim = key_dim + self.nh_kd = nh_kd = key_dim * num_heads + self.d = int(attn_ratio * key_dim) + self.dh = int(attn_ratio * key_dim) * num_heads + self.attn_ratio = attn_ratio + self.h = self.dh + nh_kd * 2 + self.qkv = Linear_BN(dim, self.h) + self.proj = nn.Sequential(activation(), Linear_BN(self.dh, dim, bn_weight_init=0)) + points = list(itertools.product(range(resolution), range(resolution))) + N = len(points) + attention_offsets = {} + idxs = [] + for p1 in points: + for p2 in points: + offset = (abs(p1[0] - p2[0]), abs(p1[1] - p2[1])) + if offset not in attention_offsets: + attention_offsets[offset] = len(attention_offsets) + idxs.append(attention_offsets[offset]) + self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)), + default_initializer=zeros_, + attr=paddle.ParamAttr(regularizer=L2Decay(0.0))) + tensor_idxs = paddle.to_tensor(idxs, dtype='int64') + self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs, [N, N])) + + @paddle.no_grad() + def train(self, mode=True): + if mode: + super().train() + else: + super().eval() + if mode and hasattr(self, 'ab'): + del self.ab + else: + self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + + def forward(self, x): + self.training = True + B, N, C = x.shape + qkv = self.qkv(x) + qkv = paddle.reshape(qkv, [B, N, self.num_heads, self.h // self.num_heads]) + q, k, v = paddle.split(qkv, [self.key_dim, self.key_dim, self.d], axis=3) + q = paddle.transpose(q, perm=[0, 2, 1, 3]) + k = paddle.transpose(k, perm=[0, 2, 1, 3]) + v = paddle.transpose(v, perm=[0, 2, 1, 3]) + k_transpose = paddle.transpose(k, perm=[0, 1, 3, 2]) + + if self.training: + attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + else: + attention_biases = self.ab + attn = (paddle.matmul(q, k_transpose) * self.scale + attention_biases) + attn = F.softmax(attn) + x = paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3]) + x = paddle.reshape(x, [B, N, self.dh]) + x = self.proj(x) + return x + + +class Subsample(nn.Layer): + + def __init__(self, stride, resolution): + super().__init__() + self.stride = stride + self.resolution = resolution + + def forward(self, x): + B, N, C = x.shape + x = paddle.reshape(x, [B, self.resolution, self.resolution, C]) + end1, end2 = x.shape[1], x.shape[2] + x = x[:, 0:end1:self.stride, 0:end2:self.stride] + x = paddle.reshape(x, [B, -1, C]) + return x + + +class AttentionSubsample(nn.Layer): + + def __init__(self, + in_dim, + out_dim, + key_dim, + num_heads=8, + attn_ratio=2, + activation=None, + stride=2, + resolution=14, + resolution_=7): + super().__init__() + self.num_heads = num_heads + self.scale = key_dim**-0.5 + self.key_dim = key_dim + self.nh_kd = nh_kd = key_dim * num_heads + self.d = int(attn_ratio * key_dim) + self.dh = int(attn_ratio * key_dim) * self.num_heads + self.attn_ratio = attn_ratio + self.resolution_ = resolution_ + self.resolution_2 = resolution_**2 + self.training = True + h = self.dh + nh_kd + self.kv = Linear_BN(in_dim, h) + + self.q = nn.Sequential(Subsample(stride, resolution), Linear_BN(in_dim, nh_kd)) + self.proj = nn.Sequential(activation(), Linear_BN(self.dh, out_dim)) + + self.stride = stride + self.resolution = resolution + points = list(itertools.product(range(resolution), range(resolution))) + points_ = list(itertools.product(range(resolution_), range(resolution_))) + + N = len(points) + N_ = len(points_) + attention_offsets = {} + idxs = [] + i = 0 + j = 0 + for p1 in points_: + i += 1 + for p2 in points: + j += 1 + size = 1 + offset = (abs(p1[0] * stride - p2[0] + (size - 1) / 2), abs(p1[1] * stride - p2[1] + (size - 1) / 2)) + if offset not in attention_offsets: + attention_offsets[offset] = len(attention_offsets) + idxs.append(attention_offsets[offset]) + self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)), + default_initializer=zeros_, + attr=paddle.ParamAttr(regularizer=L2Decay(0.0))) + + tensor_idxs_ = paddle.to_tensor(idxs, dtype='int64') + self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs_, [N_, N])) + + @paddle.no_grad() + def train(self, mode=True): + if mode: + super().train() + else: + super().eval() + if mode and hasattr(self, 'ab'): + del self.ab + else: + self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + + def forward(self, x): + self.training = True + B, N, C = x.shape + kv = self.kv(x) + kv = paddle.reshape(kv, [B, N, self.num_heads, -1]) + k, v = paddle.split(kv, [self.key_dim, self.d], axis=3) + k = paddle.transpose(k, perm=[0, 2, 1, 3]) # BHNC + v = paddle.transpose(v, perm=[0, 2, 1, 3]) + q = paddle.reshape(self.q(x), [B, self.resolution_2, self.num_heads, self.key_dim]) + q = paddle.transpose(q, perm=[0, 2, 1, 3]) + + if self.training: + attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + else: + attention_biases = self.ab + + attn = (paddle.matmul(q, paddle.transpose(k, perm=[0, 1, 3, 2]))) * self.scale + attention_biases + attn = F.softmax(attn) + + x = paddle.reshape(paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3]), [B, -1, self.dh]) + x = self.proj(x) + return x + + +class LeViT(nn.Layer): + """ Vision Transformer with support for patch or hybrid CNN input stage + """ + + def __init__(self, + img_size=224, + patch_size=16, + in_chans=3, + class_num=1000, + embed_dim=[192], + key_dim=[64], + depth=[12], + num_heads=[3], + attn_ratio=[2], + mlp_ratio=[2], + hybrid_backbone=None, + down_ops=[], + attention_activation=nn.Hardswish, + mlp_activation=nn.Hardswish, + distillation=True, + drop_path=0): + super().__init__() + + self.class_num = class_num + self.num_features = embed_dim[-1] + self.embed_dim = embed_dim + self.distillation = distillation + + self.patch_embed = hybrid_backbone + + self.blocks = [] + down_ops.append(['']) + resolution = img_size // patch_size + for i, (ed, kd, dpth, nh, ar, mr, + do) in enumerate(zip(embed_dim, key_dim, depth, num_heads, attn_ratio, mlp_ratio, down_ops)): + for _ in range(dpth): + self.blocks.append( + Residual( + Attention( + ed, + kd, + nh, + attn_ratio=ar, + activation=attention_activation, + resolution=resolution, + ), drop_path)) + if mr > 0: + h = int(ed * mr) + self.blocks.append( + Residual( + nn.Sequential( + Linear_BN(ed, h), + mlp_activation(), + Linear_BN(h, ed, bn_weight_init=0), + ), drop_path)) + if do[0] == 'Subsample': + #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride) + resolution_ = (resolution - 1) // do[5] + 1 + self.blocks.append( + AttentionSubsample(*embed_dim[i:i + 2], + key_dim=do[1], + num_heads=do[2], + attn_ratio=do[3], + activation=attention_activation, + stride=do[5], + resolution=resolution, + resolution_=resolution_)) + resolution = resolution_ + if do[4] > 0: # mlp_ratio + h = int(embed_dim[i + 1] * do[4]) + self.blocks.append( + Residual( + nn.Sequential( + Linear_BN(embed_dim[i + 1], h), + mlp_activation(), + Linear_BN(h, embed_dim[i + 1], bn_weight_init=0), + ), drop_path)) + self.blocks = nn.Sequential(*self.blocks) + + # Classifier head + self.head = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity() + if distillation: + self.head_dist = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity() + + def forward(self, x): + x = self.patch_embed(x) + x = x.flatten(2) + x = paddle.transpose(x, perm=[0, 2, 1]) + x = self.blocks(x) + x = x.mean(1) + + x = paddle.reshape(x, [-1, self.embed_dim[-1]]) + if self.distillation: + x = self.head(x), self.head_dist(x) + if not self.training: + x = (x[0] + x[1]) / 2 + else: + x = self.head(x) + return x + + +def model_factory(C, D, X, N, drop_path, class_num, distillation): + embed_dim = [int(x) for x in C.split('_')] + num_heads = [int(x) for x in N.split('_')] + depth = [int(x) for x in X.split('_')] + act = nn.Hardswish + model = LeViT( + patch_size=16, + embed_dim=embed_dim, + num_heads=num_heads, + key_dim=[D] * 3, + depth=depth, + attn_ratio=[2, 2, 2], + mlp_ratio=[2, 2, 2], + down_ops=[ + #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride) + ['Subsample', D, embed_dim[0] // D, 4, 2, 2], + ['Subsample', D, embed_dim[1] // D, 4, 2, 2], + ], + attention_activation=act, + mlp_activation=act, + hybrid_backbone=b16(embed_dim[0], activation=act), + class_num=class_num, + drop_path=drop_path, + distillation=distillation) + + return model + + +specification = { + 'LeViT_128S': { + 'C': '128_256_384', + 'D': 16, + 'N': '4_6_8', + 'X': '2_3_4', + 'drop_path': 0 + }, + 'LeViT_128': { + 'C': '128_256_384', + 'D': 16, + 'N': '4_8_12', + 'X': '4_4_4', + 'drop_path': 0 + }, + 'LeViT_192': { + 'C': '192_288_384', + 'D': 32, + 'N': '3_5_6', + 'X': '4_4_4', + 'drop_path': 0 + }, + 'LeViT_256': { + 'C': '256_384_512', + 'D': 32, + 'N': '4_6_8', + 'X': '4_4_4', + 'drop_path': 0 + }, + 'LeViT_384': { + 'C': '384_512_768', + 'D': 32, + 'N': '6_9_12', + 'X': '4_4_4', + 'drop_path': 0.1 + }, +} + + +def LeViT_128S(**kwargs): + model = model_factory(**specification['LeViT_128S'], class_num=1000, distillation=False) + return model diff --git a/modules/image/classification/levit_128s_imagenet/module.py b/modules/image/classification/levit_128s_imagenet/module.py new file mode 100644 index 0000000000000000000000000000000000000000..9476fecfabe92d16da7098f4ca873b3fdbfff5f2 --- /dev/null +++ b/modules/image/classification/levit_128s_imagenet/module.py @@ -0,0 +1,154 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import cv2 +import numpy as np +import paddle +from skimage.io import imread +from skimage.transform import rescale +from skimage.transform import resize + +import paddlehub as hub +from .model import LeViT_128S +from .processor import base64_to_cv2 +from .processor import create_operators +from .processor import Topk +from .utils import get_config +from paddlehub.module.module import moduleinfo +from paddlehub.module.module import runnable +from paddlehub.module.module import serving + + +@moduleinfo(name="levit_128s_imagenet", + type="cv/classification", + author="paddlepaddle", + author_email="", + summary="", + version="1.0.0") +class LeViT_128S_ImageNet: + + def __init__(self): + self.config = get_config(os.path.join(self.directory, 'LeViT_128S.yaml'), show=False) + self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt') + self.pretrain_path = os.path.join(self.directory, 'LeViT_128S_pretrained.pdparams') + self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path + self.model = LeViT_128S() + param_state_dict = paddle.load(self.pretrain_path) + self.model.set_dict(param_state_dict) + self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"]) + + def classification(self, + images: list = None, + paths: list = None, + batch_size: int = 1, + use_gpu: bool = False, + top_k: int = 1): + ''' + Args: + images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR. + paths (list[str]): The paths of images. + batch_size (int): batch size. + use_gpu (bool): Whether to use gpu. + top_k (int): Return top k results. + + Returns: + res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'. + ''' + postprocess_func = Topk(top_k, self.label_path) + inputs = [] + results = [] + paddle.disable_static() + place = 'gpu:0' if use_gpu else 'cpu' + place = paddle.set_device(place) + if images == None and paths == None: + print('No image provided. Please input an image or a image path.') + return + + if images != None: + for image in images: + image = image[:, :, ::-1] + inputs.append(image) + + if paths != None: + for path in paths: + image = cv2.imread(path)[:, :, ::-1] + inputs.append(image) + + batch_data = [] + for idx, imagedata in enumerate(inputs): + for process in self.preprocess_funcs: + imagedata = process(imagedata) + batch_data.append(imagedata) + if len(batch_data) >= batch_size or idx == len(inputs) - 1: + batch_tensor = paddle.to_tensor(batch_data) + out = self.model(batch_tensor) + if isinstance(out, list): + out = out[0] + if isinstance(out, dict) and "logits" in out: + out = out["logits"] + if isinstance(out, dict) and "output" in out: + out = out["output"] + result = postprocess_func(out) + results.extend(result) + batch_data.clear() + return results + + @runnable + def run_cmd(self, argvs: list): + """ + Run as a command. + """ + self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name), + prog='hub run {}'.format(self.name), + usage='%(prog)s', + add_help=True) + + self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required") + self.arg_config_group = self.parser.add_argument_group( + title="Config options", description="Run configuration for controlling module behavior, not required.") + self.add_module_config_arg() + self.add_module_input_arg() + self.args = self.parser.parse_args(argvs) + results = self.classification(paths=[self.args.input_path], + use_gpu=self.args.use_gpu, + batch_size=self.args.batch_size, + top_k=self.args.top_k) + return results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.classification(images=images_decode, **kwargs) + return results + + def add_module_config_arg(self): + """ + Add the command config options. + """ + self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not") + + self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size') + self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.') + + def add_module_input_arg(self): + """ + Add the command input options. + """ + self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.") diff --git a/modules/image/classification/levit_128s_imagenet/processor.py b/modules/image/classification/levit_128s_imagenet/processor.py new file mode 100644 index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8 --- /dev/null +++ b/modules/image/classification/levit_128s_imagenet/processor.py @@ -0,0 +1,374 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import base64 +import inspect +import math +import os +import random +import sys +from functools import partial + +import cv2 +import numpy as np +import paddle +import paddle.nn.functional as F +import six +from paddle.vision.transforms import ColorJitter as RawColorJitter +from PIL import Image + + +def create_operators(params, class_num=None): + """ + create operators based on the config + + Args: + params(list): a dict list, used to create some operators + """ + assert isinstance(params, list), ('operator config should be a list') + ops = [] + current_module = sys.modules[__name__] + for operator in params: + assert isinstance(operator, dict) and len(operator) == 1, "yaml format error" + op_name = list(operator)[0] + param = {} if operator[op_name] is None else operator[op_name] + op_func = getattr(current_module, op_name) + if "class_num" in inspect.getfullargspec(op_func).args: + param.update({"class_num": class_num}) + op = op_func(**param) + ops.append(op) + + return ops + + +class UnifiedResize(object): + + def __init__(self, interpolation=None, backend="cv2"): + _cv2_interp_from_str = { + 'nearest': cv2.INTER_NEAREST, + 'bilinear': cv2.INTER_LINEAR, + 'area': cv2.INTER_AREA, + 'bicubic': cv2.INTER_CUBIC, + 'lanczos': cv2.INTER_LANCZOS4 + } + _pil_interp_from_str = { + 'nearest': Image.NEAREST, + 'bilinear': Image.BILINEAR, + 'bicubic': Image.BICUBIC, + 'box': Image.BOX, + 'lanczos': Image.LANCZOS, + 'hamming': Image.HAMMING + } + + def _pil_resize(src, size, resample): + pil_img = Image.fromarray(src) + pil_img = pil_img.resize(size, resample) + return np.asarray(pil_img) + + if backend.lower() == "cv2": + if isinstance(interpolation, str): + interpolation = _cv2_interp_from_str[interpolation.lower()] + # compatible with opencv < version 4.4.0 + elif interpolation is None: + interpolation = cv2.INTER_LINEAR + self.resize_func = partial(cv2.resize, interpolation=interpolation) + elif backend.lower() == "pil": + if isinstance(interpolation, str): + interpolation = _pil_interp_from_str[interpolation.lower()] + self.resize_func = partial(_pil_resize, resample=interpolation) + else: + self.resize_func = cv2.resize + + def __call__(self, src, size): + return self.resize_func(src, size) + + +class OperatorParamError(ValueError): + """ OperatorParamError + """ + pass + + +class DecodeImage(object): + """ decode image """ + + def __init__(self, to_rgb=True, to_np=False, channel_first=False): + self.to_rgb = to_rgb + self.to_np = to_np # to numpy + self.channel_first = channel_first # only enabled when to_np is True + + def __call__(self, img): + if six.PY2: + assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage" + else: + assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage" + data = np.frombuffer(img, dtype='uint8') + img = cv2.imdecode(data, 1) + if self.to_rgb: + assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape) + img = img[:, :, ::-1] + + if self.channel_first: + img = img.transpose((2, 0, 1)) + + return img + + +class ResizeImage(object): + """ resize image """ + + def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"): + if resize_short is not None and resize_short > 0: + self.resize_short = resize_short + self.w = None + self.h = None + elif size is not None: + self.resize_short = None + self.w = size if type(size) is int else size[0] + self.h = size if type(size) is int else size[1] + else: + raise OperatorParamError("invalid params for ReisizeImage for '\ + 'both 'size' and 'resize_short' are None") + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + img_h, img_w = img.shape[:2] + if self.resize_short is not None: + percent = float(self.resize_short) / min(img_w, img_h) + w = int(round(img_w * percent)) + h = int(round(img_h * percent)) + else: + w = self.w + h = self.h + return self._resize_func(img, (w, h)) + + +class CropImage(object): + """ crop image """ + + def __init__(self, size): + if type(size) is int: + self.size = (size, size) + else: + self.size = size # (h, w) + + def __call__(self, img): + w, h = self.size + img_h, img_w = img.shape[:2] + w_start = (img_w - w) // 2 + h_start = (img_h - h) // 2 + + w_end = w_start + w + h_end = h_start + h + return img[h_start:h_end, w_start:w_end, :] + + +class RandCropImage(object): + """ random crop image """ + + def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"): + if type(size) is int: + self.size = (size, size) # (h, w) + else: + self.size = size + + self.scale = [0.08, 1.0] if scale is None else scale + self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + size = self.size + scale = self.scale + ratio = self.ratio + + aspect_ratio = math.sqrt(random.uniform(*ratio)) + w = 1. * aspect_ratio + h = 1. / aspect_ratio + + img_h, img_w = img.shape[:2] + + bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2)) + scale_max = min(scale[1], bound) + scale_min = min(scale[0], bound) + + target_area = img_w * img_h * random.uniform(scale_min, scale_max) + target_size = math.sqrt(target_area) + w = int(target_size * w) + h = int(target_size * h) + + i = random.randint(0, img_w - w) + j = random.randint(0, img_h - h) + + img = img[j:j + h, i:i + w, :] + + return self._resize_func(img, size) + + +class RandFlipImage(object): + """ random flip image + flip_code: + 1: Flipped Horizontally + 0: Flipped Vertically + -1: Flipped Horizontally & Vertically + """ + + def __init__(self, flip_code=1): + assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]" + self.flip_code = flip_code + + def __call__(self, img): + if random.randint(0, 1) == 1: + return cv2.flip(img, self.flip_code) + else: + return img + + +class NormalizeImage(object): + """ normalize image such as substract mean, divide std + """ + + def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3): + if isinstance(scale, str): + scale = eval(scale) + assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4." + self.channel_num = channel_num + self.output_dtype = 'float16' if output_fp16 else 'float32' + self.scale = np.float32(scale if scale is not None else 1.0 / 255.0) + self.order = order + mean = mean if mean is not None else [0.485, 0.456, 0.406] + std = std if std is not None else [0.229, 0.224, 0.225] + + shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3) + self.mean = np.array(mean).reshape(shape).astype('float32') + self.std = np.array(std).reshape(shape).astype('float32') + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage" + + img = (img.astype('float32') * self.scale - self.mean) / self.std + + if self.channel_num == 4: + img_h = img.shape[1] if self.order == 'chw' else img.shape[0] + img_w = img.shape[2] if self.order == 'chw' else img.shape[1] + pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1)) + img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate( + (img, pad_zeros), axis=2)) + return img.astype(self.output_dtype) + + +class ToCHWImage(object): + """ convert hwc image to chw image + """ + + def __init__(self): + pass + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + return img.transpose((2, 0, 1)) + + +class ColorJitter(RawColorJitter): + """ColorJitter. + """ + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + def __call__(self, img): + if not isinstance(img, Image.Image): + img = np.ascontiguousarray(img) + img = Image.fromarray(img) + img = super()._apply_image(img) + if isinstance(img, Image.Image): + img = np.asarray(img) + return img + + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + +class Topk(object): + + def __init__(self, topk=1, class_id_map_file=None): + assert isinstance(topk, (int, )) + self.class_id_map = self.parse_class_id_map(class_id_map_file) + self.topk = topk + + def parse_class_id_map(self, class_id_map_file): + if class_id_map_file is None: + return None + if not os.path.exists(class_id_map_file): + print( + "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!" + ) + return None + + try: + class_id_map = {} + with open(class_id_map_file, "r") as fin: + lines = fin.readlines() + for line in lines: + partition = line.split("\n")[0].partition(" ") + class_id_map[int(partition[0])] = str(partition[-1]) + except Exception as ex: + print(ex) + class_id_map = None + return class_id_map + + def __call__(self, x, file_names=None, multilabel=False): + assert isinstance(x, paddle.Tensor) + if file_names is not None: + assert x.shape[0] == len(file_names) + x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x) + x = x.numpy() + y = [] + for idx, probs in enumerate(x): + index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where( + probs >= 0.5)[0].astype("int32") + clas_id_list = [] + score_list = [] + label_name_list = [] + for i in index: + clas_id_list.append(i.item()) + score_list.append(probs[i].item()) + if self.class_id_map is not None: + label_name_list.append(self.class_id_map[i.item()]) + result = { + "class_ids": clas_id_list, + "scores": np.around(score_list, decimals=5).tolist(), + } + if file_names is not None: + result["file_name"] = file_names[idx] + if label_name_list is not None: + result["label_names"] = label_name_list + y.append(result) + return y diff --git a/modules/image/classification/levit_128s_imagenet/utils.py b/modules/image/classification/levit_128s_imagenet/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537 --- /dev/null +++ b/modules/image/classification/levit_128s_imagenet/utils.py @@ -0,0 +1,129 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import yaml + +__all__ = ['get_config'] + + +class AttrDict(dict): + + def __getattr__(self, key): + return self[key] + + def __setattr__(self, key, value): + if key in self.__dict__: + self.__dict__[key] = value + else: + self[key] = value + + def __deepcopy__(self, content): + return copy.deepcopy(dict(self)) + + +def create_attr_dict(yaml_config): + from ast import literal_eval + for key, value in yaml_config.items(): + if type(value) is dict: + yaml_config[key] = value = AttrDict(value) + if isinstance(value, str): + try: + value = literal_eval(value) + except BaseException: + pass + if isinstance(value, AttrDict): + create_attr_dict(yaml_config[key]) + else: + yaml_config[key] = value + + +def parse_config(cfg_file): + """Load a config file into AttrDict""" + with open(cfg_file, 'r') as fopen: + yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader)) + create_attr_dict(yaml_config) + return yaml_config + + +def override(dl, ks, v): + """ + Recursively replace dict of list + Args: + dl(dict or list): dict or list to be replaced + ks(list): list of keys + v(str): value to be replaced + """ + + def str2num(v): + try: + return eval(v) + except Exception: + return v + + assert isinstance(dl, (list, dict)), ("{} should be a list or a dict") + assert len(ks) > 0, ('lenght of keys should larger than 0') + if isinstance(dl, list): + k = str2num(ks[0]) + if len(ks) == 1: + assert k < len(dl), ('index({}) out of range({})'.format(k, dl)) + dl[k] = str2num(v) + else: + override(dl[k], ks[1:], v) + else: + if len(ks) == 1: + # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl)) + if not ks[0] in dl: + print('A new filed ({}) detected!'.format(ks[0], dl)) + dl[ks[0]] = str2num(v) + else: + override(dl[ks[0]], ks[1:], v) + + +def override_config(config, options=None): + """ + Recursively override the config + Args: + config(dict): dict to be replaced + options(list): list of pairs(key0.key1.idx.key2=value) + such as: [ + 'topk=2', + 'VALID.transforms.1.ResizeImage.resize_short=300' + ] + Returns: + config(dict): replaced config + """ + if options is not None: + for opt in options: + assert isinstance(opt, str), ("option({}) should be a str".format(opt)) + assert "=" in opt, ("option({}) should contain a =" + "to distinguish between key and value".format(opt)) + pair = opt.split('=') + assert len(pair) == 2, ("there can be only a = in the option") + key, value = pair + keys = key.split('.') + override(config, keys, value) + return config + + +def get_config(fname, overrides=None, show=False): + """ + Read config from file + """ + assert os.path.exists(fname), ('config file({}) is not exist'.format(fname)) + config = parse_config(fname) + override_config(config, overrides) + return config diff --git a/modules/image/classification/levit_192_imagenet/README.md b/modules/image/classification/levit_192_imagenet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..c3e86eea1909e7497e3356b41397416d77af044a --- /dev/null +++ b/modules/image/classification/levit_192_imagenet/README.md @@ -0,0 +1,132 @@ +# levit_192_imagenet + +|模型名称|levit_192_imagenet| +| :--- | :---: | +|类别|图像-图像分类| +|网络|LeViT| +|数据集|ImageNet-2012| +|是否支持Fine-tuning|否| +|模型大小|64 MB| +|最新更新日期|2022-04-02| +|数据指标|Acc| + + +## 一、模型基本信息 + + +- ### 模型介绍 + + - LeViT 是一种快速推理的、用于图像分类任务的混合神经网络。其设计之初考虑了网络模型在不同的硬件平台上的性能,因此能够更好地反映普遍应用的真实场景。通过大量实验,作者找到了卷积神经网络与 Transformer 体系更好的结合方式,并且提出了 attention-based 方法,用于整合 Transformer 中的位置信息编码, 该模块的模型结构配置为LeViT192, 详情可参考[论文地址](https://arxiv.org/abs/2104.01136)。 + + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 1.6.2 + + - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + +- ### 2、安装 + + - ```shell + $ hub install levit_192_imagenet + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run levit_192_imagenet --input_path "/PATH/TO/IMAGE" + ``` + - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、预测代码示例 + + - ```python + import paddlehub as hub + import cv2 + + classifier = hub.Module(name="levit_192_imagenet") + result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')]) + # or + # result = classifier.classification(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + + - ```python + def classification(images=None, + paths=None, + batch_size=1, + use_gpu=False, + top_k=1): + ``` + - 分类接口API。 + - **参数** + + - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。 + + - **返回** + + - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称) + + +## 四、服务部署 + +- PaddleHub Serving可以部署一个图像识别的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m levit_192_imagenet + ``` + + - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"\} + url = "http://127.0.0.1:8866/predict/levit_192_imagenet" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` + + +## 五、更新历史 + +* 1.0.0 + + 初始发布 + + - ```shell + $ hub install levit_192_imagenet==1.0.0 + ``` diff --git a/modules/image/classification/levit_192_imagenet/model.py b/modules/image/classification/levit_192_imagenet/model.py new file mode 100644 index 0000000000000000000000000000000000000000..104d5f0669dba227d8574e9a25daeed62e4f23fa --- /dev/null +++ b/modules/image/classification/levit_192_imagenet/model.py @@ -0,0 +1,450 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# Code was based on https://github.com/facebookresearch/LeViT +import itertools +import math +import warnings + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.initializer import Constant +from paddle.nn.initializer import TruncatedNormal +from paddle.regularizer import L2Decay + +from .vision_transformer import Identity +from .vision_transformer import ones_ +from .vision_transformer import trunc_normal_ +from .vision_transformer import zeros_ + + +def cal_attention_biases(attention_biases, attention_bias_idxs): + gather_list = [] + attention_bias_t = paddle.transpose(attention_biases, (1, 0)) + nums = attention_bias_idxs.shape[0] + for idx in range(nums): + gather = paddle.gather(attention_bias_t, attention_bias_idxs[idx]) + gather_list.append(gather) + shape0, shape1 = attention_bias_idxs.shape + gather = paddle.concat(gather_list) + return paddle.transpose(gather, (1, 0)).reshape((0, shape0, shape1)) + + +class Conv2d_BN(nn.Sequential): + + def __init__(self, a, b, ks=1, stride=1, pad=0, dilation=1, groups=1, bn_weight_init=1, resolution=-10000): + super().__init__() + self.add_sublayer('c', nn.Conv2D(a, b, ks, stride, pad, dilation, groups, bias_attr=False)) + bn = nn.BatchNorm2D(b) + ones_(bn.weight) + zeros_(bn.bias) + self.add_sublayer('bn', bn) + + +class Linear_BN(nn.Sequential): + + def __init__(self, a, b, bn_weight_init=1): + super().__init__() + self.add_sublayer('c', nn.Linear(a, b, bias_attr=False)) + bn = nn.BatchNorm1D(b) + if bn_weight_init == 0: + zeros_(bn.weight) + else: + ones_(bn.weight) + zeros_(bn.bias) + self.add_sublayer('bn', bn) + + def forward(self, x): + l, bn = self._sub_layers.values() + x = l(x) + return paddle.reshape(bn(x.flatten(0, 1)), x.shape) + + +class BN_Linear(nn.Sequential): + + def __init__(self, a, b, bias=True, std=0.02): + super().__init__() + self.add_sublayer('bn', nn.BatchNorm1D(a)) + l = nn.Linear(a, b, bias_attr=bias) + trunc_normal_(l.weight) + if bias: + zeros_(l.bias) + self.add_sublayer('l', l) + + +def b16(n, activation, resolution=224): + return nn.Sequential(Conv2d_BN(3, n // 8, 3, 2, 1, resolution=resolution), activation(), + Conv2d_BN(n // 8, n // 4, 3, 2, 1, resolution=resolution // 2), activation(), + Conv2d_BN(n // 4, n // 2, 3, 2, 1, resolution=resolution // 4), activation(), + Conv2d_BN(n // 2, n, 3, 2, 1, resolution=resolution // 8)) + + +class Residual(nn.Layer): + + def __init__(self, m, drop): + super().__init__() + self.m = m + self.drop = drop + + def forward(self, x): + if self.training and self.drop > 0: + y = paddle.rand(shape=[x.shape[0], 1, 1]).__ge__(self.drop).astype("float32") + y = y.divide(paddle.full_like(y, 1 - self.drop)) + return paddle.add(x, y) + else: + return paddle.add(x, self.m(x)) + + +class Attention(nn.Layer): + + def __init__(self, dim, key_dim, num_heads=8, attn_ratio=4, activation=None, resolution=14): + super().__init__() + self.num_heads = num_heads + self.scale = key_dim**-0.5 + self.key_dim = key_dim + self.nh_kd = nh_kd = key_dim * num_heads + self.d = int(attn_ratio * key_dim) + self.dh = int(attn_ratio * key_dim) * num_heads + self.attn_ratio = attn_ratio + self.h = self.dh + nh_kd * 2 + self.qkv = Linear_BN(dim, self.h) + self.proj = nn.Sequential(activation(), Linear_BN(self.dh, dim, bn_weight_init=0)) + points = list(itertools.product(range(resolution), range(resolution))) + N = len(points) + attention_offsets = {} + idxs = [] + for p1 in points: + for p2 in points: + offset = (abs(p1[0] - p2[0]), abs(p1[1] - p2[1])) + if offset not in attention_offsets: + attention_offsets[offset] = len(attention_offsets) + idxs.append(attention_offsets[offset]) + self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)), + default_initializer=zeros_, + attr=paddle.ParamAttr(regularizer=L2Decay(0.0))) + tensor_idxs = paddle.to_tensor(idxs, dtype='int64') + self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs, [N, N])) + + @paddle.no_grad() + def train(self, mode=True): + if mode: + super().train() + else: + super().eval() + if mode and hasattr(self, 'ab'): + del self.ab + else: + self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + + def forward(self, x): + self.training = True + B, N, C = x.shape + qkv = self.qkv(x) + qkv = paddle.reshape(qkv, [B, N, self.num_heads, self.h // self.num_heads]) + q, k, v = paddle.split(qkv, [self.key_dim, self.key_dim, self.d], axis=3) + q = paddle.transpose(q, perm=[0, 2, 1, 3]) + k = paddle.transpose(k, perm=[0, 2, 1, 3]) + v = paddle.transpose(v, perm=[0, 2, 1, 3]) + k_transpose = paddle.transpose(k, perm=[0, 1, 3, 2]) + + if self.training: + attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + else: + attention_biases = self.ab + attn = (paddle.matmul(q, k_transpose) * self.scale + attention_biases) + attn = F.softmax(attn) + x = paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3]) + x = paddle.reshape(x, [B, N, self.dh]) + x = self.proj(x) + return x + + +class Subsample(nn.Layer): + + def __init__(self, stride, resolution): + super().__init__() + self.stride = stride + self.resolution = resolution + + def forward(self, x): + B, N, C = x.shape + x = paddle.reshape(x, [B, self.resolution, self.resolution, C]) + end1, end2 = x.shape[1], x.shape[2] + x = x[:, 0:end1:self.stride, 0:end2:self.stride] + x = paddle.reshape(x, [B, -1, C]) + return x + + +class AttentionSubsample(nn.Layer): + + def __init__(self, + in_dim, + out_dim, + key_dim, + num_heads=8, + attn_ratio=2, + activation=None, + stride=2, + resolution=14, + resolution_=7): + super().__init__() + self.num_heads = num_heads + self.scale = key_dim**-0.5 + self.key_dim = key_dim + self.nh_kd = nh_kd = key_dim * num_heads + self.d = int(attn_ratio * key_dim) + self.dh = int(attn_ratio * key_dim) * self.num_heads + self.attn_ratio = attn_ratio + self.resolution_ = resolution_ + self.resolution_2 = resolution_**2 + self.training = True + h = self.dh + nh_kd + self.kv = Linear_BN(in_dim, h) + + self.q = nn.Sequential(Subsample(stride, resolution), Linear_BN(in_dim, nh_kd)) + self.proj = nn.Sequential(activation(), Linear_BN(self.dh, out_dim)) + + self.stride = stride + self.resolution = resolution + points = list(itertools.product(range(resolution), range(resolution))) + points_ = list(itertools.product(range(resolution_), range(resolution_))) + + N = len(points) + N_ = len(points_) + attention_offsets = {} + idxs = [] + i = 0 + j = 0 + for p1 in points_: + i += 1 + for p2 in points: + j += 1 + size = 1 + offset = (abs(p1[0] * stride - p2[0] + (size - 1) / 2), abs(p1[1] * stride - p2[1] + (size - 1) / 2)) + if offset not in attention_offsets: + attention_offsets[offset] = len(attention_offsets) + idxs.append(attention_offsets[offset]) + self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)), + default_initializer=zeros_, + attr=paddle.ParamAttr(regularizer=L2Decay(0.0))) + + tensor_idxs_ = paddle.to_tensor(idxs, dtype='int64') + self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs_, [N_, N])) + + @paddle.no_grad() + def train(self, mode=True): + if mode: + super().train() + else: + super().eval() + if mode and hasattr(self, 'ab'): + del self.ab + else: + self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + + def forward(self, x): + self.training = True + B, N, C = x.shape + kv = self.kv(x) + kv = paddle.reshape(kv, [B, N, self.num_heads, -1]) + k, v = paddle.split(kv, [self.key_dim, self.d], axis=3) + k = paddle.transpose(k, perm=[0, 2, 1, 3]) # BHNC + v = paddle.transpose(v, perm=[0, 2, 1, 3]) + q = paddle.reshape(self.q(x), [B, self.resolution_2, self.num_heads, self.key_dim]) + q = paddle.transpose(q, perm=[0, 2, 1, 3]) + + if self.training: + attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + else: + attention_biases = self.ab + + attn = (paddle.matmul(q, paddle.transpose(k, perm=[0, 1, 3, 2]))) * self.scale + attention_biases + attn = F.softmax(attn) + + x = paddle.reshape(paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3]), [B, -1, self.dh]) + x = self.proj(x) + return x + + +class LeViT(nn.Layer): + """ Vision Transformer with support for patch or hybrid CNN input stage + """ + + def __init__(self, + img_size=224, + patch_size=16, + in_chans=3, + class_num=1000, + embed_dim=[192], + key_dim=[64], + depth=[12], + num_heads=[3], + attn_ratio=[2], + mlp_ratio=[2], + hybrid_backbone=None, + down_ops=[], + attention_activation=nn.Hardswish, + mlp_activation=nn.Hardswish, + distillation=True, + drop_path=0): + super().__init__() + + self.class_num = class_num + self.num_features = embed_dim[-1] + self.embed_dim = embed_dim + self.distillation = distillation + + self.patch_embed = hybrid_backbone + + self.blocks = [] + down_ops.append(['']) + resolution = img_size // patch_size + for i, (ed, kd, dpth, nh, ar, mr, + do) in enumerate(zip(embed_dim, key_dim, depth, num_heads, attn_ratio, mlp_ratio, down_ops)): + for _ in range(dpth): + self.blocks.append( + Residual( + Attention( + ed, + kd, + nh, + attn_ratio=ar, + activation=attention_activation, + resolution=resolution, + ), drop_path)) + if mr > 0: + h = int(ed * mr) + self.blocks.append( + Residual( + nn.Sequential( + Linear_BN(ed, h), + mlp_activation(), + Linear_BN(h, ed, bn_weight_init=0), + ), drop_path)) + if do[0] == 'Subsample': + #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride) + resolution_ = (resolution - 1) // do[5] + 1 + self.blocks.append( + AttentionSubsample(*embed_dim[i:i + 2], + key_dim=do[1], + num_heads=do[2], + attn_ratio=do[3], + activation=attention_activation, + stride=do[5], + resolution=resolution, + resolution_=resolution_)) + resolution = resolution_ + if do[4] > 0: # mlp_ratio + h = int(embed_dim[i + 1] * do[4]) + self.blocks.append( + Residual( + nn.Sequential( + Linear_BN(embed_dim[i + 1], h), + mlp_activation(), + Linear_BN(h, embed_dim[i + 1], bn_weight_init=0), + ), drop_path)) + self.blocks = nn.Sequential(*self.blocks) + + # Classifier head + self.head = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity() + if distillation: + self.head_dist = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity() + + def forward(self, x): + x = self.patch_embed(x) + x = x.flatten(2) + x = paddle.transpose(x, perm=[0, 2, 1]) + x = self.blocks(x) + x = x.mean(1) + + x = paddle.reshape(x, [-1, self.embed_dim[-1]]) + if self.distillation: + x = self.head(x), self.head_dist(x) + if not self.training: + x = (x[0] + x[1]) / 2 + else: + x = self.head(x) + return x + + +def model_factory(C, D, X, N, drop_path, class_num, distillation): + embed_dim = [int(x) for x in C.split('_')] + num_heads = [int(x) for x in N.split('_')] + depth = [int(x) for x in X.split('_')] + act = nn.Hardswish + model = LeViT( + patch_size=16, + embed_dim=embed_dim, + num_heads=num_heads, + key_dim=[D] * 3, + depth=depth, + attn_ratio=[2, 2, 2], + mlp_ratio=[2, 2, 2], + down_ops=[ + #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride) + ['Subsample', D, embed_dim[0] // D, 4, 2, 2], + ['Subsample', D, embed_dim[1] // D, 4, 2, 2], + ], + attention_activation=act, + mlp_activation=act, + hybrid_backbone=b16(embed_dim[0], activation=act), + class_num=class_num, + drop_path=drop_path, + distillation=distillation) + + return model + + +specification = { + 'LeViT_128S': { + 'C': '128_256_384', + 'D': 16, + 'N': '4_6_8', + 'X': '2_3_4', + 'drop_path': 0 + }, + 'LeViT_128': { + 'C': '128_256_384', + 'D': 16, + 'N': '4_8_12', + 'X': '4_4_4', + 'drop_path': 0 + }, + 'LeViT_192': { + 'C': '192_288_384', + 'D': 32, + 'N': '3_5_6', + 'X': '4_4_4', + 'drop_path': 0 + }, + 'LeViT_256': { + 'C': '256_384_512', + 'D': 32, + 'N': '4_6_8', + 'X': '4_4_4', + 'drop_path': 0 + }, + 'LeViT_384': { + 'C': '384_512_768', + 'D': 32, + 'N': '6_9_12', + 'X': '4_4_4', + 'drop_path': 0.1 + }, +} + + +def LeViT_192(**kwargs): + model = model_factory(**specification['LeViT_192'], class_num=1000, distillation=False) + return model diff --git a/modules/image/classification/levit_192_imagenet/module.py b/modules/image/classification/levit_192_imagenet/module.py new file mode 100644 index 0000000000000000000000000000000000000000..1e982e824dc7c0e99310ae73f63950e4e3bf0c7c --- /dev/null +++ b/modules/image/classification/levit_192_imagenet/module.py @@ -0,0 +1,154 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import cv2 +import numpy as np +import paddle +from skimage.io import imread +from skimage.transform import rescale +from skimage.transform import resize + +import paddlehub as hub +from .model import LeViT_192 +from .processor import base64_to_cv2 +from .processor import create_operators +from .processor import Topk +from .utils import get_config +from paddlehub.module.module import moduleinfo +from paddlehub.module.module import runnable +from paddlehub.module.module import serving + + +@moduleinfo(name="levit_192_imagenet", + type="cv/classification", + author="paddlepaddle", + author_email="", + summary="", + version="1.0.0") +class LeViT_192_ImageNet: + + def __init__(self): + self.config = get_config(os.path.join(self.directory, 'LeViT_192.yaml'), show=False) + self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt') + self.pretrain_path = os.path.join(self.directory, 'LeViT_192_pretrained.pdparams') + self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path + self.model = LeViT_192() + param_state_dict = paddle.load(self.pretrain_path) + self.model.set_dict(param_state_dict) + self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"]) + + def classification(self, + images: list = None, + paths: list = None, + batch_size: int = 1, + use_gpu: bool = False, + top_k: int = 1): + ''' + Args: + images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR. + paths (list[str]): The paths of images. + batch_size (int): batch size. + use_gpu (bool): Whether to use gpu. + top_k (int): Return top k results. + + Returns: + res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'. + ''' + postprocess_func = Topk(top_k, self.label_path) + inputs = [] + results = [] + paddle.disable_static() + place = 'gpu:0' if use_gpu else 'cpu' + place = paddle.set_device(place) + if images == None and paths == None: + print('No image provided. Please input an image or a image path.') + return + + if images != None: + for image in images: + image = image[:, :, ::-1] + inputs.append(image) + + if paths != None: + for path in paths: + image = cv2.imread(path)[:, :, ::-1] + inputs.append(image) + + batch_data = [] + for idx, imagedata in enumerate(inputs): + for process in self.preprocess_funcs: + imagedata = process(imagedata) + batch_data.append(imagedata) + if len(batch_data) >= batch_size or idx == len(inputs) - 1: + batch_tensor = paddle.to_tensor(batch_data) + out = self.model(batch_tensor) + if isinstance(out, list): + out = out[0] + if isinstance(out, dict) and "logits" in out: + out = out["logits"] + if isinstance(out, dict) and "output" in out: + out = out["output"] + result = postprocess_func(out) + results.extend(result) + batch_data.clear() + return results + + @runnable + def run_cmd(self, argvs: list): + """ + Run as a command. + """ + self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name), + prog='hub run {}'.format(self.name), + usage='%(prog)s', + add_help=True) + + self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required") + self.arg_config_group = self.parser.add_argument_group( + title="Config options", description="Run configuration for controlling module behavior, not required.") + self.add_module_config_arg() + self.add_module_input_arg() + self.args = self.parser.parse_args(argvs) + results = self.classification(paths=[self.args.input_path], + use_gpu=self.args.use_gpu, + batch_size=self.args.batch_size, + top_k=self.args.top_k) + return results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.classification(images=images_decode, **kwargs) + return results + + def add_module_config_arg(self): + """ + Add the command config options. + """ + self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not") + + self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size') + self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.') + + def add_module_input_arg(self): + """ + Add the command input options. + """ + self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.") diff --git a/modules/image/classification/levit_192_imagenet/processor.py b/modules/image/classification/levit_192_imagenet/processor.py new file mode 100644 index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8 --- /dev/null +++ b/modules/image/classification/levit_192_imagenet/processor.py @@ -0,0 +1,374 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import base64 +import inspect +import math +import os +import random +import sys +from functools import partial + +import cv2 +import numpy as np +import paddle +import paddle.nn.functional as F +import six +from paddle.vision.transforms import ColorJitter as RawColorJitter +from PIL import Image + + +def create_operators(params, class_num=None): + """ + create operators based on the config + + Args: + params(list): a dict list, used to create some operators + """ + assert isinstance(params, list), ('operator config should be a list') + ops = [] + current_module = sys.modules[__name__] + for operator in params: + assert isinstance(operator, dict) and len(operator) == 1, "yaml format error" + op_name = list(operator)[0] + param = {} if operator[op_name] is None else operator[op_name] + op_func = getattr(current_module, op_name) + if "class_num" in inspect.getfullargspec(op_func).args: + param.update({"class_num": class_num}) + op = op_func(**param) + ops.append(op) + + return ops + + +class UnifiedResize(object): + + def __init__(self, interpolation=None, backend="cv2"): + _cv2_interp_from_str = { + 'nearest': cv2.INTER_NEAREST, + 'bilinear': cv2.INTER_LINEAR, + 'area': cv2.INTER_AREA, + 'bicubic': cv2.INTER_CUBIC, + 'lanczos': cv2.INTER_LANCZOS4 + } + _pil_interp_from_str = { + 'nearest': Image.NEAREST, + 'bilinear': Image.BILINEAR, + 'bicubic': Image.BICUBIC, + 'box': Image.BOX, + 'lanczos': Image.LANCZOS, + 'hamming': Image.HAMMING + } + + def _pil_resize(src, size, resample): + pil_img = Image.fromarray(src) + pil_img = pil_img.resize(size, resample) + return np.asarray(pil_img) + + if backend.lower() == "cv2": + if isinstance(interpolation, str): + interpolation = _cv2_interp_from_str[interpolation.lower()] + # compatible with opencv < version 4.4.0 + elif interpolation is None: + interpolation = cv2.INTER_LINEAR + self.resize_func = partial(cv2.resize, interpolation=interpolation) + elif backend.lower() == "pil": + if isinstance(interpolation, str): + interpolation = _pil_interp_from_str[interpolation.lower()] + self.resize_func = partial(_pil_resize, resample=interpolation) + else: + self.resize_func = cv2.resize + + def __call__(self, src, size): + return self.resize_func(src, size) + + +class OperatorParamError(ValueError): + """ OperatorParamError + """ + pass + + +class DecodeImage(object): + """ decode image """ + + def __init__(self, to_rgb=True, to_np=False, channel_first=False): + self.to_rgb = to_rgb + self.to_np = to_np # to numpy + self.channel_first = channel_first # only enabled when to_np is True + + def __call__(self, img): + if six.PY2: + assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage" + else: + assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage" + data = np.frombuffer(img, dtype='uint8') + img = cv2.imdecode(data, 1) + if self.to_rgb: + assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape) + img = img[:, :, ::-1] + + if self.channel_first: + img = img.transpose((2, 0, 1)) + + return img + + +class ResizeImage(object): + """ resize image """ + + def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"): + if resize_short is not None and resize_short > 0: + self.resize_short = resize_short + self.w = None + self.h = None + elif size is not None: + self.resize_short = None + self.w = size if type(size) is int else size[0] + self.h = size if type(size) is int else size[1] + else: + raise OperatorParamError("invalid params for ReisizeImage for '\ + 'both 'size' and 'resize_short' are None") + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + img_h, img_w = img.shape[:2] + if self.resize_short is not None: + percent = float(self.resize_short) / min(img_w, img_h) + w = int(round(img_w * percent)) + h = int(round(img_h * percent)) + else: + w = self.w + h = self.h + return self._resize_func(img, (w, h)) + + +class CropImage(object): + """ crop image """ + + def __init__(self, size): + if type(size) is int: + self.size = (size, size) + else: + self.size = size # (h, w) + + def __call__(self, img): + w, h = self.size + img_h, img_w = img.shape[:2] + w_start = (img_w - w) // 2 + h_start = (img_h - h) // 2 + + w_end = w_start + w + h_end = h_start + h + return img[h_start:h_end, w_start:w_end, :] + + +class RandCropImage(object): + """ random crop image """ + + def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"): + if type(size) is int: + self.size = (size, size) # (h, w) + else: + self.size = size + + self.scale = [0.08, 1.0] if scale is None else scale + self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + size = self.size + scale = self.scale + ratio = self.ratio + + aspect_ratio = math.sqrt(random.uniform(*ratio)) + w = 1. * aspect_ratio + h = 1. / aspect_ratio + + img_h, img_w = img.shape[:2] + + bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2)) + scale_max = min(scale[1], bound) + scale_min = min(scale[0], bound) + + target_area = img_w * img_h * random.uniform(scale_min, scale_max) + target_size = math.sqrt(target_area) + w = int(target_size * w) + h = int(target_size * h) + + i = random.randint(0, img_w - w) + j = random.randint(0, img_h - h) + + img = img[j:j + h, i:i + w, :] + + return self._resize_func(img, size) + + +class RandFlipImage(object): + """ random flip image + flip_code: + 1: Flipped Horizontally + 0: Flipped Vertically + -1: Flipped Horizontally & Vertically + """ + + def __init__(self, flip_code=1): + assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]" + self.flip_code = flip_code + + def __call__(self, img): + if random.randint(0, 1) == 1: + return cv2.flip(img, self.flip_code) + else: + return img + + +class NormalizeImage(object): + """ normalize image such as substract mean, divide std + """ + + def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3): + if isinstance(scale, str): + scale = eval(scale) + assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4." + self.channel_num = channel_num + self.output_dtype = 'float16' if output_fp16 else 'float32' + self.scale = np.float32(scale if scale is not None else 1.0 / 255.0) + self.order = order + mean = mean if mean is not None else [0.485, 0.456, 0.406] + std = std if std is not None else [0.229, 0.224, 0.225] + + shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3) + self.mean = np.array(mean).reshape(shape).astype('float32') + self.std = np.array(std).reshape(shape).astype('float32') + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage" + + img = (img.astype('float32') * self.scale - self.mean) / self.std + + if self.channel_num == 4: + img_h = img.shape[1] if self.order == 'chw' else img.shape[0] + img_w = img.shape[2] if self.order == 'chw' else img.shape[1] + pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1)) + img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate( + (img, pad_zeros), axis=2)) + return img.astype(self.output_dtype) + + +class ToCHWImage(object): + """ convert hwc image to chw image + """ + + def __init__(self): + pass + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + return img.transpose((2, 0, 1)) + + +class ColorJitter(RawColorJitter): + """ColorJitter. + """ + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + def __call__(self, img): + if not isinstance(img, Image.Image): + img = np.ascontiguousarray(img) + img = Image.fromarray(img) + img = super()._apply_image(img) + if isinstance(img, Image.Image): + img = np.asarray(img) + return img + + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + +class Topk(object): + + def __init__(self, topk=1, class_id_map_file=None): + assert isinstance(topk, (int, )) + self.class_id_map = self.parse_class_id_map(class_id_map_file) + self.topk = topk + + def parse_class_id_map(self, class_id_map_file): + if class_id_map_file is None: + return None + if not os.path.exists(class_id_map_file): + print( + "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!" + ) + return None + + try: + class_id_map = {} + with open(class_id_map_file, "r") as fin: + lines = fin.readlines() + for line in lines: + partition = line.split("\n")[0].partition(" ") + class_id_map[int(partition[0])] = str(partition[-1]) + except Exception as ex: + print(ex) + class_id_map = None + return class_id_map + + def __call__(self, x, file_names=None, multilabel=False): + assert isinstance(x, paddle.Tensor) + if file_names is not None: + assert x.shape[0] == len(file_names) + x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x) + x = x.numpy() + y = [] + for idx, probs in enumerate(x): + index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where( + probs >= 0.5)[0].astype("int32") + clas_id_list = [] + score_list = [] + label_name_list = [] + for i in index: + clas_id_list.append(i.item()) + score_list.append(probs[i].item()) + if self.class_id_map is not None: + label_name_list.append(self.class_id_map[i.item()]) + result = { + "class_ids": clas_id_list, + "scores": np.around(score_list, decimals=5).tolist(), + } + if file_names is not None: + result["file_name"] = file_names[idx] + if label_name_list is not None: + result["label_names"] = label_name_list + y.append(result) + return y diff --git a/modules/image/classification/levit_192_imagenet/utils.py b/modules/image/classification/levit_192_imagenet/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537 --- /dev/null +++ b/modules/image/classification/levit_192_imagenet/utils.py @@ -0,0 +1,129 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import yaml + +__all__ = ['get_config'] + + +class AttrDict(dict): + + def __getattr__(self, key): + return self[key] + + def __setattr__(self, key, value): + if key in self.__dict__: + self.__dict__[key] = value + else: + self[key] = value + + def __deepcopy__(self, content): + return copy.deepcopy(dict(self)) + + +def create_attr_dict(yaml_config): + from ast import literal_eval + for key, value in yaml_config.items(): + if type(value) is dict: + yaml_config[key] = value = AttrDict(value) + if isinstance(value, str): + try: + value = literal_eval(value) + except BaseException: + pass + if isinstance(value, AttrDict): + create_attr_dict(yaml_config[key]) + else: + yaml_config[key] = value + + +def parse_config(cfg_file): + """Load a config file into AttrDict""" + with open(cfg_file, 'r') as fopen: + yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader)) + create_attr_dict(yaml_config) + return yaml_config + + +def override(dl, ks, v): + """ + Recursively replace dict of list + Args: + dl(dict or list): dict or list to be replaced + ks(list): list of keys + v(str): value to be replaced + """ + + def str2num(v): + try: + return eval(v) + except Exception: + return v + + assert isinstance(dl, (list, dict)), ("{} should be a list or a dict") + assert len(ks) > 0, ('lenght of keys should larger than 0') + if isinstance(dl, list): + k = str2num(ks[0]) + if len(ks) == 1: + assert k < len(dl), ('index({}) out of range({})'.format(k, dl)) + dl[k] = str2num(v) + else: + override(dl[k], ks[1:], v) + else: + if len(ks) == 1: + # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl)) + if not ks[0] in dl: + print('A new filed ({}) detected!'.format(ks[0], dl)) + dl[ks[0]] = str2num(v) + else: + override(dl[ks[0]], ks[1:], v) + + +def override_config(config, options=None): + """ + Recursively override the config + Args: + config(dict): dict to be replaced + options(list): list of pairs(key0.key1.idx.key2=value) + such as: [ + 'topk=2', + 'VALID.transforms.1.ResizeImage.resize_short=300' + ] + Returns: + config(dict): replaced config + """ + if options is not None: + for opt in options: + assert isinstance(opt, str), ("option({}) should be a str".format(opt)) + assert "=" in opt, ("option({}) should contain a =" + "to distinguish between key and value".format(opt)) + pair = opt.split('=') + assert len(pair) == 2, ("there can be only a = in the option") + key, value = pair + keys = key.split('.') + override(config, keys, value) + return config + + +def get_config(fname, overrides=None, show=False): + """ + Read config from file + """ + assert os.path.exists(fname), ('config file({}) is not exist'.format(fname)) + config = parse_config(fname) + override_config(config, overrides) + return config diff --git a/modules/image/classification/levit_256_imagenet/README.md b/modules/image/classification/levit_256_imagenet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..fefc5bebd522cd9199c1626764be47122e3fc761 --- /dev/null +++ b/modules/image/classification/levit_256_imagenet/README.md @@ -0,0 +1,132 @@ +# levit_256_imagenet + +|模型名称|levit_256_imagenet| +| :--- | :---: | +|类别|图像-图像分类| +|网络|LeViT| +|数据集|ImageNet-2012| +|是否支持Fine-tuning|否| +|模型大小|109 MB| +|最新更新日期|2022-04-02| +|数据指标|Acc| + + +## 一、模型基本信息 + + +- ### 模型介绍 + + - LeViT 是一种快速推理的、用于图像分类任务的混合神经网络。其设计之初考虑了网络模型在不同的硬件平台上的性能,因此能够更好地反映普遍应用的真实场景。通过大量实验,作者找到了卷积神经网络与 Transformer 体系更好的结合方式,并且提出了 attention-based 方法,用于整合 Transformer 中的位置信息编码, 该模块的模型结构配置为LeViT256, 详情可参考[论文地址](https://arxiv.org/abs/2104.01136)。 + + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 1.6.2 + + - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + +- ### 2、安装 + + - ```shell + $ hub install levit_256_imagenet + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run levit_256_imagenet --input_path "/PATH/TO/IMAGE" + ``` + - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、预测代码示例 + + - ```python + import paddlehub as hub + import cv2 + + classifier = hub.Module(name="levit_256_imagenet") + result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')]) + # or + # result = classifier.classification(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + + - ```python + def classification(images=None, + paths=None, + batch_size=1, + use_gpu=False, + top_k=1): + ``` + - 分类接口API。 + - **参数** + + - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。 + + - **返回** + + - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称) + + +## 四、服务部署 + +- PaddleHub Serving可以部署一个图像识别的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m levit_256_imagenet + ``` + + - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"\} + url = "http://127.0.0.1:8866/predict/levit_256_imagenet" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` + + +## 五、更新历史 + +* 1.0.0 + + 初始发布 + + - ```shell + $ hub install levit_256_imagenet==1.0.0 + ``` diff --git a/modules/image/classification/levit_256_imagenet/model.py b/modules/image/classification/levit_256_imagenet/model.py new file mode 100644 index 0000000000000000000000000000000000000000..66b5cd8d040627c419d7df47f20eddd52760ff5a --- /dev/null +++ b/modules/image/classification/levit_256_imagenet/model.py @@ -0,0 +1,450 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# Code was based on https://github.com/facebookresearch/LeViT +import itertools +import math +import warnings + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.initializer import Constant +from paddle.nn.initializer import TruncatedNormal +from paddle.regularizer import L2Decay + +from .vision_transformer import Identity +from .vision_transformer import ones_ +from .vision_transformer import trunc_normal_ +from .vision_transformer import zeros_ + + +def cal_attention_biases(attention_biases, attention_bias_idxs): + gather_list = [] + attention_bias_t = paddle.transpose(attention_biases, (1, 0)) + nums = attention_bias_idxs.shape[0] + for idx in range(nums): + gather = paddle.gather(attention_bias_t, attention_bias_idxs[idx]) + gather_list.append(gather) + shape0, shape1 = attention_bias_idxs.shape + gather = paddle.concat(gather_list) + return paddle.transpose(gather, (1, 0)).reshape((0, shape0, shape1)) + + +class Conv2d_BN(nn.Sequential): + + def __init__(self, a, b, ks=1, stride=1, pad=0, dilation=1, groups=1, bn_weight_init=1, resolution=-10000): + super().__init__() + self.add_sublayer('c', nn.Conv2D(a, b, ks, stride, pad, dilation, groups, bias_attr=False)) + bn = nn.BatchNorm2D(b) + ones_(bn.weight) + zeros_(bn.bias) + self.add_sublayer('bn', bn) + + +class Linear_BN(nn.Sequential): + + def __init__(self, a, b, bn_weight_init=1): + super().__init__() + self.add_sublayer('c', nn.Linear(a, b, bias_attr=False)) + bn = nn.BatchNorm1D(b) + if bn_weight_init == 0: + zeros_(bn.weight) + else: + ones_(bn.weight) + zeros_(bn.bias) + self.add_sublayer('bn', bn) + + def forward(self, x): + l, bn = self._sub_layers.values() + x = l(x) + return paddle.reshape(bn(x.flatten(0, 1)), x.shape) + + +class BN_Linear(nn.Sequential): + + def __init__(self, a, b, bias=True, std=0.02): + super().__init__() + self.add_sublayer('bn', nn.BatchNorm1D(a)) + l = nn.Linear(a, b, bias_attr=bias) + trunc_normal_(l.weight) + if bias: + zeros_(l.bias) + self.add_sublayer('l', l) + + +def b16(n, activation, resolution=224): + return nn.Sequential(Conv2d_BN(3, n // 8, 3, 2, 1, resolution=resolution), activation(), + Conv2d_BN(n // 8, n // 4, 3, 2, 1, resolution=resolution // 2), activation(), + Conv2d_BN(n // 4, n // 2, 3, 2, 1, resolution=resolution // 4), activation(), + Conv2d_BN(n // 2, n, 3, 2, 1, resolution=resolution // 8)) + + +class Residual(nn.Layer): + + def __init__(self, m, drop): + super().__init__() + self.m = m + self.drop = drop + + def forward(self, x): + if self.training and self.drop > 0: + y = paddle.rand(shape=[x.shape[0], 1, 1]).__ge__(self.drop).astype("float32") + y = y.divide(paddle.full_like(y, 1 - self.drop)) + return paddle.add(x, y) + else: + return paddle.add(x, self.m(x)) + + +class Attention(nn.Layer): + + def __init__(self, dim, key_dim, num_heads=8, attn_ratio=4, activation=None, resolution=14): + super().__init__() + self.num_heads = num_heads + self.scale = key_dim**-0.5 + self.key_dim = key_dim + self.nh_kd = nh_kd = key_dim * num_heads + self.d = int(attn_ratio * key_dim) + self.dh = int(attn_ratio * key_dim) * num_heads + self.attn_ratio = attn_ratio + self.h = self.dh + nh_kd * 2 + self.qkv = Linear_BN(dim, self.h) + self.proj = nn.Sequential(activation(), Linear_BN(self.dh, dim, bn_weight_init=0)) + points = list(itertools.product(range(resolution), range(resolution))) + N = len(points) + attention_offsets = {} + idxs = [] + for p1 in points: + for p2 in points: + offset = (abs(p1[0] - p2[0]), abs(p1[1] - p2[1])) + if offset not in attention_offsets: + attention_offsets[offset] = len(attention_offsets) + idxs.append(attention_offsets[offset]) + self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)), + default_initializer=zeros_, + attr=paddle.ParamAttr(regularizer=L2Decay(0.0))) + tensor_idxs = paddle.to_tensor(idxs, dtype='int64') + self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs, [N, N])) + + @paddle.no_grad() + def train(self, mode=True): + if mode: + super().train() + else: + super().eval() + if mode and hasattr(self, 'ab'): + del self.ab + else: + self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + + def forward(self, x): + self.training = True + B, N, C = x.shape + qkv = self.qkv(x) + qkv = paddle.reshape(qkv, [B, N, self.num_heads, self.h // self.num_heads]) + q, k, v = paddle.split(qkv, [self.key_dim, self.key_dim, self.d], axis=3) + q = paddle.transpose(q, perm=[0, 2, 1, 3]) + k = paddle.transpose(k, perm=[0, 2, 1, 3]) + v = paddle.transpose(v, perm=[0, 2, 1, 3]) + k_transpose = paddle.transpose(k, perm=[0, 1, 3, 2]) + + if self.training: + attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + else: + attention_biases = self.ab + attn = (paddle.matmul(q, k_transpose) * self.scale + attention_biases) + attn = F.softmax(attn) + x = paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3]) + x = paddle.reshape(x, [B, N, self.dh]) + x = self.proj(x) + return x + + +class Subsample(nn.Layer): + + def __init__(self, stride, resolution): + super().__init__() + self.stride = stride + self.resolution = resolution + + def forward(self, x): + B, N, C = x.shape + x = paddle.reshape(x, [B, self.resolution, self.resolution, C]) + end1, end2 = x.shape[1], x.shape[2] + x = x[:, 0:end1:self.stride, 0:end2:self.stride] + x = paddle.reshape(x, [B, -1, C]) + return x + + +class AttentionSubsample(nn.Layer): + + def __init__(self, + in_dim, + out_dim, + key_dim, + num_heads=8, + attn_ratio=2, + activation=None, + stride=2, + resolution=14, + resolution_=7): + super().__init__() + self.num_heads = num_heads + self.scale = key_dim**-0.5 + self.key_dim = key_dim + self.nh_kd = nh_kd = key_dim * num_heads + self.d = int(attn_ratio * key_dim) + self.dh = int(attn_ratio * key_dim) * self.num_heads + self.attn_ratio = attn_ratio + self.resolution_ = resolution_ + self.resolution_2 = resolution_**2 + self.training = True + h = self.dh + nh_kd + self.kv = Linear_BN(in_dim, h) + + self.q = nn.Sequential(Subsample(stride, resolution), Linear_BN(in_dim, nh_kd)) + self.proj = nn.Sequential(activation(), Linear_BN(self.dh, out_dim)) + + self.stride = stride + self.resolution = resolution + points = list(itertools.product(range(resolution), range(resolution))) + points_ = list(itertools.product(range(resolution_), range(resolution_))) + + N = len(points) + N_ = len(points_) + attention_offsets = {} + idxs = [] + i = 0 + j = 0 + for p1 in points_: + i += 1 + for p2 in points: + j += 1 + size = 1 + offset = (abs(p1[0] * stride - p2[0] + (size - 1) / 2), abs(p1[1] * stride - p2[1] + (size - 1) / 2)) + if offset not in attention_offsets: + attention_offsets[offset] = len(attention_offsets) + idxs.append(attention_offsets[offset]) + self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)), + default_initializer=zeros_, + attr=paddle.ParamAttr(regularizer=L2Decay(0.0))) + + tensor_idxs_ = paddle.to_tensor(idxs, dtype='int64') + self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs_, [N_, N])) + + @paddle.no_grad() + def train(self, mode=True): + if mode: + super().train() + else: + super().eval() + if mode and hasattr(self, 'ab'): + del self.ab + else: + self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + + def forward(self, x): + self.training = True + B, N, C = x.shape + kv = self.kv(x) + kv = paddle.reshape(kv, [B, N, self.num_heads, -1]) + k, v = paddle.split(kv, [self.key_dim, self.d], axis=3) + k = paddle.transpose(k, perm=[0, 2, 1, 3]) # BHNC + v = paddle.transpose(v, perm=[0, 2, 1, 3]) + q = paddle.reshape(self.q(x), [B, self.resolution_2, self.num_heads, self.key_dim]) + q = paddle.transpose(q, perm=[0, 2, 1, 3]) + + if self.training: + attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + else: + attention_biases = self.ab + + attn = (paddle.matmul(q, paddle.transpose(k, perm=[0, 1, 3, 2]))) * self.scale + attention_biases + attn = F.softmax(attn) + + x = paddle.reshape(paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3]), [B, -1, self.dh]) + x = self.proj(x) + return x + + +class LeViT(nn.Layer): + """ Vision Transformer with support for patch or hybrid CNN input stage + """ + + def __init__(self, + img_size=224, + patch_size=16, + in_chans=3, + class_num=1000, + embed_dim=[192], + key_dim=[64], + depth=[12], + num_heads=[3], + attn_ratio=[2], + mlp_ratio=[2], + hybrid_backbone=None, + down_ops=[], + attention_activation=nn.Hardswish, + mlp_activation=nn.Hardswish, + distillation=True, + drop_path=0): + super().__init__() + + self.class_num = class_num + self.num_features = embed_dim[-1] + self.embed_dim = embed_dim + self.distillation = distillation + + self.patch_embed = hybrid_backbone + + self.blocks = [] + down_ops.append(['']) + resolution = img_size // patch_size + for i, (ed, kd, dpth, nh, ar, mr, + do) in enumerate(zip(embed_dim, key_dim, depth, num_heads, attn_ratio, mlp_ratio, down_ops)): + for _ in range(dpth): + self.blocks.append( + Residual( + Attention( + ed, + kd, + nh, + attn_ratio=ar, + activation=attention_activation, + resolution=resolution, + ), drop_path)) + if mr > 0: + h = int(ed * mr) + self.blocks.append( + Residual( + nn.Sequential( + Linear_BN(ed, h), + mlp_activation(), + Linear_BN(h, ed, bn_weight_init=0), + ), drop_path)) + if do[0] == 'Subsample': + #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride) + resolution_ = (resolution - 1) // do[5] + 1 + self.blocks.append( + AttentionSubsample(*embed_dim[i:i + 2], + key_dim=do[1], + num_heads=do[2], + attn_ratio=do[3], + activation=attention_activation, + stride=do[5], + resolution=resolution, + resolution_=resolution_)) + resolution = resolution_ + if do[4] > 0: # mlp_ratio + h = int(embed_dim[i + 1] * do[4]) + self.blocks.append( + Residual( + nn.Sequential( + Linear_BN(embed_dim[i + 1], h), + mlp_activation(), + Linear_BN(h, embed_dim[i + 1], bn_weight_init=0), + ), drop_path)) + self.blocks = nn.Sequential(*self.blocks) + + # Classifier head + self.head = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity() + if distillation: + self.head_dist = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity() + + def forward(self, x): + x = self.patch_embed(x) + x = x.flatten(2) + x = paddle.transpose(x, perm=[0, 2, 1]) + x = self.blocks(x) + x = x.mean(1) + + x = paddle.reshape(x, [-1, self.embed_dim[-1]]) + if self.distillation: + x = self.head(x), self.head_dist(x) + if not self.training: + x = (x[0] + x[1]) / 2 + else: + x = self.head(x) + return x + + +def model_factory(C, D, X, N, drop_path, class_num, distillation): + embed_dim = [int(x) for x in C.split('_')] + num_heads = [int(x) for x in N.split('_')] + depth = [int(x) for x in X.split('_')] + act = nn.Hardswish + model = LeViT( + patch_size=16, + embed_dim=embed_dim, + num_heads=num_heads, + key_dim=[D] * 3, + depth=depth, + attn_ratio=[2, 2, 2], + mlp_ratio=[2, 2, 2], + down_ops=[ + #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride) + ['Subsample', D, embed_dim[0] // D, 4, 2, 2], + ['Subsample', D, embed_dim[1] // D, 4, 2, 2], + ], + attention_activation=act, + mlp_activation=act, + hybrid_backbone=b16(embed_dim[0], activation=act), + class_num=class_num, + drop_path=drop_path, + distillation=distillation) + + return model + + +specification = { + 'LeViT_128S': { + 'C': '128_256_384', + 'D': 16, + 'N': '4_6_8', + 'X': '2_3_4', + 'drop_path': 0 + }, + 'LeViT_128': { + 'C': '128_256_384', + 'D': 16, + 'N': '4_8_12', + 'X': '4_4_4', + 'drop_path': 0 + }, + 'LeViT_192': { + 'C': '192_288_384', + 'D': 32, + 'N': '3_5_6', + 'X': '4_4_4', + 'drop_path': 0 + }, + 'LeViT_256': { + 'C': '256_384_512', + 'D': 32, + 'N': '4_6_8', + 'X': '4_4_4', + 'drop_path': 0 + }, + 'LeViT_384': { + 'C': '384_512_768', + 'D': 32, + 'N': '6_9_12', + 'X': '4_4_4', + 'drop_path': 0.1 + }, +} + + +def LeViT_256(**kwargs): + model = model_factory(**specification['LeViT_256'], class_num=1000, distillation=False) + return model diff --git a/modules/image/classification/levit_256_imagenet/module.py b/modules/image/classification/levit_256_imagenet/module.py new file mode 100644 index 0000000000000000000000000000000000000000..161cc02c0c69a2a7ebd901f1f783d35d87e0668d --- /dev/null +++ b/modules/image/classification/levit_256_imagenet/module.py @@ -0,0 +1,154 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import cv2 +import numpy as np +import paddle +from skimage.io import imread +from skimage.transform import rescale +from skimage.transform import resize + +import paddlehub as hub +from .model import LeViT_256 +from .processor import base64_to_cv2 +from .processor import create_operators +from .processor import Topk +from .utils import get_config +from paddlehub.module.module import moduleinfo +from paddlehub.module.module import runnable +from paddlehub.module.module import serving + + +@moduleinfo(name="levit_256_imagenet", + type="cv/classification", + author="paddlepaddle", + author_email="", + summary="", + version="1.0.0") +class LeViT_256_ImageNet: + + def __init__(self): + self.config = get_config(os.path.join(self.directory, 'LeViT_256.yaml'), show=False) + self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt') + self.pretrain_path = os.path.join(self.directory, 'LeViT_256_pretrained.pdparams') + self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path + self.model = LeViT_256() + param_state_dict = paddle.load(self.pretrain_path) + self.model.set_dict(param_state_dict) + self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"]) + + def classification(self, + images: list = None, + paths: list = None, + batch_size: int = 1, + use_gpu: bool = False, + top_k: int = 1): + ''' + Args: + images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR. + paths (list[str]): The paths of images. + batch_size (int): batch size. + use_gpu (bool): Whether to use gpu. + top_k (int): Return top k results. + + Returns: + res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'. + ''' + postprocess_func = Topk(top_k, self.label_path) + inputs = [] + results = [] + paddle.disable_static() + place = 'gpu:0' if use_gpu else 'cpu' + place = paddle.set_device(place) + if images == None and paths == None: + print('No image provided. Please input an image or a image path.') + return + + if images != None: + for image in images: + image = image[:, :, ::-1] + inputs.append(image) + + if paths != None: + for path in paths: + image = cv2.imread(path)[:, :, ::-1] + inputs.append(image) + + batch_data = [] + for idx, imagedata in enumerate(inputs): + for process in self.preprocess_funcs: + imagedata = process(imagedata) + batch_data.append(imagedata) + if len(batch_data) >= batch_size or idx == len(inputs) - 1: + batch_tensor = paddle.to_tensor(batch_data) + out = self.model(batch_tensor) + if isinstance(out, list): + out = out[0] + if isinstance(out, dict) and "logits" in out: + out = out["logits"] + if isinstance(out, dict) and "output" in out: + out = out["output"] + result = postprocess_func(out) + results.extend(result) + batch_data.clear() + return results + + @runnable + def run_cmd(self, argvs: list): + """ + Run as a command. + """ + self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name), + prog='hub run {}'.format(self.name), + usage='%(prog)s', + add_help=True) + + self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required") + self.arg_config_group = self.parser.add_argument_group( + title="Config options", description="Run configuration for controlling module behavior, not required.") + self.add_module_config_arg() + self.add_module_input_arg() + self.args = self.parser.parse_args(argvs) + results = self.classification(paths=[self.args.input_path], + use_gpu=self.args.use_gpu, + batch_size=self.args.batch_size, + top_k=self.args.top_k) + return results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.classification(images=images_decode, **kwargs) + return results + + def add_module_config_arg(self): + """ + Add the command config options. + """ + self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not") + + self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size') + self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.') + + def add_module_input_arg(self): + """ + Add the command input options. + """ + self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.") diff --git a/modules/image/classification/levit_256_imagenet/processor.py b/modules/image/classification/levit_256_imagenet/processor.py new file mode 100644 index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8 --- /dev/null +++ b/modules/image/classification/levit_256_imagenet/processor.py @@ -0,0 +1,374 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import base64 +import inspect +import math +import os +import random +import sys +from functools import partial + +import cv2 +import numpy as np +import paddle +import paddle.nn.functional as F +import six +from paddle.vision.transforms import ColorJitter as RawColorJitter +from PIL import Image + + +def create_operators(params, class_num=None): + """ + create operators based on the config + + Args: + params(list): a dict list, used to create some operators + """ + assert isinstance(params, list), ('operator config should be a list') + ops = [] + current_module = sys.modules[__name__] + for operator in params: + assert isinstance(operator, dict) and len(operator) == 1, "yaml format error" + op_name = list(operator)[0] + param = {} if operator[op_name] is None else operator[op_name] + op_func = getattr(current_module, op_name) + if "class_num" in inspect.getfullargspec(op_func).args: + param.update({"class_num": class_num}) + op = op_func(**param) + ops.append(op) + + return ops + + +class UnifiedResize(object): + + def __init__(self, interpolation=None, backend="cv2"): + _cv2_interp_from_str = { + 'nearest': cv2.INTER_NEAREST, + 'bilinear': cv2.INTER_LINEAR, + 'area': cv2.INTER_AREA, + 'bicubic': cv2.INTER_CUBIC, + 'lanczos': cv2.INTER_LANCZOS4 + } + _pil_interp_from_str = { + 'nearest': Image.NEAREST, + 'bilinear': Image.BILINEAR, + 'bicubic': Image.BICUBIC, + 'box': Image.BOX, + 'lanczos': Image.LANCZOS, + 'hamming': Image.HAMMING + } + + def _pil_resize(src, size, resample): + pil_img = Image.fromarray(src) + pil_img = pil_img.resize(size, resample) + return np.asarray(pil_img) + + if backend.lower() == "cv2": + if isinstance(interpolation, str): + interpolation = _cv2_interp_from_str[interpolation.lower()] + # compatible with opencv < version 4.4.0 + elif interpolation is None: + interpolation = cv2.INTER_LINEAR + self.resize_func = partial(cv2.resize, interpolation=interpolation) + elif backend.lower() == "pil": + if isinstance(interpolation, str): + interpolation = _pil_interp_from_str[interpolation.lower()] + self.resize_func = partial(_pil_resize, resample=interpolation) + else: + self.resize_func = cv2.resize + + def __call__(self, src, size): + return self.resize_func(src, size) + + +class OperatorParamError(ValueError): + """ OperatorParamError + """ + pass + + +class DecodeImage(object): + """ decode image """ + + def __init__(self, to_rgb=True, to_np=False, channel_first=False): + self.to_rgb = to_rgb + self.to_np = to_np # to numpy + self.channel_first = channel_first # only enabled when to_np is True + + def __call__(self, img): + if six.PY2: + assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage" + else: + assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage" + data = np.frombuffer(img, dtype='uint8') + img = cv2.imdecode(data, 1) + if self.to_rgb: + assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape) + img = img[:, :, ::-1] + + if self.channel_first: + img = img.transpose((2, 0, 1)) + + return img + + +class ResizeImage(object): + """ resize image """ + + def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"): + if resize_short is not None and resize_short > 0: + self.resize_short = resize_short + self.w = None + self.h = None + elif size is not None: + self.resize_short = None + self.w = size if type(size) is int else size[0] + self.h = size if type(size) is int else size[1] + else: + raise OperatorParamError("invalid params for ReisizeImage for '\ + 'both 'size' and 'resize_short' are None") + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + img_h, img_w = img.shape[:2] + if self.resize_short is not None: + percent = float(self.resize_short) / min(img_w, img_h) + w = int(round(img_w * percent)) + h = int(round(img_h * percent)) + else: + w = self.w + h = self.h + return self._resize_func(img, (w, h)) + + +class CropImage(object): + """ crop image """ + + def __init__(self, size): + if type(size) is int: + self.size = (size, size) + else: + self.size = size # (h, w) + + def __call__(self, img): + w, h = self.size + img_h, img_w = img.shape[:2] + w_start = (img_w - w) // 2 + h_start = (img_h - h) // 2 + + w_end = w_start + w + h_end = h_start + h + return img[h_start:h_end, w_start:w_end, :] + + +class RandCropImage(object): + """ random crop image """ + + def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"): + if type(size) is int: + self.size = (size, size) # (h, w) + else: + self.size = size + + self.scale = [0.08, 1.0] if scale is None else scale + self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + size = self.size + scale = self.scale + ratio = self.ratio + + aspect_ratio = math.sqrt(random.uniform(*ratio)) + w = 1. * aspect_ratio + h = 1. / aspect_ratio + + img_h, img_w = img.shape[:2] + + bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2)) + scale_max = min(scale[1], bound) + scale_min = min(scale[0], bound) + + target_area = img_w * img_h * random.uniform(scale_min, scale_max) + target_size = math.sqrt(target_area) + w = int(target_size * w) + h = int(target_size * h) + + i = random.randint(0, img_w - w) + j = random.randint(0, img_h - h) + + img = img[j:j + h, i:i + w, :] + + return self._resize_func(img, size) + + +class RandFlipImage(object): + """ random flip image + flip_code: + 1: Flipped Horizontally + 0: Flipped Vertically + -1: Flipped Horizontally & Vertically + """ + + def __init__(self, flip_code=1): + assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]" + self.flip_code = flip_code + + def __call__(self, img): + if random.randint(0, 1) == 1: + return cv2.flip(img, self.flip_code) + else: + return img + + +class NormalizeImage(object): + """ normalize image such as substract mean, divide std + """ + + def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3): + if isinstance(scale, str): + scale = eval(scale) + assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4." + self.channel_num = channel_num + self.output_dtype = 'float16' if output_fp16 else 'float32' + self.scale = np.float32(scale if scale is not None else 1.0 / 255.0) + self.order = order + mean = mean if mean is not None else [0.485, 0.456, 0.406] + std = std if std is not None else [0.229, 0.224, 0.225] + + shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3) + self.mean = np.array(mean).reshape(shape).astype('float32') + self.std = np.array(std).reshape(shape).astype('float32') + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage" + + img = (img.astype('float32') * self.scale - self.mean) / self.std + + if self.channel_num == 4: + img_h = img.shape[1] if self.order == 'chw' else img.shape[0] + img_w = img.shape[2] if self.order == 'chw' else img.shape[1] + pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1)) + img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate( + (img, pad_zeros), axis=2)) + return img.astype(self.output_dtype) + + +class ToCHWImage(object): + """ convert hwc image to chw image + """ + + def __init__(self): + pass + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + return img.transpose((2, 0, 1)) + + +class ColorJitter(RawColorJitter): + """ColorJitter. + """ + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + def __call__(self, img): + if not isinstance(img, Image.Image): + img = np.ascontiguousarray(img) + img = Image.fromarray(img) + img = super()._apply_image(img) + if isinstance(img, Image.Image): + img = np.asarray(img) + return img + + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + +class Topk(object): + + def __init__(self, topk=1, class_id_map_file=None): + assert isinstance(topk, (int, )) + self.class_id_map = self.parse_class_id_map(class_id_map_file) + self.topk = topk + + def parse_class_id_map(self, class_id_map_file): + if class_id_map_file is None: + return None + if not os.path.exists(class_id_map_file): + print( + "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!" + ) + return None + + try: + class_id_map = {} + with open(class_id_map_file, "r") as fin: + lines = fin.readlines() + for line in lines: + partition = line.split("\n")[0].partition(" ") + class_id_map[int(partition[0])] = str(partition[-1]) + except Exception as ex: + print(ex) + class_id_map = None + return class_id_map + + def __call__(self, x, file_names=None, multilabel=False): + assert isinstance(x, paddle.Tensor) + if file_names is not None: + assert x.shape[0] == len(file_names) + x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x) + x = x.numpy() + y = [] + for idx, probs in enumerate(x): + index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where( + probs >= 0.5)[0].astype("int32") + clas_id_list = [] + score_list = [] + label_name_list = [] + for i in index: + clas_id_list.append(i.item()) + score_list.append(probs[i].item()) + if self.class_id_map is not None: + label_name_list.append(self.class_id_map[i.item()]) + result = { + "class_ids": clas_id_list, + "scores": np.around(score_list, decimals=5).tolist(), + } + if file_names is not None: + result["file_name"] = file_names[idx] + if label_name_list is not None: + result["label_names"] = label_name_list + y.append(result) + return y diff --git a/modules/image/classification/levit_256_imagenet/utils.py b/modules/image/classification/levit_256_imagenet/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537 --- /dev/null +++ b/modules/image/classification/levit_256_imagenet/utils.py @@ -0,0 +1,129 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import yaml + +__all__ = ['get_config'] + + +class AttrDict(dict): + + def __getattr__(self, key): + return self[key] + + def __setattr__(self, key, value): + if key in self.__dict__: + self.__dict__[key] = value + else: + self[key] = value + + def __deepcopy__(self, content): + return copy.deepcopy(dict(self)) + + +def create_attr_dict(yaml_config): + from ast import literal_eval + for key, value in yaml_config.items(): + if type(value) is dict: + yaml_config[key] = value = AttrDict(value) + if isinstance(value, str): + try: + value = literal_eval(value) + except BaseException: + pass + if isinstance(value, AttrDict): + create_attr_dict(yaml_config[key]) + else: + yaml_config[key] = value + + +def parse_config(cfg_file): + """Load a config file into AttrDict""" + with open(cfg_file, 'r') as fopen: + yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader)) + create_attr_dict(yaml_config) + return yaml_config + + +def override(dl, ks, v): + """ + Recursively replace dict of list + Args: + dl(dict or list): dict or list to be replaced + ks(list): list of keys + v(str): value to be replaced + """ + + def str2num(v): + try: + return eval(v) + except Exception: + return v + + assert isinstance(dl, (list, dict)), ("{} should be a list or a dict") + assert len(ks) > 0, ('lenght of keys should larger than 0') + if isinstance(dl, list): + k = str2num(ks[0]) + if len(ks) == 1: + assert k < len(dl), ('index({}) out of range({})'.format(k, dl)) + dl[k] = str2num(v) + else: + override(dl[k], ks[1:], v) + else: + if len(ks) == 1: + # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl)) + if not ks[0] in dl: + print('A new filed ({}) detected!'.format(ks[0], dl)) + dl[ks[0]] = str2num(v) + else: + override(dl[ks[0]], ks[1:], v) + + +def override_config(config, options=None): + """ + Recursively override the config + Args: + config(dict): dict to be replaced + options(list): list of pairs(key0.key1.idx.key2=value) + such as: [ + 'topk=2', + 'VALID.transforms.1.ResizeImage.resize_short=300' + ] + Returns: + config(dict): replaced config + """ + if options is not None: + for opt in options: + assert isinstance(opt, str), ("option({}) should be a str".format(opt)) + assert "=" in opt, ("option({}) should contain a =" + "to distinguish between key and value".format(opt)) + pair = opt.split('=') + assert len(pair) == 2, ("there can be only a = in the option") + key, value = pair + keys = key.split('.') + override(config, keys, value) + return config + + +def get_config(fname, overrides=None, show=False): + """ + Read config from file + """ + assert os.path.exists(fname), ('config file({}) is not exist'.format(fname)) + config = parse_config(fname) + override_config(config, overrides) + return config diff --git a/modules/image/classification/levit_384_imagenet/README.md b/modules/image/classification/levit_384_imagenet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..45149034bf566a80d67c855ca3a0fb69435ab47e --- /dev/null +++ b/modules/image/classification/levit_384_imagenet/README.md @@ -0,0 +1,132 @@ +# levit_384_imagenet + +|模型名称|levit_384_imagenet| +| :--- | :---: | +|类别|图像-图像分类| +|网络|LeViT| +|数据集|ImageNet-2012| +|是否支持Fine-tuning|否| +|模型大小|225 MB| +|最新更新日期|2022-04-02| +|数据指标|Acc| + + +## 一、模型基本信息 + + +- ### 模型介绍 + + - LeViT 是一种快速推理的、用于图像分类任务的混合神经网络。其设计之初考虑了网络模型在不同的硬件平台上的性能,因此能够更好地反映普遍应用的真实场景。通过大量实验,作者找到了卷积神经网络与 Transformer 体系更好的结合方式,并且提出了 attention-based 方法,用于整合 Transformer 中的位置信息编码, 该模块的模型结构配置为LeViT384, 详情可参考[论文地址](https://arxiv.org/abs/2104.01136)。 + + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 1.6.2 + + - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + +- ### 2、安装 + + - ```shell + $ hub install levit_384_imagenet + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run levit_384_imagenet --input_path "/PATH/TO/IMAGE" + ``` + - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、预测代码示例 + + - ```python + import paddlehub as hub + import cv2 + + classifier = hub.Module(name="levit_384_imagenet") + result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')]) + # or + # result = classifier.classification(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + + - ```python + def classification(images=None, + paths=None, + batch_size=1, + use_gpu=False, + top_k=1): + ``` + - 分类接口API。 + - **参数** + + - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。 + + - **返回** + + - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称) + + +## 四、服务部署 + +- PaddleHub Serving可以部署一个图像识别的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m levit_384_imagenet + ``` + + - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"\} + url = "http://127.0.0.1:8866/predict/levit_384_imagenet" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` + + +## 五、更新历史 + +* 1.0.0 + + 初始发布 + + - ```shell + $ hub install levit_384_imagenet==1.0.0 + ``` diff --git a/modules/image/classification/levit_384_imagenet/model.py b/modules/image/classification/levit_384_imagenet/model.py new file mode 100644 index 0000000000000000000000000000000000000000..c1b3bf68f486dd3209860fe71614a1319e4f6bdb --- /dev/null +++ b/modules/image/classification/levit_384_imagenet/model.py @@ -0,0 +1,450 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# Code was based on https://github.com/facebookresearch/LeViT +import itertools +import math +import warnings + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.initializer import Constant +from paddle.nn.initializer import TruncatedNormal +from paddle.regularizer import L2Decay + +from .vision_transformer import Identity +from .vision_transformer import ones_ +from .vision_transformer import trunc_normal_ +from .vision_transformer import zeros_ + + +def cal_attention_biases(attention_biases, attention_bias_idxs): + gather_list = [] + attention_bias_t = paddle.transpose(attention_biases, (1, 0)) + nums = attention_bias_idxs.shape[0] + for idx in range(nums): + gather = paddle.gather(attention_bias_t, attention_bias_idxs[idx]) + gather_list.append(gather) + shape0, shape1 = attention_bias_idxs.shape + gather = paddle.concat(gather_list) + return paddle.transpose(gather, (1, 0)).reshape((0, shape0, shape1)) + + +class Conv2d_BN(nn.Sequential): + + def __init__(self, a, b, ks=1, stride=1, pad=0, dilation=1, groups=1, bn_weight_init=1, resolution=-10000): + super().__init__() + self.add_sublayer('c', nn.Conv2D(a, b, ks, stride, pad, dilation, groups, bias_attr=False)) + bn = nn.BatchNorm2D(b) + ones_(bn.weight) + zeros_(bn.bias) + self.add_sublayer('bn', bn) + + +class Linear_BN(nn.Sequential): + + def __init__(self, a, b, bn_weight_init=1): + super().__init__() + self.add_sublayer('c', nn.Linear(a, b, bias_attr=False)) + bn = nn.BatchNorm1D(b) + if bn_weight_init == 0: + zeros_(bn.weight) + else: + ones_(bn.weight) + zeros_(bn.bias) + self.add_sublayer('bn', bn) + + def forward(self, x): + l, bn = self._sub_layers.values() + x = l(x) + return paddle.reshape(bn(x.flatten(0, 1)), x.shape) + + +class BN_Linear(nn.Sequential): + + def __init__(self, a, b, bias=True, std=0.02): + super().__init__() + self.add_sublayer('bn', nn.BatchNorm1D(a)) + l = nn.Linear(a, b, bias_attr=bias) + trunc_normal_(l.weight) + if bias: + zeros_(l.bias) + self.add_sublayer('l', l) + + +def b16(n, activation, resolution=224): + return nn.Sequential(Conv2d_BN(3, n // 8, 3, 2, 1, resolution=resolution), activation(), + Conv2d_BN(n // 8, n // 4, 3, 2, 1, resolution=resolution // 2), activation(), + Conv2d_BN(n // 4, n // 2, 3, 2, 1, resolution=resolution // 4), activation(), + Conv2d_BN(n // 2, n, 3, 2, 1, resolution=resolution // 8)) + + +class Residual(nn.Layer): + + def __init__(self, m, drop): + super().__init__() + self.m = m + self.drop = drop + + def forward(self, x): + if self.training and self.drop > 0: + y = paddle.rand(shape=[x.shape[0], 1, 1]).__ge__(self.drop).astype("float32") + y = y.divide(paddle.full_like(y, 1 - self.drop)) + return paddle.add(x, y) + else: + return paddle.add(x, self.m(x)) + + +class Attention(nn.Layer): + + def __init__(self, dim, key_dim, num_heads=8, attn_ratio=4, activation=None, resolution=14): + super().__init__() + self.num_heads = num_heads + self.scale = key_dim**-0.5 + self.key_dim = key_dim + self.nh_kd = nh_kd = key_dim * num_heads + self.d = int(attn_ratio * key_dim) + self.dh = int(attn_ratio * key_dim) * num_heads + self.attn_ratio = attn_ratio + self.h = self.dh + nh_kd * 2 + self.qkv = Linear_BN(dim, self.h) + self.proj = nn.Sequential(activation(), Linear_BN(self.dh, dim, bn_weight_init=0)) + points = list(itertools.product(range(resolution), range(resolution))) + N = len(points) + attention_offsets = {} + idxs = [] + for p1 in points: + for p2 in points: + offset = (abs(p1[0] - p2[0]), abs(p1[1] - p2[1])) + if offset not in attention_offsets: + attention_offsets[offset] = len(attention_offsets) + idxs.append(attention_offsets[offset]) + self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)), + default_initializer=zeros_, + attr=paddle.ParamAttr(regularizer=L2Decay(0.0))) + tensor_idxs = paddle.to_tensor(idxs, dtype='int64') + self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs, [N, N])) + + @paddle.no_grad() + def train(self, mode=True): + if mode: + super().train() + else: + super().eval() + if mode and hasattr(self, 'ab'): + del self.ab + else: + self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + + def forward(self, x): + self.training = True + B, N, C = x.shape + qkv = self.qkv(x) + qkv = paddle.reshape(qkv, [B, N, self.num_heads, self.h // self.num_heads]) + q, k, v = paddle.split(qkv, [self.key_dim, self.key_dim, self.d], axis=3) + q = paddle.transpose(q, perm=[0, 2, 1, 3]) + k = paddle.transpose(k, perm=[0, 2, 1, 3]) + v = paddle.transpose(v, perm=[0, 2, 1, 3]) + k_transpose = paddle.transpose(k, perm=[0, 1, 3, 2]) + + if self.training: + attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + else: + attention_biases = self.ab + attn = (paddle.matmul(q, k_transpose) * self.scale + attention_biases) + attn = F.softmax(attn) + x = paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3]) + x = paddle.reshape(x, [B, N, self.dh]) + x = self.proj(x) + return x + + +class Subsample(nn.Layer): + + def __init__(self, stride, resolution): + super().__init__() + self.stride = stride + self.resolution = resolution + + def forward(self, x): + B, N, C = x.shape + x = paddle.reshape(x, [B, self.resolution, self.resolution, C]) + end1, end2 = x.shape[1], x.shape[2] + x = x[:, 0:end1:self.stride, 0:end2:self.stride] + x = paddle.reshape(x, [B, -1, C]) + return x + + +class AttentionSubsample(nn.Layer): + + def __init__(self, + in_dim, + out_dim, + key_dim, + num_heads=8, + attn_ratio=2, + activation=None, + stride=2, + resolution=14, + resolution_=7): + super().__init__() + self.num_heads = num_heads + self.scale = key_dim**-0.5 + self.key_dim = key_dim + self.nh_kd = nh_kd = key_dim * num_heads + self.d = int(attn_ratio * key_dim) + self.dh = int(attn_ratio * key_dim) * self.num_heads + self.attn_ratio = attn_ratio + self.resolution_ = resolution_ + self.resolution_2 = resolution_**2 + self.training = True + h = self.dh + nh_kd + self.kv = Linear_BN(in_dim, h) + + self.q = nn.Sequential(Subsample(stride, resolution), Linear_BN(in_dim, nh_kd)) + self.proj = nn.Sequential(activation(), Linear_BN(self.dh, out_dim)) + + self.stride = stride + self.resolution = resolution + points = list(itertools.product(range(resolution), range(resolution))) + points_ = list(itertools.product(range(resolution_), range(resolution_))) + + N = len(points) + N_ = len(points_) + attention_offsets = {} + idxs = [] + i = 0 + j = 0 + for p1 in points_: + i += 1 + for p2 in points: + j += 1 + size = 1 + offset = (abs(p1[0] * stride - p2[0] + (size - 1) / 2), abs(p1[1] * stride - p2[1] + (size - 1) / 2)) + if offset not in attention_offsets: + attention_offsets[offset] = len(attention_offsets) + idxs.append(attention_offsets[offset]) + self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)), + default_initializer=zeros_, + attr=paddle.ParamAttr(regularizer=L2Decay(0.0))) + + tensor_idxs_ = paddle.to_tensor(idxs, dtype='int64') + self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs_, [N_, N])) + + @paddle.no_grad() + def train(self, mode=True): + if mode: + super().train() + else: + super().eval() + if mode and hasattr(self, 'ab'): + del self.ab + else: + self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + + def forward(self, x): + self.training = True + B, N, C = x.shape + kv = self.kv(x) + kv = paddle.reshape(kv, [B, N, self.num_heads, -1]) + k, v = paddle.split(kv, [self.key_dim, self.d], axis=3) + k = paddle.transpose(k, perm=[0, 2, 1, 3]) # BHNC + v = paddle.transpose(v, perm=[0, 2, 1, 3]) + q = paddle.reshape(self.q(x), [B, self.resolution_2, self.num_heads, self.key_dim]) + q = paddle.transpose(q, perm=[0, 2, 1, 3]) + + if self.training: + attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs) + else: + attention_biases = self.ab + + attn = (paddle.matmul(q, paddle.transpose(k, perm=[0, 1, 3, 2]))) * self.scale + attention_biases + attn = F.softmax(attn) + + x = paddle.reshape(paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3]), [B, -1, self.dh]) + x = self.proj(x) + return x + + +class LeViT(nn.Layer): + """ Vision Transformer with support for patch or hybrid CNN input stage + """ + + def __init__(self, + img_size=224, + patch_size=16, + in_chans=3, + class_num=1000, + embed_dim=[192], + key_dim=[64], + depth=[12], + num_heads=[3], + attn_ratio=[2], + mlp_ratio=[2], + hybrid_backbone=None, + down_ops=[], + attention_activation=nn.Hardswish, + mlp_activation=nn.Hardswish, + distillation=True, + drop_path=0): + super().__init__() + + self.class_num = class_num + self.num_features = embed_dim[-1] + self.embed_dim = embed_dim + self.distillation = distillation + + self.patch_embed = hybrid_backbone + + self.blocks = [] + down_ops.append(['']) + resolution = img_size // patch_size + for i, (ed, kd, dpth, nh, ar, mr, + do) in enumerate(zip(embed_dim, key_dim, depth, num_heads, attn_ratio, mlp_ratio, down_ops)): + for _ in range(dpth): + self.blocks.append( + Residual( + Attention( + ed, + kd, + nh, + attn_ratio=ar, + activation=attention_activation, + resolution=resolution, + ), drop_path)) + if mr > 0: + h = int(ed * mr) + self.blocks.append( + Residual( + nn.Sequential( + Linear_BN(ed, h), + mlp_activation(), + Linear_BN(h, ed, bn_weight_init=0), + ), drop_path)) + if do[0] == 'Subsample': + #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride) + resolution_ = (resolution - 1) // do[5] + 1 + self.blocks.append( + AttentionSubsample(*embed_dim[i:i + 2], + key_dim=do[1], + num_heads=do[2], + attn_ratio=do[3], + activation=attention_activation, + stride=do[5], + resolution=resolution, + resolution_=resolution_)) + resolution = resolution_ + if do[4] > 0: # mlp_ratio + h = int(embed_dim[i + 1] * do[4]) + self.blocks.append( + Residual( + nn.Sequential( + Linear_BN(embed_dim[i + 1], h), + mlp_activation(), + Linear_BN(h, embed_dim[i + 1], bn_weight_init=0), + ), drop_path)) + self.blocks = nn.Sequential(*self.blocks) + + # Classifier head + self.head = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity() + if distillation: + self.head_dist = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity() + + def forward(self, x): + x = self.patch_embed(x) + x = x.flatten(2) + x = paddle.transpose(x, perm=[0, 2, 1]) + x = self.blocks(x) + x = x.mean(1) + + x = paddle.reshape(x, [-1, self.embed_dim[-1]]) + if self.distillation: + x = self.head(x), self.head_dist(x) + if not self.training: + x = (x[0] + x[1]) / 2 + else: + x = self.head(x) + return x + + +def model_factory(C, D, X, N, drop_path, class_num, distillation): + embed_dim = [int(x) for x in C.split('_')] + num_heads = [int(x) for x in N.split('_')] + depth = [int(x) for x in X.split('_')] + act = nn.Hardswish + model = LeViT( + patch_size=16, + embed_dim=embed_dim, + num_heads=num_heads, + key_dim=[D] * 3, + depth=depth, + attn_ratio=[2, 2, 2], + mlp_ratio=[2, 2, 2], + down_ops=[ + #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride) + ['Subsample', D, embed_dim[0] // D, 4, 2, 2], + ['Subsample', D, embed_dim[1] // D, 4, 2, 2], + ], + attention_activation=act, + mlp_activation=act, + hybrid_backbone=b16(embed_dim[0], activation=act), + class_num=class_num, + drop_path=drop_path, + distillation=distillation) + + return model + + +specification = { + 'LeViT_128S': { + 'C': '128_256_384', + 'D': 16, + 'N': '4_6_8', + 'X': '2_3_4', + 'drop_path': 0 + }, + 'LeViT_128': { + 'C': '128_256_384', + 'D': 16, + 'N': '4_8_12', + 'X': '4_4_4', + 'drop_path': 0 + }, + 'LeViT_192': { + 'C': '192_288_384', + 'D': 32, + 'N': '3_5_6', + 'X': '4_4_4', + 'drop_path': 0 + }, + 'LeViT_256': { + 'C': '256_384_512', + 'D': 32, + 'N': '4_6_8', + 'X': '4_4_4', + 'drop_path': 0 + }, + 'LeViT_384': { + 'C': '384_512_768', + 'D': 32, + 'N': '6_9_12', + 'X': '4_4_4', + 'drop_path': 0.1 + }, +} + + +def LeViT_384(**kwargs): + model = model_factory(**specification['LeViT_384'], class_num=1000, distillation=False) + return model diff --git a/modules/image/classification/levit_384_imagenet/module.py b/modules/image/classification/levit_384_imagenet/module.py new file mode 100644 index 0000000000000000000000000000000000000000..790a66d5fe19ef0554e0dfddc4b143f89af22085 --- /dev/null +++ b/modules/image/classification/levit_384_imagenet/module.py @@ -0,0 +1,154 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import cv2 +import numpy as np +import paddle +from skimage.io import imread +from skimage.transform import rescale +from skimage.transform import resize + +import paddlehub as hub +from .model import LeViT_384 +from .processor import base64_to_cv2 +from .processor import create_operators +from .processor import Topk +from .utils import get_config +from paddlehub.module.module import moduleinfo +from paddlehub.module.module import runnable +from paddlehub.module.module import serving + + +@moduleinfo(name="levit_384_imagenet", + type="cv/classification", + author="paddlepaddle", + author_email="", + summary="", + version="1.0.0") +class LeViT_384_ImageNet: + + def __init__(self): + self.config = get_config(os.path.join(self.directory, 'LeViT_384.yaml'), show=False) + self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt') + self.pretrain_path = os.path.join(self.directory, 'LeViT_384_pretrained.pdparams') + self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path + self.model = LeViT_384() + param_state_dict = paddle.load(self.pretrain_path) + self.model.set_dict(param_state_dict) + self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"]) + + def classification(self, + images: list = None, + paths: list = None, + batch_size: int = 1, + use_gpu: bool = False, + top_k: int = 1): + ''' + Args: + images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR. + paths (list[str]): The paths of images. + batch_size (int): batch size. + use_gpu (bool): Whether to use gpu. + top_k (int): Return top k results. + + Returns: + res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'. + ''' + postprocess_func = Topk(top_k, self.label_path) + inputs = [] + results = [] + paddle.disable_static() + place = 'gpu:0' if use_gpu else 'cpu' + place = paddle.set_device(place) + if images == None and paths == None: + print('No image provided. Please input an image or a image path.') + return + + if images != None: + for image in images: + image = image[:, :, ::-1] + inputs.append(image) + + if paths != None: + for path in paths: + image = cv2.imread(path)[:, :, ::-1] + inputs.append(image) + + batch_data = [] + for idx, imagedata in enumerate(inputs): + for process in self.preprocess_funcs: + imagedata = process(imagedata) + batch_data.append(imagedata) + if len(batch_data) >= batch_size or idx == len(inputs) - 1: + batch_tensor = paddle.to_tensor(batch_data) + out = self.model(batch_tensor) + if isinstance(out, list): + out = out[0] + if isinstance(out, dict) and "logits" in out: + out = out["logits"] + if isinstance(out, dict) and "output" in out: + out = out["output"] + result = postprocess_func(out) + results.extend(result) + batch_data.clear() + return results + + @runnable + def run_cmd(self, argvs: list): + """ + Run as a command. + """ + self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name), + prog='hub run {}'.format(self.name), + usage='%(prog)s', + add_help=True) + + self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required") + self.arg_config_group = self.parser.add_argument_group( + title="Config options", description="Run configuration for controlling module behavior, not required.") + self.add_module_config_arg() + self.add_module_input_arg() + self.args = self.parser.parse_args(argvs) + results = self.classification(paths=[self.args.input_path], + use_gpu=self.args.use_gpu, + batch_size=self.args.batch_size, + top_k=self.args.top_k) + return results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.classification(images=images_decode, **kwargs) + return results + + def add_module_config_arg(self): + """ + Add the command config options. + """ + self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not") + + self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size') + self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.') + + def add_module_input_arg(self): + """ + Add the command input options. + """ + self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.") diff --git a/modules/image/classification/levit_384_imagenet/processor.py b/modules/image/classification/levit_384_imagenet/processor.py new file mode 100644 index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8 --- /dev/null +++ b/modules/image/classification/levit_384_imagenet/processor.py @@ -0,0 +1,374 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import base64 +import inspect +import math +import os +import random +import sys +from functools import partial + +import cv2 +import numpy as np +import paddle +import paddle.nn.functional as F +import six +from paddle.vision.transforms import ColorJitter as RawColorJitter +from PIL import Image + + +def create_operators(params, class_num=None): + """ + create operators based on the config + + Args: + params(list): a dict list, used to create some operators + """ + assert isinstance(params, list), ('operator config should be a list') + ops = [] + current_module = sys.modules[__name__] + for operator in params: + assert isinstance(operator, dict) and len(operator) == 1, "yaml format error" + op_name = list(operator)[0] + param = {} if operator[op_name] is None else operator[op_name] + op_func = getattr(current_module, op_name) + if "class_num" in inspect.getfullargspec(op_func).args: + param.update({"class_num": class_num}) + op = op_func(**param) + ops.append(op) + + return ops + + +class UnifiedResize(object): + + def __init__(self, interpolation=None, backend="cv2"): + _cv2_interp_from_str = { + 'nearest': cv2.INTER_NEAREST, + 'bilinear': cv2.INTER_LINEAR, + 'area': cv2.INTER_AREA, + 'bicubic': cv2.INTER_CUBIC, + 'lanczos': cv2.INTER_LANCZOS4 + } + _pil_interp_from_str = { + 'nearest': Image.NEAREST, + 'bilinear': Image.BILINEAR, + 'bicubic': Image.BICUBIC, + 'box': Image.BOX, + 'lanczos': Image.LANCZOS, + 'hamming': Image.HAMMING + } + + def _pil_resize(src, size, resample): + pil_img = Image.fromarray(src) + pil_img = pil_img.resize(size, resample) + return np.asarray(pil_img) + + if backend.lower() == "cv2": + if isinstance(interpolation, str): + interpolation = _cv2_interp_from_str[interpolation.lower()] + # compatible with opencv < version 4.4.0 + elif interpolation is None: + interpolation = cv2.INTER_LINEAR + self.resize_func = partial(cv2.resize, interpolation=interpolation) + elif backend.lower() == "pil": + if isinstance(interpolation, str): + interpolation = _pil_interp_from_str[interpolation.lower()] + self.resize_func = partial(_pil_resize, resample=interpolation) + else: + self.resize_func = cv2.resize + + def __call__(self, src, size): + return self.resize_func(src, size) + + +class OperatorParamError(ValueError): + """ OperatorParamError + """ + pass + + +class DecodeImage(object): + """ decode image """ + + def __init__(self, to_rgb=True, to_np=False, channel_first=False): + self.to_rgb = to_rgb + self.to_np = to_np # to numpy + self.channel_first = channel_first # only enabled when to_np is True + + def __call__(self, img): + if six.PY2: + assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage" + else: + assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage" + data = np.frombuffer(img, dtype='uint8') + img = cv2.imdecode(data, 1) + if self.to_rgb: + assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape) + img = img[:, :, ::-1] + + if self.channel_first: + img = img.transpose((2, 0, 1)) + + return img + + +class ResizeImage(object): + """ resize image """ + + def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"): + if resize_short is not None and resize_short > 0: + self.resize_short = resize_short + self.w = None + self.h = None + elif size is not None: + self.resize_short = None + self.w = size if type(size) is int else size[0] + self.h = size if type(size) is int else size[1] + else: + raise OperatorParamError("invalid params for ReisizeImage for '\ + 'both 'size' and 'resize_short' are None") + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + img_h, img_w = img.shape[:2] + if self.resize_short is not None: + percent = float(self.resize_short) / min(img_w, img_h) + w = int(round(img_w * percent)) + h = int(round(img_h * percent)) + else: + w = self.w + h = self.h + return self._resize_func(img, (w, h)) + + +class CropImage(object): + """ crop image """ + + def __init__(self, size): + if type(size) is int: + self.size = (size, size) + else: + self.size = size # (h, w) + + def __call__(self, img): + w, h = self.size + img_h, img_w = img.shape[:2] + w_start = (img_w - w) // 2 + h_start = (img_h - h) // 2 + + w_end = w_start + w + h_end = h_start + h + return img[h_start:h_end, w_start:w_end, :] + + +class RandCropImage(object): + """ random crop image """ + + def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"): + if type(size) is int: + self.size = (size, size) # (h, w) + else: + self.size = size + + self.scale = [0.08, 1.0] if scale is None else scale + self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + size = self.size + scale = self.scale + ratio = self.ratio + + aspect_ratio = math.sqrt(random.uniform(*ratio)) + w = 1. * aspect_ratio + h = 1. / aspect_ratio + + img_h, img_w = img.shape[:2] + + bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2)) + scale_max = min(scale[1], bound) + scale_min = min(scale[0], bound) + + target_area = img_w * img_h * random.uniform(scale_min, scale_max) + target_size = math.sqrt(target_area) + w = int(target_size * w) + h = int(target_size * h) + + i = random.randint(0, img_w - w) + j = random.randint(0, img_h - h) + + img = img[j:j + h, i:i + w, :] + + return self._resize_func(img, size) + + +class RandFlipImage(object): + """ random flip image + flip_code: + 1: Flipped Horizontally + 0: Flipped Vertically + -1: Flipped Horizontally & Vertically + """ + + def __init__(self, flip_code=1): + assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]" + self.flip_code = flip_code + + def __call__(self, img): + if random.randint(0, 1) == 1: + return cv2.flip(img, self.flip_code) + else: + return img + + +class NormalizeImage(object): + """ normalize image such as substract mean, divide std + """ + + def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3): + if isinstance(scale, str): + scale = eval(scale) + assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4." + self.channel_num = channel_num + self.output_dtype = 'float16' if output_fp16 else 'float32' + self.scale = np.float32(scale if scale is not None else 1.0 / 255.0) + self.order = order + mean = mean if mean is not None else [0.485, 0.456, 0.406] + std = std if std is not None else [0.229, 0.224, 0.225] + + shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3) + self.mean = np.array(mean).reshape(shape).astype('float32') + self.std = np.array(std).reshape(shape).astype('float32') + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage" + + img = (img.astype('float32') * self.scale - self.mean) / self.std + + if self.channel_num == 4: + img_h = img.shape[1] if self.order == 'chw' else img.shape[0] + img_w = img.shape[2] if self.order == 'chw' else img.shape[1] + pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1)) + img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate( + (img, pad_zeros), axis=2)) + return img.astype(self.output_dtype) + + +class ToCHWImage(object): + """ convert hwc image to chw image + """ + + def __init__(self): + pass + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + return img.transpose((2, 0, 1)) + + +class ColorJitter(RawColorJitter): + """ColorJitter. + """ + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + def __call__(self, img): + if not isinstance(img, Image.Image): + img = np.ascontiguousarray(img) + img = Image.fromarray(img) + img = super()._apply_image(img) + if isinstance(img, Image.Image): + img = np.asarray(img) + return img + + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + +class Topk(object): + + def __init__(self, topk=1, class_id_map_file=None): + assert isinstance(topk, (int, )) + self.class_id_map = self.parse_class_id_map(class_id_map_file) + self.topk = topk + + def parse_class_id_map(self, class_id_map_file): + if class_id_map_file is None: + return None + if not os.path.exists(class_id_map_file): + print( + "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!" + ) + return None + + try: + class_id_map = {} + with open(class_id_map_file, "r") as fin: + lines = fin.readlines() + for line in lines: + partition = line.split("\n")[0].partition(" ") + class_id_map[int(partition[0])] = str(partition[-1]) + except Exception as ex: + print(ex) + class_id_map = None + return class_id_map + + def __call__(self, x, file_names=None, multilabel=False): + assert isinstance(x, paddle.Tensor) + if file_names is not None: + assert x.shape[0] == len(file_names) + x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x) + x = x.numpy() + y = [] + for idx, probs in enumerate(x): + index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where( + probs >= 0.5)[0].astype("int32") + clas_id_list = [] + score_list = [] + label_name_list = [] + for i in index: + clas_id_list.append(i.item()) + score_list.append(probs[i].item()) + if self.class_id_map is not None: + label_name_list.append(self.class_id_map[i.item()]) + result = { + "class_ids": clas_id_list, + "scores": np.around(score_list, decimals=5).tolist(), + } + if file_names is not None: + result["file_name"] = file_names[idx] + if label_name_list is not None: + result["label_names"] = label_name_list + y.append(result) + return y diff --git a/modules/image/classification/levit_384_imagenet/utils.py b/modules/image/classification/levit_384_imagenet/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537 --- /dev/null +++ b/modules/image/classification/levit_384_imagenet/utils.py @@ -0,0 +1,129 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import yaml + +__all__ = ['get_config'] + + +class AttrDict(dict): + + def __getattr__(self, key): + return self[key] + + def __setattr__(self, key, value): + if key in self.__dict__: + self.__dict__[key] = value + else: + self[key] = value + + def __deepcopy__(self, content): + return copy.deepcopy(dict(self)) + + +def create_attr_dict(yaml_config): + from ast import literal_eval + for key, value in yaml_config.items(): + if type(value) is dict: + yaml_config[key] = value = AttrDict(value) + if isinstance(value, str): + try: + value = literal_eval(value) + except BaseException: + pass + if isinstance(value, AttrDict): + create_attr_dict(yaml_config[key]) + else: + yaml_config[key] = value + + +def parse_config(cfg_file): + """Load a config file into AttrDict""" + with open(cfg_file, 'r') as fopen: + yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader)) + create_attr_dict(yaml_config) + return yaml_config + + +def override(dl, ks, v): + """ + Recursively replace dict of list + Args: + dl(dict or list): dict or list to be replaced + ks(list): list of keys + v(str): value to be replaced + """ + + def str2num(v): + try: + return eval(v) + except Exception: + return v + + assert isinstance(dl, (list, dict)), ("{} should be a list or a dict") + assert len(ks) > 0, ('lenght of keys should larger than 0') + if isinstance(dl, list): + k = str2num(ks[0]) + if len(ks) == 1: + assert k < len(dl), ('index({}) out of range({})'.format(k, dl)) + dl[k] = str2num(v) + else: + override(dl[k], ks[1:], v) + else: + if len(ks) == 1: + # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl)) + if not ks[0] in dl: + print('A new filed ({}) detected!'.format(ks[0], dl)) + dl[ks[0]] = str2num(v) + else: + override(dl[ks[0]], ks[1:], v) + + +def override_config(config, options=None): + """ + Recursively override the config + Args: + config(dict): dict to be replaced + options(list): list of pairs(key0.key1.idx.key2=value) + such as: [ + 'topk=2', + 'VALID.transforms.1.ResizeImage.resize_short=300' + ] + Returns: + config(dict): replaced config + """ + if options is not None: + for opt in options: + assert isinstance(opt, str), ("option({}) should be a str".format(opt)) + assert "=" in opt, ("option({}) should contain a =" + "to distinguish between key and value".format(opt)) + pair = opt.split('=') + assert len(pair) == 2, ("there can be only a = in the option") + key, value = pair + keys = key.split('.') + override(config, keys, value) + return config + + +def get_config(fname, overrides=None, show=False): + """ + Read config from file + """ + assert os.path.exists(fname), ('config file({}) is not exist'.format(fname)) + config = parse_config(fname) + override_config(config, overrides) + return config diff --git a/modules/image/classification/pplcnet_x0_25_imagenet/README.md b/modules/image/classification/pplcnet_x0_25_imagenet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..1444f3e4776bd2518afee67726bd484eb37c9ab1 --- /dev/null +++ b/modules/image/classification/pplcnet_x0_25_imagenet/README.md @@ -0,0 +1,132 @@ +# pplcnet_x0_25_imagenet + +|模型名称|pplcnet_x0_25_imagenet| +| :--- | :---: | +|类别|图像-图像分类| +|网络|PPLCNet| +|数据集|ImageNet-2012| +|是否支持Fine-tuning|否| +|模型大小|5 MB| +|最新更新日期|2022-04-02| +|数据指标|Acc| + + +## 一、模型基本信息 + + + +- ### 模型介绍 + + - PP-LCNet是百度针对Intel CPU 设备以及其加速库 MKLDNN 设计的特定骨干网络 ,比起其他的轻量级的 SOTA 模型,该骨干网络可以在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型。该模型为模型规模参数scale为x0.25下的PP-LCNet模型,关于模型结构的更多信息,可参考[论文](https://arxiv.org/pdf/2109.15099.pdf)。 + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 1.6.2 + + - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + +- ### 2、安装 + + - ```shell + $ hub install pplcnet_x0_25_imagenet + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run pplcnet_x0_25_imagenet --input_path "/PATH/TO/IMAGE" + ``` + - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、预测代码示例 + + - ```python + import paddlehub as hub + import cv2 + + classifier = hub.Module(name="pplcnet_x0_25_imagenet") + result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')]) + # or + # result = classifier.classification(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + + - ```python + def classification(images=None, + paths=None, + batch_size=1, + use_gpu=False, + top_k=1): + ``` + - 分类接口API。 + - **参数** + + - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。 + + - **返回** + + - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称) + + +## 四、服务部署 + +- PaddleHub Serving可以部署一个图像识别的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m pplcnet_x0_25_imagenet + ``` + + - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"\} + url = "http://127.0.0.1:8866/predict/pplcnet_x0_25_imagenet" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` + + +## 五、更新历史 + +* 1.0.0 + + 初始发布 + + - ```shell + $ hub install pplcnet_x0_25_imagenet==1.0.0 + ``` diff --git a/modules/image/classification/pplcnet_x0_25_imagenet/model.py b/modules/image/classification/pplcnet_x0_25_imagenet/model.py new file mode 100644 index 0000000000000000000000000000000000000000..071131b1b54563ae04a37a9a47df6cf678f8f7e3 --- /dev/null +++ b/modules/image/classification/pplcnet_x0_25_imagenet/model.py @@ -0,0 +1,478 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from typing import Any +from typing import Callable +from typing import Dict +from typing import List +from typing import Tuple +from typing import Union + +import paddle +import paddle.nn as nn +from paddle import ParamAttr +from paddle.nn import AdaptiveAvgPool2D +from paddle.nn import BatchNorm +from paddle.nn import Conv2D +from paddle.nn import Dropout +from paddle.nn import Linear +from paddle.nn.initializer import KaimingNormal +from paddle.regularizer import L2Decay + + +class Identity(nn.Layer): + + def __init__(self): + super(Identity, self).__init__() + + def forward(self, inputs): + return inputs + + +class TheseusLayer(nn.Layer): + + def __init__(self, *args, **kwargs): + super(TheseusLayer, self).__init__() + self.res_dict = {} + self.res_name = self.full_name() + self.pruner = None + self.quanter = None + + def _return_dict_hook(self, layer, input, output): + res_dict = {"output": output} + # 'list' is needed to avoid error raised by popping self.res_dict + for res_key in list(self.res_dict): + # clear the res_dict because the forward process may change according to input + res_dict[res_key] = self.res_dict.pop(res_key) + return res_dict + + def init_res(self, stages_pattern, return_patterns=None, return_stages=None): + if return_patterns and return_stages: + msg = f"The 'return_patterns' would be ignored when 'return_stages' is set." + return_stages = None + + if return_stages is True: + return_patterns = stages_pattern + # return_stages is int or bool + if type(return_stages) is int: + return_stages = [return_stages] + if isinstance(return_stages, list): + if max(return_stages) > len(stages_pattern) or min(return_stages) < 0: + msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}." + return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)] + return_patterns = [stages_pattern[i] for i in return_stages] + + if return_patterns: + self.update_res(return_patterns) + + def replace_sub(self, *args, **kwargs) -> None: + msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead." + raise DeprecationWarning(msg) + + def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]], + handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]: + """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'. + + Args: + layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'. + handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed. + + Returns: + Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'. + + Examples: + + from paddle import nn + import paddleclas + + def rep_func(layer: nn.Layer, pattern: str): + new_layer = nn.Conv2D( + in_channels=layer._in_channels, + out_channels=layer._out_channels, + kernel_size=5, + padding=2 + ) + return new_layer + + net = paddleclas.MobileNetV1() + res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func) + print(res) + # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer} + """ + + if not isinstance(layer_name_pattern, list): + layer_name_pattern = [layer_name_pattern] + + hit_layer_pattern_list = [] + for pattern in layer_name_pattern: + # parse pattern to find target layer and its parent + layer_list = parse_pattern_str(pattern=pattern, parent_layer=self) + if not layer_list: + continue + sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self + + sub_layer = layer_list[-1]["layer"] + sub_layer_name = layer_list[-1]["name"] + sub_layer_index = layer_list[-1]["index"] + + new_sub_layer = handle_func(sub_layer, pattern) + + if sub_layer_index: + getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer + else: + setattr(sub_layer_parent, sub_layer_name, new_sub_layer) + + hit_layer_pattern_list.append(pattern) + return hit_layer_pattern_list + + def stop_after(self, stop_layer_name: str) -> bool: + """stop forward and backward after 'stop_layer_name'. + + Args: + stop_layer_name (str): The name of layer that stop forward and backward after this layer. + + Returns: + bool: 'True' if successful, 'False' otherwise. + """ + + layer_list = parse_pattern_str(stop_layer_name, self) + if not layer_list: + return False + + parent_layer = self + for layer_dict in layer_list: + name, index = layer_dict["name"], layer_dict["index"] + if not set_identity(parent_layer, name, index): + msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'." + return False + parent_layer = layer_dict["layer"] + + return True + + def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]: + """update the result(s) to be returned. + + Args: + return_patterns (Union[str, List[str]]): The name of layer to return output. + + Returns: + Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully. + """ + + # clear res_dict that could have been set + self.res_dict = {} + + class Handler(object): + + def __init__(self, res_dict): + # res_dict is a reference + self.res_dict = res_dict + + def __call__(self, layer, pattern): + layer.res_dict = self.res_dict + layer.res_name = pattern + if hasattr(layer, "hook_remove_helper"): + layer.hook_remove_helper.remove() + layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook) + return layer + + handle_func = Handler(self.res_dict) + + hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func) + + if hasattr(self, "hook_remove_helper"): + self.hook_remove_helper.remove() + self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook) + + return hit_layer_pattern_list + + +def save_sub_res_hook(layer, input, output): + layer.res_dict[layer.res_name] = output + + +def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool: + """set the layer specified by layer_name and layer_index to Indentity. + + Args: + parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index. + layer_name (str): The name of target layer to be set to Indentity. + layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None. + + Returns: + bool: True if successfully, False otherwise. + """ + + stop_after = False + for sub_layer_name in parent_layer._sub_layers: + if stop_after: + parent_layer._sub_layers[sub_layer_name] = Identity() + continue + if sub_layer_name == layer_name: + stop_after = True + + if layer_index and stop_after: + stop_after = False + for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers: + if stop_after: + parent_layer._sub_layers[layer_name][sub_layer_index] = Identity() + continue + if layer_index == sub_layer_index: + stop_after = True + + return stop_after + + +def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: + """parse the string type pattern. + + Args: + pattern (str): The pattern to discribe layer. + parent_layer (nn.Layer): The root layer relative to the pattern. + + Returns: + Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order: + [ + {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist}, + {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist}, + ... + ] + """ + + pattern_list = pattern.split(".") + if not pattern_list: + msg = f"The pattern('{pattern}') is illegal. Please check and retry." + return None + + layer_list = [] + while len(pattern_list) > 0: + if '[' in pattern_list[0]: + target_layer_name = pattern_list[0].split('[')[0] + target_layer_index = pattern_list[0].split('[')[1].split(']')[0] + else: + target_layer_name = pattern_list[0] + target_layer_index = None + + target_layer = getattr(parent_layer, target_layer_name, None) + + if target_layer is None: + msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')." + return None + + if target_layer_index and target_layer: + if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer): + msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0." + return None + + target_layer = target_layer[target_layer_index] + + layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index}) + + pattern_list = pattern_list[1:] + parent_layer = target_layer + return layer_list + + +MODEL_STAGES_PATTERN = {"PPLCNet": ["blocks2", "blocks3", "blocks4", "blocks5", "blocks6"]} + +# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se. +# k: kernel_size +# in_c: input channel number in depthwise block +# out_c: output channel number in depthwise block +# s: stride in depthwise block +# use_se: whether to use SE block + +NET_CONFIG = { + "blocks2": + #k, in_c, out_c, s, use_se + [[3, 16, 32, 1, False]], + "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]], + "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]], + "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], + [5, 256, 256, 1, False], [5, 256, 256, 1, False]], + "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]] +} + + +def make_divisible(v, divisor=8, min_value=None): + if min_value is None: + min_value = divisor + new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) + if new_v < 0.9 * v: + new_v += divisor + return new_v + + +class ConvBNLayer(TheseusLayer): + + def __init__(self, num_channels, filter_size, num_filters, stride, num_groups=1): + super().__init__() + + self.conv = Conv2D(in_channels=num_channels, + out_channels=num_filters, + kernel_size=filter_size, + stride=stride, + padding=(filter_size - 1) // 2, + groups=num_groups, + weight_attr=ParamAttr(initializer=KaimingNormal()), + bias_attr=False) + + self.bn = BatchNorm(num_filters, + param_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + self.hardswish = nn.Hardswish() + + def forward(self, x): + x = self.conv(x) + x = self.bn(x) + x = self.hardswish(x) + return x + + +class DepthwiseSeparable(TheseusLayer): + + def __init__(self, num_channels, num_filters, stride, dw_size=3, use_se=False): + super().__init__() + self.use_se = use_se + self.dw_conv = ConvBNLayer(num_channels=num_channels, + num_filters=num_channels, + filter_size=dw_size, + stride=stride, + num_groups=num_channels) + if use_se: + self.se = SEModule(num_channels) + self.pw_conv = ConvBNLayer(num_channels=num_channels, filter_size=1, num_filters=num_filters, stride=1) + + def forward(self, x): + x = self.dw_conv(x) + if self.use_se: + x = self.se(x) + x = self.pw_conv(x) + return x + + +class SEModule(TheseusLayer): + + def __init__(self, channel, reduction=4): + super().__init__() + self.avg_pool = AdaptiveAvgPool2D(1) + self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0) + self.relu = nn.ReLU() + self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0) + self.hardsigmoid = nn.Hardsigmoid() + + def forward(self, x): + identity = x + x = self.avg_pool(x) + x = self.conv1(x) + x = self.relu(x) + x = self.conv2(x) + x = self.hardsigmoid(x) + x = paddle.multiply(x=identity, y=x) + return x + + +class PPLCNet(TheseusLayer): + + def __init__(self, + stages_pattern, + scale=1.0, + class_num=1000, + dropout_prob=0.2, + class_expand=1280, + return_patterns=None, + return_stages=None): + super().__init__() + self.scale = scale + self.class_expand = class_expand + + self.conv1 = ConvBNLayer(num_channels=3, filter_size=3, num_filters=make_divisible(16 * scale), stride=2) + + self.blocks2 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"]) + ]) + + self.blocks3 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"]) + ]) + + self.blocks4 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"]) + ]) + + self.blocks5 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"]) + ]) + + self.blocks6 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"]) + ]) + + self.avg_pool = AdaptiveAvgPool2D(1) + + self.last_conv = Conv2D(in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale), + out_channels=self.class_expand, + kernel_size=1, + stride=1, + padding=0, + bias_attr=False) + + self.hardswish = nn.Hardswish() + self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer") + self.flatten = nn.Flatten(start_axis=1, stop_axis=-1) + + self.fc = Linear(self.class_expand, class_num) + + super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages) + + def forward(self, x): + x = self.conv1(x) + + x = self.blocks2(x) + x = self.blocks3(x) + x = self.blocks4(x) + x = self.blocks5(x) + x = self.blocks6(x) + + x = self.avg_pool(x) + x = self.last_conv(x) + x = self.hardswish(x) + x = self.dropout(x) + x = self.flatten(x) + x = self.fc(x) + return x + + +def PPLCNet_x0_25(**kwargs): + model = PPLCNet(scale=0.25, stages_pattern=MODEL_STAGES_PATTERN["PPLCNet"], **kwargs) + return model diff --git a/modules/image/classification/pplcnet_x0_25_imagenet/module.py b/modules/image/classification/pplcnet_x0_25_imagenet/module.py new file mode 100644 index 0000000000000000000000000000000000000000..a4f9878636cce57503a6aa0db115122d958155f7 --- /dev/null +++ b/modules/image/classification/pplcnet_x0_25_imagenet/module.py @@ -0,0 +1,154 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import cv2 +import numpy as np +import paddle +from skimage.io import imread +from skimage.transform import rescale +from skimage.transform import resize + +import paddlehub as hub +from .model import PPLCNet_x0_25 +from .processor import base64_to_cv2 +from .processor import create_operators +from .processor import Topk +from .utils import get_config +from paddlehub.module.module import moduleinfo +from paddlehub.module.module import runnable +from paddlehub.module.module import serving + + +@moduleinfo(name="pplcnet_x0_25_imagenet", + type="cv/classification", + author="paddlepaddle", + author_email="", + summary="", + version="1.0.0") +class PPLcNet_x0_5: + + def __init__(self): + self.config = get_config(os.path.join(self.directory, 'PPLCNet_x0_25.yaml'), show=False) + self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt') + self.pretrain_path = os.path.join(self.directory, 'PPLCNet_x0_25_pretrained.pdparams') + self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path + self.model = PPLCNet_x0_25() + param_state_dict = paddle.load(self.pretrain_path) + self.model.set_dict(param_state_dict) + self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"]) + + def classification(self, + images: list = None, + paths: list = None, + batch_size: int = 1, + use_gpu: bool = False, + top_k: int = 1): + ''' + Args: + images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR. + paths (list[str]): The paths of images. + batch_size (int): batch size. + use_gpu (bool): Whether to use gpu. + top_k (int): Return top k results. + + Returns: + res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'. + ''' + postprocess_func = Topk(top_k, self.label_path) + inputs = [] + results = [] + paddle.disable_static() + place = 'gpu:0' if use_gpu else 'cpu' + place = paddle.set_device(place) + if images == None and paths == None: + print('No image provided. Please input an image or a image path.') + return + + if images != None: + for image in images: + image = image[:, :, ::-1] + inputs.append(image) + + if paths != None: + for path in paths: + image = cv2.imread(path)[:, :, ::-1] + inputs.append(image) + + batch_data = [] + for idx, imagedata in enumerate(inputs): + for process in self.preprocess_funcs: + imagedata = process(imagedata) + batch_data.append(imagedata) + if len(batch_data) >= batch_size or idx == len(inputs) - 1: + batch_tensor = paddle.to_tensor(batch_data) + out = self.model(batch_tensor) + if isinstance(out, list): + out = out[0] + if isinstance(out, dict) and "logits" in out: + out = out["logits"] + if isinstance(out, dict) and "output" in out: + out = out["output"] + result = postprocess_func(out) + results.extend(result) + batch_data.clear() + return results + + @runnable + def run_cmd(self, argvs: list): + """ + Run as a command. + """ + self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name), + prog='hub run {}'.format(self.name), + usage='%(prog)s', + add_help=True) + + self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required") + self.arg_config_group = self.parser.add_argument_group( + title="Config options", description="Run configuration for controlling module behavior, not required.") + self.add_module_config_arg() + self.add_module_input_arg() + self.args = self.parser.parse_args(argvs) + results = self.classification(paths=[self.args.input_path], + use_gpu=self.args.use_gpu, + batch_size=self.args.batch_size, + top_k=self.args.top_k) + return results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.classification(images=images_decode, **kwargs) + return results + + def add_module_config_arg(self): + """ + Add the command config options. + """ + self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not") + + self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size') + self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.') + + def add_module_input_arg(self): + """ + Add the command input options. + """ + self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.") diff --git a/modules/image/classification/pplcnet_x0_25_imagenet/processor.py b/modules/image/classification/pplcnet_x0_25_imagenet/processor.py new file mode 100644 index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8 --- /dev/null +++ b/modules/image/classification/pplcnet_x0_25_imagenet/processor.py @@ -0,0 +1,374 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import base64 +import inspect +import math +import os +import random +import sys +from functools import partial + +import cv2 +import numpy as np +import paddle +import paddle.nn.functional as F +import six +from paddle.vision.transforms import ColorJitter as RawColorJitter +from PIL import Image + + +def create_operators(params, class_num=None): + """ + create operators based on the config + + Args: + params(list): a dict list, used to create some operators + """ + assert isinstance(params, list), ('operator config should be a list') + ops = [] + current_module = sys.modules[__name__] + for operator in params: + assert isinstance(operator, dict) and len(operator) == 1, "yaml format error" + op_name = list(operator)[0] + param = {} if operator[op_name] is None else operator[op_name] + op_func = getattr(current_module, op_name) + if "class_num" in inspect.getfullargspec(op_func).args: + param.update({"class_num": class_num}) + op = op_func(**param) + ops.append(op) + + return ops + + +class UnifiedResize(object): + + def __init__(self, interpolation=None, backend="cv2"): + _cv2_interp_from_str = { + 'nearest': cv2.INTER_NEAREST, + 'bilinear': cv2.INTER_LINEAR, + 'area': cv2.INTER_AREA, + 'bicubic': cv2.INTER_CUBIC, + 'lanczos': cv2.INTER_LANCZOS4 + } + _pil_interp_from_str = { + 'nearest': Image.NEAREST, + 'bilinear': Image.BILINEAR, + 'bicubic': Image.BICUBIC, + 'box': Image.BOX, + 'lanczos': Image.LANCZOS, + 'hamming': Image.HAMMING + } + + def _pil_resize(src, size, resample): + pil_img = Image.fromarray(src) + pil_img = pil_img.resize(size, resample) + return np.asarray(pil_img) + + if backend.lower() == "cv2": + if isinstance(interpolation, str): + interpolation = _cv2_interp_from_str[interpolation.lower()] + # compatible with opencv < version 4.4.0 + elif interpolation is None: + interpolation = cv2.INTER_LINEAR + self.resize_func = partial(cv2.resize, interpolation=interpolation) + elif backend.lower() == "pil": + if isinstance(interpolation, str): + interpolation = _pil_interp_from_str[interpolation.lower()] + self.resize_func = partial(_pil_resize, resample=interpolation) + else: + self.resize_func = cv2.resize + + def __call__(self, src, size): + return self.resize_func(src, size) + + +class OperatorParamError(ValueError): + """ OperatorParamError + """ + pass + + +class DecodeImage(object): + """ decode image """ + + def __init__(self, to_rgb=True, to_np=False, channel_first=False): + self.to_rgb = to_rgb + self.to_np = to_np # to numpy + self.channel_first = channel_first # only enabled when to_np is True + + def __call__(self, img): + if six.PY2: + assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage" + else: + assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage" + data = np.frombuffer(img, dtype='uint8') + img = cv2.imdecode(data, 1) + if self.to_rgb: + assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape) + img = img[:, :, ::-1] + + if self.channel_first: + img = img.transpose((2, 0, 1)) + + return img + + +class ResizeImage(object): + """ resize image """ + + def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"): + if resize_short is not None and resize_short > 0: + self.resize_short = resize_short + self.w = None + self.h = None + elif size is not None: + self.resize_short = None + self.w = size if type(size) is int else size[0] + self.h = size if type(size) is int else size[1] + else: + raise OperatorParamError("invalid params for ReisizeImage for '\ + 'both 'size' and 'resize_short' are None") + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + img_h, img_w = img.shape[:2] + if self.resize_short is not None: + percent = float(self.resize_short) / min(img_w, img_h) + w = int(round(img_w * percent)) + h = int(round(img_h * percent)) + else: + w = self.w + h = self.h + return self._resize_func(img, (w, h)) + + +class CropImage(object): + """ crop image """ + + def __init__(self, size): + if type(size) is int: + self.size = (size, size) + else: + self.size = size # (h, w) + + def __call__(self, img): + w, h = self.size + img_h, img_w = img.shape[:2] + w_start = (img_w - w) // 2 + h_start = (img_h - h) // 2 + + w_end = w_start + w + h_end = h_start + h + return img[h_start:h_end, w_start:w_end, :] + + +class RandCropImage(object): + """ random crop image """ + + def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"): + if type(size) is int: + self.size = (size, size) # (h, w) + else: + self.size = size + + self.scale = [0.08, 1.0] if scale is None else scale + self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + size = self.size + scale = self.scale + ratio = self.ratio + + aspect_ratio = math.sqrt(random.uniform(*ratio)) + w = 1. * aspect_ratio + h = 1. / aspect_ratio + + img_h, img_w = img.shape[:2] + + bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2)) + scale_max = min(scale[1], bound) + scale_min = min(scale[0], bound) + + target_area = img_w * img_h * random.uniform(scale_min, scale_max) + target_size = math.sqrt(target_area) + w = int(target_size * w) + h = int(target_size * h) + + i = random.randint(0, img_w - w) + j = random.randint(0, img_h - h) + + img = img[j:j + h, i:i + w, :] + + return self._resize_func(img, size) + + +class RandFlipImage(object): + """ random flip image + flip_code: + 1: Flipped Horizontally + 0: Flipped Vertically + -1: Flipped Horizontally & Vertically + """ + + def __init__(self, flip_code=1): + assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]" + self.flip_code = flip_code + + def __call__(self, img): + if random.randint(0, 1) == 1: + return cv2.flip(img, self.flip_code) + else: + return img + + +class NormalizeImage(object): + """ normalize image such as substract mean, divide std + """ + + def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3): + if isinstance(scale, str): + scale = eval(scale) + assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4." + self.channel_num = channel_num + self.output_dtype = 'float16' if output_fp16 else 'float32' + self.scale = np.float32(scale if scale is not None else 1.0 / 255.0) + self.order = order + mean = mean if mean is not None else [0.485, 0.456, 0.406] + std = std if std is not None else [0.229, 0.224, 0.225] + + shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3) + self.mean = np.array(mean).reshape(shape).astype('float32') + self.std = np.array(std).reshape(shape).astype('float32') + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage" + + img = (img.astype('float32') * self.scale - self.mean) / self.std + + if self.channel_num == 4: + img_h = img.shape[1] if self.order == 'chw' else img.shape[0] + img_w = img.shape[2] if self.order == 'chw' else img.shape[1] + pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1)) + img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate( + (img, pad_zeros), axis=2)) + return img.astype(self.output_dtype) + + +class ToCHWImage(object): + """ convert hwc image to chw image + """ + + def __init__(self): + pass + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + return img.transpose((2, 0, 1)) + + +class ColorJitter(RawColorJitter): + """ColorJitter. + """ + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + def __call__(self, img): + if not isinstance(img, Image.Image): + img = np.ascontiguousarray(img) + img = Image.fromarray(img) + img = super()._apply_image(img) + if isinstance(img, Image.Image): + img = np.asarray(img) + return img + + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + +class Topk(object): + + def __init__(self, topk=1, class_id_map_file=None): + assert isinstance(topk, (int, )) + self.class_id_map = self.parse_class_id_map(class_id_map_file) + self.topk = topk + + def parse_class_id_map(self, class_id_map_file): + if class_id_map_file is None: + return None + if not os.path.exists(class_id_map_file): + print( + "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!" + ) + return None + + try: + class_id_map = {} + with open(class_id_map_file, "r") as fin: + lines = fin.readlines() + for line in lines: + partition = line.split("\n")[0].partition(" ") + class_id_map[int(partition[0])] = str(partition[-1]) + except Exception as ex: + print(ex) + class_id_map = None + return class_id_map + + def __call__(self, x, file_names=None, multilabel=False): + assert isinstance(x, paddle.Tensor) + if file_names is not None: + assert x.shape[0] == len(file_names) + x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x) + x = x.numpy() + y = [] + for idx, probs in enumerate(x): + index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where( + probs >= 0.5)[0].astype("int32") + clas_id_list = [] + score_list = [] + label_name_list = [] + for i in index: + clas_id_list.append(i.item()) + score_list.append(probs[i].item()) + if self.class_id_map is not None: + label_name_list.append(self.class_id_map[i.item()]) + result = { + "class_ids": clas_id_list, + "scores": np.around(score_list, decimals=5).tolist(), + } + if file_names is not None: + result["file_name"] = file_names[idx] + if label_name_list is not None: + result["label_names"] = label_name_list + y.append(result) + return y diff --git a/modules/image/classification/pplcnet_x0_25_imagenet/utils.py b/modules/image/classification/pplcnet_x0_25_imagenet/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537 --- /dev/null +++ b/modules/image/classification/pplcnet_x0_25_imagenet/utils.py @@ -0,0 +1,129 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import yaml + +__all__ = ['get_config'] + + +class AttrDict(dict): + + def __getattr__(self, key): + return self[key] + + def __setattr__(self, key, value): + if key in self.__dict__: + self.__dict__[key] = value + else: + self[key] = value + + def __deepcopy__(self, content): + return copy.deepcopy(dict(self)) + + +def create_attr_dict(yaml_config): + from ast import literal_eval + for key, value in yaml_config.items(): + if type(value) is dict: + yaml_config[key] = value = AttrDict(value) + if isinstance(value, str): + try: + value = literal_eval(value) + except BaseException: + pass + if isinstance(value, AttrDict): + create_attr_dict(yaml_config[key]) + else: + yaml_config[key] = value + + +def parse_config(cfg_file): + """Load a config file into AttrDict""" + with open(cfg_file, 'r') as fopen: + yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader)) + create_attr_dict(yaml_config) + return yaml_config + + +def override(dl, ks, v): + """ + Recursively replace dict of list + Args: + dl(dict or list): dict or list to be replaced + ks(list): list of keys + v(str): value to be replaced + """ + + def str2num(v): + try: + return eval(v) + except Exception: + return v + + assert isinstance(dl, (list, dict)), ("{} should be a list or a dict") + assert len(ks) > 0, ('lenght of keys should larger than 0') + if isinstance(dl, list): + k = str2num(ks[0]) + if len(ks) == 1: + assert k < len(dl), ('index({}) out of range({})'.format(k, dl)) + dl[k] = str2num(v) + else: + override(dl[k], ks[1:], v) + else: + if len(ks) == 1: + # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl)) + if not ks[0] in dl: + print('A new filed ({}) detected!'.format(ks[0], dl)) + dl[ks[0]] = str2num(v) + else: + override(dl[ks[0]], ks[1:], v) + + +def override_config(config, options=None): + """ + Recursively override the config + Args: + config(dict): dict to be replaced + options(list): list of pairs(key0.key1.idx.key2=value) + such as: [ + 'topk=2', + 'VALID.transforms.1.ResizeImage.resize_short=300' + ] + Returns: + config(dict): replaced config + """ + if options is not None: + for opt in options: + assert isinstance(opt, str), ("option({}) should be a str".format(opt)) + assert "=" in opt, ("option({}) should contain a =" + "to distinguish between key and value".format(opt)) + pair = opt.split('=') + assert len(pair) == 2, ("there can be only a = in the option") + key, value = pair + keys = key.split('.') + override(config, keys, value) + return config + + +def get_config(fname, overrides=None, show=False): + """ + Read config from file + """ + assert os.path.exists(fname), ('config file({}) is not exist'.format(fname)) + config = parse_config(fname) + override_config(config, overrides) + return config diff --git a/modules/image/classification/pplcnet_x0_35_imagenet/README.md b/modules/image/classification/pplcnet_x0_35_imagenet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..021c52b8eb650b4001a55a2cf393ce061ad011a4 --- /dev/null +++ b/modules/image/classification/pplcnet_x0_35_imagenet/README.md @@ -0,0 +1,132 @@ +# pplcnet_x0_35_imagenet + +|模型名称|pplcnet_x0_35_imagenet| +| :--- | :---: | +|类别|图像-图像分类| +|网络|PPLCNet| +|数据集|ImageNet-2012| +|是否支持Fine-tuning|否| +|模型大小|6 MB| +|最新更新日期|2022-04-02| +|数据指标|Acc| + + +## 一、模型基本信息 + + + +- ### 模型介绍 + + - PP-LCNet是百度针对Intel CPU 设备以及其加速库 MKLDNN 设计的特定骨干网络 ,比起其他的轻量级的 SOTA 模型,该骨干网络可以在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型。该模型为模型规模参数scale为x0.35下的PP-LCNet模型,关于模型结构的更多信息,可参考[论文](https://arxiv.org/pdf/2109.15099.pdf)。 + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 1.6.2 + + - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + +- ### 2、安装 + + - ```shell + $ hub install pplcnet_x0_35_imagenet + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run pplcnet_x0_35_imagenet --input_path "/PATH/TO/IMAGE" + ``` + - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、预测代码示例 + + - ```python + import paddlehub as hub + import cv2 + + classifier = hub.Module(name="pplcnet_x0_35_imagenet") + result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')]) + # or + # result = classifier.classification(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + + - ```python + def classification(images=None, + paths=None, + batch_size=1, + use_gpu=False, + top_k=1): + ``` + - 分类接口API。 + - **参数** + + - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。 + + - **返回** + + - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称) + + +## 四、服务部署 + +- PaddleHub Serving可以部署一个图像识别的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m pplcnet_x0_35_imagenet + ``` + + - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"\} + url = "http://127.0.0.1:8866/predict/pplcnet_x0_35_imagenet" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` + + +## 五、更新历史 + +* 1.0.0 + + 初始发布 + + - ```shell + $ hub install pplcnet_x0_35_imagenet==1.0.0 + ``` diff --git a/modules/image/classification/pplcnet_x0_35_imagenet/model.py b/modules/image/classification/pplcnet_x0_35_imagenet/model.py new file mode 100644 index 0000000000000000000000000000000000000000..85580ae9f6c73ab79b0e371398e343384cc459ab --- /dev/null +++ b/modules/image/classification/pplcnet_x0_35_imagenet/model.py @@ -0,0 +1,478 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from typing import Any +from typing import Callable +from typing import Dict +from typing import List +from typing import Tuple +from typing import Union + +import paddle +import paddle.nn as nn +from paddle import ParamAttr +from paddle.nn import AdaptiveAvgPool2D +from paddle.nn import BatchNorm +from paddle.nn import Conv2D +from paddle.nn import Dropout +from paddle.nn import Linear +from paddle.nn.initializer import KaimingNormal +from paddle.regularizer import L2Decay + + +class Identity(nn.Layer): + + def __init__(self): + super(Identity, self).__init__() + + def forward(self, inputs): + return inputs + + +class TheseusLayer(nn.Layer): + + def __init__(self, *args, **kwargs): + super(TheseusLayer, self).__init__() + self.res_dict = {} + self.res_name = self.full_name() + self.pruner = None + self.quanter = None + + def _return_dict_hook(self, layer, input, output): + res_dict = {"output": output} + # 'list' is needed to avoid error raised by popping self.res_dict + for res_key in list(self.res_dict): + # clear the res_dict because the forward process may change according to input + res_dict[res_key] = self.res_dict.pop(res_key) + return res_dict + + def init_res(self, stages_pattern, return_patterns=None, return_stages=None): + if return_patterns and return_stages: + msg = f"The 'return_patterns' would be ignored when 'return_stages' is set." + return_stages = None + + if return_stages is True: + return_patterns = stages_pattern + # return_stages is int or bool + if type(return_stages) is int: + return_stages = [return_stages] + if isinstance(return_stages, list): + if max(return_stages) > len(stages_pattern) or min(return_stages) < 0: + msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}." + return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)] + return_patterns = [stages_pattern[i] for i in return_stages] + + if return_patterns: + self.update_res(return_patterns) + + def replace_sub(self, *args, **kwargs) -> None: + msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead." + raise DeprecationWarning(msg) + + def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]], + handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]: + """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'. + + Args: + layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'. + handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed. + + Returns: + Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'. + + Examples: + + from paddle import nn + import paddleclas + + def rep_func(layer: nn.Layer, pattern: str): + new_layer = nn.Conv2D( + in_channels=layer._in_channels, + out_channels=layer._out_channels, + kernel_size=5, + padding=2 + ) + return new_layer + + net = paddleclas.MobileNetV1() + res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func) + print(res) + # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer} + """ + + if not isinstance(layer_name_pattern, list): + layer_name_pattern = [layer_name_pattern] + + hit_layer_pattern_list = [] + for pattern in layer_name_pattern: + # parse pattern to find target layer and its parent + layer_list = parse_pattern_str(pattern=pattern, parent_layer=self) + if not layer_list: + continue + sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self + + sub_layer = layer_list[-1]["layer"] + sub_layer_name = layer_list[-1]["name"] + sub_layer_index = layer_list[-1]["index"] + + new_sub_layer = handle_func(sub_layer, pattern) + + if sub_layer_index: + getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer + else: + setattr(sub_layer_parent, sub_layer_name, new_sub_layer) + + hit_layer_pattern_list.append(pattern) + return hit_layer_pattern_list + + def stop_after(self, stop_layer_name: str) -> bool: + """stop forward and backward after 'stop_layer_name'. + + Args: + stop_layer_name (str): The name of layer that stop forward and backward after this layer. + + Returns: + bool: 'True' if successful, 'False' otherwise. + """ + + layer_list = parse_pattern_str(stop_layer_name, self) + if not layer_list: + return False + + parent_layer = self + for layer_dict in layer_list: + name, index = layer_dict["name"], layer_dict["index"] + if not set_identity(parent_layer, name, index): + msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'." + return False + parent_layer = layer_dict["layer"] + + return True + + def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]: + """update the result(s) to be returned. + + Args: + return_patterns (Union[str, List[str]]): The name of layer to return output. + + Returns: + Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully. + """ + + # clear res_dict that could have been set + self.res_dict = {} + + class Handler(object): + + def __init__(self, res_dict): + # res_dict is a reference + self.res_dict = res_dict + + def __call__(self, layer, pattern): + layer.res_dict = self.res_dict + layer.res_name = pattern + if hasattr(layer, "hook_remove_helper"): + layer.hook_remove_helper.remove() + layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook) + return layer + + handle_func = Handler(self.res_dict) + + hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func) + + if hasattr(self, "hook_remove_helper"): + self.hook_remove_helper.remove() + self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook) + + return hit_layer_pattern_list + + +def save_sub_res_hook(layer, input, output): + layer.res_dict[layer.res_name] = output + + +def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool: + """set the layer specified by layer_name and layer_index to Indentity. + + Args: + parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index. + layer_name (str): The name of target layer to be set to Indentity. + layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None. + + Returns: + bool: True if successfully, False otherwise. + """ + + stop_after = False + for sub_layer_name in parent_layer._sub_layers: + if stop_after: + parent_layer._sub_layers[sub_layer_name] = Identity() + continue + if sub_layer_name == layer_name: + stop_after = True + + if layer_index and stop_after: + stop_after = False + for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers: + if stop_after: + parent_layer._sub_layers[layer_name][sub_layer_index] = Identity() + continue + if layer_index == sub_layer_index: + stop_after = True + + return stop_after + + +def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: + """parse the string type pattern. + + Args: + pattern (str): The pattern to discribe layer. + parent_layer (nn.Layer): The root layer relative to the pattern. + + Returns: + Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order: + [ + {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist}, + {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist}, + ... + ] + """ + + pattern_list = pattern.split(".") + if not pattern_list: + msg = f"The pattern('{pattern}') is illegal. Please check and retry." + return None + + layer_list = [] + while len(pattern_list) > 0: + if '[' in pattern_list[0]: + target_layer_name = pattern_list[0].split('[')[0] + target_layer_index = pattern_list[0].split('[')[1].split(']')[0] + else: + target_layer_name = pattern_list[0] + target_layer_index = None + + target_layer = getattr(parent_layer, target_layer_name, None) + + if target_layer is None: + msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')." + return None + + if target_layer_index and target_layer: + if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer): + msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0." + return None + + target_layer = target_layer[target_layer_index] + + layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index}) + + pattern_list = pattern_list[1:] + parent_layer = target_layer + return layer_list + + +MODEL_STAGES_PATTERN = {"PPLCNet": ["blocks2", "blocks3", "blocks4", "blocks5", "blocks6"]} + +# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se. +# k: kernel_size +# in_c: input channel number in depthwise block +# out_c: output channel number in depthwise block +# s: stride in depthwise block +# use_se: whether to use SE block + +NET_CONFIG = { + "blocks2": + #k, in_c, out_c, s, use_se + [[3, 16, 32, 1, False]], + "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]], + "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]], + "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], + [5, 256, 256, 1, False], [5, 256, 256, 1, False]], + "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]] +} + + +def make_divisible(v, divisor=8, min_value=None): + if min_value is None: + min_value = divisor + new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) + if new_v < 0.9 * v: + new_v += divisor + return new_v + + +class ConvBNLayer(TheseusLayer): + + def __init__(self, num_channels, filter_size, num_filters, stride, num_groups=1): + super().__init__() + + self.conv = Conv2D(in_channels=num_channels, + out_channels=num_filters, + kernel_size=filter_size, + stride=stride, + padding=(filter_size - 1) // 2, + groups=num_groups, + weight_attr=ParamAttr(initializer=KaimingNormal()), + bias_attr=False) + + self.bn = BatchNorm(num_filters, + param_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + self.hardswish = nn.Hardswish() + + def forward(self, x): + x = self.conv(x) + x = self.bn(x) + x = self.hardswish(x) + return x + + +class DepthwiseSeparable(TheseusLayer): + + def __init__(self, num_channels, num_filters, stride, dw_size=3, use_se=False): + super().__init__() + self.use_se = use_se + self.dw_conv = ConvBNLayer(num_channels=num_channels, + num_filters=num_channels, + filter_size=dw_size, + stride=stride, + num_groups=num_channels) + if use_se: + self.se = SEModule(num_channels) + self.pw_conv = ConvBNLayer(num_channels=num_channels, filter_size=1, num_filters=num_filters, stride=1) + + def forward(self, x): + x = self.dw_conv(x) + if self.use_se: + x = self.se(x) + x = self.pw_conv(x) + return x + + +class SEModule(TheseusLayer): + + def __init__(self, channel, reduction=4): + super().__init__() + self.avg_pool = AdaptiveAvgPool2D(1) + self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0) + self.relu = nn.ReLU() + self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0) + self.hardsigmoid = nn.Hardsigmoid() + + def forward(self, x): + identity = x + x = self.avg_pool(x) + x = self.conv1(x) + x = self.relu(x) + x = self.conv2(x) + x = self.hardsigmoid(x) + x = paddle.multiply(x=identity, y=x) + return x + + +class PPLCNet(TheseusLayer): + + def __init__(self, + stages_pattern, + scale=1.0, + class_num=1000, + dropout_prob=0.2, + class_expand=1280, + return_patterns=None, + return_stages=None): + super().__init__() + self.scale = scale + self.class_expand = class_expand + + self.conv1 = ConvBNLayer(num_channels=3, filter_size=3, num_filters=make_divisible(16 * scale), stride=2) + + self.blocks2 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"]) + ]) + + self.blocks3 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"]) + ]) + + self.blocks4 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"]) + ]) + + self.blocks5 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"]) + ]) + + self.blocks6 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"]) + ]) + + self.avg_pool = AdaptiveAvgPool2D(1) + + self.last_conv = Conv2D(in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale), + out_channels=self.class_expand, + kernel_size=1, + stride=1, + padding=0, + bias_attr=False) + + self.hardswish = nn.Hardswish() + self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer") + self.flatten = nn.Flatten(start_axis=1, stop_axis=-1) + + self.fc = Linear(self.class_expand, class_num) + + super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages) + + def forward(self, x): + x = self.conv1(x) + + x = self.blocks2(x) + x = self.blocks3(x) + x = self.blocks4(x) + x = self.blocks5(x) + x = self.blocks6(x) + + x = self.avg_pool(x) + x = self.last_conv(x) + x = self.hardswish(x) + x = self.dropout(x) + x = self.flatten(x) + x = self.fc(x) + return x + + +def PPLCNet_x0_35(pretrained=False, use_ssld=False, **kwargs): + model = PPLCNet(scale=0.35, stages_pattern=MODEL_STAGES_PATTERN["PPLCNet"], **kwargs) + return model diff --git a/modules/image/classification/pplcnet_x0_35_imagenet/module.py b/modules/image/classification/pplcnet_x0_35_imagenet/module.py new file mode 100644 index 0000000000000000000000000000000000000000..acd31f0261c2ae7af7014c8fdc15a061b5d44128 --- /dev/null +++ b/modules/image/classification/pplcnet_x0_35_imagenet/module.py @@ -0,0 +1,154 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import cv2 +import numpy as np +import paddle +from skimage.io import imread +from skimage.transform import rescale +from skimage.transform import resize + +import paddlehub as hub +from .model import PPLCNet_x0_35 +from .processor import base64_to_cv2 +from .processor import create_operators +from .processor import Topk +from .utils import get_config +from paddlehub.module.module import moduleinfo +from paddlehub.module.module import runnable +from paddlehub.module.module import serving + + +@moduleinfo(name="pplcnet_x0_35_imagenet", + type="cv/classification", + author="paddlepaddle", + author_email="", + summary="", + version="1.0.0") +class PPLcNet_x0_35: + + def __init__(self): + self.config = get_config(os.path.join(self.directory, 'PPLCNet_x0_35.yaml'), show=False) + self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt') + self.pretrain_path = os.path.join(self.directory, 'PPLCNet_x0_35_pretrained.pdparams') + self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path + self.model = PPLCNet_x0_35() + param_state_dict = paddle.load(self.pretrain_path) + self.model.set_dict(param_state_dict) + self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"]) + + def classification(self, + images: list = None, + paths: list = None, + batch_size: int = 1, + use_gpu: bool = False, + top_k: int = 1): + ''' + Args: + images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR. + paths (list[str]): The paths of images. + batch_size (int): batch size. + use_gpu (bool): Whether to use gpu. + top_k (int): Return top k results. + + Returns: + res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'. + ''' + postprocess_func = Topk(top_k, self.label_path) + inputs = [] + results = [] + paddle.disable_static() + place = 'gpu:0' if use_gpu else 'cpu' + place = paddle.set_device(place) + if images == None and paths == None: + print('No image provided. Please input an image or a image path.') + return + + if images != None: + for image in images: + image = image[:, :, ::-1] + inputs.append(image) + + if paths != None: + for path in paths: + image = cv2.imread(path)[:, :, ::-1] + inputs.append(image) + + batch_data = [] + for idx, imagedata in enumerate(inputs): + for process in self.preprocess_funcs: + imagedata = process(imagedata) + batch_data.append(imagedata) + if len(batch_data) >= batch_size or idx == len(inputs) - 1: + batch_tensor = paddle.to_tensor(batch_data) + out = self.model(batch_tensor) + if isinstance(out, list): + out = out[0] + if isinstance(out, dict) and "logits" in out: + out = out["logits"] + if isinstance(out, dict) and "output" in out: + out = out["output"] + result = postprocess_func(out) + results.extend(result) + batch_data.clear() + return results + + @runnable + def run_cmd(self, argvs: list): + """ + Run as a command. + """ + self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name), + prog='hub run {}'.format(self.name), + usage='%(prog)s', + add_help=True) + + self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required") + self.arg_config_group = self.parser.add_argument_group( + title="Config options", description="Run configuration for controlling module behavior, not required.") + self.add_module_config_arg() + self.add_module_input_arg() + self.args = self.parser.parse_args(argvs) + results = self.classification(paths=[self.args.input_path], + use_gpu=self.args.use_gpu, + batch_size=self.args.batch_size, + top_k=self.args.top_k) + return results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.classification(images=images_decode, **kwargs) + return results + + def add_module_config_arg(self): + """ + Add the command config options. + """ + self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not") + + self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size') + self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.') + + def add_module_input_arg(self): + """ + Add the command input options. + """ + self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.") diff --git a/modules/image/classification/pplcnet_x0_35_imagenet/processor.py b/modules/image/classification/pplcnet_x0_35_imagenet/processor.py new file mode 100644 index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8 --- /dev/null +++ b/modules/image/classification/pplcnet_x0_35_imagenet/processor.py @@ -0,0 +1,374 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import base64 +import inspect +import math +import os +import random +import sys +from functools import partial + +import cv2 +import numpy as np +import paddle +import paddle.nn.functional as F +import six +from paddle.vision.transforms import ColorJitter as RawColorJitter +from PIL import Image + + +def create_operators(params, class_num=None): + """ + create operators based on the config + + Args: + params(list): a dict list, used to create some operators + """ + assert isinstance(params, list), ('operator config should be a list') + ops = [] + current_module = sys.modules[__name__] + for operator in params: + assert isinstance(operator, dict) and len(operator) == 1, "yaml format error" + op_name = list(operator)[0] + param = {} if operator[op_name] is None else operator[op_name] + op_func = getattr(current_module, op_name) + if "class_num" in inspect.getfullargspec(op_func).args: + param.update({"class_num": class_num}) + op = op_func(**param) + ops.append(op) + + return ops + + +class UnifiedResize(object): + + def __init__(self, interpolation=None, backend="cv2"): + _cv2_interp_from_str = { + 'nearest': cv2.INTER_NEAREST, + 'bilinear': cv2.INTER_LINEAR, + 'area': cv2.INTER_AREA, + 'bicubic': cv2.INTER_CUBIC, + 'lanczos': cv2.INTER_LANCZOS4 + } + _pil_interp_from_str = { + 'nearest': Image.NEAREST, + 'bilinear': Image.BILINEAR, + 'bicubic': Image.BICUBIC, + 'box': Image.BOX, + 'lanczos': Image.LANCZOS, + 'hamming': Image.HAMMING + } + + def _pil_resize(src, size, resample): + pil_img = Image.fromarray(src) + pil_img = pil_img.resize(size, resample) + return np.asarray(pil_img) + + if backend.lower() == "cv2": + if isinstance(interpolation, str): + interpolation = _cv2_interp_from_str[interpolation.lower()] + # compatible with opencv < version 4.4.0 + elif interpolation is None: + interpolation = cv2.INTER_LINEAR + self.resize_func = partial(cv2.resize, interpolation=interpolation) + elif backend.lower() == "pil": + if isinstance(interpolation, str): + interpolation = _pil_interp_from_str[interpolation.lower()] + self.resize_func = partial(_pil_resize, resample=interpolation) + else: + self.resize_func = cv2.resize + + def __call__(self, src, size): + return self.resize_func(src, size) + + +class OperatorParamError(ValueError): + """ OperatorParamError + """ + pass + + +class DecodeImage(object): + """ decode image """ + + def __init__(self, to_rgb=True, to_np=False, channel_first=False): + self.to_rgb = to_rgb + self.to_np = to_np # to numpy + self.channel_first = channel_first # only enabled when to_np is True + + def __call__(self, img): + if six.PY2: + assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage" + else: + assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage" + data = np.frombuffer(img, dtype='uint8') + img = cv2.imdecode(data, 1) + if self.to_rgb: + assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape) + img = img[:, :, ::-1] + + if self.channel_first: + img = img.transpose((2, 0, 1)) + + return img + + +class ResizeImage(object): + """ resize image """ + + def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"): + if resize_short is not None and resize_short > 0: + self.resize_short = resize_short + self.w = None + self.h = None + elif size is not None: + self.resize_short = None + self.w = size if type(size) is int else size[0] + self.h = size if type(size) is int else size[1] + else: + raise OperatorParamError("invalid params for ReisizeImage for '\ + 'both 'size' and 'resize_short' are None") + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + img_h, img_w = img.shape[:2] + if self.resize_short is not None: + percent = float(self.resize_short) / min(img_w, img_h) + w = int(round(img_w * percent)) + h = int(round(img_h * percent)) + else: + w = self.w + h = self.h + return self._resize_func(img, (w, h)) + + +class CropImage(object): + """ crop image """ + + def __init__(self, size): + if type(size) is int: + self.size = (size, size) + else: + self.size = size # (h, w) + + def __call__(self, img): + w, h = self.size + img_h, img_w = img.shape[:2] + w_start = (img_w - w) // 2 + h_start = (img_h - h) // 2 + + w_end = w_start + w + h_end = h_start + h + return img[h_start:h_end, w_start:w_end, :] + + +class RandCropImage(object): + """ random crop image """ + + def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"): + if type(size) is int: + self.size = (size, size) # (h, w) + else: + self.size = size + + self.scale = [0.08, 1.0] if scale is None else scale + self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + size = self.size + scale = self.scale + ratio = self.ratio + + aspect_ratio = math.sqrt(random.uniform(*ratio)) + w = 1. * aspect_ratio + h = 1. / aspect_ratio + + img_h, img_w = img.shape[:2] + + bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2)) + scale_max = min(scale[1], bound) + scale_min = min(scale[0], bound) + + target_area = img_w * img_h * random.uniform(scale_min, scale_max) + target_size = math.sqrt(target_area) + w = int(target_size * w) + h = int(target_size * h) + + i = random.randint(0, img_w - w) + j = random.randint(0, img_h - h) + + img = img[j:j + h, i:i + w, :] + + return self._resize_func(img, size) + + +class RandFlipImage(object): + """ random flip image + flip_code: + 1: Flipped Horizontally + 0: Flipped Vertically + -1: Flipped Horizontally & Vertically + """ + + def __init__(self, flip_code=1): + assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]" + self.flip_code = flip_code + + def __call__(self, img): + if random.randint(0, 1) == 1: + return cv2.flip(img, self.flip_code) + else: + return img + + +class NormalizeImage(object): + """ normalize image such as substract mean, divide std + """ + + def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3): + if isinstance(scale, str): + scale = eval(scale) + assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4." + self.channel_num = channel_num + self.output_dtype = 'float16' if output_fp16 else 'float32' + self.scale = np.float32(scale if scale is not None else 1.0 / 255.0) + self.order = order + mean = mean if mean is not None else [0.485, 0.456, 0.406] + std = std if std is not None else [0.229, 0.224, 0.225] + + shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3) + self.mean = np.array(mean).reshape(shape).astype('float32') + self.std = np.array(std).reshape(shape).astype('float32') + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage" + + img = (img.astype('float32') * self.scale - self.mean) / self.std + + if self.channel_num == 4: + img_h = img.shape[1] if self.order == 'chw' else img.shape[0] + img_w = img.shape[2] if self.order == 'chw' else img.shape[1] + pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1)) + img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate( + (img, pad_zeros), axis=2)) + return img.astype(self.output_dtype) + + +class ToCHWImage(object): + """ convert hwc image to chw image + """ + + def __init__(self): + pass + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + return img.transpose((2, 0, 1)) + + +class ColorJitter(RawColorJitter): + """ColorJitter. + """ + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + def __call__(self, img): + if not isinstance(img, Image.Image): + img = np.ascontiguousarray(img) + img = Image.fromarray(img) + img = super()._apply_image(img) + if isinstance(img, Image.Image): + img = np.asarray(img) + return img + + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + +class Topk(object): + + def __init__(self, topk=1, class_id_map_file=None): + assert isinstance(topk, (int, )) + self.class_id_map = self.parse_class_id_map(class_id_map_file) + self.topk = topk + + def parse_class_id_map(self, class_id_map_file): + if class_id_map_file is None: + return None + if not os.path.exists(class_id_map_file): + print( + "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!" + ) + return None + + try: + class_id_map = {} + with open(class_id_map_file, "r") as fin: + lines = fin.readlines() + for line in lines: + partition = line.split("\n")[0].partition(" ") + class_id_map[int(partition[0])] = str(partition[-1]) + except Exception as ex: + print(ex) + class_id_map = None + return class_id_map + + def __call__(self, x, file_names=None, multilabel=False): + assert isinstance(x, paddle.Tensor) + if file_names is not None: + assert x.shape[0] == len(file_names) + x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x) + x = x.numpy() + y = [] + for idx, probs in enumerate(x): + index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where( + probs >= 0.5)[0].astype("int32") + clas_id_list = [] + score_list = [] + label_name_list = [] + for i in index: + clas_id_list.append(i.item()) + score_list.append(probs[i].item()) + if self.class_id_map is not None: + label_name_list.append(self.class_id_map[i.item()]) + result = { + "class_ids": clas_id_list, + "scores": np.around(score_list, decimals=5).tolist(), + } + if file_names is not None: + result["file_name"] = file_names[idx] + if label_name_list is not None: + result["label_names"] = label_name_list + y.append(result) + return y diff --git a/modules/image/classification/pplcnet_x0_35_imagenet/utils.py b/modules/image/classification/pplcnet_x0_35_imagenet/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537 --- /dev/null +++ b/modules/image/classification/pplcnet_x0_35_imagenet/utils.py @@ -0,0 +1,129 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import yaml + +__all__ = ['get_config'] + + +class AttrDict(dict): + + def __getattr__(self, key): + return self[key] + + def __setattr__(self, key, value): + if key in self.__dict__: + self.__dict__[key] = value + else: + self[key] = value + + def __deepcopy__(self, content): + return copy.deepcopy(dict(self)) + + +def create_attr_dict(yaml_config): + from ast import literal_eval + for key, value in yaml_config.items(): + if type(value) is dict: + yaml_config[key] = value = AttrDict(value) + if isinstance(value, str): + try: + value = literal_eval(value) + except BaseException: + pass + if isinstance(value, AttrDict): + create_attr_dict(yaml_config[key]) + else: + yaml_config[key] = value + + +def parse_config(cfg_file): + """Load a config file into AttrDict""" + with open(cfg_file, 'r') as fopen: + yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader)) + create_attr_dict(yaml_config) + return yaml_config + + +def override(dl, ks, v): + """ + Recursively replace dict of list + Args: + dl(dict or list): dict or list to be replaced + ks(list): list of keys + v(str): value to be replaced + """ + + def str2num(v): + try: + return eval(v) + except Exception: + return v + + assert isinstance(dl, (list, dict)), ("{} should be a list or a dict") + assert len(ks) > 0, ('lenght of keys should larger than 0') + if isinstance(dl, list): + k = str2num(ks[0]) + if len(ks) == 1: + assert k < len(dl), ('index({}) out of range({})'.format(k, dl)) + dl[k] = str2num(v) + else: + override(dl[k], ks[1:], v) + else: + if len(ks) == 1: + # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl)) + if not ks[0] in dl: + print('A new filed ({}) detected!'.format(ks[0], dl)) + dl[ks[0]] = str2num(v) + else: + override(dl[ks[0]], ks[1:], v) + + +def override_config(config, options=None): + """ + Recursively override the config + Args: + config(dict): dict to be replaced + options(list): list of pairs(key0.key1.idx.key2=value) + such as: [ + 'topk=2', + 'VALID.transforms.1.ResizeImage.resize_short=300' + ] + Returns: + config(dict): replaced config + """ + if options is not None: + for opt in options: + assert isinstance(opt, str), ("option({}) should be a str".format(opt)) + assert "=" in opt, ("option({}) should contain a =" + "to distinguish between key and value".format(opt)) + pair = opt.split('=') + assert len(pair) == 2, ("there can be only a = in the option") + key, value = pair + keys = key.split('.') + override(config, keys, value) + return config + + +def get_config(fname, overrides=None, show=False): + """ + Read config from file + """ + assert os.path.exists(fname), ('config file({}) is not exist'.format(fname)) + config = parse_config(fname) + override_config(config, overrides) + return config diff --git a/modules/image/classification/pplcnet_x0_5_imagenet/README.md b/modules/image/classification/pplcnet_x0_5_imagenet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..3efd7cd06b4d3177e509c8e61c2ddd05bec080bf --- /dev/null +++ b/modules/image/classification/pplcnet_x0_5_imagenet/README.md @@ -0,0 +1,132 @@ +# pplcnet_x0_5_imagenet + +|模型名称|pplcnet_x0_5_imagenet| +| :--- | :---: | +|类别|图像-图像分类| +|网络|PPLCNet| +|数据集|ImageNet-2012| +|是否支持Fine-tuning|否| +|模型大小|7 MB| +|最新更新日期|2022-04-02| +|数据指标|Acc| + + +## 一、模型基本信息 + + + +- ### 模型介绍 + + - PP-LCNet是百度针对Intel CPU 设备以及其加速库 MKLDNN 设计的特定骨干网络 ,比起其他的轻量级的 SOTA 模型,该骨干网络可以在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型。该模型为模型规模参数scale为x0.5下的PP-LCNet模型,关于模型结构的更多信息,可参考[论文](https://arxiv.org/pdf/2109.15099.pdf)。 + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 1.6.2 + + - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + +- ### 2、安装 + + - ```shell + $ hub install pplcnet_x0_5_imagenet + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run pplcnet_x0_5_imagenet --input_path "/PATH/TO/IMAGE" + ``` + - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、预测代码示例 + + - ```python + import paddlehub as hub + import cv2 + + classifier = hub.Module(name="pplcnet_x0_5_imagenet") + result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')]) + # or + # result = classifier.classification(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + + - ```python + def classification(images=None, + paths=None, + batch_size=1, + use_gpu=False, + top_k=1): + ``` + - 分类接口API。 + - **参数** + + - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。 + + - **返回** + + - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称) + + +## 四、服务部署 + +- PaddleHub Serving可以部署一个图像识别的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m pplcnet_x0_5_imagenet + ``` + + - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"\} + url = "http://127.0.0.1:8866/predict/pplcnet_x0_5_imagenet" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` + + +## 五、更新历史 + +* 1.0.0 + + 初始发布 + + - ```shell + $ hub install pplcnet_x0_5_imagenet==1.0.0 + ``` diff --git a/modules/image/classification/pplcnet_x0_5_imagenet/model.py b/modules/image/classification/pplcnet_x0_5_imagenet/model.py new file mode 100644 index 0000000000000000000000000000000000000000..8c6a399bc83fc29a7a33df79112bcf66e07146b5 --- /dev/null +++ b/modules/image/classification/pplcnet_x0_5_imagenet/model.py @@ -0,0 +1,478 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from typing import Any +from typing import Callable +from typing import Dict +from typing import List +from typing import Tuple +from typing import Union + +import paddle +import paddle.nn as nn +from paddle import ParamAttr +from paddle.nn import AdaptiveAvgPool2D +from paddle.nn import BatchNorm +from paddle.nn import Conv2D +from paddle.nn import Dropout +from paddle.nn import Linear +from paddle.nn.initializer import KaimingNormal +from paddle.regularizer import L2Decay + + +class Identity(nn.Layer): + + def __init__(self): + super(Identity, self).__init__() + + def forward(self, inputs): + return inputs + + +class TheseusLayer(nn.Layer): + + def __init__(self, *args, **kwargs): + super(TheseusLayer, self).__init__() + self.res_dict = {} + self.res_name = self.full_name() + self.pruner = None + self.quanter = None + + def _return_dict_hook(self, layer, input, output): + res_dict = {"output": output} + # 'list' is needed to avoid error raised by popping self.res_dict + for res_key in list(self.res_dict): + # clear the res_dict because the forward process may change according to input + res_dict[res_key] = self.res_dict.pop(res_key) + return res_dict + + def init_res(self, stages_pattern, return_patterns=None, return_stages=None): + if return_patterns and return_stages: + msg = f"The 'return_patterns' would be ignored when 'return_stages' is set." + return_stages = None + + if return_stages is True: + return_patterns = stages_pattern + # return_stages is int or bool + if type(return_stages) is int: + return_stages = [return_stages] + if isinstance(return_stages, list): + if max(return_stages) > len(stages_pattern) or min(return_stages) < 0: + msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}." + return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)] + return_patterns = [stages_pattern[i] for i in return_stages] + + if return_patterns: + self.update_res(return_patterns) + + def replace_sub(self, *args, **kwargs) -> None: + msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead." + raise DeprecationWarning(msg) + + def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]], + handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]: + """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'. + + Args: + layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'. + handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed. + + Returns: + Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'. + + Examples: + + from paddle import nn + import paddleclas + + def rep_func(layer: nn.Layer, pattern: str): + new_layer = nn.Conv2D( + in_channels=layer._in_channels, + out_channels=layer._out_channels, + kernel_size=5, + padding=2 + ) + return new_layer + + net = paddleclas.MobileNetV1() + res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func) + print(res) + # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer} + """ + + if not isinstance(layer_name_pattern, list): + layer_name_pattern = [layer_name_pattern] + + hit_layer_pattern_list = [] + for pattern in layer_name_pattern: + # parse pattern to find target layer and its parent + layer_list = parse_pattern_str(pattern=pattern, parent_layer=self) + if not layer_list: + continue + sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self + + sub_layer = layer_list[-1]["layer"] + sub_layer_name = layer_list[-1]["name"] + sub_layer_index = layer_list[-1]["index"] + + new_sub_layer = handle_func(sub_layer, pattern) + + if sub_layer_index: + getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer + else: + setattr(sub_layer_parent, sub_layer_name, new_sub_layer) + + hit_layer_pattern_list.append(pattern) + return hit_layer_pattern_list + + def stop_after(self, stop_layer_name: str) -> bool: + """stop forward and backward after 'stop_layer_name'. + + Args: + stop_layer_name (str): The name of layer that stop forward and backward after this layer. + + Returns: + bool: 'True' if successful, 'False' otherwise. + """ + + layer_list = parse_pattern_str(stop_layer_name, self) + if not layer_list: + return False + + parent_layer = self + for layer_dict in layer_list: + name, index = layer_dict["name"], layer_dict["index"] + if not set_identity(parent_layer, name, index): + msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'." + return False + parent_layer = layer_dict["layer"] + + return True + + def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]: + """update the result(s) to be returned. + + Args: + return_patterns (Union[str, List[str]]): The name of layer to return output. + + Returns: + Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully. + """ + + # clear res_dict that could have been set + self.res_dict = {} + + class Handler(object): + + def __init__(self, res_dict): + # res_dict is a reference + self.res_dict = res_dict + + def __call__(self, layer, pattern): + layer.res_dict = self.res_dict + layer.res_name = pattern + if hasattr(layer, "hook_remove_helper"): + layer.hook_remove_helper.remove() + layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook) + return layer + + handle_func = Handler(self.res_dict) + + hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func) + + if hasattr(self, "hook_remove_helper"): + self.hook_remove_helper.remove() + self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook) + + return hit_layer_pattern_list + + +def save_sub_res_hook(layer, input, output): + layer.res_dict[layer.res_name] = output + + +def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool: + """set the layer specified by layer_name and layer_index to Indentity. + + Args: + parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index. + layer_name (str): The name of target layer to be set to Indentity. + layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None. + + Returns: + bool: True if successfully, False otherwise. + """ + + stop_after = False + for sub_layer_name in parent_layer._sub_layers: + if stop_after: + parent_layer._sub_layers[sub_layer_name] = Identity() + continue + if sub_layer_name == layer_name: + stop_after = True + + if layer_index and stop_after: + stop_after = False + for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers: + if stop_after: + parent_layer._sub_layers[layer_name][sub_layer_index] = Identity() + continue + if layer_index == sub_layer_index: + stop_after = True + + return stop_after + + +def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: + """parse the string type pattern. + + Args: + pattern (str): The pattern to discribe layer. + parent_layer (nn.Layer): The root layer relative to the pattern. + + Returns: + Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order: + [ + {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist}, + {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist}, + ... + ] + """ + + pattern_list = pattern.split(".") + if not pattern_list: + msg = f"The pattern('{pattern}') is illegal. Please check and retry." + return None + + layer_list = [] + while len(pattern_list) > 0: + if '[' in pattern_list[0]: + target_layer_name = pattern_list[0].split('[')[0] + target_layer_index = pattern_list[0].split('[')[1].split(']')[0] + else: + target_layer_name = pattern_list[0] + target_layer_index = None + + target_layer = getattr(parent_layer, target_layer_name, None) + + if target_layer is None: + msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')." + return None + + if target_layer_index and target_layer: + if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer): + msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0." + return None + + target_layer = target_layer[target_layer_index] + + layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index}) + + pattern_list = pattern_list[1:] + parent_layer = target_layer + return layer_list + + +MODEL_STAGES_PATTERN = {"PPLCNet": ["blocks2", "blocks3", "blocks4", "blocks5", "blocks6"]} + +# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se. +# k: kernel_size +# in_c: input channel number in depthwise block +# out_c: output channel number in depthwise block +# s: stride in depthwise block +# use_se: whether to use SE block + +NET_CONFIG = { + "blocks2": + #k, in_c, out_c, s, use_se + [[3, 16, 32, 1, False]], + "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]], + "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]], + "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], + [5, 256, 256, 1, False], [5, 256, 256, 1, False]], + "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]] +} + + +def make_divisible(v, divisor=8, min_value=None): + if min_value is None: + min_value = divisor + new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) + if new_v < 0.9 * v: + new_v += divisor + return new_v + + +class ConvBNLayer(TheseusLayer): + + def __init__(self, num_channels, filter_size, num_filters, stride, num_groups=1): + super().__init__() + + self.conv = Conv2D(in_channels=num_channels, + out_channels=num_filters, + kernel_size=filter_size, + stride=stride, + padding=(filter_size - 1) // 2, + groups=num_groups, + weight_attr=ParamAttr(initializer=KaimingNormal()), + bias_attr=False) + + self.bn = BatchNorm(num_filters, + param_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + self.hardswish = nn.Hardswish() + + def forward(self, x): + x = self.conv(x) + x = self.bn(x) + x = self.hardswish(x) + return x + + +class DepthwiseSeparable(TheseusLayer): + + def __init__(self, num_channels, num_filters, stride, dw_size=3, use_se=False): + super().__init__() + self.use_se = use_se + self.dw_conv = ConvBNLayer(num_channels=num_channels, + num_filters=num_channels, + filter_size=dw_size, + stride=stride, + num_groups=num_channels) + if use_se: + self.se = SEModule(num_channels) + self.pw_conv = ConvBNLayer(num_channels=num_channels, filter_size=1, num_filters=num_filters, stride=1) + + def forward(self, x): + x = self.dw_conv(x) + if self.use_se: + x = self.se(x) + x = self.pw_conv(x) + return x + + +class SEModule(TheseusLayer): + + def __init__(self, channel, reduction=4): + super().__init__() + self.avg_pool = AdaptiveAvgPool2D(1) + self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0) + self.relu = nn.ReLU() + self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0) + self.hardsigmoid = nn.Hardsigmoid() + + def forward(self, x): + identity = x + x = self.avg_pool(x) + x = self.conv1(x) + x = self.relu(x) + x = self.conv2(x) + x = self.hardsigmoid(x) + x = paddle.multiply(x=identity, y=x) + return x + + +class PPLCNet(TheseusLayer): + + def __init__(self, + stages_pattern, + scale=1.0, + class_num=1000, + dropout_prob=0.2, + class_expand=1280, + return_patterns=None, + return_stages=None): + super().__init__() + self.scale = scale + self.class_expand = class_expand + + self.conv1 = ConvBNLayer(num_channels=3, filter_size=3, num_filters=make_divisible(16 * scale), stride=2) + + self.blocks2 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"]) + ]) + + self.blocks3 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"]) + ]) + + self.blocks4 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"]) + ]) + + self.blocks5 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"]) + ]) + + self.blocks6 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"]) + ]) + + self.avg_pool = AdaptiveAvgPool2D(1) + + self.last_conv = Conv2D(in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale), + out_channels=self.class_expand, + kernel_size=1, + stride=1, + padding=0, + bias_attr=False) + + self.hardswish = nn.Hardswish() + self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer") + self.flatten = nn.Flatten(start_axis=1, stop_axis=-1) + + self.fc = Linear(self.class_expand, class_num) + + super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages) + + def forward(self, x): + x = self.conv1(x) + + x = self.blocks2(x) + x = self.blocks3(x) + x = self.blocks4(x) + x = self.blocks5(x) + x = self.blocks6(x) + + x = self.avg_pool(x) + x = self.last_conv(x) + x = self.hardswish(x) + x = self.dropout(x) + x = self.flatten(x) + x = self.fc(x) + return x + + +def PPLCNet_x0_5(pretrained=False, use_ssld=False, **kwargs): + model = PPLCNet(scale=0.5, stages_pattern=MODEL_STAGES_PATTERN["PPLCNet"], **kwargs) + return model diff --git a/modules/image/classification/pplcnet_x0_5_imagenet/module.py b/modules/image/classification/pplcnet_x0_5_imagenet/module.py new file mode 100644 index 0000000000000000000000000000000000000000..05ac64722efd510096c6c88a63fb56b65e3055a3 --- /dev/null +++ b/modules/image/classification/pplcnet_x0_5_imagenet/module.py @@ -0,0 +1,154 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import cv2 +import numpy as np +import paddle +from skimage.io import imread +from skimage.transform import rescale +from skimage.transform import resize + +import paddlehub as hub +from .model import PPLCNet_x0_5 +from .processor import base64_to_cv2 +from .processor import create_operators +from .processor import Topk +from .utils import get_config +from paddlehub.module.module import moduleinfo +from paddlehub.module.module import runnable +from paddlehub.module.module import serving + + +@moduleinfo(name="pplcnet_x0_5_imagenet", + type="cv/classification", + author="paddlepaddle", + author_email="", + summary="", + version="1.0.0") +class PPLcNet_x0_5: + + def __init__(self): + self.config = get_config(os.path.join(self.directory, 'PPLCNet_x0_5.yaml'), show=False) + self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt') + self.pretrain_path = os.path.join(self.directory, 'PPLCNet_x0_5_pretrained.pdparams') + self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path + self.model = PPLCNet_x0_5() + param_state_dict = paddle.load(self.pretrain_path) + self.model.set_dict(param_state_dict) + self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"]) + + def classification(self, + images: list = None, + paths: list = None, + batch_size: int = 1, + use_gpu: bool = False, + top_k: int = 1): + ''' + Args: + images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR. + paths (list[str]): The paths of images. + batch_size (int): batch size. + use_gpu (bool): Whether to use gpu. + top_k (int): Return top k results. + + Returns: + res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'. + ''' + postprocess_func = Topk(top_k, self.label_path) + inputs = [] + results = [] + paddle.disable_static() + place = 'gpu:0' if use_gpu else 'cpu' + place = paddle.set_device(place) + if images == None and paths == None: + print('No image provided. Please input an image or a image path.') + return + + if images != None: + for image in images: + image = image[:, :, ::-1] + inputs.append(image) + + if paths != None: + for path in paths: + image = cv2.imread(path)[:, :, ::-1] + inputs.append(image) + + batch_data = [] + for idx, imagedata in enumerate(inputs): + for process in self.preprocess_funcs: + imagedata = process(imagedata) + batch_data.append(imagedata) + if len(batch_data) >= batch_size or idx == len(inputs) - 1: + batch_tensor = paddle.to_tensor(batch_data) + out = self.model(batch_tensor) + if isinstance(out, list): + out = out[0] + if isinstance(out, dict) and "logits" in out: + out = out["logits"] + if isinstance(out, dict) and "output" in out: + out = out["output"] + result = postprocess_func(out) + results.extend(result) + batch_data.clear() + return results + + @runnable + def run_cmd(self, argvs: list): + """ + Run as a command. + """ + self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name), + prog='hub run {}'.format(self.name), + usage='%(prog)s', + add_help=True) + + self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required") + self.arg_config_group = self.parser.add_argument_group( + title="Config options", description="Run configuration for controlling module behavior, not required.") + self.add_module_config_arg() + self.add_module_input_arg() + self.args = self.parser.parse_args(argvs) + results = self.classification(paths=[self.args.input_path], + use_gpu=self.args.use_gpu, + batch_size=self.args.batch_size, + top_k=self.args.top_k) + return results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.classification(images=images_decode, **kwargs) + return results + + def add_module_config_arg(self): + """ + Add the command config options. + """ + self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not") + + self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size') + self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.') + + def add_module_input_arg(self): + """ + Add the command input options. + """ + self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.") diff --git a/modules/image/classification/pplcnet_x0_5_imagenet/processor.py b/modules/image/classification/pplcnet_x0_5_imagenet/processor.py new file mode 100644 index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8 --- /dev/null +++ b/modules/image/classification/pplcnet_x0_5_imagenet/processor.py @@ -0,0 +1,374 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import base64 +import inspect +import math +import os +import random +import sys +from functools import partial + +import cv2 +import numpy as np +import paddle +import paddle.nn.functional as F +import six +from paddle.vision.transforms import ColorJitter as RawColorJitter +from PIL import Image + + +def create_operators(params, class_num=None): + """ + create operators based on the config + + Args: + params(list): a dict list, used to create some operators + """ + assert isinstance(params, list), ('operator config should be a list') + ops = [] + current_module = sys.modules[__name__] + for operator in params: + assert isinstance(operator, dict) and len(operator) == 1, "yaml format error" + op_name = list(operator)[0] + param = {} if operator[op_name] is None else operator[op_name] + op_func = getattr(current_module, op_name) + if "class_num" in inspect.getfullargspec(op_func).args: + param.update({"class_num": class_num}) + op = op_func(**param) + ops.append(op) + + return ops + + +class UnifiedResize(object): + + def __init__(self, interpolation=None, backend="cv2"): + _cv2_interp_from_str = { + 'nearest': cv2.INTER_NEAREST, + 'bilinear': cv2.INTER_LINEAR, + 'area': cv2.INTER_AREA, + 'bicubic': cv2.INTER_CUBIC, + 'lanczos': cv2.INTER_LANCZOS4 + } + _pil_interp_from_str = { + 'nearest': Image.NEAREST, + 'bilinear': Image.BILINEAR, + 'bicubic': Image.BICUBIC, + 'box': Image.BOX, + 'lanczos': Image.LANCZOS, + 'hamming': Image.HAMMING + } + + def _pil_resize(src, size, resample): + pil_img = Image.fromarray(src) + pil_img = pil_img.resize(size, resample) + return np.asarray(pil_img) + + if backend.lower() == "cv2": + if isinstance(interpolation, str): + interpolation = _cv2_interp_from_str[interpolation.lower()] + # compatible with opencv < version 4.4.0 + elif interpolation is None: + interpolation = cv2.INTER_LINEAR + self.resize_func = partial(cv2.resize, interpolation=interpolation) + elif backend.lower() == "pil": + if isinstance(interpolation, str): + interpolation = _pil_interp_from_str[interpolation.lower()] + self.resize_func = partial(_pil_resize, resample=interpolation) + else: + self.resize_func = cv2.resize + + def __call__(self, src, size): + return self.resize_func(src, size) + + +class OperatorParamError(ValueError): + """ OperatorParamError + """ + pass + + +class DecodeImage(object): + """ decode image """ + + def __init__(self, to_rgb=True, to_np=False, channel_first=False): + self.to_rgb = to_rgb + self.to_np = to_np # to numpy + self.channel_first = channel_first # only enabled when to_np is True + + def __call__(self, img): + if six.PY2: + assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage" + else: + assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage" + data = np.frombuffer(img, dtype='uint8') + img = cv2.imdecode(data, 1) + if self.to_rgb: + assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape) + img = img[:, :, ::-1] + + if self.channel_first: + img = img.transpose((2, 0, 1)) + + return img + + +class ResizeImage(object): + """ resize image """ + + def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"): + if resize_short is not None and resize_short > 0: + self.resize_short = resize_short + self.w = None + self.h = None + elif size is not None: + self.resize_short = None + self.w = size if type(size) is int else size[0] + self.h = size if type(size) is int else size[1] + else: + raise OperatorParamError("invalid params for ReisizeImage for '\ + 'both 'size' and 'resize_short' are None") + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + img_h, img_w = img.shape[:2] + if self.resize_short is not None: + percent = float(self.resize_short) / min(img_w, img_h) + w = int(round(img_w * percent)) + h = int(round(img_h * percent)) + else: + w = self.w + h = self.h + return self._resize_func(img, (w, h)) + + +class CropImage(object): + """ crop image """ + + def __init__(self, size): + if type(size) is int: + self.size = (size, size) + else: + self.size = size # (h, w) + + def __call__(self, img): + w, h = self.size + img_h, img_w = img.shape[:2] + w_start = (img_w - w) // 2 + h_start = (img_h - h) // 2 + + w_end = w_start + w + h_end = h_start + h + return img[h_start:h_end, w_start:w_end, :] + + +class RandCropImage(object): + """ random crop image """ + + def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"): + if type(size) is int: + self.size = (size, size) # (h, w) + else: + self.size = size + + self.scale = [0.08, 1.0] if scale is None else scale + self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + size = self.size + scale = self.scale + ratio = self.ratio + + aspect_ratio = math.sqrt(random.uniform(*ratio)) + w = 1. * aspect_ratio + h = 1. / aspect_ratio + + img_h, img_w = img.shape[:2] + + bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2)) + scale_max = min(scale[1], bound) + scale_min = min(scale[0], bound) + + target_area = img_w * img_h * random.uniform(scale_min, scale_max) + target_size = math.sqrt(target_area) + w = int(target_size * w) + h = int(target_size * h) + + i = random.randint(0, img_w - w) + j = random.randint(0, img_h - h) + + img = img[j:j + h, i:i + w, :] + + return self._resize_func(img, size) + + +class RandFlipImage(object): + """ random flip image + flip_code: + 1: Flipped Horizontally + 0: Flipped Vertically + -1: Flipped Horizontally & Vertically + """ + + def __init__(self, flip_code=1): + assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]" + self.flip_code = flip_code + + def __call__(self, img): + if random.randint(0, 1) == 1: + return cv2.flip(img, self.flip_code) + else: + return img + + +class NormalizeImage(object): + """ normalize image such as substract mean, divide std + """ + + def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3): + if isinstance(scale, str): + scale = eval(scale) + assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4." + self.channel_num = channel_num + self.output_dtype = 'float16' if output_fp16 else 'float32' + self.scale = np.float32(scale if scale is not None else 1.0 / 255.0) + self.order = order + mean = mean if mean is not None else [0.485, 0.456, 0.406] + std = std if std is not None else [0.229, 0.224, 0.225] + + shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3) + self.mean = np.array(mean).reshape(shape).astype('float32') + self.std = np.array(std).reshape(shape).astype('float32') + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage" + + img = (img.astype('float32') * self.scale - self.mean) / self.std + + if self.channel_num == 4: + img_h = img.shape[1] if self.order == 'chw' else img.shape[0] + img_w = img.shape[2] if self.order == 'chw' else img.shape[1] + pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1)) + img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate( + (img, pad_zeros), axis=2)) + return img.astype(self.output_dtype) + + +class ToCHWImage(object): + """ convert hwc image to chw image + """ + + def __init__(self): + pass + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + return img.transpose((2, 0, 1)) + + +class ColorJitter(RawColorJitter): + """ColorJitter. + """ + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + def __call__(self, img): + if not isinstance(img, Image.Image): + img = np.ascontiguousarray(img) + img = Image.fromarray(img) + img = super()._apply_image(img) + if isinstance(img, Image.Image): + img = np.asarray(img) + return img + + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + +class Topk(object): + + def __init__(self, topk=1, class_id_map_file=None): + assert isinstance(topk, (int, )) + self.class_id_map = self.parse_class_id_map(class_id_map_file) + self.topk = topk + + def parse_class_id_map(self, class_id_map_file): + if class_id_map_file is None: + return None + if not os.path.exists(class_id_map_file): + print( + "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!" + ) + return None + + try: + class_id_map = {} + with open(class_id_map_file, "r") as fin: + lines = fin.readlines() + for line in lines: + partition = line.split("\n")[0].partition(" ") + class_id_map[int(partition[0])] = str(partition[-1]) + except Exception as ex: + print(ex) + class_id_map = None + return class_id_map + + def __call__(self, x, file_names=None, multilabel=False): + assert isinstance(x, paddle.Tensor) + if file_names is not None: + assert x.shape[0] == len(file_names) + x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x) + x = x.numpy() + y = [] + for idx, probs in enumerate(x): + index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where( + probs >= 0.5)[0].astype("int32") + clas_id_list = [] + score_list = [] + label_name_list = [] + for i in index: + clas_id_list.append(i.item()) + score_list.append(probs[i].item()) + if self.class_id_map is not None: + label_name_list.append(self.class_id_map[i.item()]) + result = { + "class_ids": clas_id_list, + "scores": np.around(score_list, decimals=5).tolist(), + } + if file_names is not None: + result["file_name"] = file_names[idx] + if label_name_list is not None: + result["label_names"] = label_name_list + y.append(result) + return y diff --git a/modules/image/classification/pplcnet_x0_5_imagenet/utils.py b/modules/image/classification/pplcnet_x0_5_imagenet/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537 --- /dev/null +++ b/modules/image/classification/pplcnet_x0_5_imagenet/utils.py @@ -0,0 +1,129 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import yaml + +__all__ = ['get_config'] + + +class AttrDict(dict): + + def __getattr__(self, key): + return self[key] + + def __setattr__(self, key, value): + if key in self.__dict__: + self.__dict__[key] = value + else: + self[key] = value + + def __deepcopy__(self, content): + return copy.deepcopy(dict(self)) + + +def create_attr_dict(yaml_config): + from ast import literal_eval + for key, value in yaml_config.items(): + if type(value) is dict: + yaml_config[key] = value = AttrDict(value) + if isinstance(value, str): + try: + value = literal_eval(value) + except BaseException: + pass + if isinstance(value, AttrDict): + create_attr_dict(yaml_config[key]) + else: + yaml_config[key] = value + + +def parse_config(cfg_file): + """Load a config file into AttrDict""" + with open(cfg_file, 'r') as fopen: + yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader)) + create_attr_dict(yaml_config) + return yaml_config + + +def override(dl, ks, v): + """ + Recursively replace dict of list + Args: + dl(dict or list): dict or list to be replaced + ks(list): list of keys + v(str): value to be replaced + """ + + def str2num(v): + try: + return eval(v) + except Exception: + return v + + assert isinstance(dl, (list, dict)), ("{} should be a list or a dict") + assert len(ks) > 0, ('lenght of keys should larger than 0') + if isinstance(dl, list): + k = str2num(ks[0]) + if len(ks) == 1: + assert k < len(dl), ('index({}) out of range({})'.format(k, dl)) + dl[k] = str2num(v) + else: + override(dl[k], ks[1:], v) + else: + if len(ks) == 1: + # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl)) + if not ks[0] in dl: + print('A new filed ({}) detected!'.format(ks[0], dl)) + dl[ks[0]] = str2num(v) + else: + override(dl[ks[0]], ks[1:], v) + + +def override_config(config, options=None): + """ + Recursively override the config + Args: + config(dict): dict to be replaced + options(list): list of pairs(key0.key1.idx.key2=value) + such as: [ + 'topk=2', + 'VALID.transforms.1.ResizeImage.resize_short=300' + ] + Returns: + config(dict): replaced config + """ + if options is not None: + for opt in options: + assert isinstance(opt, str), ("option({}) should be a str".format(opt)) + assert "=" in opt, ("option({}) should contain a =" + "to distinguish between key and value".format(opt)) + pair = opt.split('=') + assert len(pair) == 2, ("there can be only a = in the option") + key, value = pair + keys = key.split('.') + override(config, keys, value) + return config + + +def get_config(fname, overrides=None, show=False): + """ + Read config from file + """ + assert os.path.exists(fname), ('config file({}) is not exist'.format(fname)) + config = parse_config(fname) + override_config(config, overrides) + return config diff --git a/modules/image/classification/pplcnet_x0_75_imagenet/README.md b/modules/image/classification/pplcnet_x0_75_imagenet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..72c8c072d86af617eda317d9a03cd83978153763 --- /dev/null +++ b/modules/image/classification/pplcnet_x0_75_imagenet/README.md @@ -0,0 +1,132 @@ +# pplcnet_x0_75_imagenet + +|模型名称|pplcnet_x0_75_imagenet| +| :--- | :---: | +|类别|图像-图像分类| +|网络|PPLCNet| +|数据集|ImageNet-2012| +|是否支持Fine-tuning|否| +|模型大小|9 MB| +|最新更新日期|2022-04-02| +|数据指标|Acc| + + +## 一、模型基本信息 + + + +- ### 模型介绍 + + - PP-LCNet是百度针对Intel CPU 设备以及其加速库 MKLDNN 设计的特定骨干网络 ,比起其他的轻量级的 SOTA 模型,该骨干网络可以在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型。该模型为模型规模参数scale为x0.75下的PP-LCNet模型,关于模型结构的更多信息,可参考[论文](https://arxiv.org/pdf/2109.15099.pdf)。 + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 1.6.2 + + - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + +- ### 2、安装 + + - ```shell + $ hub install pplcnet_x0_75_imagenet + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run pplcnet_x0_75_imagenet --input_path "/PATH/TO/IMAGE" + ``` + - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、预测代码示例 + + - ```python + import paddlehub as hub + import cv2 + + classifier = hub.Module(name="pplcnet_x0_75_imagenet") + result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')]) + # or + # result = classifier.classification(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + + - ```python + def classification(images=None, + paths=None, + batch_size=1, + use_gpu=False, + top_k=1): + ``` + - 分类接口API。 + - **参数** + + - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。 + + - **返回** + + - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称) + + +## 四、服务部署 + +- PaddleHub Serving可以部署一个图像识别的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m pplcnet_x0_75_imagenet + ``` + + - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"\} + url = "http://127.0.0.1:8866/predict/pplcnet_x0_75_imagenet" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` + + +## 五、更新历史 + +* 1.0.0 + + 初始发布 + + - ```shell + $ hub install pplcnet_x0_75_imagenet==1.0.0 + ``` diff --git a/modules/image/classification/pplcnet_x0_75_imagenet/model.py b/modules/image/classification/pplcnet_x0_75_imagenet/model.py new file mode 100644 index 0000000000000000000000000000000000000000..df546e13b47c0a9a3c64dc44b46ffbcdf326e7fd --- /dev/null +++ b/modules/image/classification/pplcnet_x0_75_imagenet/model.py @@ -0,0 +1,478 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from typing import Any +from typing import Callable +from typing import Dict +from typing import List +from typing import Tuple +from typing import Union + +import paddle +import paddle.nn as nn +from paddle import ParamAttr +from paddle.nn import AdaptiveAvgPool2D +from paddle.nn import BatchNorm +from paddle.nn import Conv2D +from paddle.nn import Dropout +from paddle.nn import Linear +from paddle.nn.initializer import KaimingNormal +from paddle.regularizer import L2Decay + + +class Identity(nn.Layer): + + def __init__(self): + super(Identity, self).__init__() + + def forward(self, inputs): + return inputs + + +class TheseusLayer(nn.Layer): + + def __init__(self, *args, **kwargs): + super(TheseusLayer, self).__init__() + self.res_dict = {} + self.res_name = self.full_name() + self.pruner = None + self.quanter = None + + def _return_dict_hook(self, layer, input, output): + res_dict = {"output": output} + # 'list' is needed to avoid error raised by popping self.res_dict + for res_key in list(self.res_dict): + # clear the res_dict because the forward process may change according to input + res_dict[res_key] = self.res_dict.pop(res_key) + return res_dict + + def init_res(self, stages_pattern, return_patterns=None, return_stages=None): + if return_patterns and return_stages: + msg = f"The 'return_patterns' would be ignored when 'return_stages' is set." + return_stages = None + + if return_stages is True: + return_patterns = stages_pattern + # return_stages is int or bool + if type(return_stages) is int: + return_stages = [return_stages] + if isinstance(return_stages, list): + if max(return_stages) > len(stages_pattern) or min(return_stages) < 0: + msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}." + return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)] + return_patterns = [stages_pattern[i] for i in return_stages] + + if return_patterns: + self.update_res(return_patterns) + + def replace_sub(self, *args, **kwargs) -> None: + msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead." + raise DeprecationWarning(msg) + + def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]], + handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]: + """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'. + + Args: + layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'. + handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed. + + Returns: + Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'. + + Examples: + + from paddle import nn + import paddleclas + + def rep_func(layer: nn.Layer, pattern: str): + new_layer = nn.Conv2D( + in_channels=layer._in_channels, + out_channels=layer._out_channels, + kernel_size=5, + padding=2 + ) + return new_layer + + net = paddleclas.MobileNetV1() + res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func) + print(res) + # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer} + """ + + if not isinstance(layer_name_pattern, list): + layer_name_pattern = [layer_name_pattern] + + hit_layer_pattern_list = [] + for pattern in layer_name_pattern: + # parse pattern to find target layer and its parent + layer_list = parse_pattern_str(pattern=pattern, parent_layer=self) + if not layer_list: + continue + sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self + + sub_layer = layer_list[-1]["layer"] + sub_layer_name = layer_list[-1]["name"] + sub_layer_index = layer_list[-1]["index"] + + new_sub_layer = handle_func(sub_layer, pattern) + + if sub_layer_index: + getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer + else: + setattr(sub_layer_parent, sub_layer_name, new_sub_layer) + + hit_layer_pattern_list.append(pattern) + return hit_layer_pattern_list + + def stop_after(self, stop_layer_name: str) -> bool: + """stop forward and backward after 'stop_layer_name'. + + Args: + stop_layer_name (str): The name of layer that stop forward and backward after this layer. + + Returns: + bool: 'True' if successful, 'False' otherwise. + """ + + layer_list = parse_pattern_str(stop_layer_name, self) + if not layer_list: + return False + + parent_layer = self + for layer_dict in layer_list: + name, index = layer_dict["name"], layer_dict["index"] + if not set_identity(parent_layer, name, index): + msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'." + return False + parent_layer = layer_dict["layer"] + + return True + + def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]: + """update the result(s) to be returned. + + Args: + return_patterns (Union[str, List[str]]): The name of layer to return output. + + Returns: + Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully. + """ + + # clear res_dict that could have been set + self.res_dict = {} + + class Handler(object): + + def __init__(self, res_dict): + # res_dict is a reference + self.res_dict = res_dict + + def __call__(self, layer, pattern): + layer.res_dict = self.res_dict + layer.res_name = pattern + if hasattr(layer, "hook_remove_helper"): + layer.hook_remove_helper.remove() + layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook) + return layer + + handle_func = Handler(self.res_dict) + + hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func) + + if hasattr(self, "hook_remove_helper"): + self.hook_remove_helper.remove() + self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook) + + return hit_layer_pattern_list + + +def save_sub_res_hook(layer, input, output): + layer.res_dict[layer.res_name] = output + + +def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool: + """set the layer specified by layer_name and layer_index to Indentity. + + Args: + parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index. + layer_name (str): The name of target layer to be set to Indentity. + layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None. + + Returns: + bool: True if successfully, False otherwise. + """ + + stop_after = False + for sub_layer_name in parent_layer._sub_layers: + if stop_after: + parent_layer._sub_layers[sub_layer_name] = Identity() + continue + if sub_layer_name == layer_name: + stop_after = True + + if layer_index and stop_after: + stop_after = False + for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers: + if stop_after: + parent_layer._sub_layers[layer_name][sub_layer_index] = Identity() + continue + if layer_index == sub_layer_index: + stop_after = True + + return stop_after + + +def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: + """parse the string type pattern. + + Args: + pattern (str): The pattern to discribe layer. + parent_layer (nn.Layer): The root layer relative to the pattern. + + Returns: + Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order: + [ + {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist}, + {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist}, + ... + ] + """ + + pattern_list = pattern.split(".") + if not pattern_list: + msg = f"The pattern('{pattern}') is illegal. Please check and retry." + return None + + layer_list = [] + while len(pattern_list) > 0: + if '[' in pattern_list[0]: + target_layer_name = pattern_list[0].split('[')[0] + target_layer_index = pattern_list[0].split('[')[1].split(']')[0] + else: + target_layer_name = pattern_list[0] + target_layer_index = None + + target_layer = getattr(parent_layer, target_layer_name, None) + + if target_layer is None: + msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')." + return None + + if target_layer_index and target_layer: + if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer): + msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0." + return None + + target_layer = target_layer[target_layer_index] + + layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index}) + + pattern_list = pattern_list[1:] + parent_layer = target_layer + return layer_list + + +MODEL_STAGES_PATTERN = {"PPLCNet": ["blocks2", "blocks3", "blocks4", "blocks5", "blocks6"]} + +# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se. +# k: kernel_size +# in_c: input channel number in depthwise block +# out_c: output channel number in depthwise block +# s: stride in depthwise block +# use_se: whether to use SE block + +NET_CONFIG = { + "blocks2": + #k, in_c, out_c, s, use_se + [[3, 16, 32, 1, False]], + "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]], + "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]], + "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], + [5, 256, 256, 1, False], [5, 256, 256, 1, False]], + "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]] +} + + +def make_divisible(v, divisor=8, min_value=None): + if min_value is None: + min_value = divisor + new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) + if new_v < 0.9 * v: + new_v += divisor + return new_v + + +class ConvBNLayer(TheseusLayer): + + def __init__(self, num_channels, filter_size, num_filters, stride, num_groups=1): + super().__init__() + + self.conv = Conv2D(in_channels=num_channels, + out_channels=num_filters, + kernel_size=filter_size, + stride=stride, + padding=(filter_size - 1) // 2, + groups=num_groups, + weight_attr=ParamAttr(initializer=KaimingNormal()), + bias_attr=False) + + self.bn = BatchNorm(num_filters, + param_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + self.hardswish = nn.Hardswish() + + def forward(self, x): + x = self.conv(x) + x = self.bn(x) + x = self.hardswish(x) + return x + + +class DepthwiseSeparable(TheseusLayer): + + def __init__(self, num_channels, num_filters, stride, dw_size=3, use_se=False): + super().__init__() + self.use_se = use_se + self.dw_conv = ConvBNLayer(num_channels=num_channels, + num_filters=num_channels, + filter_size=dw_size, + stride=stride, + num_groups=num_channels) + if use_se: + self.se = SEModule(num_channels) + self.pw_conv = ConvBNLayer(num_channels=num_channels, filter_size=1, num_filters=num_filters, stride=1) + + def forward(self, x): + x = self.dw_conv(x) + if self.use_se: + x = self.se(x) + x = self.pw_conv(x) + return x + + +class SEModule(TheseusLayer): + + def __init__(self, channel, reduction=4): + super().__init__() + self.avg_pool = AdaptiveAvgPool2D(1) + self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0) + self.relu = nn.ReLU() + self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0) + self.hardsigmoid = nn.Hardsigmoid() + + def forward(self, x): + identity = x + x = self.avg_pool(x) + x = self.conv1(x) + x = self.relu(x) + x = self.conv2(x) + x = self.hardsigmoid(x) + x = paddle.multiply(x=identity, y=x) + return x + + +class PPLCNet(TheseusLayer): + + def __init__(self, + stages_pattern, + scale=1.0, + class_num=1000, + dropout_prob=0.2, + class_expand=1280, + return_patterns=None, + return_stages=None): + super().__init__() + self.scale = scale + self.class_expand = class_expand + + self.conv1 = ConvBNLayer(num_channels=3, filter_size=3, num_filters=make_divisible(16 * scale), stride=2) + + self.blocks2 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"]) + ]) + + self.blocks3 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"]) + ]) + + self.blocks4 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"]) + ]) + + self.blocks5 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"]) + ]) + + self.blocks6 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"]) + ]) + + self.avg_pool = AdaptiveAvgPool2D(1) + + self.last_conv = Conv2D(in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale), + out_channels=self.class_expand, + kernel_size=1, + stride=1, + padding=0, + bias_attr=False) + + self.hardswish = nn.Hardswish() + self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer") + self.flatten = nn.Flatten(start_axis=1, stop_axis=-1) + + self.fc = Linear(self.class_expand, class_num) + + super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages) + + def forward(self, x): + x = self.conv1(x) + + x = self.blocks2(x) + x = self.blocks3(x) + x = self.blocks4(x) + x = self.blocks5(x) + x = self.blocks6(x) + + x = self.avg_pool(x) + x = self.last_conv(x) + x = self.hardswish(x) + x = self.dropout(x) + x = self.flatten(x) + x = self.fc(x) + return x + + +def PPLCNet_x0_75(pretrained=False, use_ssld=False, **kwargs): + model = PPLCNet(scale=0.75, stages_pattern=MODEL_STAGES_PATTERN["PPLCNet"], **kwargs) + return model diff --git a/modules/image/classification/pplcnet_x0_75_imagenet/module.py b/modules/image/classification/pplcnet_x0_75_imagenet/module.py new file mode 100644 index 0000000000000000000000000000000000000000..7ce6c2eaca491c21266d87307110f032d92e007a --- /dev/null +++ b/modules/image/classification/pplcnet_x0_75_imagenet/module.py @@ -0,0 +1,154 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import cv2 +import numpy as np +import paddle +from skimage.io import imread +from skimage.transform import rescale +from skimage.transform import resize + +import paddlehub as hub +from .model import PPLCNet_x0_75 +from .processor import base64_to_cv2 +from .processor import create_operators +from .processor import Topk +from .utils import get_config +from paddlehub.module.module import moduleinfo +from paddlehub.module.module import runnable +from paddlehub.module.module import serving + + +@moduleinfo(name="pplcnet_x0_75_imagenet", + type="cv/classification", + author="paddlepaddle", + author_email="", + summary="", + version="1.0.0") +class PPLcNet_x0_75: + + def __init__(self): + self.config = get_config(os.path.join(self.directory, 'PPLCNet_x0_75.yaml'), show=False) + self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt') + self.pretrain_path = os.path.join(self.directory, 'PPLCNet_x0_75_pretrained.pdparams') + self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path + self.model = PPLCNet_x0_75() + param_state_dict = paddle.load(self.pretrain_path) + self.model.set_dict(param_state_dict) + self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"]) + + def classification(self, + images: list = None, + paths: list = None, + batch_size: int = 1, + use_gpu: bool = False, + top_k: int = 1): + ''' + Args: + images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR. + paths (list[str]): The paths of images. + batch_size (int): batch size. + use_gpu (bool): Whether to use gpu. + top_k (int): Return top k results. + + Returns: + res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'. + ''' + postprocess_func = Topk(top_k, self.label_path) + inputs = [] + results = [] + paddle.disable_static() + place = 'gpu:0' if use_gpu else 'cpu' + place = paddle.set_device(place) + if images == None and paths == None: + print('No image provided. Please input an image or a image path.') + return + + if images != None: + for image in images: + image = image[:, :, ::-1] + inputs.append(image) + + if paths != None: + for path in paths: + image = cv2.imread(path)[:, :, ::-1] + inputs.append(image) + + batch_data = [] + for idx, imagedata in enumerate(inputs): + for process in self.preprocess_funcs: + imagedata = process(imagedata) + batch_data.append(imagedata) + if len(batch_data) >= batch_size or idx == len(inputs) - 1: + batch_tensor = paddle.to_tensor(batch_data) + out = self.model(batch_tensor) + if isinstance(out, list): + out = out[0] + if isinstance(out, dict) and "logits" in out: + out = out["logits"] + if isinstance(out, dict) and "output" in out: + out = out["output"] + result = postprocess_func(out) + results.extend(result) + batch_data.clear() + return results + + @runnable + def run_cmd(self, argvs: list): + """ + Run as a command. + """ + self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name), + prog='hub run {}'.format(self.name), + usage='%(prog)s', + add_help=True) + + self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required") + self.arg_config_group = self.parser.add_argument_group( + title="Config options", description="Run configuration for controlling module behavior, not required.") + self.add_module_config_arg() + self.add_module_input_arg() + self.args = self.parser.parse_args(argvs) + results = self.classification(paths=[self.args.input_path], + use_gpu=self.args.use_gpu, + batch_size=self.args.batch_size, + top_k=self.args.top_k) + return results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.classification(images=images_decode, **kwargs) + return results + + def add_module_config_arg(self): + """ + Add the command config options. + """ + self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not") + + self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size') + self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.') + + def add_module_input_arg(self): + """ + Add the command input options. + """ + self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.") diff --git a/modules/image/classification/pplcnet_x0_75_imagenet/processor.py b/modules/image/classification/pplcnet_x0_75_imagenet/processor.py new file mode 100644 index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8 --- /dev/null +++ b/modules/image/classification/pplcnet_x0_75_imagenet/processor.py @@ -0,0 +1,374 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import base64 +import inspect +import math +import os +import random +import sys +from functools import partial + +import cv2 +import numpy as np +import paddle +import paddle.nn.functional as F +import six +from paddle.vision.transforms import ColorJitter as RawColorJitter +from PIL import Image + + +def create_operators(params, class_num=None): + """ + create operators based on the config + + Args: + params(list): a dict list, used to create some operators + """ + assert isinstance(params, list), ('operator config should be a list') + ops = [] + current_module = sys.modules[__name__] + for operator in params: + assert isinstance(operator, dict) and len(operator) == 1, "yaml format error" + op_name = list(operator)[0] + param = {} if operator[op_name] is None else operator[op_name] + op_func = getattr(current_module, op_name) + if "class_num" in inspect.getfullargspec(op_func).args: + param.update({"class_num": class_num}) + op = op_func(**param) + ops.append(op) + + return ops + + +class UnifiedResize(object): + + def __init__(self, interpolation=None, backend="cv2"): + _cv2_interp_from_str = { + 'nearest': cv2.INTER_NEAREST, + 'bilinear': cv2.INTER_LINEAR, + 'area': cv2.INTER_AREA, + 'bicubic': cv2.INTER_CUBIC, + 'lanczos': cv2.INTER_LANCZOS4 + } + _pil_interp_from_str = { + 'nearest': Image.NEAREST, + 'bilinear': Image.BILINEAR, + 'bicubic': Image.BICUBIC, + 'box': Image.BOX, + 'lanczos': Image.LANCZOS, + 'hamming': Image.HAMMING + } + + def _pil_resize(src, size, resample): + pil_img = Image.fromarray(src) + pil_img = pil_img.resize(size, resample) + return np.asarray(pil_img) + + if backend.lower() == "cv2": + if isinstance(interpolation, str): + interpolation = _cv2_interp_from_str[interpolation.lower()] + # compatible with opencv < version 4.4.0 + elif interpolation is None: + interpolation = cv2.INTER_LINEAR + self.resize_func = partial(cv2.resize, interpolation=interpolation) + elif backend.lower() == "pil": + if isinstance(interpolation, str): + interpolation = _pil_interp_from_str[interpolation.lower()] + self.resize_func = partial(_pil_resize, resample=interpolation) + else: + self.resize_func = cv2.resize + + def __call__(self, src, size): + return self.resize_func(src, size) + + +class OperatorParamError(ValueError): + """ OperatorParamError + """ + pass + + +class DecodeImage(object): + """ decode image """ + + def __init__(self, to_rgb=True, to_np=False, channel_first=False): + self.to_rgb = to_rgb + self.to_np = to_np # to numpy + self.channel_first = channel_first # only enabled when to_np is True + + def __call__(self, img): + if six.PY2: + assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage" + else: + assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage" + data = np.frombuffer(img, dtype='uint8') + img = cv2.imdecode(data, 1) + if self.to_rgb: + assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape) + img = img[:, :, ::-1] + + if self.channel_first: + img = img.transpose((2, 0, 1)) + + return img + + +class ResizeImage(object): + """ resize image """ + + def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"): + if resize_short is not None and resize_short > 0: + self.resize_short = resize_short + self.w = None + self.h = None + elif size is not None: + self.resize_short = None + self.w = size if type(size) is int else size[0] + self.h = size if type(size) is int else size[1] + else: + raise OperatorParamError("invalid params for ReisizeImage for '\ + 'both 'size' and 'resize_short' are None") + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + img_h, img_w = img.shape[:2] + if self.resize_short is not None: + percent = float(self.resize_short) / min(img_w, img_h) + w = int(round(img_w * percent)) + h = int(round(img_h * percent)) + else: + w = self.w + h = self.h + return self._resize_func(img, (w, h)) + + +class CropImage(object): + """ crop image """ + + def __init__(self, size): + if type(size) is int: + self.size = (size, size) + else: + self.size = size # (h, w) + + def __call__(self, img): + w, h = self.size + img_h, img_w = img.shape[:2] + w_start = (img_w - w) // 2 + h_start = (img_h - h) // 2 + + w_end = w_start + w + h_end = h_start + h + return img[h_start:h_end, w_start:w_end, :] + + +class RandCropImage(object): + """ random crop image """ + + def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"): + if type(size) is int: + self.size = (size, size) # (h, w) + else: + self.size = size + + self.scale = [0.08, 1.0] if scale is None else scale + self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + size = self.size + scale = self.scale + ratio = self.ratio + + aspect_ratio = math.sqrt(random.uniform(*ratio)) + w = 1. * aspect_ratio + h = 1. / aspect_ratio + + img_h, img_w = img.shape[:2] + + bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2)) + scale_max = min(scale[1], bound) + scale_min = min(scale[0], bound) + + target_area = img_w * img_h * random.uniform(scale_min, scale_max) + target_size = math.sqrt(target_area) + w = int(target_size * w) + h = int(target_size * h) + + i = random.randint(0, img_w - w) + j = random.randint(0, img_h - h) + + img = img[j:j + h, i:i + w, :] + + return self._resize_func(img, size) + + +class RandFlipImage(object): + """ random flip image + flip_code: + 1: Flipped Horizontally + 0: Flipped Vertically + -1: Flipped Horizontally & Vertically + """ + + def __init__(self, flip_code=1): + assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]" + self.flip_code = flip_code + + def __call__(self, img): + if random.randint(0, 1) == 1: + return cv2.flip(img, self.flip_code) + else: + return img + + +class NormalizeImage(object): + """ normalize image such as substract mean, divide std + """ + + def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3): + if isinstance(scale, str): + scale = eval(scale) + assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4." + self.channel_num = channel_num + self.output_dtype = 'float16' if output_fp16 else 'float32' + self.scale = np.float32(scale if scale is not None else 1.0 / 255.0) + self.order = order + mean = mean if mean is not None else [0.485, 0.456, 0.406] + std = std if std is not None else [0.229, 0.224, 0.225] + + shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3) + self.mean = np.array(mean).reshape(shape).astype('float32') + self.std = np.array(std).reshape(shape).astype('float32') + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage" + + img = (img.astype('float32') * self.scale - self.mean) / self.std + + if self.channel_num == 4: + img_h = img.shape[1] if self.order == 'chw' else img.shape[0] + img_w = img.shape[2] if self.order == 'chw' else img.shape[1] + pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1)) + img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate( + (img, pad_zeros), axis=2)) + return img.astype(self.output_dtype) + + +class ToCHWImage(object): + """ convert hwc image to chw image + """ + + def __init__(self): + pass + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + return img.transpose((2, 0, 1)) + + +class ColorJitter(RawColorJitter): + """ColorJitter. + """ + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + def __call__(self, img): + if not isinstance(img, Image.Image): + img = np.ascontiguousarray(img) + img = Image.fromarray(img) + img = super()._apply_image(img) + if isinstance(img, Image.Image): + img = np.asarray(img) + return img + + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + +class Topk(object): + + def __init__(self, topk=1, class_id_map_file=None): + assert isinstance(topk, (int, )) + self.class_id_map = self.parse_class_id_map(class_id_map_file) + self.topk = topk + + def parse_class_id_map(self, class_id_map_file): + if class_id_map_file is None: + return None + if not os.path.exists(class_id_map_file): + print( + "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!" + ) + return None + + try: + class_id_map = {} + with open(class_id_map_file, "r") as fin: + lines = fin.readlines() + for line in lines: + partition = line.split("\n")[0].partition(" ") + class_id_map[int(partition[0])] = str(partition[-1]) + except Exception as ex: + print(ex) + class_id_map = None + return class_id_map + + def __call__(self, x, file_names=None, multilabel=False): + assert isinstance(x, paddle.Tensor) + if file_names is not None: + assert x.shape[0] == len(file_names) + x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x) + x = x.numpy() + y = [] + for idx, probs in enumerate(x): + index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where( + probs >= 0.5)[0].astype("int32") + clas_id_list = [] + score_list = [] + label_name_list = [] + for i in index: + clas_id_list.append(i.item()) + score_list.append(probs[i].item()) + if self.class_id_map is not None: + label_name_list.append(self.class_id_map[i.item()]) + result = { + "class_ids": clas_id_list, + "scores": np.around(score_list, decimals=5).tolist(), + } + if file_names is not None: + result["file_name"] = file_names[idx] + if label_name_list is not None: + result["label_names"] = label_name_list + y.append(result) + return y diff --git a/modules/image/classification/pplcnet_x0_75_imagenet/utils.py b/modules/image/classification/pplcnet_x0_75_imagenet/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537 --- /dev/null +++ b/modules/image/classification/pplcnet_x0_75_imagenet/utils.py @@ -0,0 +1,129 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import yaml + +__all__ = ['get_config'] + + +class AttrDict(dict): + + def __getattr__(self, key): + return self[key] + + def __setattr__(self, key, value): + if key in self.__dict__: + self.__dict__[key] = value + else: + self[key] = value + + def __deepcopy__(self, content): + return copy.deepcopy(dict(self)) + + +def create_attr_dict(yaml_config): + from ast import literal_eval + for key, value in yaml_config.items(): + if type(value) is dict: + yaml_config[key] = value = AttrDict(value) + if isinstance(value, str): + try: + value = literal_eval(value) + except BaseException: + pass + if isinstance(value, AttrDict): + create_attr_dict(yaml_config[key]) + else: + yaml_config[key] = value + + +def parse_config(cfg_file): + """Load a config file into AttrDict""" + with open(cfg_file, 'r') as fopen: + yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader)) + create_attr_dict(yaml_config) + return yaml_config + + +def override(dl, ks, v): + """ + Recursively replace dict of list + Args: + dl(dict or list): dict or list to be replaced + ks(list): list of keys + v(str): value to be replaced + """ + + def str2num(v): + try: + return eval(v) + except Exception: + return v + + assert isinstance(dl, (list, dict)), ("{} should be a list or a dict") + assert len(ks) > 0, ('lenght of keys should larger than 0') + if isinstance(dl, list): + k = str2num(ks[0]) + if len(ks) == 1: + assert k < len(dl), ('index({}) out of range({})'.format(k, dl)) + dl[k] = str2num(v) + else: + override(dl[k], ks[1:], v) + else: + if len(ks) == 1: + # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl)) + if not ks[0] in dl: + print('A new filed ({}) detected!'.format(ks[0], dl)) + dl[ks[0]] = str2num(v) + else: + override(dl[ks[0]], ks[1:], v) + + +def override_config(config, options=None): + """ + Recursively override the config + Args: + config(dict): dict to be replaced + options(list): list of pairs(key0.key1.idx.key2=value) + such as: [ + 'topk=2', + 'VALID.transforms.1.ResizeImage.resize_short=300' + ] + Returns: + config(dict): replaced config + """ + if options is not None: + for opt in options: + assert isinstance(opt, str), ("option({}) should be a str".format(opt)) + assert "=" in opt, ("option({}) should contain a =" + "to distinguish between key and value".format(opt)) + pair = opt.split('=') + assert len(pair) == 2, ("there can be only a = in the option") + key, value = pair + keys = key.split('.') + override(config, keys, value) + return config + + +def get_config(fname, overrides=None, show=False): + """ + Read config from file + """ + assert os.path.exists(fname), ('config file({}) is not exist'.format(fname)) + config = parse_config(fname) + override_config(config, overrides) + return config diff --git a/modules/image/classification/pplcnet_x1_0_imagenet/README.md b/modules/image/classification/pplcnet_x1_0_imagenet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..22dc1b235ddca2121d42042861ada9e618fcb0ab --- /dev/null +++ b/modules/image/classification/pplcnet_x1_0_imagenet/README.md @@ -0,0 +1,132 @@ +# pplcnet_x1_0_imagenet + +|模型名称|pplcnet_x1_0_imagenet| +| :--- | :---: | +|类别|图像-图像分类| +|网络|PPLCNet| +|数据集|ImageNet-2012| +|是否支持Fine-tuning|否| +|模型大小|11 MB| +|最新更新日期|2022-04-02| +|数据指标|Acc| + + +## 一、模型基本信息 + + + +- ### 模型介绍 + + - PP-LCNet是百度针对Intel CPU 设备以及其加速库 MKLDNN 设计的特定骨干网络 ,比起其他的轻量级的 SOTA 模型,该骨干网络可以在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型。该模型为模型规模参数scale为x1.0下的PP-LCNet模型,关于模型结构的更多信息,可参考[论文](https://arxiv.org/pdf/2109.15099.pdf)。 + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 1.6.2 + + - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + +- ### 2、安装 + + - ```shell + $ hub install pplcnet_x1_0_imagenet + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run pplcnet_x1_0_imagenet --input_path "/PATH/TO/IMAGE" + ``` + - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、预测代码示例 + + - ```python + import paddlehub as hub + import cv2 + + classifier = hub.Module(name="pplcnet_x1_0_imagenet") + result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')]) + # or + # result = classifier.classification(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + + - ```python + def classification(images=None, + paths=None, + batch_size=1, + use_gpu=False, + top_k=1): + ``` + - 分类接口API。 + - **参数** + + - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。 + + - **返回** + + - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称) + + +## 四、服务部署 + +- PaddleHub Serving可以部署一个图像识别的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m pplcnet_x1_0_imagenet + ``` + + - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"\} + url = "http://127.0.0.1:8866/predict/pplcnet_x1_0_imagenet" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` + + +## 五、更新历史 + +* 1.0.0 + + 初始发布 + + - ```shell + $ hub install pplcnet_x1_0_imagenet==1.0.0 + ``` diff --git a/modules/image/classification/pplcnet_x1_0_imagenet/model.py b/modules/image/classification/pplcnet_x1_0_imagenet/model.py new file mode 100644 index 0000000000000000000000000000000000000000..a69f326d8d58263bcf02c2857db3d85bf738cf7b --- /dev/null +++ b/modules/image/classification/pplcnet_x1_0_imagenet/model.py @@ -0,0 +1,478 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from typing import Any +from typing import Callable +from typing import Dict +from typing import List +from typing import Tuple +from typing import Union + +import paddle +import paddle.nn as nn +from paddle import ParamAttr +from paddle.nn import AdaptiveAvgPool2D +from paddle.nn import BatchNorm +from paddle.nn import Conv2D +from paddle.nn import Dropout +from paddle.nn import Linear +from paddle.nn.initializer import KaimingNormal +from paddle.regularizer import L2Decay + + +class Identity(nn.Layer): + + def __init__(self): + super(Identity, self).__init__() + + def forward(self, inputs): + return inputs + + +class TheseusLayer(nn.Layer): + + def __init__(self, *args, **kwargs): + super(TheseusLayer, self).__init__() + self.res_dict = {} + self.res_name = self.full_name() + self.pruner = None + self.quanter = None + + def _return_dict_hook(self, layer, input, output): + res_dict = {"output": output} + # 'list' is needed to avoid error raised by popping self.res_dict + for res_key in list(self.res_dict): + # clear the res_dict because the forward process may change according to input + res_dict[res_key] = self.res_dict.pop(res_key) + return res_dict + + def init_res(self, stages_pattern, return_patterns=None, return_stages=None): + if return_patterns and return_stages: + msg = f"The 'return_patterns' would be ignored when 'return_stages' is set." + return_stages = None + + if return_stages is True: + return_patterns = stages_pattern + # return_stages is int or bool + if type(return_stages) is int: + return_stages = [return_stages] + if isinstance(return_stages, list): + if max(return_stages) > len(stages_pattern) or min(return_stages) < 0: + msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}." + return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)] + return_patterns = [stages_pattern[i] for i in return_stages] + + if return_patterns: + self.update_res(return_patterns) + + def replace_sub(self, *args, **kwargs) -> None: + msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead." + raise DeprecationWarning(msg) + + def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]], + handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]: + """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'. + + Args: + layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'. + handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed. + + Returns: + Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'. + + Examples: + + from paddle import nn + import paddleclas + + def rep_func(layer: nn.Layer, pattern: str): + new_layer = nn.Conv2D( + in_channels=layer._in_channels, + out_channels=layer._out_channels, + kernel_size=5, + padding=2 + ) + return new_layer + + net = paddleclas.MobileNetV1() + res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func) + print(res) + # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer} + """ + + if not isinstance(layer_name_pattern, list): + layer_name_pattern = [layer_name_pattern] + + hit_layer_pattern_list = [] + for pattern in layer_name_pattern: + # parse pattern to find target layer and its parent + layer_list = parse_pattern_str(pattern=pattern, parent_layer=self) + if not layer_list: + continue + sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self + + sub_layer = layer_list[-1]["layer"] + sub_layer_name = layer_list[-1]["name"] + sub_layer_index = layer_list[-1]["index"] + + new_sub_layer = handle_func(sub_layer, pattern) + + if sub_layer_index: + getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer + else: + setattr(sub_layer_parent, sub_layer_name, new_sub_layer) + + hit_layer_pattern_list.append(pattern) + return hit_layer_pattern_list + + def stop_after(self, stop_layer_name: str) -> bool: + """stop forward and backward after 'stop_layer_name'. + + Args: + stop_layer_name (str): The name of layer that stop forward and backward after this layer. + + Returns: + bool: 'True' if successful, 'False' otherwise. + """ + + layer_list = parse_pattern_str(stop_layer_name, self) + if not layer_list: + return False + + parent_layer = self + for layer_dict in layer_list: + name, index = layer_dict["name"], layer_dict["index"] + if not set_identity(parent_layer, name, index): + msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'." + return False + parent_layer = layer_dict["layer"] + + return True + + def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]: + """update the result(s) to be returned. + + Args: + return_patterns (Union[str, List[str]]): The name of layer to return output. + + Returns: + Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully. + """ + + # clear res_dict that could have been set + self.res_dict = {} + + class Handler(object): + + def __init__(self, res_dict): + # res_dict is a reference + self.res_dict = res_dict + + def __call__(self, layer, pattern): + layer.res_dict = self.res_dict + layer.res_name = pattern + if hasattr(layer, "hook_remove_helper"): + layer.hook_remove_helper.remove() + layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook) + return layer + + handle_func = Handler(self.res_dict) + + hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func) + + if hasattr(self, "hook_remove_helper"): + self.hook_remove_helper.remove() + self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook) + + return hit_layer_pattern_list + + +def save_sub_res_hook(layer, input, output): + layer.res_dict[layer.res_name] = output + + +def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool: + """set the layer specified by layer_name and layer_index to Indentity. + + Args: + parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index. + layer_name (str): The name of target layer to be set to Indentity. + layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None. + + Returns: + bool: True if successfully, False otherwise. + """ + + stop_after = False + for sub_layer_name in parent_layer._sub_layers: + if stop_after: + parent_layer._sub_layers[sub_layer_name] = Identity() + continue + if sub_layer_name == layer_name: + stop_after = True + + if layer_index and stop_after: + stop_after = False + for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers: + if stop_after: + parent_layer._sub_layers[layer_name][sub_layer_index] = Identity() + continue + if layer_index == sub_layer_index: + stop_after = True + + return stop_after + + +def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: + """parse the string type pattern. + + Args: + pattern (str): The pattern to discribe layer. + parent_layer (nn.Layer): The root layer relative to the pattern. + + Returns: + Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order: + [ + {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist}, + {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist}, + ... + ] + """ + + pattern_list = pattern.split(".") + if not pattern_list: + msg = f"The pattern('{pattern}') is illegal. Please check and retry." + return None + + layer_list = [] + while len(pattern_list) > 0: + if '[' in pattern_list[0]: + target_layer_name = pattern_list[0].split('[')[0] + target_layer_index = pattern_list[0].split('[')[1].split(']')[0] + else: + target_layer_name = pattern_list[0] + target_layer_index = None + + target_layer = getattr(parent_layer, target_layer_name, None) + + if target_layer is None: + msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')." + return None + + if target_layer_index and target_layer: + if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer): + msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0." + return None + + target_layer = target_layer[target_layer_index] + + layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index}) + + pattern_list = pattern_list[1:] + parent_layer = target_layer + return layer_list + + +MODEL_STAGES_PATTERN = {"PPLCNet": ["blocks2", "blocks3", "blocks4", "blocks5", "blocks6"]} + +# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se. +# k: kernel_size +# in_c: input channel number in depthwise block +# out_c: output channel number in depthwise block +# s: stride in depthwise block +# use_se: whether to use SE block + +NET_CONFIG = { + "blocks2": + #k, in_c, out_c, s, use_se + [[3, 16, 32, 1, False]], + "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]], + "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]], + "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], + [5, 256, 256, 1, False], [5, 256, 256, 1, False]], + "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]] +} + + +def make_divisible(v, divisor=8, min_value=None): + if min_value is None: + min_value = divisor + new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) + if new_v < 0.9 * v: + new_v += divisor + return new_v + + +class ConvBNLayer(TheseusLayer): + + def __init__(self, num_channels, filter_size, num_filters, stride, num_groups=1): + super().__init__() + + self.conv = Conv2D(in_channels=num_channels, + out_channels=num_filters, + kernel_size=filter_size, + stride=stride, + padding=(filter_size - 1) // 2, + groups=num_groups, + weight_attr=ParamAttr(initializer=KaimingNormal()), + bias_attr=False) + + self.bn = BatchNorm(num_filters, + param_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + self.hardswish = nn.Hardswish() + + def forward(self, x): + x = self.conv(x) + x = self.bn(x) + x = self.hardswish(x) + return x + + +class DepthwiseSeparable(TheseusLayer): + + def __init__(self, num_channels, num_filters, stride, dw_size=3, use_se=False): + super().__init__() + self.use_se = use_se + self.dw_conv = ConvBNLayer(num_channels=num_channels, + num_filters=num_channels, + filter_size=dw_size, + stride=stride, + num_groups=num_channels) + if use_se: + self.se = SEModule(num_channels) + self.pw_conv = ConvBNLayer(num_channels=num_channels, filter_size=1, num_filters=num_filters, stride=1) + + def forward(self, x): + x = self.dw_conv(x) + if self.use_se: + x = self.se(x) + x = self.pw_conv(x) + return x + + +class SEModule(TheseusLayer): + + def __init__(self, channel, reduction=4): + super().__init__() + self.avg_pool = AdaptiveAvgPool2D(1) + self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0) + self.relu = nn.ReLU() + self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0) + self.hardsigmoid = nn.Hardsigmoid() + + def forward(self, x): + identity = x + x = self.avg_pool(x) + x = self.conv1(x) + x = self.relu(x) + x = self.conv2(x) + x = self.hardsigmoid(x) + x = paddle.multiply(x=identity, y=x) + return x + + +class PPLCNet(TheseusLayer): + + def __init__(self, + stages_pattern, + scale=1.0, + class_num=1000, + dropout_prob=0.2, + class_expand=1280, + return_patterns=None, + return_stages=None): + super().__init__() + self.scale = scale + self.class_expand = class_expand + + self.conv1 = ConvBNLayer(num_channels=3, filter_size=3, num_filters=make_divisible(16 * scale), stride=2) + + self.blocks2 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"]) + ]) + + self.blocks3 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"]) + ]) + + self.blocks4 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"]) + ]) + + self.blocks5 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"]) + ]) + + self.blocks6 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"]) + ]) + + self.avg_pool = AdaptiveAvgPool2D(1) + + self.last_conv = Conv2D(in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale), + out_channels=self.class_expand, + kernel_size=1, + stride=1, + padding=0, + bias_attr=False) + + self.hardswish = nn.Hardswish() + self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer") + self.flatten = nn.Flatten(start_axis=1, stop_axis=-1) + + self.fc = Linear(self.class_expand, class_num) + + super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages) + + def forward(self, x): + x = self.conv1(x) + + x = self.blocks2(x) + x = self.blocks3(x) + x = self.blocks4(x) + x = self.blocks5(x) + x = self.blocks6(x) + + x = self.avg_pool(x) + x = self.last_conv(x) + x = self.hardswish(x) + x = self.dropout(x) + x = self.flatten(x) + x = self.fc(x) + return x + + +def PPLCNet_x1_0(pretrained=False, use_ssld=False, **kwargs): + model = PPLCNet(scale=1.0, stages_pattern=MODEL_STAGES_PATTERN["PPLCNet"], **kwargs) + return model diff --git a/modules/image/classification/pplcnet_x1_0_imagenet/module.py b/modules/image/classification/pplcnet_x1_0_imagenet/module.py new file mode 100644 index 0000000000000000000000000000000000000000..3119f49bb466d5b53a76c374e6cdc8b8cbde03db --- /dev/null +++ b/modules/image/classification/pplcnet_x1_0_imagenet/module.py @@ -0,0 +1,154 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import cv2 +import numpy as np +import paddle +from skimage.io import imread +from skimage.transform import rescale +from skimage.transform import resize + +import paddlehub as hub +from .model import PPLCNet_x1_0 +from .processor import base64_to_cv2 +from .processor import create_operators +from .processor import Topk +from .utils import get_config +from paddlehub.module.module import moduleinfo +from paddlehub.module.module import runnable +from paddlehub.module.module import serving + + +@moduleinfo(name="pplcnet_x1_0_imagenet", + type="cv/classification", + author="paddlepaddle", + author_email="", + summary="", + version="1.0.0") +class PPLcNet_x1_0: + + def __init__(self): + self.config = get_config(os.path.join(self.directory, 'PPLCNet_x1_0.yaml'), show=False) + self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt') + self.pretrain_path = os.path.join(self.directory, 'PPLCNet_x1_0_pretrained.pdparams') + self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path + self.model = PPLCNet_x1_0() + param_state_dict = paddle.load(self.pretrain_path) + self.model.set_dict(param_state_dict) + self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"]) + + def classification(self, + images: list = None, + paths: list = None, + batch_size: int = 1, + use_gpu: bool = False, + top_k: int = 1): + ''' + Args: + images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR. + paths (list[str]): The paths of images. + batch_size (int): batch size. + use_gpu (bool): Whether to use gpu. + top_k (int): Return top k results. + + Returns: + res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'. + ''' + postprocess_func = Topk(top_k, self.label_path) + inputs = [] + results = [] + paddle.disable_static() + place = 'gpu:0' if use_gpu else 'cpu' + place = paddle.set_device(place) + if images == None and paths == None: + print('No image provided. Please input an image or a image path.') + return + + if images != None: + for image in images: + image = image[:, :, ::-1] + inputs.append(image) + + if paths != None: + for path in paths: + image = cv2.imread(path)[:, :, ::-1] + inputs.append(image) + + batch_data = [] + for idx, imagedata in enumerate(inputs): + for process in self.preprocess_funcs: + imagedata = process(imagedata) + batch_data.append(imagedata) + if len(batch_data) >= batch_size or idx == len(inputs) - 1: + batch_tensor = paddle.to_tensor(batch_data) + out = self.model(batch_tensor) + if isinstance(out, list): + out = out[0] + if isinstance(out, dict) and "logits" in out: + out = out["logits"] + if isinstance(out, dict) and "output" in out: + out = out["output"] + result = postprocess_func(out) + results.extend(result) + batch_data.clear() + return results + + @runnable + def run_cmd(self, argvs: list): + """ + Run as a command. + """ + self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name), + prog='hub run {}'.format(self.name), + usage='%(prog)s', + add_help=True) + + self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required") + self.arg_config_group = self.parser.add_argument_group( + title="Config options", description="Run configuration for controlling module behavior, not required.") + self.add_module_config_arg() + self.add_module_input_arg() + self.args = self.parser.parse_args(argvs) + results = self.classification(paths=[self.args.input_path], + use_gpu=self.args.use_gpu, + batch_size=self.args.batch_size, + top_k=self.args.top_k) + return results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.classification(images=images_decode, **kwargs) + return results + + def add_module_config_arg(self): + """ + Add the command config options. + """ + self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not") + + self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size') + self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.') + + def add_module_input_arg(self): + """ + Add the command input options. + """ + self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.") diff --git a/modules/image/classification/pplcnet_x1_0_imagenet/processor.py b/modules/image/classification/pplcnet_x1_0_imagenet/processor.py new file mode 100644 index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8 --- /dev/null +++ b/modules/image/classification/pplcnet_x1_0_imagenet/processor.py @@ -0,0 +1,374 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import base64 +import inspect +import math +import os +import random +import sys +from functools import partial + +import cv2 +import numpy as np +import paddle +import paddle.nn.functional as F +import six +from paddle.vision.transforms import ColorJitter as RawColorJitter +from PIL import Image + + +def create_operators(params, class_num=None): + """ + create operators based on the config + + Args: + params(list): a dict list, used to create some operators + """ + assert isinstance(params, list), ('operator config should be a list') + ops = [] + current_module = sys.modules[__name__] + for operator in params: + assert isinstance(operator, dict) and len(operator) == 1, "yaml format error" + op_name = list(operator)[0] + param = {} if operator[op_name] is None else operator[op_name] + op_func = getattr(current_module, op_name) + if "class_num" in inspect.getfullargspec(op_func).args: + param.update({"class_num": class_num}) + op = op_func(**param) + ops.append(op) + + return ops + + +class UnifiedResize(object): + + def __init__(self, interpolation=None, backend="cv2"): + _cv2_interp_from_str = { + 'nearest': cv2.INTER_NEAREST, + 'bilinear': cv2.INTER_LINEAR, + 'area': cv2.INTER_AREA, + 'bicubic': cv2.INTER_CUBIC, + 'lanczos': cv2.INTER_LANCZOS4 + } + _pil_interp_from_str = { + 'nearest': Image.NEAREST, + 'bilinear': Image.BILINEAR, + 'bicubic': Image.BICUBIC, + 'box': Image.BOX, + 'lanczos': Image.LANCZOS, + 'hamming': Image.HAMMING + } + + def _pil_resize(src, size, resample): + pil_img = Image.fromarray(src) + pil_img = pil_img.resize(size, resample) + return np.asarray(pil_img) + + if backend.lower() == "cv2": + if isinstance(interpolation, str): + interpolation = _cv2_interp_from_str[interpolation.lower()] + # compatible with opencv < version 4.4.0 + elif interpolation is None: + interpolation = cv2.INTER_LINEAR + self.resize_func = partial(cv2.resize, interpolation=interpolation) + elif backend.lower() == "pil": + if isinstance(interpolation, str): + interpolation = _pil_interp_from_str[interpolation.lower()] + self.resize_func = partial(_pil_resize, resample=interpolation) + else: + self.resize_func = cv2.resize + + def __call__(self, src, size): + return self.resize_func(src, size) + + +class OperatorParamError(ValueError): + """ OperatorParamError + """ + pass + + +class DecodeImage(object): + """ decode image """ + + def __init__(self, to_rgb=True, to_np=False, channel_first=False): + self.to_rgb = to_rgb + self.to_np = to_np # to numpy + self.channel_first = channel_first # only enabled when to_np is True + + def __call__(self, img): + if six.PY2: + assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage" + else: + assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage" + data = np.frombuffer(img, dtype='uint8') + img = cv2.imdecode(data, 1) + if self.to_rgb: + assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape) + img = img[:, :, ::-1] + + if self.channel_first: + img = img.transpose((2, 0, 1)) + + return img + + +class ResizeImage(object): + """ resize image """ + + def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"): + if resize_short is not None and resize_short > 0: + self.resize_short = resize_short + self.w = None + self.h = None + elif size is not None: + self.resize_short = None + self.w = size if type(size) is int else size[0] + self.h = size if type(size) is int else size[1] + else: + raise OperatorParamError("invalid params for ReisizeImage for '\ + 'both 'size' and 'resize_short' are None") + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + img_h, img_w = img.shape[:2] + if self.resize_short is not None: + percent = float(self.resize_short) / min(img_w, img_h) + w = int(round(img_w * percent)) + h = int(round(img_h * percent)) + else: + w = self.w + h = self.h + return self._resize_func(img, (w, h)) + + +class CropImage(object): + """ crop image """ + + def __init__(self, size): + if type(size) is int: + self.size = (size, size) + else: + self.size = size # (h, w) + + def __call__(self, img): + w, h = self.size + img_h, img_w = img.shape[:2] + w_start = (img_w - w) // 2 + h_start = (img_h - h) // 2 + + w_end = w_start + w + h_end = h_start + h + return img[h_start:h_end, w_start:w_end, :] + + +class RandCropImage(object): + """ random crop image """ + + def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"): + if type(size) is int: + self.size = (size, size) # (h, w) + else: + self.size = size + + self.scale = [0.08, 1.0] if scale is None else scale + self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + size = self.size + scale = self.scale + ratio = self.ratio + + aspect_ratio = math.sqrt(random.uniform(*ratio)) + w = 1. * aspect_ratio + h = 1. / aspect_ratio + + img_h, img_w = img.shape[:2] + + bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2)) + scale_max = min(scale[1], bound) + scale_min = min(scale[0], bound) + + target_area = img_w * img_h * random.uniform(scale_min, scale_max) + target_size = math.sqrt(target_area) + w = int(target_size * w) + h = int(target_size * h) + + i = random.randint(0, img_w - w) + j = random.randint(0, img_h - h) + + img = img[j:j + h, i:i + w, :] + + return self._resize_func(img, size) + + +class RandFlipImage(object): + """ random flip image + flip_code: + 1: Flipped Horizontally + 0: Flipped Vertically + -1: Flipped Horizontally & Vertically + """ + + def __init__(self, flip_code=1): + assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]" + self.flip_code = flip_code + + def __call__(self, img): + if random.randint(0, 1) == 1: + return cv2.flip(img, self.flip_code) + else: + return img + + +class NormalizeImage(object): + """ normalize image such as substract mean, divide std + """ + + def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3): + if isinstance(scale, str): + scale = eval(scale) + assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4." + self.channel_num = channel_num + self.output_dtype = 'float16' if output_fp16 else 'float32' + self.scale = np.float32(scale if scale is not None else 1.0 / 255.0) + self.order = order + mean = mean if mean is not None else [0.485, 0.456, 0.406] + std = std if std is not None else [0.229, 0.224, 0.225] + + shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3) + self.mean = np.array(mean).reshape(shape).astype('float32') + self.std = np.array(std).reshape(shape).astype('float32') + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage" + + img = (img.astype('float32') * self.scale - self.mean) / self.std + + if self.channel_num == 4: + img_h = img.shape[1] if self.order == 'chw' else img.shape[0] + img_w = img.shape[2] if self.order == 'chw' else img.shape[1] + pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1)) + img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate( + (img, pad_zeros), axis=2)) + return img.astype(self.output_dtype) + + +class ToCHWImage(object): + """ convert hwc image to chw image + """ + + def __init__(self): + pass + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + return img.transpose((2, 0, 1)) + + +class ColorJitter(RawColorJitter): + """ColorJitter. + """ + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + def __call__(self, img): + if not isinstance(img, Image.Image): + img = np.ascontiguousarray(img) + img = Image.fromarray(img) + img = super()._apply_image(img) + if isinstance(img, Image.Image): + img = np.asarray(img) + return img + + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + +class Topk(object): + + def __init__(self, topk=1, class_id_map_file=None): + assert isinstance(topk, (int, )) + self.class_id_map = self.parse_class_id_map(class_id_map_file) + self.topk = topk + + def parse_class_id_map(self, class_id_map_file): + if class_id_map_file is None: + return None + if not os.path.exists(class_id_map_file): + print( + "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!" + ) + return None + + try: + class_id_map = {} + with open(class_id_map_file, "r") as fin: + lines = fin.readlines() + for line in lines: + partition = line.split("\n")[0].partition(" ") + class_id_map[int(partition[0])] = str(partition[-1]) + except Exception as ex: + print(ex) + class_id_map = None + return class_id_map + + def __call__(self, x, file_names=None, multilabel=False): + assert isinstance(x, paddle.Tensor) + if file_names is not None: + assert x.shape[0] == len(file_names) + x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x) + x = x.numpy() + y = [] + for idx, probs in enumerate(x): + index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where( + probs >= 0.5)[0].astype("int32") + clas_id_list = [] + score_list = [] + label_name_list = [] + for i in index: + clas_id_list.append(i.item()) + score_list.append(probs[i].item()) + if self.class_id_map is not None: + label_name_list.append(self.class_id_map[i.item()]) + result = { + "class_ids": clas_id_list, + "scores": np.around(score_list, decimals=5).tolist(), + } + if file_names is not None: + result["file_name"] = file_names[idx] + if label_name_list is not None: + result["label_names"] = label_name_list + y.append(result) + return y diff --git a/modules/image/classification/pplcnet_x1_0_imagenet/utils.py b/modules/image/classification/pplcnet_x1_0_imagenet/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537 --- /dev/null +++ b/modules/image/classification/pplcnet_x1_0_imagenet/utils.py @@ -0,0 +1,129 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import yaml + +__all__ = ['get_config'] + + +class AttrDict(dict): + + def __getattr__(self, key): + return self[key] + + def __setattr__(self, key, value): + if key in self.__dict__: + self.__dict__[key] = value + else: + self[key] = value + + def __deepcopy__(self, content): + return copy.deepcopy(dict(self)) + + +def create_attr_dict(yaml_config): + from ast import literal_eval + for key, value in yaml_config.items(): + if type(value) is dict: + yaml_config[key] = value = AttrDict(value) + if isinstance(value, str): + try: + value = literal_eval(value) + except BaseException: + pass + if isinstance(value, AttrDict): + create_attr_dict(yaml_config[key]) + else: + yaml_config[key] = value + + +def parse_config(cfg_file): + """Load a config file into AttrDict""" + with open(cfg_file, 'r') as fopen: + yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader)) + create_attr_dict(yaml_config) + return yaml_config + + +def override(dl, ks, v): + """ + Recursively replace dict of list + Args: + dl(dict or list): dict or list to be replaced + ks(list): list of keys + v(str): value to be replaced + """ + + def str2num(v): + try: + return eval(v) + except Exception: + return v + + assert isinstance(dl, (list, dict)), ("{} should be a list or a dict") + assert len(ks) > 0, ('lenght of keys should larger than 0') + if isinstance(dl, list): + k = str2num(ks[0]) + if len(ks) == 1: + assert k < len(dl), ('index({}) out of range({})'.format(k, dl)) + dl[k] = str2num(v) + else: + override(dl[k], ks[1:], v) + else: + if len(ks) == 1: + # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl)) + if not ks[0] in dl: + print('A new filed ({}) detected!'.format(ks[0], dl)) + dl[ks[0]] = str2num(v) + else: + override(dl[ks[0]], ks[1:], v) + + +def override_config(config, options=None): + """ + Recursively override the config + Args: + config(dict): dict to be replaced + options(list): list of pairs(key0.key1.idx.key2=value) + such as: [ + 'topk=2', + 'VALID.transforms.1.ResizeImage.resize_short=300' + ] + Returns: + config(dict): replaced config + """ + if options is not None: + for opt in options: + assert isinstance(opt, str), ("option({}) should be a str".format(opt)) + assert "=" in opt, ("option({}) should contain a =" + "to distinguish between key and value".format(opt)) + pair = opt.split('=') + assert len(pair) == 2, ("there can be only a = in the option") + key, value = pair + keys = key.split('.') + override(config, keys, value) + return config + + +def get_config(fname, overrides=None, show=False): + """ + Read config from file + """ + assert os.path.exists(fname), ('config file({}) is not exist'.format(fname)) + config = parse_config(fname) + override_config(config, overrides) + return config diff --git a/modules/image/classification/pplcnet_x1_5_imagenet/README.md b/modules/image/classification/pplcnet_x1_5_imagenet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..eb8342effacea9ca4b3296002c1aa2577cf87ad3 --- /dev/null +++ b/modules/image/classification/pplcnet_x1_5_imagenet/README.md @@ -0,0 +1,132 @@ +# pplcnet_x1_5_imagenet + +|模型名称|pplcnet_x1_5_imagenet| +| :--- | :---: | +|类别|图像-图像分类| +|网络|PPLCNet| +|数据集|ImageNet-2012| +|是否支持Fine-tuning|否| +|模型大小|17 MB| +|最新更新日期|2022-04-02| +|数据指标|Acc| + + +## 一、模型基本信息 + + + +- ### 模型介绍 + + - PP-LCNet是百度针对Intel CPU 设备以及其加速库 MKLDNN 设计的特定骨干网络 ,比起其他的轻量级的 SOTA 模型,该骨干网络可以在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型。该模型为模型规模参数scale为x1.5下的PP-LCNet模型,关于模型结构的更多信息,可参考[论文](https://arxiv.org/pdf/2109.15099.pdf)。 + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 1.6.2 + + - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + +- ### 2、安装 + + - ```shell + $ hub install pplcnet_x1_5_imagenet + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run pplcnet_x1_5_imagenet --input_path "/PATH/TO/IMAGE" + ``` + - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、预测代码示例 + + - ```python + import paddlehub as hub + import cv2 + + classifier = hub.Module(name="pplcnet_x1_5_imagenet") + result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')]) + # or + # result = classifier.classification(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + + - ```python + def classification(images=None, + paths=None, + batch_size=1, + use_gpu=False, + top_k=1): + ``` + - 分类接口API。 + - **参数** + + - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。 + + - **返回** + + - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称) + + +## 四、服务部署 + +- PaddleHub Serving可以部署一个图像识别的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m pplcnet_x1_5_imagenet + ``` + + - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"\} + url = "http://127.0.0.1:8866/predict/pplcnet_x1_5_imagenet" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` + + +## 五、更新历史 + +* 1.0.0 + + 初始发布 + + - ```shell + $ hub install pplcnet_x1_5_imagenet==1.0.0 + ``` diff --git a/modules/image/classification/pplcnet_x1_5_imagenet/model.py b/modules/image/classification/pplcnet_x1_5_imagenet/model.py new file mode 100644 index 0000000000000000000000000000000000000000..085bb5668a15d4783c4e8a7b412dd4c0a0b1610c --- /dev/null +++ b/modules/image/classification/pplcnet_x1_5_imagenet/model.py @@ -0,0 +1,478 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from typing import Any +from typing import Callable +from typing import Dict +from typing import List +from typing import Tuple +from typing import Union + +import paddle +import paddle.nn as nn +from paddle import ParamAttr +from paddle.nn import AdaptiveAvgPool2D +from paddle.nn import BatchNorm +from paddle.nn import Conv2D +from paddle.nn import Dropout +from paddle.nn import Linear +from paddle.nn.initializer import KaimingNormal +from paddle.regularizer import L2Decay + + +class Identity(nn.Layer): + + def __init__(self): + super(Identity, self).__init__() + + def forward(self, inputs): + return inputs + + +class TheseusLayer(nn.Layer): + + def __init__(self, *args, **kwargs): + super(TheseusLayer, self).__init__() + self.res_dict = {} + self.res_name = self.full_name() + self.pruner = None + self.quanter = None + + def _return_dict_hook(self, layer, input, output): + res_dict = {"output": output} + # 'list' is needed to avoid error raised by popping self.res_dict + for res_key in list(self.res_dict): + # clear the res_dict because the forward process may change according to input + res_dict[res_key] = self.res_dict.pop(res_key) + return res_dict + + def init_res(self, stages_pattern, return_patterns=None, return_stages=None): + if return_patterns and return_stages: + msg = f"The 'return_patterns' would be ignored when 'return_stages' is set." + return_stages = None + + if return_stages is True: + return_patterns = stages_pattern + # return_stages is int or bool + if type(return_stages) is int: + return_stages = [return_stages] + if isinstance(return_stages, list): + if max(return_stages) > len(stages_pattern) or min(return_stages) < 0: + msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}." + return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)] + return_patterns = [stages_pattern[i] for i in return_stages] + + if return_patterns: + self.update_res(return_patterns) + + def replace_sub(self, *args, **kwargs) -> None: + msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead." + raise DeprecationWarning(msg) + + def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]], + handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]: + """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'. + + Args: + layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'. + handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed. + + Returns: + Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'. + + Examples: + + from paddle import nn + import paddleclas + + def rep_func(layer: nn.Layer, pattern: str): + new_layer = nn.Conv2D( + in_channels=layer._in_channels, + out_channels=layer._out_channels, + kernel_size=5, + padding=2 + ) + return new_layer + + net = paddleclas.MobileNetV1() + res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func) + print(res) + # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer} + """ + + if not isinstance(layer_name_pattern, list): + layer_name_pattern = [layer_name_pattern] + + hit_layer_pattern_list = [] + for pattern in layer_name_pattern: + # parse pattern to find target layer and its parent + layer_list = parse_pattern_str(pattern=pattern, parent_layer=self) + if not layer_list: + continue + sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self + + sub_layer = layer_list[-1]["layer"] + sub_layer_name = layer_list[-1]["name"] + sub_layer_index = layer_list[-1]["index"] + + new_sub_layer = handle_func(sub_layer, pattern) + + if sub_layer_index: + getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer + else: + setattr(sub_layer_parent, sub_layer_name, new_sub_layer) + + hit_layer_pattern_list.append(pattern) + return hit_layer_pattern_list + + def stop_after(self, stop_layer_name: str) -> bool: + """stop forward and backward after 'stop_layer_name'. + + Args: + stop_layer_name (str): The name of layer that stop forward and backward after this layer. + + Returns: + bool: 'True' if successful, 'False' otherwise. + """ + + layer_list = parse_pattern_str(stop_layer_name, self) + if not layer_list: + return False + + parent_layer = self + for layer_dict in layer_list: + name, index = layer_dict["name"], layer_dict["index"] + if not set_identity(parent_layer, name, index): + msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'." + return False + parent_layer = layer_dict["layer"] + + return True + + def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]: + """update the result(s) to be returned. + + Args: + return_patterns (Union[str, List[str]]): The name of layer to return output. + + Returns: + Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully. + """ + + # clear res_dict that could have been set + self.res_dict = {} + + class Handler(object): + + def __init__(self, res_dict): + # res_dict is a reference + self.res_dict = res_dict + + def __call__(self, layer, pattern): + layer.res_dict = self.res_dict + layer.res_name = pattern + if hasattr(layer, "hook_remove_helper"): + layer.hook_remove_helper.remove() + layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook) + return layer + + handle_func = Handler(self.res_dict) + + hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func) + + if hasattr(self, "hook_remove_helper"): + self.hook_remove_helper.remove() + self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook) + + return hit_layer_pattern_list + + +def save_sub_res_hook(layer, input, output): + layer.res_dict[layer.res_name] = output + + +def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool: + """set the layer specified by layer_name and layer_index to Indentity. + + Args: + parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index. + layer_name (str): The name of target layer to be set to Indentity. + layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None. + + Returns: + bool: True if successfully, False otherwise. + """ + + stop_after = False + for sub_layer_name in parent_layer._sub_layers: + if stop_after: + parent_layer._sub_layers[sub_layer_name] = Identity() + continue + if sub_layer_name == layer_name: + stop_after = True + + if layer_index and stop_after: + stop_after = False + for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers: + if stop_after: + parent_layer._sub_layers[layer_name][sub_layer_index] = Identity() + continue + if layer_index == sub_layer_index: + stop_after = True + + return stop_after + + +def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: + """parse the string type pattern. + + Args: + pattern (str): The pattern to discribe layer. + parent_layer (nn.Layer): The root layer relative to the pattern. + + Returns: + Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order: + [ + {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist}, + {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist}, + ... + ] + """ + + pattern_list = pattern.split(".") + if not pattern_list: + msg = f"The pattern('{pattern}') is illegal. Please check and retry." + return None + + layer_list = [] + while len(pattern_list) > 0: + if '[' in pattern_list[0]: + target_layer_name = pattern_list[0].split('[')[0] + target_layer_index = pattern_list[0].split('[')[1].split(']')[0] + else: + target_layer_name = pattern_list[0] + target_layer_index = None + + target_layer = getattr(parent_layer, target_layer_name, None) + + if target_layer is None: + msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')." + return None + + if target_layer_index and target_layer: + if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer): + msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0." + return None + + target_layer = target_layer[target_layer_index] + + layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index}) + + pattern_list = pattern_list[1:] + parent_layer = target_layer + return layer_list + + +MODEL_STAGES_PATTERN = {"PPLCNet": ["blocks2", "blocks3", "blocks4", "blocks5", "blocks6"]} + +# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se. +# k: kernel_size +# in_c: input channel number in depthwise block +# out_c: output channel number in depthwise block +# s: stride in depthwise block +# use_se: whether to use SE block + +NET_CONFIG = { + "blocks2": + #k, in_c, out_c, s, use_se + [[3, 16, 32, 1, False]], + "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]], + "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]], + "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], + [5, 256, 256, 1, False], [5, 256, 256, 1, False]], + "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]] +} + + +def make_divisible(v, divisor=8, min_value=None): + if min_value is None: + min_value = divisor + new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) + if new_v < 0.9 * v: + new_v += divisor + return new_v + + +class ConvBNLayer(TheseusLayer): + + def __init__(self, num_channels, filter_size, num_filters, stride, num_groups=1): + super().__init__() + + self.conv = Conv2D(in_channels=num_channels, + out_channels=num_filters, + kernel_size=filter_size, + stride=stride, + padding=(filter_size - 1) // 2, + groups=num_groups, + weight_attr=ParamAttr(initializer=KaimingNormal()), + bias_attr=False) + + self.bn = BatchNorm(num_filters, + param_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + self.hardswish = nn.Hardswish() + + def forward(self, x): + x = self.conv(x) + x = self.bn(x) + x = self.hardswish(x) + return x + + +class DepthwiseSeparable(TheseusLayer): + + def __init__(self, num_channels, num_filters, stride, dw_size=3, use_se=False): + super().__init__() + self.use_se = use_se + self.dw_conv = ConvBNLayer(num_channels=num_channels, + num_filters=num_channels, + filter_size=dw_size, + stride=stride, + num_groups=num_channels) + if use_se: + self.se = SEModule(num_channels) + self.pw_conv = ConvBNLayer(num_channels=num_channels, filter_size=1, num_filters=num_filters, stride=1) + + def forward(self, x): + x = self.dw_conv(x) + if self.use_se: + x = self.se(x) + x = self.pw_conv(x) + return x + + +class SEModule(TheseusLayer): + + def __init__(self, channel, reduction=4): + super().__init__() + self.avg_pool = AdaptiveAvgPool2D(1) + self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0) + self.relu = nn.ReLU() + self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0) + self.hardsigmoid = nn.Hardsigmoid() + + def forward(self, x): + identity = x + x = self.avg_pool(x) + x = self.conv1(x) + x = self.relu(x) + x = self.conv2(x) + x = self.hardsigmoid(x) + x = paddle.multiply(x=identity, y=x) + return x + + +class PPLCNet(TheseusLayer): + + def __init__(self, + stages_pattern, + scale=1.0, + class_num=1000, + dropout_prob=0.2, + class_expand=1280, + return_patterns=None, + return_stages=None): + super().__init__() + self.scale = scale + self.class_expand = class_expand + + self.conv1 = ConvBNLayer(num_channels=3, filter_size=3, num_filters=make_divisible(16 * scale), stride=2) + + self.blocks2 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"]) + ]) + + self.blocks3 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"]) + ]) + + self.blocks4 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"]) + ]) + + self.blocks5 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"]) + ]) + + self.blocks6 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"]) + ]) + + self.avg_pool = AdaptiveAvgPool2D(1) + + self.last_conv = Conv2D(in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale), + out_channels=self.class_expand, + kernel_size=1, + stride=1, + padding=0, + bias_attr=False) + + self.hardswish = nn.Hardswish() + self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer") + self.flatten = nn.Flatten(start_axis=1, stop_axis=-1) + + self.fc = Linear(self.class_expand, class_num) + + super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages) + + def forward(self, x): + x = self.conv1(x) + + x = self.blocks2(x) + x = self.blocks3(x) + x = self.blocks4(x) + x = self.blocks5(x) + x = self.blocks6(x) + + x = self.avg_pool(x) + x = self.last_conv(x) + x = self.hardswish(x) + x = self.dropout(x) + x = self.flatten(x) + x = self.fc(x) + return x + + +def PPLCNet_x1_5(pretrained=False, use_ssld=False, **kwargs): + model = PPLCNet(scale=1.5, stages_pattern=MODEL_STAGES_PATTERN["PPLCNet"], **kwargs) + return model diff --git a/modules/image/classification/pplcnet_x1_5_imagenet/module.py b/modules/image/classification/pplcnet_x1_5_imagenet/module.py new file mode 100644 index 0000000000000000000000000000000000000000..25f258db9b6b8cad10a63497431e28dcd67ddd2c --- /dev/null +++ b/modules/image/classification/pplcnet_x1_5_imagenet/module.py @@ -0,0 +1,154 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import cv2 +import numpy as np +import paddle +from skimage.io import imread +from skimage.transform import rescale +from skimage.transform import resize + +import paddlehub as hub +from .model import PPLCNet_x1_5 +from .processor import base64_to_cv2 +from .processor import create_operators +from .processor import Topk +from .utils import get_config +from paddlehub.module.module import moduleinfo +from paddlehub.module.module import runnable +from paddlehub.module.module import serving + + +@moduleinfo(name="pplcnet_x1_5_imagenet", + type="cv/classification", + author="paddlepaddle", + author_email="", + summary="", + version="1.0.0") +class PPLcNet_x1_5: + + def __init__(self): + self.config = get_config(os.path.join(self.directory, 'PPLCNet_x1_5.yaml'), show=False) + self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt') + self.pretrain_path = os.path.join(self.directory, 'PPLCNet_x1_5_pretrained.pdparams') + self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path + self.model = PPLCNet_x1_5() + param_state_dict = paddle.load(self.pretrain_path) + self.model.set_dict(param_state_dict) + self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"]) + + def classification(self, + images: list = None, + paths: list = None, + batch_size: int = 1, + use_gpu: bool = False, + top_k: int = 1): + ''' + Args: + images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR. + paths (list[str]): The paths of images. + batch_size (int): batch size. + use_gpu (bool): Whether to use gpu. + top_k (int): Return top k results. + + Returns: + res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'. + ''' + postprocess_func = Topk(top_k, self.label_path) + inputs = [] + results = [] + paddle.disable_static() + place = 'gpu:0' if use_gpu else 'cpu' + place = paddle.set_device(place) + if images == None and paths == None: + print('No image provided. Please input an image or a image path.') + return + + if images != None: + for image in images: + image = image[:, :, ::-1] + inputs.append(image) + + if paths != None: + for path in paths: + image = cv2.imread(path)[:, :, ::-1] + inputs.append(image) + + batch_data = [] + for idx, imagedata in enumerate(inputs): + for process in self.preprocess_funcs: + imagedata = process(imagedata) + batch_data.append(imagedata) + if len(batch_data) >= batch_size or idx == len(inputs) - 1: + batch_tensor = paddle.to_tensor(batch_data) + out = self.model(batch_tensor) + if isinstance(out, list): + out = out[0] + if isinstance(out, dict) and "logits" in out: + out = out["logits"] + if isinstance(out, dict) and "output" in out: + out = out["output"] + result = postprocess_func(out) + results.extend(result) + batch_data.clear() + return results + + @runnable + def run_cmd(self, argvs: list): + """ + Run as a command. + """ + self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name), + prog='hub run {}'.format(self.name), + usage='%(prog)s', + add_help=True) + + self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required") + self.arg_config_group = self.parser.add_argument_group( + title="Config options", description="Run configuration for controlling module behavior, not required.") + self.add_module_config_arg() + self.add_module_input_arg() + self.args = self.parser.parse_args(argvs) + results = self.classification(paths=[self.args.input_path], + use_gpu=self.args.use_gpu, + batch_size=self.args.batch_size, + top_k=self.args.top_k) + return results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.classification(images=images_decode, **kwargs) + return results + + def add_module_config_arg(self): + """ + Add the command config options. + """ + self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not") + + self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size') + self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.') + + def add_module_input_arg(self): + """ + Add the command input options. + """ + self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.") diff --git a/modules/image/classification/pplcnet_x1_5_imagenet/processor.py b/modules/image/classification/pplcnet_x1_5_imagenet/processor.py new file mode 100644 index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8 --- /dev/null +++ b/modules/image/classification/pplcnet_x1_5_imagenet/processor.py @@ -0,0 +1,374 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import base64 +import inspect +import math +import os +import random +import sys +from functools import partial + +import cv2 +import numpy as np +import paddle +import paddle.nn.functional as F +import six +from paddle.vision.transforms import ColorJitter as RawColorJitter +from PIL import Image + + +def create_operators(params, class_num=None): + """ + create operators based on the config + + Args: + params(list): a dict list, used to create some operators + """ + assert isinstance(params, list), ('operator config should be a list') + ops = [] + current_module = sys.modules[__name__] + for operator in params: + assert isinstance(operator, dict) and len(operator) == 1, "yaml format error" + op_name = list(operator)[0] + param = {} if operator[op_name] is None else operator[op_name] + op_func = getattr(current_module, op_name) + if "class_num" in inspect.getfullargspec(op_func).args: + param.update({"class_num": class_num}) + op = op_func(**param) + ops.append(op) + + return ops + + +class UnifiedResize(object): + + def __init__(self, interpolation=None, backend="cv2"): + _cv2_interp_from_str = { + 'nearest': cv2.INTER_NEAREST, + 'bilinear': cv2.INTER_LINEAR, + 'area': cv2.INTER_AREA, + 'bicubic': cv2.INTER_CUBIC, + 'lanczos': cv2.INTER_LANCZOS4 + } + _pil_interp_from_str = { + 'nearest': Image.NEAREST, + 'bilinear': Image.BILINEAR, + 'bicubic': Image.BICUBIC, + 'box': Image.BOX, + 'lanczos': Image.LANCZOS, + 'hamming': Image.HAMMING + } + + def _pil_resize(src, size, resample): + pil_img = Image.fromarray(src) + pil_img = pil_img.resize(size, resample) + return np.asarray(pil_img) + + if backend.lower() == "cv2": + if isinstance(interpolation, str): + interpolation = _cv2_interp_from_str[interpolation.lower()] + # compatible with opencv < version 4.4.0 + elif interpolation is None: + interpolation = cv2.INTER_LINEAR + self.resize_func = partial(cv2.resize, interpolation=interpolation) + elif backend.lower() == "pil": + if isinstance(interpolation, str): + interpolation = _pil_interp_from_str[interpolation.lower()] + self.resize_func = partial(_pil_resize, resample=interpolation) + else: + self.resize_func = cv2.resize + + def __call__(self, src, size): + return self.resize_func(src, size) + + +class OperatorParamError(ValueError): + """ OperatorParamError + """ + pass + + +class DecodeImage(object): + """ decode image """ + + def __init__(self, to_rgb=True, to_np=False, channel_first=False): + self.to_rgb = to_rgb + self.to_np = to_np # to numpy + self.channel_first = channel_first # only enabled when to_np is True + + def __call__(self, img): + if six.PY2: + assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage" + else: + assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage" + data = np.frombuffer(img, dtype='uint8') + img = cv2.imdecode(data, 1) + if self.to_rgb: + assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape) + img = img[:, :, ::-1] + + if self.channel_first: + img = img.transpose((2, 0, 1)) + + return img + + +class ResizeImage(object): + """ resize image """ + + def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"): + if resize_short is not None and resize_short > 0: + self.resize_short = resize_short + self.w = None + self.h = None + elif size is not None: + self.resize_short = None + self.w = size if type(size) is int else size[0] + self.h = size if type(size) is int else size[1] + else: + raise OperatorParamError("invalid params for ReisizeImage for '\ + 'both 'size' and 'resize_short' are None") + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + img_h, img_w = img.shape[:2] + if self.resize_short is not None: + percent = float(self.resize_short) / min(img_w, img_h) + w = int(round(img_w * percent)) + h = int(round(img_h * percent)) + else: + w = self.w + h = self.h + return self._resize_func(img, (w, h)) + + +class CropImage(object): + """ crop image """ + + def __init__(self, size): + if type(size) is int: + self.size = (size, size) + else: + self.size = size # (h, w) + + def __call__(self, img): + w, h = self.size + img_h, img_w = img.shape[:2] + w_start = (img_w - w) // 2 + h_start = (img_h - h) // 2 + + w_end = w_start + w + h_end = h_start + h + return img[h_start:h_end, w_start:w_end, :] + + +class RandCropImage(object): + """ random crop image """ + + def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"): + if type(size) is int: + self.size = (size, size) # (h, w) + else: + self.size = size + + self.scale = [0.08, 1.0] if scale is None else scale + self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + size = self.size + scale = self.scale + ratio = self.ratio + + aspect_ratio = math.sqrt(random.uniform(*ratio)) + w = 1. * aspect_ratio + h = 1. / aspect_ratio + + img_h, img_w = img.shape[:2] + + bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2)) + scale_max = min(scale[1], bound) + scale_min = min(scale[0], bound) + + target_area = img_w * img_h * random.uniform(scale_min, scale_max) + target_size = math.sqrt(target_area) + w = int(target_size * w) + h = int(target_size * h) + + i = random.randint(0, img_w - w) + j = random.randint(0, img_h - h) + + img = img[j:j + h, i:i + w, :] + + return self._resize_func(img, size) + + +class RandFlipImage(object): + """ random flip image + flip_code: + 1: Flipped Horizontally + 0: Flipped Vertically + -1: Flipped Horizontally & Vertically + """ + + def __init__(self, flip_code=1): + assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]" + self.flip_code = flip_code + + def __call__(self, img): + if random.randint(0, 1) == 1: + return cv2.flip(img, self.flip_code) + else: + return img + + +class NormalizeImage(object): + """ normalize image such as substract mean, divide std + """ + + def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3): + if isinstance(scale, str): + scale = eval(scale) + assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4." + self.channel_num = channel_num + self.output_dtype = 'float16' if output_fp16 else 'float32' + self.scale = np.float32(scale if scale is not None else 1.0 / 255.0) + self.order = order + mean = mean if mean is not None else [0.485, 0.456, 0.406] + std = std if std is not None else [0.229, 0.224, 0.225] + + shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3) + self.mean = np.array(mean).reshape(shape).astype('float32') + self.std = np.array(std).reshape(shape).astype('float32') + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage" + + img = (img.astype('float32') * self.scale - self.mean) / self.std + + if self.channel_num == 4: + img_h = img.shape[1] if self.order == 'chw' else img.shape[0] + img_w = img.shape[2] if self.order == 'chw' else img.shape[1] + pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1)) + img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate( + (img, pad_zeros), axis=2)) + return img.astype(self.output_dtype) + + +class ToCHWImage(object): + """ convert hwc image to chw image + """ + + def __init__(self): + pass + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + return img.transpose((2, 0, 1)) + + +class ColorJitter(RawColorJitter): + """ColorJitter. + """ + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + def __call__(self, img): + if not isinstance(img, Image.Image): + img = np.ascontiguousarray(img) + img = Image.fromarray(img) + img = super()._apply_image(img) + if isinstance(img, Image.Image): + img = np.asarray(img) + return img + + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + +class Topk(object): + + def __init__(self, topk=1, class_id_map_file=None): + assert isinstance(topk, (int, )) + self.class_id_map = self.parse_class_id_map(class_id_map_file) + self.topk = topk + + def parse_class_id_map(self, class_id_map_file): + if class_id_map_file is None: + return None + if not os.path.exists(class_id_map_file): + print( + "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!" + ) + return None + + try: + class_id_map = {} + with open(class_id_map_file, "r") as fin: + lines = fin.readlines() + for line in lines: + partition = line.split("\n")[0].partition(" ") + class_id_map[int(partition[0])] = str(partition[-1]) + except Exception as ex: + print(ex) + class_id_map = None + return class_id_map + + def __call__(self, x, file_names=None, multilabel=False): + assert isinstance(x, paddle.Tensor) + if file_names is not None: + assert x.shape[0] == len(file_names) + x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x) + x = x.numpy() + y = [] + for idx, probs in enumerate(x): + index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where( + probs >= 0.5)[0].astype("int32") + clas_id_list = [] + score_list = [] + label_name_list = [] + for i in index: + clas_id_list.append(i.item()) + score_list.append(probs[i].item()) + if self.class_id_map is not None: + label_name_list.append(self.class_id_map[i.item()]) + result = { + "class_ids": clas_id_list, + "scores": np.around(score_list, decimals=5).tolist(), + } + if file_names is not None: + result["file_name"] = file_names[idx] + if label_name_list is not None: + result["label_names"] = label_name_list + y.append(result) + return y diff --git a/modules/image/classification/pplcnet_x1_5_imagenet/utils.py b/modules/image/classification/pplcnet_x1_5_imagenet/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537 --- /dev/null +++ b/modules/image/classification/pplcnet_x1_5_imagenet/utils.py @@ -0,0 +1,129 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import yaml + +__all__ = ['get_config'] + + +class AttrDict(dict): + + def __getattr__(self, key): + return self[key] + + def __setattr__(self, key, value): + if key in self.__dict__: + self.__dict__[key] = value + else: + self[key] = value + + def __deepcopy__(self, content): + return copy.deepcopy(dict(self)) + + +def create_attr_dict(yaml_config): + from ast import literal_eval + for key, value in yaml_config.items(): + if type(value) is dict: + yaml_config[key] = value = AttrDict(value) + if isinstance(value, str): + try: + value = literal_eval(value) + except BaseException: + pass + if isinstance(value, AttrDict): + create_attr_dict(yaml_config[key]) + else: + yaml_config[key] = value + + +def parse_config(cfg_file): + """Load a config file into AttrDict""" + with open(cfg_file, 'r') as fopen: + yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader)) + create_attr_dict(yaml_config) + return yaml_config + + +def override(dl, ks, v): + """ + Recursively replace dict of list + Args: + dl(dict or list): dict or list to be replaced + ks(list): list of keys + v(str): value to be replaced + """ + + def str2num(v): + try: + return eval(v) + except Exception: + return v + + assert isinstance(dl, (list, dict)), ("{} should be a list or a dict") + assert len(ks) > 0, ('lenght of keys should larger than 0') + if isinstance(dl, list): + k = str2num(ks[0]) + if len(ks) == 1: + assert k < len(dl), ('index({}) out of range({})'.format(k, dl)) + dl[k] = str2num(v) + else: + override(dl[k], ks[1:], v) + else: + if len(ks) == 1: + # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl)) + if not ks[0] in dl: + print('A new filed ({}) detected!'.format(ks[0], dl)) + dl[ks[0]] = str2num(v) + else: + override(dl[ks[0]], ks[1:], v) + + +def override_config(config, options=None): + """ + Recursively override the config + Args: + config(dict): dict to be replaced + options(list): list of pairs(key0.key1.idx.key2=value) + such as: [ + 'topk=2', + 'VALID.transforms.1.ResizeImage.resize_short=300' + ] + Returns: + config(dict): replaced config + """ + if options is not None: + for opt in options: + assert isinstance(opt, str), ("option({}) should be a str".format(opt)) + assert "=" in opt, ("option({}) should contain a =" + "to distinguish between key and value".format(opt)) + pair = opt.split('=') + assert len(pair) == 2, ("there can be only a = in the option") + key, value = pair + keys = key.split('.') + override(config, keys, value) + return config + + +def get_config(fname, overrides=None, show=False): + """ + Read config from file + """ + assert os.path.exists(fname), ('config file({}) is not exist'.format(fname)) + config = parse_config(fname) + override_config(config, overrides) + return config diff --git a/modules/image/classification/pplcnet_x2_0_imagenet/README.md b/modules/image/classification/pplcnet_x2_0_imagenet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..61c681d008383dfa19a26d36145a0d6c890ecaa8 --- /dev/null +++ b/modules/image/classification/pplcnet_x2_0_imagenet/README.md @@ -0,0 +1,132 @@ +# pplcnet_x2_0_imagenet + +|模型名称|pplcnet_x2_0_imagenet| +| :--- | :---: | +|类别|图像-图像分类| +|网络|PPLCNet| +|数据集|ImageNet-2012| +|是否支持Fine-tuning|否| +|模型大小|24 MB| +|最新更新日期|2022-04-02| +|数据指标|Acc| + + +## 一、模型基本信息 + + + +- ### 模型介绍 + + - PP-LCNet是百度针对Intel CPU 设备以及其加速库 MKLDNN 设计的特定骨干网络 ,比起其他的轻量级的 SOTA 模型,该骨干网络可以在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型。该模型为模型规模参数scale为x2.0下的PP-LCNet模型,关于模型结构的更多信息,可参考[论文](https://arxiv.org/pdf/2109.15099.pdf)。 + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 1.6.2 + + - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + +- ### 2、安装 + + - ```shell + $ hub install pplcnet_x2_0_imagenet + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run pplcnet_x2_0_imagenet --input_path "/PATH/TO/IMAGE" + ``` + - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、预测代码示例 + + - ```python + import paddlehub as hub + import cv2 + + classifier = hub.Module(name="pplcnet_x2_0_imagenet") + result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')]) + # or + # result = classifier.classification(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + + - ```python + def classification(images=None, + paths=None, + batch_size=1, + use_gpu=False, + top_k=1): + ``` + - 分类接口API。 + - **参数** + + - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。 + + - **返回** + + - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称) + + +## 四、服务部署 + +- PaddleHub Serving可以部署一个图像识别的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m pplcnet_x2_0_imagenet + ``` + + - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"\} + url = "http://127.0.0.1:8866/predict/pplcnet_x2_0_imagenet" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` + + +## 五、更新历史 + +* 1.0.0 + + 初始发布 + + - ```shell + $ hub install pplcnet_x2_0_imagenet==1.0.0 + ``` diff --git a/modules/image/classification/pplcnet_x2_0_imagenet/model.py b/modules/image/classification/pplcnet_x2_0_imagenet/model.py new file mode 100644 index 0000000000000000000000000000000000000000..a3fd8364ae59dbf085d50c597d5c650d6a7a7d73 --- /dev/null +++ b/modules/image/classification/pplcnet_x2_0_imagenet/model.py @@ -0,0 +1,478 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from typing import Any +from typing import Callable +from typing import Dict +from typing import List +from typing import Tuple +from typing import Union + +import paddle +import paddle.nn as nn +from paddle import ParamAttr +from paddle.nn import AdaptiveAvgPool2D +from paddle.nn import BatchNorm +from paddle.nn import Conv2D +from paddle.nn import Dropout +from paddle.nn import Linear +from paddle.nn.initializer import KaimingNormal +from paddle.regularizer import L2Decay + + +class Identity(nn.Layer): + + def __init__(self): + super(Identity, self).__init__() + + def forward(self, inputs): + return inputs + + +class TheseusLayer(nn.Layer): + + def __init__(self, *args, **kwargs): + super(TheseusLayer, self).__init__() + self.res_dict = {} + self.res_name = self.full_name() + self.pruner = None + self.quanter = None + + def _return_dict_hook(self, layer, input, output): + res_dict = {"output": output} + # 'list' is needed to avoid error raised by popping self.res_dict + for res_key in list(self.res_dict): + # clear the res_dict because the forward process may change according to input + res_dict[res_key] = self.res_dict.pop(res_key) + return res_dict + + def init_res(self, stages_pattern, return_patterns=None, return_stages=None): + if return_patterns and return_stages: + msg = f"The 'return_patterns' would be ignored when 'return_stages' is set." + return_stages = None + + if return_stages is True: + return_patterns = stages_pattern + # return_stages is int or bool + if type(return_stages) is int: + return_stages = [return_stages] + if isinstance(return_stages, list): + if max(return_stages) > len(stages_pattern) or min(return_stages) < 0: + msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}." + return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)] + return_patterns = [stages_pattern[i] for i in return_stages] + + if return_patterns: + self.update_res(return_patterns) + + def replace_sub(self, *args, **kwargs) -> None: + msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead." + raise DeprecationWarning(msg) + + def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]], + handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]: + """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'. + + Args: + layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'. + handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed. + + Returns: + Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'. + + Examples: + + from paddle import nn + import paddleclas + + def rep_func(layer: nn.Layer, pattern: str): + new_layer = nn.Conv2D( + in_channels=layer._in_channels, + out_channels=layer._out_channels, + kernel_size=5, + padding=2 + ) + return new_layer + + net = paddleclas.MobileNetV1() + res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func) + print(res) + # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer} + """ + + if not isinstance(layer_name_pattern, list): + layer_name_pattern = [layer_name_pattern] + + hit_layer_pattern_list = [] + for pattern in layer_name_pattern: + # parse pattern to find target layer and its parent + layer_list = parse_pattern_str(pattern=pattern, parent_layer=self) + if not layer_list: + continue + sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self + + sub_layer = layer_list[-1]["layer"] + sub_layer_name = layer_list[-1]["name"] + sub_layer_index = layer_list[-1]["index"] + + new_sub_layer = handle_func(sub_layer, pattern) + + if sub_layer_index: + getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer + else: + setattr(sub_layer_parent, sub_layer_name, new_sub_layer) + + hit_layer_pattern_list.append(pattern) + return hit_layer_pattern_list + + def stop_after(self, stop_layer_name: str) -> bool: + """stop forward and backward after 'stop_layer_name'. + + Args: + stop_layer_name (str): The name of layer that stop forward and backward after this layer. + + Returns: + bool: 'True' if successful, 'False' otherwise. + """ + + layer_list = parse_pattern_str(stop_layer_name, self) + if not layer_list: + return False + + parent_layer = self + for layer_dict in layer_list: + name, index = layer_dict["name"], layer_dict["index"] + if not set_identity(parent_layer, name, index): + msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'." + return False + parent_layer = layer_dict["layer"] + + return True + + def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]: + """update the result(s) to be returned. + + Args: + return_patterns (Union[str, List[str]]): The name of layer to return output. + + Returns: + Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully. + """ + + # clear res_dict that could have been set + self.res_dict = {} + + class Handler(object): + + def __init__(self, res_dict): + # res_dict is a reference + self.res_dict = res_dict + + def __call__(self, layer, pattern): + layer.res_dict = self.res_dict + layer.res_name = pattern + if hasattr(layer, "hook_remove_helper"): + layer.hook_remove_helper.remove() + layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook) + return layer + + handle_func = Handler(self.res_dict) + + hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func) + + if hasattr(self, "hook_remove_helper"): + self.hook_remove_helper.remove() + self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook) + + return hit_layer_pattern_list + + +def save_sub_res_hook(layer, input, output): + layer.res_dict[layer.res_name] = output + + +def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool: + """set the layer specified by layer_name and layer_index to Indentity. + + Args: + parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index. + layer_name (str): The name of target layer to be set to Indentity. + layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None. + + Returns: + bool: True if successfully, False otherwise. + """ + + stop_after = False + for sub_layer_name in parent_layer._sub_layers: + if stop_after: + parent_layer._sub_layers[sub_layer_name] = Identity() + continue + if sub_layer_name == layer_name: + stop_after = True + + if layer_index and stop_after: + stop_after = False + for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers: + if stop_after: + parent_layer._sub_layers[layer_name][sub_layer_index] = Identity() + continue + if layer_index == sub_layer_index: + stop_after = True + + return stop_after + + +def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: + """parse the string type pattern. + + Args: + pattern (str): The pattern to discribe layer. + parent_layer (nn.Layer): The root layer relative to the pattern. + + Returns: + Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order: + [ + {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist}, + {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist}, + ... + ] + """ + + pattern_list = pattern.split(".") + if not pattern_list: + msg = f"The pattern('{pattern}') is illegal. Please check and retry." + return None + + layer_list = [] + while len(pattern_list) > 0: + if '[' in pattern_list[0]: + target_layer_name = pattern_list[0].split('[')[0] + target_layer_index = pattern_list[0].split('[')[1].split(']')[0] + else: + target_layer_name = pattern_list[0] + target_layer_index = None + + target_layer = getattr(parent_layer, target_layer_name, None) + + if target_layer is None: + msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')." + return None + + if target_layer_index and target_layer: + if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer): + msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0." + return None + + target_layer = target_layer[target_layer_index] + + layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index}) + + pattern_list = pattern_list[1:] + parent_layer = target_layer + return layer_list + + +MODEL_STAGES_PATTERN = {"PPLCNet": ["blocks2", "blocks3", "blocks4", "blocks5", "blocks6"]} + +# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se. +# k: kernel_size +# in_c: input channel number in depthwise block +# out_c: output channel number in depthwise block +# s: stride in depthwise block +# use_se: whether to use SE block + +NET_CONFIG = { + "blocks2": + #k, in_c, out_c, s, use_se + [[3, 16, 32, 1, False]], + "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]], + "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]], + "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], + [5, 256, 256, 1, False], [5, 256, 256, 1, False]], + "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]] +} + + +def make_divisible(v, divisor=8, min_value=None): + if min_value is None: + min_value = divisor + new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) + if new_v < 0.9 * v: + new_v += divisor + return new_v + + +class ConvBNLayer(TheseusLayer): + + def __init__(self, num_channels, filter_size, num_filters, stride, num_groups=1): + super().__init__() + + self.conv = Conv2D(in_channels=num_channels, + out_channels=num_filters, + kernel_size=filter_size, + stride=stride, + padding=(filter_size - 1) // 2, + groups=num_groups, + weight_attr=ParamAttr(initializer=KaimingNormal()), + bias_attr=False) + + self.bn = BatchNorm(num_filters, + param_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + self.hardswish = nn.Hardswish() + + def forward(self, x): + x = self.conv(x) + x = self.bn(x) + x = self.hardswish(x) + return x + + +class DepthwiseSeparable(TheseusLayer): + + def __init__(self, num_channels, num_filters, stride, dw_size=3, use_se=False): + super().__init__() + self.use_se = use_se + self.dw_conv = ConvBNLayer(num_channels=num_channels, + num_filters=num_channels, + filter_size=dw_size, + stride=stride, + num_groups=num_channels) + if use_se: + self.se = SEModule(num_channels) + self.pw_conv = ConvBNLayer(num_channels=num_channels, filter_size=1, num_filters=num_filters, stride=1) + + def forward(self, x): + x = self.dw_conv(x) + if self.use_se: + x = self.se(x) + x = self.pw_conv(x) + return x + + +class SEModule(TheseusLayer): + + def __init__(self, channel, reduction=4): + super().__init__() + self.avg_pool = AdaptiveAvgPool2D(1) + self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0) + self.relu = nn.ReLU() + self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0) + self.hardsigmoid = nn.Hardsigmoid() + + def forward(self, x): + identity = x + x = self.avg_pool(x) + x = self.conv1(x) + x = self.relu(x) + x = self.conv2(x) + x = self.hardsigmoid(x) + x = paddle.multiply(x=identity, y=x) + return x + + +class PPLCNet(TheseusLayer): + + def __init__(self, + stages_pattern, + scale=1.0, + class_num=1000, + dropout_prob=0.2, + class_expand=1280, + return_patterns=None, + return_stages=None): + super().__init__() + self.scale = scale + self.class_expand = class_expand + + self.conv1 = ConvBNLayer(num_channels=3, filter_size=3, num_filters=make_divisible(16 * scale), stride=2) + + self.blocks2 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"]) + ]) + + self.blocks3 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"]) + ]) + + self.blocks4 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"]) + ]) + + self.blocks5 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"]) + ]) + + self.blocks6 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"]) + ]) + + self.avg_pool = AdaptiveAvgPool2D(1) + + self.last_conv = Conv2D(in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale), + out_channels=self.class_expand, + kernel_size=1, + stride=1, + padding=0, + bias_attr=False) + + self.hardswish = nn.Hardswish() + self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer") + self.flatten = nn.Flatten(start_axis=1, stop_axis=-1) + + self.fc = Linear(self.class_expand, class_num) + + super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages) + + def forward(self, x): + x = self.conv1(x) + + x = self.blocks2(x) + x = self.blocks3(x) + x = self.blocks4(x) + x = self.blocks5(x) + x = self.blocks6(x) + + x = self.avg_pool(x) + x = self.last_conv(x) + x = self.hardswish(x) + x = self.dropout(x) + x = self.flatten(x) + x = self.fc(x) + return x + + +def PPLCNet_x2_0(pretrained=False, use_ssld=False, **kwargs): + model = PPLCNet(scale=2.0, stages_pattern=MODEL_STAGES_PATTERN["PPLCNet"], **kwargs) + return model diff --git a/modules/image/classification/pplcnet_x2_0_imagenet/module.py b/modules/image/classification/pplcnet_x2_0_imagenet/module.py new file mode 100644 index 0000000000000000000000000000000000000000..d67d80800fa1953288c09ffc67a69cd85dd85b83 --- /dev/null +++ b/modules/image/classification/pplcnet_x2_0_imagenet/module.py @@ -0,0 +1,154 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import cv2 +import numpy as np +import paddle +from skimage.io import imread +from skimage.transform import rescale +from skimage.transform import resize + +import paddlehub as hub +from .model import PPLCNet_x2_0 +from .processor import base64_to_cv2 +from .processor import create_operators +from .processor import Topk +from .utils import get_config +from paddlehub.module.module import moduleinfo +from paddlehub.module.module import runnable +from paddlehub.module.module import serving + + +@moduleinfo(name="pplcnet_x2_0_imagenet", + type="cv/classification", + author="paddlepaddle", + author_email="", + summary="", + version="1.0.0") +class PPLcNet_x2_0: + + def __init__(self): + self.config = get_config(os.path.join(self.directory, 'PPLCNet_x2_0.yaml'), show=False) + self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt') + self.pretrain_path = os.path.join(self.directory, 'PPLCNet_x2_0_pretrained.pdparams') + self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path + self.model = PPLCNet_x2_0() + param_state_dict = paddle.load(self.pretrain_path) + self.model.set_dict(param_state_dict) + self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"]) + + def classification(self, + images: list = None, + paths: list = None, + batch_size: int = 1, + use_gpu: bool = False, + top_k: int = 1): + ''' + Args: + images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR. + paths (list[str]): The paths of images. + batch_size (int): batch size. + use_gpu (bool): Whether to use gpu. + top_k (int): Return top k results. + + Returns: + res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'. + ''' + postprocess_func = Topk(top_k, self.label_path) + inputs = [] + results = [] + paddle.disable_static() + place = 'gpu:0' if use_gpu else 'cpu' + place = paddle.set_device(place) + if images == None and paths == None: + print('No image provided. Please input an image or a image path.') + return + + if images != None: + for image in images: + image = image[:, :, ::-1] + inputs.append(image) + + if paths != None: + for path in paths: + image = cv2.imread(path)[:, :, ::-1] + inputs.append(image) + + batch_data = [] + for idx, imagedata in enumerate(inputs): + for process in self.preprocess_funcs: + imagedata = process(imagedata) + batch_data.append(imagedata) + if len(batch_data) >= batch_size or idx == len(inputs) - 1: + batch_tensor = paddle.to_tensor(batch_data) + out = self.model(batch_tensor) + if isinstance(out, list): + out = out[0] + if isinstance(out, dict) and "logits" in out: + out = out["logits"] + if isinstance(out, dict) and "output" in out: + out = out["output"] + result = postprocess_func(out) + results.extend(result) + batch_data.clear() + return results + + @runnable + def run_cmd(self, argvs: list): + """ + Run as a command. + """ + self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name), + prog='hub run {}'.format(self.name), + usage='%(prog)s', + add_help=True) + + self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required") + self.arg_config_group = self.parser.add_argument_group( + title="Config options", description="Run configuration for controlling module behavior, not required.") + self.add_module_config_arg() + self.add_module_input_arg() + self.args = self.parser.parse_args(argvs) + results = self.classification(paths=[self.args.input_path], + use_gpu=self.args.use_gpu, + batch_size=self.args.batch_size, + top_k=self.args.top_k) + return results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.classification(images=images_decode, **kwargs) + return results + + def add_module_config_arg(self): + """ + Add the command config options. + """ + self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not") + + self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size') + self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.') + + def add_module_input_arg(self): + """ + Add the command input options. + """ + self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.") diff --git a/modules/image/classification/pplcnet_x2_0_imagenet/processor.py b/modules/image/classification/pplcnet_x2_0_imagenet/processor.py new file mode 100644 index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8 --- /dev/null +++ b/modules/image/classification/pplcnet_x2_0_imagenet/processor.py @@ -0,0 +1,374 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import base64 +import inspect +import math +import os +import random +import sys +from functools import partial + +import cv2 +import numpy as np +import paddle +import paddle.nn.functional as F +import six +from paddle.vision.transforms import ColorJitter as RawColorJitter +from PIL import Image + + +def create_operators(params, class_num=None): + """ + create operators based on the config + + Args: + params(list): a dict list, used to create some operators + """ + assert isinstance(params, list), ('operator config should be a list') + ops = [] + current_module = sys.modules[__name__] + for operator in params: + assert isinstance(operator, dict) and len(operator) == 1, "yaml format error" + op_name = list(operator)[0] + param = {} if operator[op_name] is None else operator[op_name] + op_func = getattr(current_module, op_name) + if "class_num" in inspect.getfullargspec(op_func).args: + param.update({"class_num": class_num}) + op = op_func(**param) + ops.append(op) + + return ops + + +class UnifiedResize(object): + + def __init__(self, interpolation=None, backend="cv2"): + _cv2_interp_from_str = { + 'nearest': cv2.INTER_NEAREST, + 'bilinear': cv2.INTER_LINEAR, + 'area': cv2.INTER_AREA, + 'bicubic': cv2.INTER_CUBIC, + 'lanczos': cv2.INTER_LANCZOS4 + } + _pil_interp_from_str = { + 'nearest': Image.NEAREST, + 'bilinear': Image.BILINEAR, + 'bicubic': Image.BICUBIC, + 'box': Image.BOX, + 'lanczos': Image.LANCZOS, + 'hamming': Image.HAMMING + } + + def _pil_resize(src, size, resample): + pil_img = Image.fromarray(src) + pil_img = pil_img.resize(size, resample) + return np.asarray(pil_img) + + if backend.lower() == "cv2": + if isinstance(interpolation, str): + interpolation = _cv2_interp_from_str[interpolation.lower()] + # compatible with opencv < version 4.4.0 + elif interpolation is None: + interpolation = cv2.INTER_LINEAR + self.resize_func = partial(cv2.resize, interpolation=interpolation) + elif backend.lower() == "pil": + if isinstance(interpolation, str): + interpolation = _pil_interp_from_str[interpolation.lower()] + self.resize_func = partial(_pil_resize, resample=interpolation) + else: + self.resize_func = cv2.resize + + def __call__(self, src, size): + return self.resize_func(src, size) + + +class OperatorParamError(ValueError): + """ OperatorParamError + """ + pass + + +class DecodeImage(object): + """ decode image """ + + def __init__(self, to_rgb=True, to_np=False, channel_first=False): + self.to_rgb = to_rgb + self.to_np = to_np # to numpy + self.channel_first = channel_first # only enabled when to_np is True + + def __call__(self, img): + if six.PY2: + assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage" + else: + assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage" + data = np.frombuffer(img, dtype='uint8') + img = cv2.imdecode(data, 1) + if self.to_rgb: + assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape) + img = img[:, :, ::-1] + + if self.channel_first: + img = img.transpose((2, 0, 1)) + + return img + + +class ResizeImage(object): + """ resize image """ + + def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"): + if resize_short is not None and resize_short > 0: + self.resize_short = resize_short + self.w = None + self.h = None + elif size is not None: + self.resize_short = None + self.w = size if type(size) is int else size[0] + self.h = size if type(size) is int else size[1] + else: + raise OperatorParamError("invalid params for ReisizeImage for '\ + 'both 'size' and 'resize_short' are None") + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + img_h, img_w = img.shape[:2] + if self.resize_short is not None: + percent = float(self.resize_short) / min(img_w, img_h) + w = int(round(img_w * percent)) + h = int(round(img_h * percent)) + else: + w = self.w + h = self.h + return self._resize_func(img, (w, h)) + + +class CropImage(object): + """ crop image """ + + def __init__(self, size): + if type(size) is int: + self.size = (size, size) + else: + self.size = size # (h, w) + + def __call__(self, img): + w, h = self.size + img_h, img_w = img.shape[:2] + w_start = (img_w - w) // 2 + h_start = (img_h - h) // 2 + + w_end = w_start + w + h_end = h_start + h + return img[h_start:h_end, w_start:w_end, :] + + +class RandCropImage(object): + """ random crop image """ + + def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"): + if type(size) is int: + self.size = (size, size) # (h, w) + else: + self.size = size + + self.scale = [0.08, 1.0] if scale is None else scale + self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + size = self.size + scale = self.scale + ratio = self.ratio + + aspect_ratio = math.sqrt(random.uniform(*ratio)) + w = 1. * aspect_ratio + h = 1. / aspect_ratio + + img_h, img_w = img.shape[:2] + + bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2)) + scale_max = min(scale[1], bound) + scale_min = min(scale[0], bound) + + target_area = img_w * img_h * random.uniform(scale_min, scale_max) + target_size = math.sqrt(target_area) + w = int(target_size * w) + h = int(target_size * h) + + i = random.randint(0, img_w - w) + j = random.randint(0, img_h - h) + + img = img[j:j + h, i:i + w, :] + + return self._resize_func(img, size) + + +class RandFlipImage(object): + """ random flip image + flip_code: + 1: Flipped Horizontally + 0: Flipped Vertically + -1: Flipped Horizontally & Vertically + """ + + def __init__(self, flip_code=1): + assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]" + self.flip_code = flip_code + + def __call__(self, img): + if random.randint(0, 1) == 1: + return cv2.flip(img, self.flip_code) + else: + return img + + +class NormalizeImage(object): + """ normalize image such as substract mean, divide std + """ + + def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3): + if isinstance(scale, str): + scale = eval(scale) + assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4." + self.channel_num = channel_num + self.output_dtype = 'float16' if output_fp16 else 'float32' + self.scale = np.float32(scale if scale is not None else 1.0 / 255.0) + self.order = order + mean = mean if mean is not None else [0.485, 0.456, 0.406] + std = std if std is not None else [0.229, 0.224, 0.225] + + shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3) + self.mean = np.array(mean).reshape(shape).astype('float32') + self.std = np.array(std).reshape(shape).astype('float32') + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage" + + img = (img.astype('float32') * self.scale - self.mean) / self.std + + if self.channel_num == 4: + img_h = img.shape[1] if self.order == 'chw' else img.shape[0] + img_w = img.shape[2] if self.order == 'chw' else img.shape[1] + pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1)) + img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate( + (img, pad_zeros), axis=2)) + return img.astype(self.output_dtype) + + +class ToCHWImage(object): + """ convert hwc image to chw image + """ + + def __init__(self): + pass + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + return img.transpose((2, 0, 1)) + + +class ColorJitter(RawColorJitter): + """ColorJitter. + """ + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + def __call__(self, img): + if not isinstance(img, Image.Image): + img = np.ascontiguousarray(img) + img = Image.fromarray(img) + img = super()._apply_image(img) + if isinstance(img, Image.Image): + img = np.asarray(img) + return img + + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + +class Topk(object): + + def __init__(self, topk=1, class_id_map_file=None): + assert isinstance(topk, (int, )) + self.class_id_map = self.parse_class_id_map(class_id_map_file) + self.topk = topk + + def parse_class_id_map(self, class_id_map_file): + if class_id_map_file is None: + return None + if not os.path.exists(class_id_map_file): + print( + "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!" + ) + return None + + try: + class_id_map = {} + with open(class_id_map_file, "r") as fin: + lines = fin.readlines() + for line in lines: + partition = line.split("\n")[0].partition(" ") + class_id_map[int(partition[0])] = str(partition[-1]) + except Exception as ex: + print(ex) + class_id_map = None + return class_id_map + + def __call__(self, x, file_names=None, multilabel=False): + assert isinstance(x, paddle.Tensor) + if file_names is not None: + assert x.shape[0] == len(file_names) + x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x) + x = x.numpy() + y = [] + for idx, probs in enumerate(x): + index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where( + probs >= 0.5)[0].astype("int32") + clas_id_list = [] + score_list = [] + label_name_list = [] + for i in index: + clas_id_list.append(i.item()) + score_list.append(probs[i].item()) + if self.class_id_map is not None: + label_name_list.append(self.class_id_map[i.item()]) + result = { + "class_ids": clas_id_list, + "scores": np.around(score_list, decimals=5).tolist(), + } + if file_names is not None: + result["file_name"] = file_names[idx] + if label_name_list is not None: + result["label_names"] = label_name_list + y.append(result) + return y diff --git a/modules/image/classification/pplcnet_x2_0_imagenet/utils.py b/modules/image/classification/pplcnet_x2_0_imagenet/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537 --- /dev/null +++ b/modules/image/classification/pplcnet_x2_0_imagenet/utils.py @@ -0,0 +1,129 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import yaml + +__all__ = ['get_config'] + + +class AttrDict(dict): + + def __getattr__(self, key): + return self[key] + + def __setattr__(self, key, value): + if key in self.__dict__: + self.__dict__[key] = value + else: + self[key] = value + + def __deepcopy__(self, content): + return copy.deepcopy(dict(self)) + + +def create_attr_dict(yaml_config): + from ast import literal_eval + for key, value in yaml_config.items(): + if type(value) is dict: + yaml_config[key] = value = AttrDict(value) + if isinstance(value, str): + try: + value = literal_eval(value) + except BaseException: + pass + if isinstance(value, AttrDict): + create_attr_dict(yaml_config[key]) + else: + yaml_config[key] = value + + +def parse_config(cfg_file): + """Load a config file into AttrDict""" + with open(cfg_file, 'r') as fopen: + yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader)) + create_attr_dict(yaml_config) + return yaml_config + + +def override(dl, ks, v): + """ + Recursively replace dict of list + Args: + dl(dict or list): dict or list to be replaced + ks(list): list of keys + v(str): value to be replaced + """ + + def str2num(v): + try: + return eval(v) + except Exception: + return v + + assert isinstance(dl, (list, dict)), ("{} should be a list or a dict") + assert len(ks) > 0, ('lenght of keys should larger than 0') + if isinstance(dl, list): + k = str2num(ks[0]) + if len(ks) == 1: + assert k < len(dl), ('index({}) out of range({})'.format(k, dl)) + dl[k] = str2num(v) + else: + override(dl[k], ks[1:], v) + else: + if len(ks) == 1: + # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl)) + if not ks[0] in dl: + print('A new filed ({}) detected!'.format(ks[0], dl)) + dl[ks[0]] = str2num(v) + else: + override(dl[ks[0]], ks[1:], v) + + +def override_config(config, options=None): + """ + Recursively override the config + Args: + config(dict): dict to be replaced + options(list): list of pairs(key0.key1.idx.key2=value) + such as: [ + 'topk=2', + 'VALID.transforms.1.ResizeImage.resize_short=300' + ] + Returns: + config(dict): replaced config + """ + if options is not None: + for opt in options: + assert isinstance(opt, str), ("option({}) should be a str".format(opt)) + assert "=" in opt, ("option({}) should contain a =" + "to distinguish between key and value".format(opt)) + pair = opt.split('=') + assert len(pair) == 2, ("there can be only a = in the option") + key, value = pair + keys = key.split('.') + override(config, keys, value) + return config + + +def get_config(fname, overrides=None, show=False): + """ + Read config from file + """ + assert os.path.exists(fname), ('config file({}) is not exist'.format(fname)) + config = parse_config(fname) + override_config(config, overrides) + return config diff --git a/modules/image/classification/pplcnet_x2_5_imagenet/README.md b/modules/image/classification/pplcnet_x2_5_imagenet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a7099ebce1a3c914bdbc612a9761bfe0e3965b64 --- /dev/null +++ b/modules/image/classification/pplcnet_x2_5_imagenet/README.md @@ -0,0 +1,132 @@ +# pplcnet_x2_5_imagenet + +|模型名称|pplcnet_x2_5_imagenet| +| :--- | :---: | +|类别|图像-图像分类| +|网络|PPLCNet| +|数据集|ImageNet-2012| +|是否支持Fine-tuning|否| +|模型大小|34 MB| +|最新更新日期|2022-04-02| +|数据指标|Acc| + + +## 一、模型基本信息 + + + +- ### 模型介绍 + + - PP-LCNet是百度针对Intel CPU 设备以及其加速库 MKLDNN 设计的特定骨干网络 ,比起其他的轻量级的 SOTA 模型,该骨干网络可以在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型。该模型为模型规模参数scale为x2.5下的PP-LCNet模型,关于模型结构的更多信息,可参考[论文](https://arxiv.org/pdf/2109.15099.pdf)。 + +## 二、安装 + +- ### 1、环境依赖 + + - paddlepaddle >= 1.6.2 + + - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + +- ### 2、安装 + + - ```shell + $ hub install pplcnet_x2_5_imagenet + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run pplcnet_x2_5_imagenet --input_path "/PATH/TO/IMAGE" + ``` + - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、预测代码示例 + + - ```python + import paddlehub as hub + import cv2 + + classifier = hub.Module(name="pplcnet_x2_5_imagenet") + result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')]) + # or + # result = classifier.classification(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + + - ```python + def classification(images=None, + paths=None, + batch_size=1, + use_gpu=False, + top_k=1): + ``` + - 分类接口API。 + - **参数** + + - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。 + + - **返回** + + - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称) + + +## 四、服务部署 + +- PaddleHub Serving可以部署一个图像识别的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m pplcnet_x2_5_imagenet + ``` + + - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"\} + url = "http://127.0.0.1:8866/predict/pplcnet_x2_5_imagenet" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` + + +## 五、更新历史 + +* 1.0.0 + + 初始发布 + + - ```shell + $ hub install pplcnet_x2_5_imagenet==1.0.0 + ``` diff --git a/modules/image/classification/pplcnet_x2_5_imagenet/model.py b/modules/image/classification/pplcnet_x2_5_imagenet/model.py new file mode 100644 index 0000000000000000000000000000000000000000..b1395770144db6cadd6f1cb121d07b966e30e02a --- /dev/null +++ b/modules/image/classification/pplcnet_x2_5_imagenet/model.py @@ -0,0 +1,478 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from typing import Any +from typing import Callable +from typing import Dict +from typing import List +from typing import Tuple +from typing import Union + +import paddle +import paddle.nn as nn +from paddle import ParamAttr +from paddle.nn import AdaptiveAvgPool2D +from paddle.nn import BatchNorm +from paddle.nn import Conv2D +from paddle.nn import Dropout +from paddle.nn import Linear +from paddle.nn.initializer import KaimingNormal +from paddle.regularizer import L2Decay + + +class Identity(nn.Layer): + + def __init__(self): + super(Identity, self).__init__() + + def forward(self, inputs): + return inputs + + +class TheseusLayer(nn.Layer): + + def __init__(self, *args, **kwargs): + super(TheseusLayer, self).__init__() + self.res_dict = {} + self.res_name = self.full_name() + self.pruner = None + self.quanter = None + + def _return_dict_hook(self, layer, input, output): + res_dict = {"output": output} + # 'list' is needed to avoid error raised by popping self.res_dict + for res_key in list(self.res_dict): + # clear the res_dict because the forward process may change according to input + res_dict[res_key] = self.res_dict.pop(res_key) + return res_dict + + def init_res(self, stages_pattern, return_patterns=None, return_stages=None): + if return_patterns and return_stages: + msg = f"The 'return_patterns' would be ignored when 'return_stages' is set." + return_stages = None + + if return_stages is True: + return_patterns = stages_pattern + # return_stages is int or bool + if type(return_stages) is int: + return_stages = [return_stages] + if isinstance(return_stages, list): + if max(return_stages) > len(stages_pattern) or min(return_stages) < 0: + msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}." + return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)] + return_patterns = [stages_pattern[i] for i in return_stages] + + if return_patterns: + self.update_res(return_patterns) + + def replace_sub(self, *args, **kwargs) -> None: + msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead." + raise DeprecationWarning(msg) + + def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]], + handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]: + """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'. + + Args: + layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'. + handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed. + + Returns: + Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'. + + Examples: + + from paddle import nn + import paddleclas + + def rep_func(layer: nn.Layer, pattern: str): + new_layer = nn.Conv2D( + in_channels=layer._in_channels, + out_channels=layer._out_channels, + kernel_size=5, + padding=2 + ) + return new_layer + + net = paddleclas.MobileNetV1() + res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func) + print(res) + # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer} + """ + + if not isinstance(layer_name_pattern, list): + layer_name_pattern = [layer_name_pattern] + + hit_layer_pattern_list = [] + for pattern in layer_name_pattern: + # parse pattern to find target layer and its parent + layer_list = parse_pattern_str(pattern=pattern, parent_layer=self) + if not layer_list: + continue + sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self + + sub_layer = layer_list[-1]["layer"] + sub_layer_name = layer_list[-1]["name"] + sub_layer_index = layer_list[-1]["index"] + + new_sub_layer = handle_func(sub_layer, pattern) + + if sub_layer_index: + getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer + else: + setattr(sub_layer_parent, sub_layer_name, new_sub_layer) + + hit_layer_pattern_list.append(pattern) + return hit_layer_pattern_list + + def stop_after(self, stop_layer_name: str) -> bool: + """stop forward and backward after 'stop_layer_name'. + + Args: + stop_layer_name (str): The name of layer that stop forward and backward after this layer. + + Returns: + bool: 'True' if successful, 'False' otherwise. + """ + + layer_list = parse_pattern_str(stop_layer_name, self) + if not layer_list: + return False + + parent_layer = self + for layer_dict in layer_list: + name, index = layer_dict["name"], layer_dict["index"] + if not set_identity(parent_layer, name, index): + msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'." + return False + parent_layer = layer_dict["layer"] + + return True + + def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]: + """update the result(s) to be returned. + + Args: + return_patterns (Union[str, List[str]]): The name of layer to return output. + + Returns: + Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully. + """ + + # clear res_dict that could have been set + self.res_dict = {} + + class Handler(object): + + def __init__(self, res_dict): + # res_dict is a reference + self.res_dict = res_dict + + def __call__(self, layer, pattern): + layer.res_dict = self.res_dict + layer.res_name = pattern + if hasattr(layer, "hook_remove_helper"): + layer.hook_remove_helper.remove() + layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook) + return layer + + handle_func = Handler(self.res_dict) + + hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func) + + if hasattr(self, "hook_remove_helper"): + self.hook_remove_helper.remove() + self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook) + + return hit_layer_pattern_list + + +def save_sub_res_hook(layer, input, output): + layer.res_dict[layer.res_name] = output + + +def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool: + """set the layer specified by layer_name and layer_index to Indentity. + + Args: + parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index. + layer_name (str): The name of target layer to be set to Indentity. + layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None. + + Returns: + bool: True if successfully, False otherwise. + """ + + stop_after = False + for sub_layer_name in parent_layer._sub_layers: + if stop_after: + parent_layer._sub_layers[sub_layer_name] = Identity() + continue + if sub_layer_name == layer_name: + stop_after = True + + if layer_index and stop_after: + stop_after = False + for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers: + if stop_after: + parent_layer._sub_layers[layer_name][sub_layer_index] = Identity() + continue + if layer_index == sub_layer_index: + stop_after = True + + return stop_after + + +def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: + """parse the string type pattern. + + Args: + pattern (str): The pattern to discribe layer. + parent_layer (nn.Layer): The root layer relative to the pattern. + + Returns: + Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order: + [ + {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist}, + {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist}, + ... + ] + """ + + pattern_list = pattern.split(".") + if not pattern_list: + msg = f"The pattern('{pattern}') is illegal. Please check and retry." + return None + + layer_list = [] + while len(pattern_list) > 0: + if '[' in pattern_list[0]: + target_layer_name = pattern_list[0].split('[')[0] + target_layer_index = pattern_list[0].split('[')[1].split(']')[0] + else: + target_layer_name = pattern_list[0] + target_layer_index = None + + target_layer = getattr(parent_layer, target_layer_name, None) + + if target_layer is None: + msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')." + return None + + if target_layer_index and target_layer: + if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer): + msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0." + return None + + target_layer = target_layer[target_layer_index] + + layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index}) + + pattern_list = pattern_list[1:] + parent_layer = target_layer + return layer_list + + +MODEL_STAGES_PATTERN = {"PPLCNet": ["blocks2", "blocks3", "blocks4", "blocks5", "blocks6"]} + +# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se. +# k: kernel_size +# in_c: input channel number in depthwise block +# out_c: output channel number in depthwise block +# s: stride in depthwise block +# use_se: whether to use SE block + +NET_CONFIG = { + "blocks2": + #k, in_c, out_c, s, use_se + [[3, 16, 32, 1, False]], + "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]], + "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]], + "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], + [5, 256, 256, 1, False], [5, 256, 256, 1, False]], + "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]] +} + + +def make_divisible(v, divisor=8, min_value=None): + if min_value is None: + min_value = divisor + new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) + if new_v < 0.9 * v: + new_v += divisor + return new_v + + +class ConvBNLayer(TheseusLayer): + + def __init__(self, num_channels, filter_size, num_filters, stride, num_groups=1): + super().__init__() + + self.conv = Conv2D(in_channels=num_channels, + out_channels=num_filters, + kernel_size=filter_size, + stride=stride, + padding=(filter_size - 1) // 2, + groups=num_groups, + weight_attr=ParamAttr(initializer=KaimingNormal()), + bias_attr=False) + + self.bn = BatchNorm(num_filters, + param_attr=ParamAttr(regularizer=L2Decay(0.0)), + bias_attr=ParamAttr(regularizer=L2Decay(0.0))) + self.hardswish = nn.Hardswish() + + def forward(self, x): + x = self.conv(x) + x = self.bn(x) + x = self.hardswish(x) + return x + + +class DepthwiseSeparable(TheseusLayer): + + def __init__(self, num_channels, num_filters, stride, dw_size=3, use_se=False): + super().__init__() + self.use_se = use_se + self.dw_conv = ConvBNLayer(num_channels=num_channels, + num_filters=num_channels, + filter_size=dw_size, + stride=stride, + num_groups=num_channels) + if use_se: + self.se = SEModule(num_channels) + self.pw_conv = ConvBNLayer(num_channels=num_channels, filter_size=1, num_filters=num_filters, stride=1) + + def forward(self, x): + x = self.dw_conv(x) + if self.use_se: + x = self.se(x) + x = self.pw_conv(x) + return x + + +class SEModule(TheseusLayer): + + def __init__(self, channel, reduction=4): + super().__init__() + self.avg_pool = AdaptiveAvgPool2D(1) + self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0) + self.relu = nn.ReLU() + self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0) + self.hardsigmoid = nn.Hardsigmoid() + + def forward(self, x): + identity = x + x = self.avg_pool(x) + x = self.conv1(x) + x = self.relu(x) + x = self.conv2(x) + x = self.hardsigmoid(x) + x = paddle.multiply(x=identity, y=x) + return x + + +class PPLCNet(TheseusLayer): + + def __init__(self, + stages_pattern, + scale=1.0, + class_num=1000, + dropout_prob=0.2, + class_expand=1280, + return_patterns=None, + return_stages=None): + super().__init__() + self.scale = scale + self.class_expand = class_expand + + self.conv1 = ConvBNLayer(num_channels=3, filter_size=3, num_filters=make_divisible(16 * scale), stride=2) + + self.blocks2 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"]) + ]) + + self.blocks3 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"]) + ]) + + self.blocks4 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"]) + ]) + + self.blocks5 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"]) + ]) + + self.blocks6 = nn.Sequential(*[ + DepthwiseSeparable(num_channels=make_divisible(in_c * scale), + num_filters=make_divisible(out_c * scale), + dw_size=k, + stride=s, + use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"]) + ]) + + self.avg_pool = AdaptiveAvgPool2D(1) + + self.last_conv = Conv2D(in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale), + out_channels=self.class_expand, + kernel_size=1, + stride=1, + padding=0, + bias_attr=False) + + self.hardswish = nn.Hardswish() + self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer") + self.flatten = nn.Flatten(start_axis=1, stop_axis=-1) + + self.fc = Linear(self.class_expand, class_num) + + super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages) + + def forward(self, x): + x = self.conv1(x) + + x = self.blocks2(x) + x = self.blocks3(x) + x = self.blocks4(x) + x = self.blocks5(x) + x = self.blocks6(x) + + x = self.avg_pool(x) + x = self.last_conv(x) + x = self.hardswish(x) + x = self.dropout(x) + x = self.flatten(x) + x = self.fc(x) + return x + + +def PPLCNet_x2_5(pretrained=False, use_ssld=False, **kwargs): + model = PPLCNet(scale=2.5, stages_pattern=MODEL_STAGES_PATTERN["PPLCNet"], **kwargs) + return model diff --git a/modules/image/classification/pplcnet_x2_5_imagenet/module.py b/modules/image/classification/pplcnet_x2_5_imagenet/module.py new file mode 100644 index 0000000000000000000000000000000000000000..479cf4a61b46dd14b64e5915df6e8144f40cdf1c --- /dev/null +++ b/modules/image/classification/pplcnet_x2_5_imagenet/module.py @@ -0,0 +1,154 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import cv2 +import numpy as np +import paddle +from skimage.io import imread +from skimage.transform import rescale +from skimage.transform import resize + +import paddlehub as hub +from .model import PPLCNet_x2_5 +from .processor import base64_to_cv2 +from .processor import create_operators +from .processor import Topk +from .utils import get_config +from paddlehub.module.module import moduleinfo +from paddlehub.module.module import runnable +from paddlehub.module.module import serving + + +@moduleinfo(name="pplcnet_x2_5_imagenet", + type="cv/classification", + author="paddlepaddle", + author_email="", + summary="", + version="1.0.0") +class PPLcNet_x2_5: + + def __init__(self): + self.config = get_config(os.path.join(self.directory, 'PPLCNet_x2_5.yaml'), show=False) + self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt') + self.pretrain_path = os.path.join(self.directory, 'PPLCNet_x2_5_pretrained.pdparams') + self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path + self.model = PPLCNet_x2_5() + param_state_dict = paddle.load(self.pretrain_path) + self.model.set_dict(param_state_dict) + self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"]) + + def classification(self, + images: list = None, + paths: list = None, + batch_size: int = 1, + use_gpu: bool = False, + top_k: int = 1): + ''' + Args: + images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR. + paths (list[str]): The paths of images. + batch_size (int): batch size. + use_gpu (bool): Whether to use gpu. + top_k (int): Return top k results. + + Returns: + res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'. + ''' + postprocess_func = Topk(top_k, self.label_path) + inputs = [] + results = [] + paddle.disable_static() + place = 'gpu:0' if use_gpu else 'cpu' + place = paddle.set_device(place) + if images == None and paths == None: + print('No image provided. Please input an image or a image path.') + return + + if images != None: + for image in images: + image = image[:, :, ::-1] + inputs.append(image) + + if paths != None: + for path in paths: + image = cv2.imread(path)[:, :, ::-1] + inputs.append(image) + + batch_data = [] + for idx, imagedata in enumerate(inputs): + for process in self.preprocess_funcs: + imagedata = process(imagedata) + batch_data.append(imagedata) + if len(batch_data) >= batch_size or idx == len(inputs) - 1: + batch_tensor = paddle.to_tensor(batch_data) + out = self.model(batch_tensor) + if isinstance(out, list): + out = out[0] + if isinstance(out, dict) and "logits" in out: + out = out["logits"] + if isinstance(out, dict) and "output" in out: + out = out["output"] + result = postprocess_func(out) + results.extend(result) + batch_data.clear() + return results + + @runnable + def run_cmd(self, argvs: list): + """ + Run as a command. + """ + self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name), + prog='hub run {}'.format(self.name), + usage='%(prog)s', + add_help=True) + + self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required") + self.arg_config_group = self.parser.add_argument_group( + title="Config options", description="Run configuration for controlling module behavior, not required.") + self.add_module_config_arg() + self.add_module_input_arg() + self.args = self.parser.parse_args(argvs) + results = self.classification(paths=[self.args.input_path], + use_gpu=self.args.use_gpu, + batch_size=self.args.batch_size, + top_k=self.args.top_k) + return results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.classification(images=images_decode, **kwargs) + return results + + def add_module_config_arg(self): + """ + Add the command config options. + """ + self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not") + + self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size') + self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.') + + def add_module_input_arg(self): + """ + Add the command input options. + """ + self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.") diff --git a/modules/image/classification/pplcnet_x2_5_imagenet/processor.py b/modules/image/classification/pplcnet_x2_5_imagenet/processor.py new file mode 100644 index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8 --- /dev/null +++ b/modules/image/classification/pplcnet_x2_5_imagenet/processor.py @@ -0,0 +1,374 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from __future__ import unicode_literals + +import base64 +import inspect +import math +import os +import random +import sys +from functools import partial + +import cv2 +import numpy as np +import paddle +import paddle.nn.functional as F +import six +from paddle.vision.transforms import ColorJitter as RawColorJitter +from PIL import Image + + +def create_operators(params, class_num=None): + """ + create operators based on the config + + Args: + params(list): a dict list, used to create some operators + """ + assert isinstance(params, list), ('operator config should be a list') + ops = [] + current_module = sys.modules[__name__] + for operator in params: + assert isinstance(operator, dict) and len(operator) == 1, "yaml format error" + op_name = list(operator)[0] + param = {} if operator[op_name] is None else operator[op_name] + op_func = getattr(current_module, op_name) + if "class_num" in inspect.getfullargspec(op_func).args: + param.update({"class_num": class_num}) + op = op_func(**param) + ops.append(op) + + return ops + + +class UnifiedResize(object): + + def __init__(self, interpolation=None, backend="cv2"): + _cv2_interp_from_str = { + 'nearest': cv2.INTER_NEAREST, + 'bilinear': cv2.INTER_LINEAR, + 'area': cv2.INTER_AREA, + 'bicubic': cv2.INTER_CUBIC, + 'lanczos': cv2.INTER_LANCZOS4 + } + _pil_interp_from_str = { + 'nearest': Image.NEAREST, + 'bilinear': Image.BILINEAR, + 'bicubic': Image.BICUBIC, + 'box': Image.BOX, + 'lanczos': Image.LANCZOS, + 'hamming': Image.HAMMING + } + + def _pil_resize(src, size, resample): + pil_img = Image.fromarray(src) + pil_img = pil_img.resize(size, resample) + return np.asarray(pil_img) + + if backend.lower() == "cv2": + if isinstance(interpolation, str): + interpolation = _cv2_interp_from_str[interpolation.lower()] + # compatible with opencv < version 4.4.0 + elif interpolation is None: + interpolation = cv2.INTER_LINEAR + self.resize_func = partial(cv2.resize, interpolation=interpolation) + elif backend.lower() == "pil": + if isinstance(interpolation, str): + interpolation = _pil_interp_from_str[interpolation.lower()] + self.resize_func = partial(_pil_resize, resample=interpolation) + else: + self.resize_func = cv2.resize + + def __call__(self, src, size): + return self.resize_func(src, size) + + +class OperatorParamError(ValueError): + """ OperatorParamError + """ + pass + + +class DecodeImage(object): + """ decode image """ + + def __init__(self, to_rgb=True, to_np=False, channel_first=False): + self.to_rgb = to_rgb + self.to_np = to_np # to numpy + self.channel_first = channel_first # only enabled when to_np is True + + def __call__(self, img): + if six.PY2: + assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage" + else: + assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage" + data = np.frombuffer(img, dtype='uint8') + img = cv2.imdecode(data, 1) + if self.to_rgb: + assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape) + img = img[:, :, ::-1] + + if self.channel_first: + img = img.transpose((2, 0, 1)) + + return img + + +class ResizeImage(object): + """ resize image """ + + def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"): + if resize_short is not None and resize_short > 0: + self.resize_short = resize_short + self.w = None + self.h = None + elif size is not None: + self.resize_short = None + self.w = size if type(size) is int else size[0] + self.h = size if type(size) is int else size[1] + else: + raise OperatorParamError("invalid params for ReisizeImage for '\ + 'both 'size' and 'resize_short' are None") + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + img_h, img_w = img.shape[:2] + if self.resize_short is not None: + percent = float(self.resize_short) / min(img_w, img_h) + w = int(round(img_w * percent)) + h = int(round(img_h * percent)) + else: + w = self.w + h = self.h + return self._resize_func(img, (w, h)) + + +class CropImage(object): + """ crop image """ + + def __init__(self, size): + if type(size) is int: + self.size = (size, size) + else: + self.size = size # (h, w) + + def __call__(self, img): + w, h = self.size + img_h, img_w = img.shape[:2] + w_start = (img_w - w) // 2 + h_start = (img_h - h) // 2 + + w_end = w_start + w + h_end = h_start + h + return img[h_start:h_end, w_start:w_end, :] + + +class RandCropImage(object): + """ random crop image """ + + def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"): + if type(size) is int: + self.size = (size, size) # (h, w) + else: + self.size = size + + self.scale = [0.08, 1.0] if scale is None else scale + self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio + + self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend) + + def __call__(self, img): + size = self.size + scale = self.scale + ratio = self.ratio + + aspect_ratio = math.sqrt(random.uniform(*ratio)) + w = 1. * aspect_ratio + h = 1. / aspect_ratio + + img_h, img_w = img.shape[:2] + + bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2)) + scale_max = min(scale[1], bound) + scale_min = min(scale[0], bound) + + target_area = img_w * img_h * random.uniform(scale_min, scale_max) + target_size = math.sqrt(target_area) + w = int(target_size * w) + h = int(target_size * h) + + i = random.randint(0, img_w - w) + j = random.randint(0, img_h - h) + + img = img[j:j + h, i:i + w, :] + + return self._resize_func(img, size) + + +class RandFlipImage(object): + """ random flip image + flip_code: + 1: Flipped Horizontally + 0: Flipped Vertically + -1: Flipped Horizontally & Vertically + """ + + def __init__(self, flip_code=1): + assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]" + self.flip_code = flip_code + + def __call__(self, img): + if random.randint(0, 1) == 1: + return cv2.flip(img, self.flip_code) + else: + return img + + +class NormalizeImage(object): + """ normalize image such as substract mean, divide std + """ + + def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3): + if isinstance(scale, str): + scale = eval(scale) + assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4." + self.channel_num = channel_num + self.output_dtype = 'float16' if output_fp16 else 'float32' + self.scale = np.float32(scale if scale is not None else 1.0 / 255.0) + self.order = order + mean = mean if mean is not None else [0.485, 0.456, 0.406] + std = std if std is not None else [0.229, 0.224, 0.225] + + shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3) + self.mean = np.array(mean).reshape(shape).astype('float32') + self.std = np.array(std).reshape(shape).astype('float32') + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage" + + img = (img.astype('float32') * self.scale - self.mean) / self.std + + if self.channel_num == 4: + img_h = img.shape[1] if self.order == 'chw' else img.shape[0] + img_w = img.shape[2] if self.order == 'chw' else img.shape[1] + pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1)) + img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate( + (img, pad_zeros), axis=2)) + return img.astype(self.output_dtype) + + +class ToCHWImage(object): + """ convert hwc image to chw image + """ + + def __init__(self): + pass + + def __call__(self, img): + from PIL import Image + if isinstance(img, Image.Image): + img = np.array(img) + + return img.transpose((2, 0, 1)) + + +class ColorJitter(RawColorJitter): + """ColorJitter. + """ + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + def __call__(self, img): + if not isinstance(img, Image.Image): + img = np.ascontiguousarray(img) + img = Image.fromarray(img) + img = super()._apply_image(img) + if isinstance(img, Image.Image): + img = np.asarray(img) + return img + + +def base64_to_cv2(b64str): + data = base64.b64decode(b64str.encode('utf8')) + data = np.fromstring(data, np.uint8) + data = cv2.imdecode(data, cv2.IMREAD_COLOR) + return data + + +class Topk(object): + + def __init__(self, topk=1, class_id_map_file=None): + assert isinstance(topk, (int, )) + self.class_id_map = self.parse_class_id_map(class_id_map_file) + self.topk = topk + + def parse_class_id_map(self, class_id_map_file): + if class_id_map_file is None: + return None + if not os.path.exists(class_id_map_file): + print( + "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!" + ) + return None + + try: + class_id_map = {} + with open(class_id_map_file, "r") as fin: + lines = fin.readlines() + for line in lines: + partition = line.split("\n")[0].partition(" ") + class_id_map[int(partition[0])] = str(partition[-1]) + except Exception as ex: + print(ex) + class_id_map = None + return class_id_map + + def __call__(self, x, file_names=None, multilabel=False): + assert isinstance(x, paddle.Tensor) + if file_names is not None: + assert x.shape[0] == len(file_names) + x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x) + x = x.numpy() + y = [] + for idx, probs in enumerate(x): + index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where( + probs >= 0.5)[0].astype("int32") + clas_id_list = [] + score_list = [] + label_name_list = [] + for i in index: + clas_id_list.append(i.item()) + score_list.append(probs[i].item()) + if self.class_id_map is not None: + label_name_list.append(self.class_id_map[i.item()]) + result = { + "class_ids": clas_id_list, + "scores": np.around(score_list, decimals=5).tolist(), + } + if file_names is not None: + result["file_name"] = file_names[idx] + if label_name_list is not None: + result["label_names"] = label_name_list + y.append(result) + return y diff --git a/modules/image/classification/pplcnet_x2_5_imagenet/utils.py b/modules/image/classification/pplcnet_x2_5_imagenet/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537 --- /dev/null +++ b/modules/image/classification/pplcnet_x2_5_imagenet/utils.py @@ -0,0 +1,129 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import argparse +import copy +import os + +import yaml + +__all__ = ['get_config'] + + +class AttrDict(dict): + + def __getattr__(self, key): + return self[key] + + def __setattr__(self, key, value): + if key in self.__dict__: + self.__dict__[key] = value + else: + self[key] = value + + def __deepcopy__(self, content): + return copy.deepcopy(dict(self)) + + +def create_attr_dict(yaml_config): + from ast import literal_eval + for key, value in yaml_config.items(): + if type(value) is dict: + yaml_config[key] = value = AttrDict(value) + if isinstance(value, str): + try: + value = literal_eval(value) + except BaseException: + pass + if isinstance(value, AttrDict): + create_attr_dict(yaml_config[key]) + else: + yaml_config[key] = value + + +def parse_config(cfg_file): + """Load a config file into AttrDict""" + with open(cfg_file, 'r') as fopen: + yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader)) + create_attr_dict(yaml_config) + return yaml_config + + +def override(dl, ks, v): + """ + Recursively replace dict of list + Args: + dl(dict or list): dict or list to be replaced + ks(list): list of keys + v(str): value to be replaced + """ + + def str2num(v): + try: + return eval(v) + except Exception: + return v + + assert isinstance(dl, (list, dict)), ("{} should be a list or a dict") + assert len(ks) > 0, ('lenght of keys should larger than 0') + if isinstance(dl, list): + k = str2num(ks[0]) + if len(ks) == 1: + assert k < len(dl), ('index({}) out of range({})'.format(k, dl)) + dl[k] = str2num(v) + else: + override(dl[k], ks[1:], v) + else: + if len(ks) == 1: + # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl)) + if not ks[0] in dl: + print('A new filed ({}) detected!'.format(ks[0], dl)) + dl[ks[0]] = str2num(v) + else: + override(dl[ks[0]], ks[1:], v) + + +def override_config(config, options=None): + """ + Recursively override the config + Args: + config(dict): dict to be replaced + options(list): list of pairs(key0.key1.idx.key2=value) + such as: [ + 'topk=2', + 'VALID.transforms.1.ResizeImage.resize_short=300' + ] + Returns: + config(dict): replaced config + """ + if options is not None: + for opt in options: + assert isinstance(opt, str), ("option({}) should be a str".format(opt)) + assert "=" in opt, ("option({}) should contain a =" + "to distinguish between key and value".format(opt)) + pair = opt.split('=') + assert len(pair) == 2, ("there can be only a = in the option") + key, value = pair + keys = key.split('.') + override(config, keys, value) + return config + + +def get_config(fname, overrides=None, show=False): + """ + Read config from file + """ + assert os.path.exists(fname), ('config file({}) is not exist'.format(fname)) + config = parse_config(fname) + override_config(config, overrides) + return config diff --git a/modules/text/text_generation/ernie_tiny/README.md b/modules/text/text_generation/ernie_tiny/README.md new file mode 100644 index 0000000000000000000000000000000000000000..15c6543286655543a7e1345d3b1fdf7394c6b8ef --- /dev/null +++ b/modules/text/text_generation/ernie_tiny/README.md @@ -0,0 +1,126 @@ +# ernie_tiny + +|模型名称|ernie_tiny| +| :--- | :---: | +|类别|图像 - 图像生成| +|网络|SPADEGenerator| +|数据集|coco_stuff| +|是否支持Fine-tuning|否| +|模型大小|74MB| +|最新更新日期|2021-12-14| +|数据指标|-| + + +## 一、模型基本信息 + +- ### 应用效果展示 + - 样例结果示例: +

+ +
+ +- ### 模型介绍 + + - 本模块采用一个像素风格迁移网络 Pix2PixHD,能够根据输入的语义分割标签生成照片风格的图片。为了解决模型归一化层导致标签语义信息丢失的问题,向 Pix2PixHD 的生成器网络中添加了 SPADE(Spatially-Adaptive + Normalization)空间自适应归一化模块,通过两个卷积层保留了归一化时训练的缩放与偏置参数的空间维度,以增强生成图片的质量。语义风格标签图像可以参考[coco_stuff数据集](https://github.com/nightrome/cocostuff)获取, 也可以通过[PaddleGAN repo中的该项目](https://github.com/PaddlePaddle/PaddleGAN/blob/87537ad9d4eeda17eaa5916c6a585534ab989ea8/docs/zh_CN/tutorials/photopen.md)来自定义生成图像进行体验。 + + + +## 二、安装 + +- ### 1、环境依赖 + - ppgan + +- ### 2、安装 + + - ```shell + $ hub install photopen + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + # Read from a file + $ hub run photopen --input_path "/PATH/TO/IMAGE" + ``` + - 通过命令行方式实现图像生成模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、预测代码示例 + + - ```python + import paddlehub as hub + + module = hub.Module(name="photopen") + input_path = ["/PATH/TO/IMAGE"] + # Read from a file + module.photo_transfer(paths=input_path, output_dir='./transfer_result/', use_gpu=True) + ``` + +- ### 3、API + + - ```python + photo_transfer(images=None, paths=None, output_dir='./transfer_result/', use_gpu=False, visualization=True): + ``` + - 图像转换生成API。 + + - **参数** + + - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\];
+ - paths (list\[str\]): 图片的路径;
+ - output\_dir (str): 结果保存的路径;
+ - use\_gpu (bool): 是否使用 GPU;
+ - visualization(bool): 是否保存结果到本地文件夹 + + +## 四、服务部署 + +- PaddleHub Serving可以部署一个在线图像转换生成服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m photopen + ``` + + - 这样就完成了一个图像转换生成的在线服务API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/photopen" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + +## 五、更新历史 + +* 1.0.0 + + 初始发布 + + - ```shell + $ hub install ernie_tiny==1.1.0 + ``` diff --git a/modules/text/text_generation/ernie_tiny/README_en.md b/modules/text/text_generation/ernie_tiny/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..373348799089cc370f90335e6290c0ce38a8a11c --- /dev/null +++ b/modules/text/text_generation/ernie_tiny/README_en.md @@ -0,0 +1,171 @@ +# ernie_tiny + +|Module Name|ernie_tiny| +| :--- | :---: | +|Category|object detection| +|Network|faster_rcnn| +|Dataset|COCO2017| +|Fine-tuning supported or not|No| +|Module Size|161MB| +|Latest update date|2021-03-15| +|Data indicators|-| + + +## I.Basic Information + +- ### Application Effect Display + - Sample results: +

+ +
+

+ +- ### Module Introduction + + - Faster_RCNN is a two-stage detector, it consists of feature extraction, proposal, classification and refinement processes. This module is trained on COCO2017 dataset, and can be used for object detection. + + +## II.Installation + +- ### 1、Environmental Dependence + + - paddlepaddle >= 1.6.2 + + - paddlehub >= 1.6.0 | [How to install PaddleHub](../../../../docs/docs_en/get_start/installation.rst) + +- ### 2、Installation + + - ```shell + $ hub install faster_rcnn_resnet50_fpn_coco2017 + ``` + - In case of any problems during installation, please refer to: [Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md) + +## III.Module API Prediction + +- ### 1、Command line Prediction + + - ```shell + $ hub run faster_rcnn_resnet50_fpn_coco2017 --input_path "/PATH/TO/IMAGE" + ``` + - If you want to call the Hub module through the command line, please refer to: [PaddleHub Command Line Instruction](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、Prediction Code Example + + - ```python + import paddlehub as hub + import cv2 + + object_detector = hub.Module(name="faster_rcnn_resnet50_fpn_coco2017") + result = object_detector.object_detection(images=[cv2.imread('/PATH/TO/IMAGE')]) + # or + # result = object_detector.object_detection((paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + - ```python + def object_detection(paths=None, + images=None, + batch_size=1, + use_gpu=False, + output_dir='detection_result', + score_thresh=0.5, + visualization=True) + ``` + + - Detection API, detect positions of all objects in image + + - **Parameters** + + - paths (list[str]): image path; + - images (list\[numpy.ndarray\]): image data, ndarray.shape is in the format [H, W, C], BGR; + - batch_size (int): the size of batch; + - use_gpu (bool): use GPU or not; **set the CUDA_VISIBLE_DEVICES environment variable first if you are using GPU** + - output_dir (str): save path of images; + - score\_thresh (float): confidence threshold;
+ - visualization (bool): Whether to save the results as picture files; + + **NOTE:** choose one parameter to provide data from paths and images + + - **Return** + + - res (list\[dict\]): results + - data (list): detection results, each element in the list is dict + - confidence (float): the confidence of the result + - label (str): label + - left (int): the upper left corner x coordinate of the detection box + - top (int): the upper left corner y coordinate of the detection box + - right (int): the lower right corner x coordinate of the detection box + - bottom (int): the lower right corner y coordinate of the detection box + - save\_path (str, optional): output path for saving results + + + - ```python + def save_inference_model(dirname, + model_filename=None, + params_filename=None, + combined=True) + ``` + - Save model to specific path + + - **Parameters** + + - dirname: output dir for saving model + - model\_filename: filename for saving model + - params\_filename: filename for saving parameters + - combined: whether save parameters into one file + + +## IV.Server Deployment + +- PaddleHub Serving can deploy an online service of object detection. + +- ### Step 1: Start PaddleHub Serving + + - Run the startup command: + - ```shell + $ hub serving start -m faster_rcnn_resnet50_fpn_coco2017 + ``` + + - The servitization API is now deployed and the default port number is 8866. + + - **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set. + +- ### Step 2: Send a predictive request + + - With a configured server, use the following lines of code to send the prediction request and obtain the result + + - ```python + import requests + import json + import cv2 + import base64 + + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # Send an HTTP request + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/faster_rcnn_resnet50_fpn_coco2017" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # print prediction results + print(r.json()["results"]) + ``` + + +## V.Release Note + +* 1.0.0 + + First release + +* 1.0.1 + + Fix the problem of reading numpy + - ```shell + $ hub install ernie_tiny==1.1.0 + ```